比较熊猫中的自定义值

时间:2021-05-14 11:23:29

标签: python pandas dataframe

我想在 Pandas 中比较成绩,但成绩不是数字。我想在新列[keep]中保留最高等级,其他重复代码会在新列[keep]中写一些东西。 等级规则是金>银>铜

示例 csv:

VIP_CODE|Grade
123|Gold
321|Sliver
123|Gold
321|Bronze
456|Sliver
456|Gold

预期结果:

VIP_CODE|Grade|keep
123|Gold|yes
321|Sliver|yes
123|Gold|yes
321|Bronze|dup by 321
456|Sliver|dup by 456
456|Gold|yes

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:2)

尝试类似:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'VIP_CODE': {0: 123, 1: 321, 2: 123, 3: 321, 4: 456, 5: 456},
    'Grade': {0: 'Gold', 1: 'Silver', 2: 'Gold', 3: 'Bronze',
              4: 'Silver', 5: 'Gold'}
})

# Assign Numerical Value To Each Grade
df['weight'] = df['Grade'].map({'Gold': 2, 'Silver': 1, 'Bronze': 0})
# Get Max For Each Group
df['max'] = df.groupby('VIP_CODE')['weight'].transform('max')
# Where weight is max for group
df['keep'] = np.where(
    df['max'] == df['weight'],
    'yes',
    'dup by ' + df['VIP_CODE'].astype(str)
)

# Drop extra columns
df = df.drop(columns=['weight', 'max'])

# For Display
print(df.to_csv(sep='|', index=False))

df

VIP_CODE|Grade|keep
123|Gold|yes
321|Silver|yes
123|Gold|yes
321|Bronze|dup by 321
456|Silver|dup by 456
456|Gold|yes
相关问题