我想在 Pandas 中比较成绩,但成绩不是数字。我想在新列[keep]中保留最高等级,其他重复代码会在新列[keep]中写一些东西。 等级规则是金>银>铜
示例 csv:
VIP_CODE|Grade
123|Gold
321|Sliver
123|Gold
321|Bronze
456|Sliver
456|Gold
预期结果:
VIP_CODE|Grade|keep
123|Gold|yes
321|Sliver|yes
123|Gold|yes
321|Bronze|dup by 321
456|Sliver|dup by 456
456|Gold|yes
非常感谢任何帮助。
答案 0 :(得分:2)
尝试类似:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'VIP_CODE': {0: 123, 1: 321, 2: 123, 3: 321, 4: 456, 5: 456},
'Grade': {0: 'Gold', 1: 'Silver', 2: 'Gold', 3: 'Bronze',
4: 'Silver', 5: 'Gold'}
})
# Assign Numerical Value To Each Grade
df['weight'] = df['Grade'].map({'Gold': 2, 'Silver': 1, 'Bronze': 0})
# Get Max For Each Group
df['max'] = df.groupby('VIP_CODE')['weight'].transform('max')
# Where weight is max for group
df['keep'] = np.where(
df['max'] == df['weight'],
'yes',
'dup by ' + df['VIP_CODE'].astype(str)
)
# Drop extra columns
df = df.drop(columns=['weight', 'max'])
# For Display
print(df.to_csv(sep='|', index=False))
df
:
VIP_CODE|Grade|keep
123|Gold|yes
321|Silver|yes
123|Gold|yes
321|Bronze|dup by 321
456|Silver|dup by 456
456|Gold|yes