对于给定的数据框df
df = pd.DataFrame({
'id': [1, 2, 2],
'name': ['Peter', 'Max', None],
'age': [50.0, np.nan, 60.0]
})
如果在分组行的列中只有groupby
或None
,我想nan
并合并数据,以便生成的df看起来像
age id name
id
1 0 50.0 1 Peter
2 1 60.0 2 Max
有没有比这个更好的解决方案:
def f(df):
names = set(df['name']) - {None}
if len(names) == 1:
df['name'] = names.pop()
else:
print('Error: Names are not mergeable:', names)
ages = {age for age in df['age'] if ~np.isnan(age)}
if len(ages) == 1:
df['age'] = ages.pop()
else:
print('Error: Ages are not mergeable:', ages)
df = df.drop_duplicates()
return df
df.groupby('id').apply(f)
答案 0 :(得分:1)
groupby
+ first
df.groupby('id').first()
Out[877]:
age name
id
1 50.0 Peter
2 60.0 Max
答案 1 :(得分:1)
这可能是最慢的解决方案,你可以将nan分类到最后并将它们放在groupby中,即
df = pd.DataFrame({
'id': [1, 2, 2,1,2],
'name': ['Peter', 'Max', None,'Daniel','Sign'],
'age': [50.0, np.nan, 60.0,40,30]
})
# age id name
#0 50.0 1 Peter
#1 NaN 2 Max
#2 60.0 2 None
#3 40.0 1 Daniel
#4 30.0 2 Sign
df.groupby('id').apply(lambda x: x.apply(sorted,key=pd.isnull).dropna()).reset_index(drop=True)
age id name
0 50.0 1 Peter
1 40.0 1 Daniel
2 60.0 2 Max
3 30.0 2 Sign