给定pandas
数据框:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
'x': pd.Series([1.0, 1.0, 2.0, 1.0, 2.0]),
'y': pd.Series([6.0, 7.0, 8.0, 9.0, 10.0]),
'z': pd.Series([3, 2, 1, 0, 0])
})
grpd = df.groupby(['clients']).agg({
'x': [np.sum, np.average],
'y': [np.sum, np.average],
'z': [np.sum, np.average]
})
In[55]: grpd
Out[53]:
y x z
sum average sum average sum average
clients
A 21 7.0 4 1.333333 6 2
B 19 9.5 3 1.500000 0 0
如何创建将一个函数应用于选定子列的新列?
期望的结果是:
y x z new_col
sum average sum average sum average
clients
A 21 7.0 4 1.333333 6 2 0.19
B 19 9.5 3 1.500000 0 0 0.15
我有这样的想法:
grpd['new_col'] = grpd[['x', 'y']].apply(lambda x: x[0]['sum'] / x[1]['sum'], axis=1)
答案 0 :(得分:0)
您可以执行操作的矢量化版本:
grpd['new_col'] = grpd[('x', 'sum')]/grpd[('y', 'sum')]
或者,为了保持一致性(使new_col
sum
的二级索引与[{1}}和x
一样):
y