Pandas:按组聚合后对列执行操作

时间:2017-01-19 14:51:19

标签: pandas

如果我有以下df,我想按A列分组并将D列除以每个A的最大D.

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
...:    ...:                           'foo', 'bar', 'foo', 'foo'],
...:    ...:                    'B' : ['one', 'one', 'two', 'three',
...:    ...:                           'two', 'two', 'one', 'three'],
...:    ...:                    'C' : np.random.randn(8),
...:    ...:                    'D' : np.random.randn(8)})

我试过像

这样的东西
max_by_id = df.groupby('A')['D'].max()
df = df.set_index('A')
df['D'] /= max_by_id.reset_index()['D']

但是这给了我

ValueError: cannot reindex from a duplicate axis

1 个答案:

答案 0 :(得分:2)

// module.js var name = "foobar"; // export it exports.name = name; Then, in route.js... > //route.js > // get a reference to your required module > var myModule = require('./module'); > //correct path to folder where your above file is > // name is a member of myModule due to the export above > var name = myModule.name; 对象上聚合的计算最大值具有缩减的索引,因此错误,如果要将原始df列除以聚合,则可以在{{groupby上调用transform 1}} object,使索引对齐:

groupby

你可以看到差异:

In [192]:    
df['D'].div(df.groupby('A')['D'].transform('max'))

Out[192]:
0   -0.601098
1   -0.553823
2   -0.408006
3    1.000000
4    0.312029
5    0.709397
6    1.000000
7    0.140932
Name: D, dtype: float64

此外,当您In [193]: df.groupby('A')['D'].transform('max') Out[193]: 0 1.508660 1 1.378085 2 1.508660 3 1.378085 4 1.508660 5 1.378085 6 1.508660 7 1.508660 Name: D, dtype: float64 In [194]: df.groupby('A')['D'].max() Out[194]: A bar 1.378085 foo 1.508660 Name: D, dtype: float64 时,它会删除原始的reset_index列标签:

grouped

但在此之前,您将索引设置为列' A'但是这会失败:

In [198]:
max_by_id.reset_index()['D']

Out[198]:
0    0.215997
1    0.962928
Name: D, dtype: float64

此外,您可以使用df['D'] /= max_by_id.reset_index()['D'] lambda在同一apply中执行此操作:

lambda