拥有以下数据框,并希望添加一个新列,指示groupby操作中与max关联的值。所以我的DF会有很多这样的记录,我只想循环遍历每个组,类似于df.groupby('B')['C'].mean().idxmax()
,但是遍历整个DF然后创建一个新的col,结果如下:
A B C D
0 05641 KMPV 45 056
1 05649 KMPV 45 056
2 05650 KMPV 45 056
3 05651 KMPV 45 056
4 05653 KMPV 45 056
5 05678 KMPV 45 056
6 05654 KMPV 45 056
7 05609 KMPV 45 056
8 05620 KMPV 45 056
9 05633 KMPV 45 056
10 05601 KMPV 45 056
11 05602 KMPV 45 056
12 05603 KMPV 45 056
13 05662 KMVL 23 056
14 05665 KMVL 23 056
15 05672 KMVL 23 056
16 05677 KMVL 23 056
17 05648 KMVL 23 056
18 05652 KMVL 23 056
19 05680 KMVL 23 056
20 05655 KMVL 23 056
21 05656 KMVL 23 056
22 05657 KMVL 23 056
23 05661 KMVL 23 056
24 05681 KMVL 23 056
25 05682 KMVL 23 056
26 05647 K1V4 30 056
27 05658 K1V4 30 056
新的DataFrame,其中列指示KMPV具有最高值。
A B C D newval
0 05641 KMPV 45 056 KMPV
1 05649 KMPV 45 056 KMPV
2 05650 KMPV 45 056 KMPV
3 05651 KMPV 45 056 KMPV
4 05653 KMPV 45 056 KMPV
5 05678 KMPV 45 056 KMPV
6 05654 KMPV 45 056 KMPV
7 05609 KMPV 45 056 KMPV
8 05620 KMPV 45 056 KMPV
9 05633 KMPV 45 056 KMPV
10 05601 KMPV 45 056 KMPV
11 05602 KMPV 45 056 KMPV
12 05603 KMPV 45 056 KMPV
13 05662 KMVL 23 056 KMPV
14 05665 KMVL 23 056 KMPV
15 05672 KMVL 23 056 KMPV
16 05677 KMVL 23 056 KMPV
17 05648 KMVL 23 056 KMPV
18 05652 KMVL 23 056 KMPV
19 05680 KMVL 23 056 KMPV
20 05655 KMVL 23 056 KMPV
21 05656 KMVL 23 056 KMPV
22 05657 KMVL 23 056 KMPV
23 05661 KMVL 23 056 KMPV
24 05681 KMVL 23 056 KMPV
25 05682 KMVL 23 056 KMPV
26 05647 K1V4 30 056 KMPV
27 05658 K1V4 30 056 KMPV
更简单地说。如何从以下组中选择最大值?换句话说,我想过滤到056 - > KMPV和100 - > KJRB,因为他们拥有最高的数字等等。
A B
056 K1V4 30
KMPV 45
KMVL 23
100 KJRB 130
KNYC 84
KTEB 80
答案 0 :(得分:2)
快速又脏。
数据:强>
{'A': {0: 5641, 1: 5649, 2: 5650, 3: 5651, 4: 5653, 5: 5678, 6: 5654, 7: 5609, 8: 5620, 9: 5633, 10: 5601, 11: 5602, 12: 5603, 13: 5662, 14: 5665, 15: 5672, 16: 5677, 17: 5648, 18: 5652, 19: 5680, 20: 10155, 21: 10156, 22: 10157, 23: 10161, 24: 10181, 25: 10182, 26: 10147, 27: 10158}, 'C': {0: 45, 1: 45, 2: 45, 3: 45, 4: 45, 5: 45, 6: 45, 7: 45, 8: 45, 9: 45, 10: 45, 11: 45, 12: 45, 13: 23, 14: 23, 15: 23, 16: 23, 17: 23, 18: 23, 19: 23, 20: 23, 21: 23, 22: 23, 23: 23, 24: 23, 25: 23, 26: 30, 27: 30}, 'B': {0: 'KMPV', 1: 'KMPV', 2: 'KMPV', 3: 'KMPV', 4: 'KMPV', 5: 'KMPV', 6: 'KMPV', 7: 'KMPV', 8: 'KMPV', 9: 'KMPV', 10: 'KMPV', 11: 'KMPV', 12: 'KMPV', 13: 'KMVL', 14: 'KMVL', 15: 'KMVL', 16: 'KMVL', 17: 'KMVL', 18: 'KMVL', 19: 'KMVL', 20: 'KMVL', 21: 'KMVL', 22: 'KMVL', 23: 'KMVL', 24: 'KMVL', 25: 'KMVL', 26: 'K1V4', 27: 'K1V4'}, 'D': {0: 56, 1: 56, 2: 56, 3: 56, 4: 56, 5: 56, 6: 56, 7: 56, 8: 56, 9: 56, 10: 56, 11: 56, 12: 56, 13: 56, 14: 56, 15: 56, 16: 56, 17: 56, 18: 56, 19: 56, 20: 101, 21: 101, 22: 101, 23: 101, 24: 101, 25: 101, 26: 101, 27: 101}}
<强>代码:强>
import pandas as pd
df = pd.read_csv('so.csv')
df_ = df.groupby(['D'])
# Result of above line is a tuple.
for g, v in df_:
# Get max C of each group. Then get column B.
x = v[v['C'] == v['C'].max()]['B'].iat[0]
df.loc[df['D'] == g,'newval'] = x
print df
<强>结果:强>
A B C D newval
0 5641 KMPV 45 56 KMPV
...
13 5662 KMVL 23 56 KMPV
...
20 10155 KMVL 23 101 K1V4
...
26 10147 K1V4 30 101 K1V4
27 10158 K1V4 30 101 K1V4