Pandas从groupby的结果中添加新列

时间:2015-03-17 10:35:38

标签: python pandas

拥有以下数据框,并希望添加一个新列,指示groupby操作中与max关联的值。所以我的DF会有很多这样的记录,我只想循环遍历每个组,类似于df.groupby('B')['C'].mean().idxmax(),但是遍历整个DF然后创建一个新的col,结果如下:

           A     B   C    D
    0   05641  KMPV  45  056
    1   05649  KMPV  45  056
    2   05650  KMPV  45  056
    3   05651  KMPV  45  056
    4   05653  KMPV  45  056
    5   05678  KMPV  45  056
    6   05654  KMPV  45  056
    7   05609  KMPV  45  056
    8   05620  KMPV  45  056
    9   05633  KMPV  45  056
    10  05601  KMPV  45  056
    11  05602  KMPV  45  056
    12  05603  KMPV  45  056
    13  05662  KMVL  23  056
    14  05665  KMVL  23  056
    15  05672  KMVL  23  056
    16  05677  KMVL  23  056
    17  05648  KMVL  23  056
    18  05652  KMVL  23  056
    19  05680  KMVL  23  056
    20  05655  KMVL  23  056
    21  05656  KMVL  23  056
    22  05657  KMVL  23  056
    23  05661  KMVL  23  056
    24  05681  KMVL  23  056
    25  05682  KMVL  23  056
    26  05647  K1V4  30  056
    27  05658  K1V4  30  056

新的DataFrame,其中列指示KMPV具有最高值。

            A     B   C    D newval
    0   05641  KMPV  45  056   KMPV
    1   05649  KMPV  45  056   KMPV
    2   05650  KMPV  45  056   KMPV
    3   05651  KMPV  45  056   KMPV
    4   05653  KMPV  45  056   KMPV
    5   05678  KMPV  45  056   KMPV
    6   05654  KMPV  45  056   KMPV
    7   05609  KMPV  45  056   KMPV
    8   05620  KMPV  45  056   KMPV
    9   05633  KMPV  45  056   KMPV
    10  05601  KMPV  45  056   KMPV
    11  05602  KMPV  45  056   KMPV
    12  05603  KMPV  45  056   KMPV
    13  05662  KMVL  23  056   KMPV
    14  05665  KMVL  23  056   KMPV
    15  05672  KMVL  23  056   KMPV
    16  05677  KMVL  23  056   KMPV
    17  05648  KMVL  23  056   KMPV
    18  05652  KMVL  23  056   KMPV
    19  05680  KMVL  23  056   KMPV
    20  05655  KMVL  23  056   KMPV
    21  05656  KMVL  23  056   KMPV
    22  05657  KMVL  23  056   KMPV
    23  05661  KMVL  23  056   KMPV
    24  05681  KMVL  23  056   KMPV
    25  05682  KMVL  23  056   KMPV
    26  05647  K1V4  30  056   KMPV
    27  05658  K1V4  30  056   KMPV

更简单地说。如何从以下组中选择最大值?换句话说,我想过滤到056 - > KMPV和100 - > KJRB,因为他们拥有最高的数字等等。

     A               B
    056            K1V4        30
                   KMPV        45
                   KMVL        23
    100            KJRB       130
                   KNYC        84
                   KTEB        80

1 个答案:

答案 0 :(得分:2)

快速又脏。

数据:

{'A': {0: 5641, 1: 5649, 2: 5650, 3: 5651, 4: 5653, 5: 5678, 6: 5654, 7: 5609, 8: 5620, 9: 5633, 10: 5601, 11: 5602, 12: 5603, 13: 5662, 14: 5665, 15: 5672, 16: 5677, 17: 5648, 18: 5652, 19: 5680, 20: 10155, 21: 10156, 22: 10157, 23: 10161, 24: 10181, 25: 10182, 26: 10147, 27: 10158}, 'C': {0: 45, 1: 45, 2: 45, 3: 45, 4: 45, 5: 45, 6: 45, 7: 45, 8: 45, 9: 45, 10: 45, 11: 45, 12: 45, 13: 23, 14: 23, 15: 23, 16: 23, 17: 23, 18: 23, 19: 23, 20: 23, 21: 23, 22: 23, 23: 23, 24: 23, 25: 23, 26: 30, 27: 30}, 'B': {0: 'KMPV', 1: 'KMPV', 2: 'KMPV', 3: 'KMPV', 4: 'KMPV', 5: 'KMPV', 6: 'KMPV', 7: 'KMPV', 8: 'KMPV', 9: 'KMPV', 10: 'KMPV', 11: 'KMPV', 12: 'KMPV', 13: 'KMVL', 14: 'KMVL', 15: 'KMVL', 16: 'KMVL', 17: 'KMVL', 18: 'KMVL', 19: 'KMVL', 20: 'KMVL', 21: 'KMVL', 22: 'KMVL', 23: 'KMVL', 24: 'KMVL', 25: 'KMVL', 26: 'K1V4', 27: 'K1V4'}, 'D': {0: 56, 1: 56, 2: 56, 3: 56, 4: 56, 5: 56, 6: 56, 7: 56, 8: 56, 9: 56, 10: 56, 11: 56, 12: 56, 13: 56, 14: 56, 15: 56, 16: 56, 17: 56, 18: 56, 19: 56, 20: 101, 21: 101, 22: 101, 23: 101, 24: 101, 25: 101, 26: 101, 27: 101}}

<强>代码:

import pandas as pd

df = pd.read_csv('so.csv')
df_ = df.groupby(['D'])
# Result of above line is a tuple.
for g, v in df_:
    # Get max C of each group. Then get column B.
    x = v[v['C'] == v['C'].max()]['B'].iat[0]
    df.loc[df['D'] == g,'newval'] = x
print df

<强>结果:

        A     B   C    D newval
0    5641  KMPV  45   56   KMPV
...
13   5662  KMVL  23   56   KMPV
...
20  10155  KMVL  23  101   K1V4
...
26  10147  K1V4  30  101   K1V4
27  10158  K1V4  30  101   K1V4