熊猫计算每组的概率

时间:2018-08-16 08:30:41

标签: pandas

已准备以下数据。

       group  score 
a         1     100 
b         2      80 
c         2      75 
d         2      65 
e         2      55 
f         3      45 
g         3      30 
h         4       1 

我想对每个使用熊猫的群体使用概率。我要取得如下结果。

       group  score first second third fourth sum
a         1     100  100%   27%   22%   22%  171%
b         2      80    0%   21%   18%   17%   57%
c         2      75    0%   20%   17%   16%   53%
d         2      65    0%   17%   14%   14%   46%
e         2      55    0%   15%   12%   12%   39%
f         3      45    0%    0%   10%   10%   20%
g         3      30    0%    0%    7%    7%   13%
h         4       1    0%    0%    0%    2%    2%

它可与以下程序一起使用,但是有更好的方法吗?

df_second = df[df['group'] <= 2]['score'].to_frame('score')
df_second['second'] = df_second / df_second.sum()
del df_second['score']
df.join(df_second)

2 个答案:

答案 0 :(得分:1)

我认为需要循环:

for i in df['group'].unique():
    df[i] = (df['score'] / df.loc[df['group'] <= i, 'score'].sum()) * 100

df['sum'] = df.iloc[:, 2:].sum(axis=1)
print (df)
   group  score      1          2          3          4         sum
a      1    100  100.0  26.666667  22.222222  22.172949  171.061838
b      2     80   80.0  21.333333  17.777778  17.738359  136.849470
c      2     75   75.0  20.000000  16.666667  16.629712  128.296378
d      2     65   65.0  17.333333  14.444444  14.412417  111.190195
e      2     55   55.0  14.666667  12.222222  12.195122   94.084011
f      3     45   45.0  12.000000  10.000000   9.977827   76.977827
g      3     30   30.0   8.000000   6.666667   6.651885   51.318551
h      4      1    1.0   0.266667   0.222222   0.221729    1.710618

具有列表理解功能的另一种解决方案:

arr = df['group'].unique()
comp = [(df['score'] / df.loc[df['group'] <= i, 'score'].sum()) * 100 for i in arr]
df1 = pd.concat(comp, axis=1, keys=arr)
df1['sum'] = df1.sum(axis=1)
print (df1)
       1          2          3          4         sum
a  100.0  26.666667  22.222222  22.172949  171.061838
b   80.0  21.333333  17.777778  17.738359  136.849470
c   75.0  20.000000  16.666667  16.629712  128.296378
d   65.0  17.333333  14.444444  14.412417  111.190195
e   55.0  14.666667  12.222222  12.195122   94.084011
f   45.0  12.000000  10.000000   9.977827   76.977827
g   30.0   8.000000   6.666667   6.651885   51.318551
h    1.0   0.266667   0.222222   0.221729    1.710618

df = df.join(df1)
print (df)
   group  score      1          2          3          4         sum
a      1    100  100.0  26.666667  22.222222  22.172949  171.061838
b      2     80   80.0  21.333333  17.777778  17.738359  136.849470
c      2     75   75.0  20.000000  16.666667  16.629712  128.296378
d      2     65   65.0  17.333333  14.444444  14.412417  111.190195
e      2     55   55.0  14.666667  12.222222  12.195122   94.084011
f      3     45   45.0  12.000000  10.000000   9.977827   76.977827
g      3     30   30.0   8.000000   6.666667   6.651885   51.318551
h      4      1    1.0   0.266667   0.222222   0.221729    1.710618

答案 1 :(得分:0)

请参见pandas groupby apply

g = df.groupby('group')
g.apply(lambda x: x / x.sum())