熊猫根据指标将价值除以总和

时间:2018-10-17 06:14:41

标签: python pandas dataframe pandas-groupby

在将其标记为重复项之前,我查看了以下内容: question1 question2 source3

对于每个农民,我正在尝试计算两件事: 1)是水果x的成熟水果的百分比:%(成熟水果x)/(总成熟水果) 2)水果x的成熟水果百分比:%(成熟的水果x)/(总水果x)

基于成熟水果指标(1表示成熟,0表示不成熟)。

输入:

df = pd.DataFrame({'Farmer': ['Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Tims','Tims','Tims','Tims'],
                 'Fruit':['Apple','Apple','Apple','Grape','Grape','Grape','Grape','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry'],
                 'Type': ['Red','Yellow','Green','Red seedless','Red with seeds','Green','Purple','Montmorency','Morello','Bing','Rainer','Montmorency','Morello','Bing','Rainer'],
                 'Number':[2,6,2,1,1,6,2,3,1,3,3,3,1,3,3],
                 'Ripe':[1,1,0,1,0,1,1,0,0,0,1,0,0,0,1]})
df

    Farmer  Fruit   Number  Ripe    Type
0   Sallys  Apple   2        1      Red
1   Sallys  Apple   6        1      Yellow
2   Sallys  Apple   2        0      Green
3   Sallys  Grape   1        1      Red seedless
4   Sallys  Grape   1        0      Red with seeds
5   Sallys  Grape   6        1      Green
6   Sallys  Grape   2        1      Purple
7   Sallys  Cherry  3        0      Montmorency
8   Sallys  Cherry  1        0      Morello
9   Sallys  Cherry  3        0      Bing
10  Sallys  Cherry  3        1      Rainer
11  Tims    Cherry  3        0      Montmorency
12  Tims    Cherry  1        0      Morello
13  Tims    Cherry  3        0      Bing
14  Tims    Cherry  3        1      Rainer

所需的输出:

    Farmer  Fruit   %(ripe fruit x)/(total ripe fruit)  %(ripe fruit x)/(total fruit x)
0   Sallys  Apple   40                                  80
1   Sallys  Grape   45                                  90
2   Sallys  Cherry  15                                  30
3   Tims    Cherry  100                                 30

1 个答案:

答案 0 :(得分:2)

首先聚合sum并用unstack重塑,然后用sum除以div

df1 = df.groupby(['Farmer','Fruit','Ripe'], sort=False)['Number'].sum().unstack()

a = df1[1].div(df1[1].sum(level=0)).mul(100)
b = df1[1].div(df1.sum(axis=1)).mul(100)

keys = ('%(ripe fruit x)/(total ripe fruit)','%(ripe fruit x)/(total fruit x)')
df2 = pd.concat([a,b], axis=1, keys=keys).reset_index()
print (df2)
   Farmer   Fruit  %(ripe fruit x)/(total ripe fruit)  \
0  Sallys   Apple                                40.0   
1  Sallys   Grape                                45.0   
2  Sallys  Cherry                                15.0   
3    Tims  Cherry                               100.0   

   %(ripe fruit x)/(total fruit x)  
0                             80.0  
1                             90.0  
2                             30.0  
3                             30.0