Pandas GroupBy两列,根据一列计算总量,但根据agregator的总量计算百分比

时间:2018-06-07 01:46:53

标签: python pandas dataframe pandas-groupby

我已经派生了我想要的分组,但想根据每月的总数计算一个百分比列,即无论originating_system_id中的字符串是什么

d = [('Total_RFQ_For_Month', 'size')]
df_RFQ_Channel = df.groupby(['Year_Month','originating_system_id'])['state'].agg(d)
#df_RFQ_Channel['RFQ_Pcent_For_Month'] = ?
display(df_RFQ_Channel)

Year_Month  originating_system_id   Total_RFQ_For_Month RFQ_Pcent_For_Month
2017-11              BBT                      59              7.90%
                     EUCR                     33              4.42%
                     MAXL                     6               0.80%
                     MXUS                     649             86.88%
2017-12              BBT                      36              73.47%
                     EUCR                     7               14.29%
                     MAXL                     6               12.24%
2018-01              BBT                      88              9.52%
                     EUCR                     26              2.81%
                     MAXL                     4               0.43%
                     MXUS                     800             86.58%
                     VOIX                     6               0.65%

示例:

7.90% is BBT's Total_RFQ_For_Month (59) divided by the sum of all for 2017-11 (747) 
2.81% is EUCR's Total_RFQ_For_Month (26) divided by the sum of all for 2018-01 (924). 

2 个答案:

答案 0 :(得分:3)

Series使用transform,其尺寸与原始DataFrame相同,因此可以除以Total_RFQ_For_Month列:

#create columns from MultiIndex
df = df.reset_index()

s = df.groupby('Year_Month')['Total_RFQ_For_Month'].transform('sum')
df['RFQ_Pcent_For_Month'] = df['Total_RFQ_For_Month'].div(s).mul(100).round(2)
print (df)
   Year_Month originating_system_id  Total_RFQ_For_Month  RFQ_Pcent_For_Month
0     2017-11                   BBT                   59                 7.90
1     2017-11                  EUCR                   33                 4.42
2     2017-11                  MAXL                    6                 0.80
3     2017-11                  MXUS                  649                86.88
4     2017-12                   BBT                   36                73.47
5     2017-12                  EUCR                    7                14.29
6     2017-12                  MAXL                    6                12.24
7     2018-01                   BBT                   88                 9.52
8     2018-01                  EUCR                   26                 2.81
9     2018-01                  MAXL                    4                 0.43
10    2018-01                  MXUS                  800                86.58
11    2018-01                  VOIX                    6                 0.65

百分比:

df['RFQ_Pcent_For_Month'] = (df['Total_RFQ_For_Month'].div(s)
                                                     .mul(100)
                                                     .round(2)
                                                     .astype(str)
                                                     .add('%'))
print (df)
   Year_Month originating_system_id  Total_RFQ_For_Month RFQ_Pcent_For_Month
0     2017-11                   BBT                   59                7.9%
1     2017-11                  EUCR                   33               4.42%
2     2017-11                  MAXL                    6                0.8%
3     2017-11                  MXUS                  649              86.88%
4     2017-12                   BBT                   36              73.47%
5     2017-12                  EUCR                    7              14.29%
6     2017-12                  MAXL                    6              12.24%
7     2018-01                   BBT                   88               9.52%
8     2018-01                  EUCR                   26               2.81%
9     2018-01                  MAXL                    4               0.43%
10    2018-01                  MXUS                  800              86.58%
11    2018-01                  VOIX                    6               0.65%

<强>详细

print (s)
0     747
1     747
2     747
3     747
4      49
5      49
6      49
7     924
8     924
9     924
10    924
11    924
Name: Total_RFQ_For_Month, dtype: int64

答案 1 :(得分:1)

重新创建你的df的步骤:

df = pd.DataFrame(columns=['Year_Month', 'originating_system_id', 'Total_RFQ_For_Month'])

# only two months

df.loc[0]=['2017-11','BBT',59]
df.loc[1]=['2017-11','EUCR',33]
df.loc[2]=['2017-11','MAXL',6]
df.loc[3]=['2017-11','MXUS',649]
df.loc[4]=['2017-12','BBT',36]
df.loc[5]=['2017-12','EUCR',7]
df.loc[6]=['2017-12','MAXL',88]

# Same as your DF
gp1 = df.groupby(['Year_Month','originating_system_id']).sum()
gp2=gp1.reset_index()

gp3 = df[['Year_Month','Total_RFQ_For_Month']].groupby(['Year_Month']).sum().rename(columns={'Total_RFQ_For_Month':
                                                                                        'RFQ_For_Month_Sum'})
gp2=gp2.merge(gp3, on='Year_Month')

gp2['RFQ_Pcent_For_Month']=((gp2['Total_RFQ_For_Month']*100)/gp2['RFQ_For_Month_Sum']).round(3).astype(str).add('%')
gp2.drop(['RFQ_For_Month_Sum'],1,inplace=True) 

enter image description here