将熊猫数据框行值汇总为平均值和总和

时间:2018-10-04 08:34:14

标签: python pandas pivot

我有一个如下数据框:

data =[['A','ABC001','18M01',1,3],['A','ABC002','18M01',2,4],['A','ABC001','18M02',3,3],['B','ABC001','18M01',4,3],['B','ABC002','18M02',5,4],['B','ABC002','18M02',6,4]]
df = pd.DataFrame(data,columns=['Type','Product','Month','Requirement','Inventory'])
df

输入:

Type Product Month Requirement Inventory
A    ABC001  18M01     1         3
A    ABC002  18M01     2         4
A    ABC001  18M02     3         3
B    ABC001  18M01     4         3
B    ABC002  18M02     5         4
B    ABC002  18M02     6         4

我想做的就是将它总结成这样的数据框

输出:

Type    Product           Values         18M01    18M02
A       ABC001      Sum of Requirement     1       3
A       ABC001      Average of Inventory   3       3
A       ABC002      Sum of Requirement     2      NaN
A       ABC002      Average of Inventory   4      NaN
B       ABC001      Sum of Requirement     4      NaN
B       ABC001      Average of Inventory   3      NaN
B       ABC002      Sum of Requirement    NaN      11
B       ABC002      Average of Inventory  NaN      4

我可以很容易地在excel枢轴中创建它,但是在使用pandas枢轴时根本不知道。请帮助

2 个答案:

答案 0 :(得分:1)

我认为您需要通过summean进行汇总,在列中展平MultiIndex并通过stack通过unstack来重塑:

df1 = (df.groupby(['Type','Product','Month'])
         .agg({'Requirement': 'sum','Inventory':'mean'})
         .rename(columns={'Requirement':'Sum of Requirement',
                          'Inventory':'Average of Inventory'})
         .stack()
         .unstack(2)
         .reset_index()
         .rename(columns={'level_2':'Values'}))

print (df1)
Month Type Product                Values  18M01  18M02
0        A  ABC001    Sum of Requirement    1.0    3.0
1        A  ABC001  Average of Inventory    3.0    3.0
2        A  ABC002    Sum of Requirement    2.0    NaN
3        A  ABC002  Average of Inventory    4.0    NaN
4        B  ABC001    Sum of Requirement    4.0    NaN
5        B  ABC001  Average of Inventory    3.0    NaN
6        B  ABC002    Sum of Requirement    NaN   11.0
7        B  ABC002  Average of Inventory    NaN    4.0

答案 1 :(得分:1)

一种pivot_table的处理方式-

df1 = df.pivot_table('Requirement', ['Type','Product'], 'Month', aggfunc='sum')
df1['Values'] = 'Sum of Requirement'
df2 = df.pivot_table('Inventory', ['Type','Product'], 'Month', aggfunc='mean')
df2['Values'] = 'Average of Inventory'

df1.append(df2)

输出

Month         18M01  18M02                Values
Type Product
A    ABC001     1.0    3.0    Sum of Requirement
     ABC002     2.0    NaN    Sum of Requirement
B    ABC001     4.0    NaN    Sum of Requirement
     ABC002     NaN   11.0    Sum of Requirement
A    ABC001     3.0    3.0  Average of Inventory
     ABC002     4.0    NaN  Average of Inventory
B    ABC001     3.0    NaN  Average of Inventory
     ABC002     NaN    4.0  Average of Inventory

您可以添加reset_index()使其变得更好-

df1.append(df2).reset_index()

Month Type Product  18M01  18M02                Values
0        A  ABC001    1.0    3.0    Sum of Requirement
1        A  ABC002    2.0    NaN    Sum of Requirement
2        B  ABC001    4.0    NaN    Sum of Requirement
3        B  ABC002    NaN   11.0    Sum of Requirement
4        A  ABC001    3.0    3.0  Average of Inventory
5        A  ABC002    4.0    NaN  Average of Inventory
6        B  ABC001    3.0    NaN  Average of Inventory
7        B  ABC002    NaN    4.0  Average of Inventory