熊猫数据透视表加权平均值

时间:2019-05-18 01:53:35

标签: python pandas pivot

我正在尝试在Pandas数据透视表中计算加权平均价格。

我尝试使用groupby,它可以与np.average配合使用。但是,我无法使用pd.pivot_table复制它。

我有一个由字典构造的DataFrame:

dict_data = {
    'Contract' : ['Contract 1', 'Contract 2', 'Contract 3', 'Contract 4', 'Contract 5', 'Contract 6', 'Contract 7', 'Contract 8', 'Contract 9', 'Contract 10', 'Contract 11', 'Contract 12'],
    'Contract_Date': ['01/01/2019', '02/02/2019', '03/03/2019', '04/03/2019', '01/01/2019', '02/02/2019', '03/03/2019', '04/03/2019', '01/01/2019', '02/02/2019', '03/03/2019', '04/03/2019'],
    'Product': ['A','A','A','A','B','B','B','B', 'C','C','C','C'],
    'Delivery' : ['2019-01', '2019-01', '2019-02', '2019-03', '2019-01', '2019-01', '2019-02', '2019-03', '2019-01', '2019-01', '2019-02', '2019-03'],
    'Price' : [90, 95, 100, 105, 90, 95, 100, 105, 90, 95, 100, 105],
    'Balance': [50, 100, 150, 200, 50, 100, 150, 200, 50, 100, 150, 200]
}

df = pd.DataFrame.from_dict(dict_data)

df
    Contract        Contract_Date   Product     Delivery    Price   Balance
0   Contract 1      01/01/2019      A           2019-01     90      50
1   Contract 2      02/02/2019      A           2019-01     95      100 
2   Contract 3      03/03/2019      A           2019-02     100     150
3   Contract 4      04/03/2019      A           2019-03     105     200
4   Contract 5      01/01/2019      B           2019-01     90      50
5   Contract 6      02/02/2019      B           2019-01     95      100
6   Contract 7      03/03/2019      B           2019-02     100     150
7   Contract 8      04/03/2019      B           2019-03     105     200
8   Contract 9      01/01/2019      C         ` 2019-01     90      50
9   Contract 10     02/02/2019      C           2019-01     95      100
10  Contract 11     03/03/2019      C           2019-02     100     150
11  Contract 12     04/03/2019      C           2019-03     105     200

使用groupby进行加权平均计算:

df.groupby(['Product', 'Delivery']).apply(lambda x: np.average(x.Price, weights=x.Balance))

输出:

Product  Delivery
A        2019-01      93.333333
         2019-02     100.000000
         2019-03     105.000000
B        2019-01      93.333333
         2019-02     100.000000
         2019-03     105.000000
C        2019-01      93.333333
         2019-02     100.000000
         2019-03     105.000000

尝试并陷入以下困境:

# Define a dictionary with the functions to apply for a given column:
f = {'Balance': ['sum'], 'Price': [np.average(df.Price, weights=df.Balance)] }

# Construct a pivot table, applying the weighted average price function to 'Price'
df.pivot_table(
    columns='Delivery',
    values=['Balance', 'Price'],
    index='Product',
    aggfunc=f
).swaplevel(1,0,axis=1).sort_index(axis=1)

在共享列Balance下的预期输出(显示2个值PriceDelivery):

Delivery    2019-01           2019-02           2019-03
            Balance  Price    Balance  Price    Balance Price
Product                         
A           150      93.333   150      100      200     105
B           150      93.333   150      100      200     105
C           150      93.333   150      100      200     105

1 个答案:

答案 0 :(得分:1)

我认为您可以修复代码

df.groupby(['Product', 'Delivery']).\
    apply(lambda x: pd.Series([np.average(x.Price, weights=x.Balance),x.Balance.sum()],index=['Price','Balance'])).unstack()
Out[21]: 
              Price                 Balance                
Delivery    2019-01 2019-02 2019-03 2019-01 2019-02 2019-03
Product                                                    
A         93.333333   100.0   105.0   150.0   150.0   200.0
B         93.333333   100.0   105.0   150.0   150.0   200.0
C         93.333333   100.0   105.0   150.0   150.0   200.0