按月计算数据帧的累积值

时间:2019-02-25 06:34:35

标签: python pandas dataframe

Excel中的数据框看起来像屏幕截图的左侧,我想按月份(“期间”)计算其2列的“成本”和“数量(篮子)”的累积值。

所需的结果如屏幕截图的右侧。

enter image description here

import pandas as pd

data = {"Store": ["A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"], 
"Fruit" : ["Avocado", "Avocado", "Avocado", "Berry", "Berry", "Berry",     "Apple", "Apple", "Apple", "Orange", "Orange", "Orange"],
"Period": ["2017-04", "2017-11", "2018-01", "2017-01", "2017-02", "2017-03",     "2017-05", "2017-06", "2017-07", "2018-07", "2018-10", "2018-11"],
"Cost" : [450, 8682, 2372, 976, 329, 3752, 379, 5868, 5497, 1515, 3234, 5430],
"Quantity (Basket)": [68, 72, 69, 70, 70, 57, 60, 58, 49, 80, 60, 64]}

df = pd.DataFrame(data)

我尝试了如下所示的几种方法,但仍然无法理解。

df["Period"] = pd.to_datetime(df["Period"])

df_1 = df.groupby(["Store", "Fruit", "Period"]).sum()
    or
df_2 = df.groupby(["Store", "Fruit", "Period"])['Cost'].agg('sum')
    or
df_3 = df.groupby(["Store", "Fruit", "Period"])['Cost'].sum()
    or
df_4 = df.groupby(["Store", "Fruit"])['Cost'].sum()

您能以正确的方式帮助我吗?谢谢。

1 个答案:

答案 0 :(得分:1)

DataFrameGroupBy.cumsumjoin一起使用:

df1 = df.join((df.groupby(["Store", "Fruit", "Period"])['Cost', 'Quantity (Basket)']
        .cumsum()
        .add_prefix('Accumulative ')))

print (df1)
   Store    Fruit     Period  Cost  Quantity (Basket)  Accumulative Cost  \
0      A  Avocado 2017-04-01   450                 68                450   
1      A  Avocado 2017-11-01  8682                 72               8682   
2      A  Avocado 2018-01-01  2372                 69               2372   
3      A    Berry 2017-01-01   976                 70                976   
4      A    Berry 2017-02-01   329                 70                329   
5      A    Berry 2017-03-01  3752                 57               3752   
6      B    Apple 2017-05-01   379                 60                379   
7      B    Apple 2017-06-01  5868                 58               5868   
8      B    Apple 2017-07-01  5497                 49               5497   
9      B   Orange 2018-07-01  1515                 80               1515   
10     B   Orange 2018-10-01  3234                 60               3234   
11     B   Orange 2018-11-01  5430                 64               5430   

    Accumulative Quantity (Basket)  
0                               68  
1                               72  
2                               69  
3                               70  
4                               70  
5                               57  
6                               60  
7                               58  
8                               49  
9                               80  
10                              60  
11                              64  

df2 = df.join((df.groupby(["Store", "Fruit"])['Cost', 'Quantity (Basket)']
        .cumsum()
        .add_prefix('Accumulative ')))

print (df2)
   Store    Fruit     Period  Cost  Quantity (Basket)  Accumulative Cost  \
0      A  Avocado 2017-04-01   450                 68                450   
1      A  Avocado 2017-11-01  8682                 72               9132   
2      A  Avocado 2018-01-01  2372                 69              11504   
3      A    Berry 2017-01-01   976                 70                976   
4      A    Berry 2017-02-01   329                 70               1305   
5      A    Berry 2017-03-01  3752                 57               5057   
6      B    Apple 2017-05-01   379                 60                379   
7      B    Apple 2017-06-01  5868                 58               6247   
8      B    Apple 2017-07-01  5497                 49              11744   
9      B   Orange 2018-07-01  1515                 80               1515   
10     B   Orange 2018-10-01  3234                 60               4749   
11     B   Orange 2018-11-01  5430                 64              10179   

    Accumulative Quantity (Basket)  
0                               68  
1                              140  
2                              209  
3                               70  
4                              140  
5                              197  
6                               60  
7                              118  
8                              167  
9                               80  
10                             140  
11                             204