pandas计算多列的摘要DataFrame

时间:2018-02-24 12:59:30

标签: python pandas jupyter-notebook

我有一个包含各种产品,位置,Licence_ends的文件,我需要计算有多少产品按日期耗尽许可证,以及在该季度可以重新订购的数量,下面的示例数据:

    List<String> cheeses  = Arrays.asList("Gouda", "Edam");
    String x= "Gouda";

    String type =  cheeses.contains(x) ? "Gouda".equals(x) ? "Yummy Gouda" : "Cheese - but not Gouda" : "Maybe not dairy";

我想要实现的目标如下:

   Item    Store    Category    Licence_ends    Available_to_reorder
0  A01929  North    Office      2018 Q1         Yes
1  A02911  South    Windows     2019 Q3         Yes
2  B11282  North    Adobe       2019 Q2         No
3  C73162  East     Office      2018 Q4         Yes
4  A12817  West     Windows     2020 Q1         No

我从下面的代码开始,但我迷路了,不知道正确的方法:

   Store    Category    2018 Q1 2018 Q2 ... 2020 Q4
0  East     Windows       0       1           24   # cumulative sum of previous quarters
1  East     Office        1       2           11
2  East     Adobe         1       4           6
3  West     Windows       2       2           18
4  West     Office        0       0           0
...
11 South    Adobe         1       0           12
12 Total    All       col.sum()  col.sum()   col.sum()     

这是我正在制作的,但仅限于最后一家商店:

enter image description here

分别为每个类别。我尝试添加列表,系列,字典到空数据帧,我尝试追加,添加,分配,并没有得到我想要的。你能指点我到正确的方向吗?

我在SO中经历了大多数方法,并且在Wes Kinley的书中看到了@SaféBooks,但是无法登陆它。请帮忙。我必须在星期一之前完成,而且我绝对没有。

1 个答案:

答案 0 :(得分:2)

使用 aggfunc 参数中的lambda,使用条件逻辑和考虑pivot_table。下面演示随机数据,播种的重现性,当然还要添加开源类别。

数据

import numpy as np
import pandas as pd

np.random.seed(22)
LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')

df = pd.DataFrame({'Item': ["".join(list(np.random.choice(LETTERS,1)) +
                                    [str(np.random.randint(1000, 9000))]) for _ in range(500)],
                   'Store': [np.random.choice(['North', 'South', 
                                               'East', 'West'],1).item(0) for _ in range(500)],
                   'Category': [np.random.choice(['Office', 'Windows', 
                                                  'Adobe', 'Open Source'],1).item(0) for _ in range(500)],
                   'Licence_ends': ["Q".join([str(np.random.randint(2018, 2021))] +
                                             [str(np.random.randint(1,4))]) for _ in range(500)],
                   'Available_to_reorder': [np.random.choice(['Yes', 'No'],1).item(0) for _ in range(500)]},
                  columns = ['Item', 'Store', 'Category', 'Licence_ends', 'Available_to_reorder'])

print(df.head())
#     Item  Store     Category Licence_ends Available_to_reorder
# 0  V7276   West  Open Source       2018Q2                  Yes
# 1  M8104   West      Windows       2020Q1                   No
# 2  E6478  North  Open Source       2019Q2                   No
# 3  W5587  South  Open Source       2018Q2                  Yes
# 4  U3952  South      Windows       2019Q3                   No
# 5  E1989   East       Office       2018Q1                   No
# 6  S6646   West      Windows       2019Q2                  Yes
# 7  N7616   West        Adobe       2019Q1                  Yes
# 8  H6410   East        Adobe       2020Q2                   No
# 9  J8176   West       Office       2020Q1                  Yes

数据透视表 (结果为多索引数据框)

pvt_df = df.pivot_table(index=['Store', 'Category'], columns='Licence_ends', values='Available_to_reorder', 
                        aggfunc = lambda x: sum(x=='Yes'), margins=True, margins_name='Total')

print(pvt_df)                                                            
# Licence_ends       2018Q1  2018Q2  2018Q3  2019Q1  2019Q2  2019Q3  2020Q1  2020Q2  2020Q3  Total
# Store Category                                                                                  
# East  Adobe           3.0     0.0     1.0     0.0     3.0     2.0     1.0     4.0     0.0     14
#       Office          1.0     3.0     4.0     2.0     NaN     4.0     1.0     1.0     1.0     17
#       Open Source     1.0     4.0     2.0     0.0     1.0     0.0     1.0     2.0     1.0     12
#       Windows         1.0     2.0     3.0     1.0     1.0     0.0     1.0     3.0     1.0     13
# North Adobe           3.0     4.0     1.0     1.0     1.0     1.0     3.0     0.0     2.0     16
#       Office          1.0     0.0     3.0     0.0     1.0     2.0     3.0     0.0     0.0     10
#       Open Source     3.0     1.0     0.0     1.0     1.0     2.0     2.0     1.0     2.0     13
#       Windows         2.0     2.0     5.0     0.0     2.0     2.0     1.0     1.0     3.0     18
# South Adobe           2.0     3.0     NaN     2.0     2.0     3.0     1.0     3.0     2.0     18
#       Office          4.0     3.0     1.0     2.0     NaN     2.0     3.0     2.0     2.0     19
#       Open Source     1.0     2.0     2.0     4.0     1.0     NaN     NaN     3.0     2.0     15
#       Windows         2.0     1.0     1.0     2.0     2.0     2.0     1.0     3.0     1.0     15
# West  Adobe           1.0     1.0     0.0     4.0     3.0     3.0     1.0     0.0     3.0     16
#       Office          1.0     1.0     3.0     3.0     3.0     2.0     2.0     2.0     1.0     18
#       Open Source     4.0     2.0     4.0     0.0     0.0     4.0     1.0     1.0     2.0     18
#       Windows         2.0     2.0     1.0     5.0     4.0     1.0     4.0     1.0     0.0     20
# Total                32.0    31.0    31.0    27.0    25.0    30.0    26.0    27.0    23.0    252