Question

我有一个包含各种产品，位置，Licence_ends的文件，我需要计算有多少产品按日期耗尽许可证，以及在该季度可以重新订购的数量，下面的示例数据：

    List<String> cheeses  = Arrays.asList("Gouda", "Edam");
    String x= "Gouda";

    String type =  cheeses.contains(x) ? "Gouda".equals(x) ? "Yummy Gouda" : "Cheese - but not Gouda" : "Maybe not dairy";

我想要实现的目标如下：

   Item    Store    Category    Licence_ends    Available_to_reorder
0  A01929  North    Office      2018 Q1         Yes
1  A02911  South    Windows     2019 Q3         Yes
2  B11282  North    Adobe       2019 Q2         No
3  C73162  East     Office      2018 Q4         Yes
4  A12817  West     Windows     2020 Q1         No

我从下面的代码开始，但我迷路了，不知道正确的方法：

   Store    Category    2018 Q1 2018 Q2 ... 2020 Q4
0  East     Windows       0       1           24   # cumulative sum of previous quarters
1  East     Office        1       2           11
2  East     Adobe         1       4           6
3  West     Windows       2       2           18
4  West     Office        0       0           0
...
11 South    Adobe         1       0           12
12 Total    All       col.sum()  col.sum()   col.sum()

这是我正在制作的，但仅限于最后一家商店：

分别为每个类别。我尝试添加列表，系列，字典到空数据帧，我尝试追加，添加，分配，并没有得到我想要的。你能指点我到正确的方向吗？

我在SO中经历了大多数方法，并且在Wes Kinley的书中看到了@SaféBooks，但是无法登陆它。请帮忙。我必须在星期一之前完成，而且我绝对没有。

Answer 1

使用 aggfunc 参数中的lambda，使用条件逻辑和考虑pivot_table。下面演示随机数据，播种的重现性，当然还要添加开源类别。

数据

import numpy as np import pandas as pd np.random.seed(22) LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ') df = pd.DataFrame({'Item': ["".join(list(np.random.choice(LETTERS,1)) + [str(np.random.randint(1000, 9000))]) for _ in range(500)], 'Store': [np.random.choice(['North', 'South', 'East', 'West'],1).item(0) for _ in range(500)], 'Category': [np.random.choice(['Office', 'Windows', 'Adobe', 'Open Source'],1).item(0) for _ in range(500)], 'Licence_ends': ["Q".join([str(np.random.randint(2018, 2021))] + [str(np.random.randint(1,4))]) for _ in range(500)], 'Available_to_reorder': [np.random.choice(['Yes', 'No'],1).item(0) for _ in range(500)]}, columns = ['Item', 'Store', 'Category', 'Licence_ends', 'Available_to_reorder']) print(df.head()) # Item Store Category Licence_ends Available_to_reorder # 0 V7276 West Open Source 2018Q2 Yes # 1 M8104 West Windows 2020Q1 No # 2 E6478 North Open Source 2019Q2 No # 3 W5587 South Open Source 2018Q2 Yes # 4 U3952 South Windows 2019Q3 No # 5 E1989 East Office 2018Q1 No # 6 S6646 West Windows 2019Q2 Yes # 7 N7616 West Adobe 2019Q1 Yes # 8 H6410 East Adobe 2020Q2 No # 9 J8176 West Office 2020Q1 Yes

数据透视表 （结果为多索引数据框）

pvt_df = df.pivot_table(index=['Store', 'Category'], columns='Licence_ends', values='Available_to_reorder', aggfunc = lambda x: sum(x=='Yes'), margins=True, margins_name='Total') print(pvt_df) # Licence_ends 2018Q1 2018Q2 2018Q3 2019Q1 2019Q2 2019Q3 2020Q1 2020Q2 2020Q3 Total # Store Category # East Adobe 3.0 0.0 1.0 0.0 3.0 2.0 1.0 4.0 0.0 14 # Office 1.0 3.0 4.0 2.0 NaN 4.0 1.0 1.0 1.0 17 # Open Source 1.0 4.0 2.0 0.0 1.0 0.0 1.0 2.0 1.0 12 # Windows 1.0 2.0 3.0 1.0 1.0 0.0 1.0 3.0 1.0 13 # North Adobe 3.0 4.0 1.0 1.0 1.0 1.0 3.0 0.0 2.0 16 # Office 1.0 0.0 3.0 0.0 1.0 2.0 3.0 0.0 0.0 10 # Open Source 3.0 1.0 0.0 1.0 1.0 2.0 2.0 1.0 2.0 13 # Windows 2.0 2.0 5.0 0.0 2.0 2.0 1.0 1.0 3.0 18 # South Adobe 2.0 3.0 NaN 2.0 2.0 3.0 1.0 3.0 2.0 18 # Office 4.0 3.0 1.0 2.0 NaN 2.0 3.0 2.0 2.0 19 # Open Source 1.0 2.0 2.0 4.0 1.0 NaN NaN 3.0 2.0 15 # Windows 2.0 1.0 1.0 2.0 2.0 2.0 1.0 3.0 1.0 15 # West Adobe 1.0 1.0 0.0 4.0 3.0 3.0 1.0 0.0 3.0 16 # Office 1.0 1.0 3.0 3.0 3.0 2.0 2.0 2.0 1.0 18 # Open Source 4.0 2.0 4.0 0.0 0.0 4.0 1.0 1.0 2.0 18 # Windows 2.0 2.0 1.0 5.0 4.0 1.0 4.0 1.0 0.0 20 # Total 32.0 31.0 31.0 27.0 25.0 30.0 26.0 27.0 23.0 252

pandas计算多列的摘要DataFrame

1 个答案: