在特定日期之前对项目进行分组

时间:2018-06-01 13:31:44

标签: python pandas

我有以下数据框:

import pandas as pd

from io import StringIO
data = StringIO("""TitleCode,ReleaseDate,WeekEnding,TotalUnits
A,12/16/2017,12/2/2017 0:00,5
A,12/16/2017,12/9/2017 0:00,10
A,12/16/2017,12/16/2017 0:00,2
A,12/16/2017,12/23/2017 0:00,5
A,12/16/2017,12/30/2017 0:00,4
B,1/6/2018,1/13/2017 0:00,4
B,1/6/2018,1/20/2017 0:00,2
""")


result = StringIO("""TitleCode,ReleaseDate,WeekEnding,TotalUnits
A,12/16/2017,12/16/2017 0:00,17
A,12/16/2017,12/23/2017 0:00,5
A,12/16/2017,12/30/2017 0:00,4
B,1/6/2018,1/13/2017 0:00,4
B,1/6/2018,1/13/2017 0:00,2
""")
datadf = pd.read_csv(data, parse_dates=True)
resultdf = pd.read_csv(result, parse_dates=True)

datadf
    TitleCode   ReleaseDate WeekEnding  TotalUnits
0   A   12/16/2017  12/2/2017 0:00  5
1   A   12/16/2017  12/9/2017 0:00  10
2   A   12/16/2017  12/16/2017 0:00 2
3   A   12/16/2017  12/23/2017 0:00 5
4   A   12/16/2017  12/30/2017 0:00 4
5   B   1/6/2018    1/13/2017 0:00  4
6   B   1/6/2018    1/13/2017 0:00  2

resultdf
    TitleCode   ReleaseDate WeekEnding  TotalUnits
0   A   12/16/2017  12/16/2017 0:00 17
1   A   12/16/2017  12/23/2017 0:00 5
2   A   12/16/2017  12/30/2017 0:00 4
3   B   1/6/2018    1/13/2017 0:00  4
4   B   1/6/2018    1/20/2017 0:00  2

datadf数据框按周显示项目销售额,以及项目的发布日期。我想将所有预售销售组合在一起,即在发布日期之前发生的销售(resultdf)。

我能想到的唯一方法就是循环数据框,但必须有一种更有效的方法。

谢谢!

1 个答案:

答案 0 :(得分:1)

# standardize datetime format for comparison
datadf['WeekEnding'] = pd.to_datetime(datadf.WeekEnding, format='%m/%d/%Y %H:%M')
datadf['ReleaseDate'] = pd.to_datetime(datadf.ReleaseDate, format='%m/%d/%Y')

# replace weekending with release date if smaller
datadf['WeekEnding'] = datadf['WeekEnding'].where(
    datadf['WeekEnding'] > datadf['ReleaseDate'], datadf['ReleaseDate']
)

datadf.groupby(
    ['TitleCode', 'ReleaseDate', 'WeekEnding']
).TotalUnits.sum().reset_index()

enter image description here

相关问题