从条件中获取2个数据帧的分组数据

时间:2017-10-20 15:18:54

标签: python pandas

有2个数据帧:

print df1

userid  reg_date
1       2015-07-21
2       2015-07-11
3       2015-07-14

print df2

userid           date               status      amount
1             2015-07-22            CHARGED      11.68
1             2015-07-29            CHARGED      21.4
2             2015-07-13            CHARGED      18.98
2             2015-07-15           DECLINED      10.96

需要来自df1的每个用户ID在df2中查找总和(金额),其中status =“CHARGED”和reg_date + 7> date

# result
userid amount
1      11.68
2      18.98
3      0

我以这种方式构建解决方案。 但是这样,如果在df2中没有满足条件的行,则UserId将不返回任何内容(需要返回0)。


    import pandas as pd
    from datetime import timedelta
    df1 = pd.read_csv('Task2_data1.csv', sep=',',parse_dates=['reg_date'])
    df2 = pd.read_csv('Task2_data2.csv', sep=',',parse_dates=['date'])
    df2['amount'] = df2['amount'].replace(',','.', regex=True).astype(float)
    df3 = pd.merge(df1, df2, how='outer', on=['userid', 'userid'])
    df3 = df3[(df3.status == 'CHARGED') & 
              (df3.reg_date + timedelta(days=7)>df3.date)]   
    print df3.groupby(['userid'])['amount'].sum()

有没有其他方法可以做到这一点?

1 个答案:

答案 0 :(得分:1)

使用

In [4974]: dff = df2.merge(df1)

In [4975]: (dff[dff['status'].eq('CHARGED') & (dff['date']-dff['reg_date']).dt.days.le(7)]
              .groupby('userid')['amount'].sum()
              .reindex(df1['userid'].unique(), fill_value=0)
              .reset_index())
Out[4975]:
   userid  amount
0       1   11.68
1       2   18.98
2       3    0.00