根据日期条件删除重复项

时间:2019-07-18 15:41:38

标签: python-3.x pandas

我有一个如下所示的数据框,我想根据某些标准删除重复项。 1)如果开始日期大于Month,它将被删除。 2)如果开始日期少于Month,请保留最新记录。

>       COMP    Month       Startdate   bundle            result
> 0     TD3M    2018-03-01  2015-08-28  01_Essential      keep    
> 1     TD3M    2018-03-01  2018-07-17  04_Complete       remove
> 2     TD3M    2018-04-01  2015-08-28  01_Essential      keep
> 3     TD3M    2018-04-01  2018-07-17  04_Complete       remove
> 4     TD3M    2018-05-01  2015-08-28  01_Essential      keep
> 5     TD3M    2018-05-01  2018-07-17  04_Complete       remove
> 6     TD3M    2018-06-01  2015-08-28  01_Essential      keep
> 7     TD3M    2018-06-01  2018-07-17  04_Complete       remove
> 8     TD3M    2018-08-01  2015-08-28  01_Essential      remove
> 9     TD3M    2018-08-01  2018-07-17  04_Complete       keep
> 10    TD3M    2018-09-01  2015-08-28  01_Essential      remove
> 11    TD3M    2018-09-01  2018-07-17  04_Complete       keep

预期输出为:

>       COMP    Month       Startdate   bundle            
> 0     TD3M    2018-03-01  2015-08-28  01_Essential      
> 2     TD3M    2018-04-01  2015-08-28  01_Essential     
> 4     TD3M    2018-05-01  2015-08-28  01_Essential     
> 6     TD3M    2018-06-01  2015-08-28  01_Essential     
> 9     TD3M    2018-08-01  2018-07-17  04_Complete  
> 11    TD3M    2018-09-01  2018-07-17  04_Complete          

2 个答案:

答案 0 :(得分:1)

首先,我将您的列“结果”删除:

df = df.drop(columns='result')

首先检查您的“月”和“开始日期”字段是否为日期时间格式:

df.Month = pd.to_datetime(df.Month) df.Startdate = pd.to_datetime(df.Startdate)

然后过滤器和分组依据(最大合计):

df = df[df.Startdate <= df.Month] df.groupby(['COMP', 'Month'], as_index=False).max()

答案 1 :(得分:0)

这是使用sort_values drop_duplicates

的一种方法
df.query('Startdate<=Month').sort_values('Startdate').drop_duplicates('Month',keep='last')
Out[892]: 
    COMP      Month  Startdate        bundle result
0   TD3M 2018-03-01 2015-08-28  01_Essential   keep
2   TD3M 2018-04-01 2015-08-28  01_Essential   keep
4   TD3M 2018-05-01 2015-08-28  01_Essential   keep
6   TD3M 2018-06-01 2015-08-28  01_Essential   keep
9   TD3M 2018-08-01 2018-07-17   04_Complete   keep
11  TD3M 2018-09-01 2018-07-17   04_Complete   keep
相关问题