计算电子邮件数据的唯一前提ID的日期差

时间:2018-12-27 11:59:57

标签: python loops dataframe

我有一个数据框,如下所示:

RefNo TopicNo BillA/c PremisesNo Date Age TopicType 
  1    111     1234     54698    11/12/18  APSR
  2    222     5698     123654   12/12/18  KLPO

我需要查找所有PremisesNo的出现,并找出各自的日期差异。

RefNo TopicNo BillA/c PremisesNo Date Age TopicType Diff
  1    111     1234     54698    11/12/18  APSR      1
  2    222     5698     54698    12/12/18  KLPO      0
  3    333     5798     54698    12/12/18  KLPO      NA

我尝试了以下代码:

df2 =[]
def occurence(df1):
for ind, row in df2.iterrows(): 
    if ind in df['Premises Number'].unique():
        df2.append(df1['Premises Number'])
 return df2 

occurence(df1)

但是它没有提供所需的解决方案。需要一些建议。

sample data

2 个答案:

答案 0 :(得分:0)

您可以按PremisesNo分组,并使用DateAge.diff列的diff

df['Diff'] = df.groupby('PremisesNo').['Date Age'].diff(-1).abs().dt.days

使用示例数据框:

         TopicNo  BillA/c PremisesNo Date Age   TopicType
RefNo                                                   
1          111     1234       54698 2018-12-11      APSR
2          222     5698       54698 2018-12-12      KLPO
3          333     5798       54698 2018-12-12      KLPO

首先将Date Age列设置为日期时间,然后执行上述操作:

df['Date Age'] = pd.to_datetime(df['Date Age'], format = '%d/%m/%y')
df['Diff'] = df.groupby('PremisesNo')['Date Age'].diff(-1).abs().dt.days

         TopicNo  BillA/c  PremisesNo  Date Age    TopicType  Diff
RefNo                                                         
1          111     1234       54698   2018-12-11      APSR   1.0
2          222     5698       54698   2018-12-12      KLPO   0.0
3          333     5798       54698   2018-12-12      KLPO   NaN

答案 1 :(得分:0)

要添加到@nixon答案中,请尝试

将“日期年龄”转换为熊猫DateTime

df['Date Age'] = pd.to_datetime(df['Date Age'])
df['Diff'] = df[['PremisesNo','Date Age']].groupby('PremisesNo')['Date Age'].diff()

当前提没有变化时,则使差异无

df.loc[df.PremisesNo != df.PremisesNo.shift(),'Diff'] = None