根据ID列减去行-熊猫

时间:2019-01-23 11:25:41

标签: python pandas numpy pandas-groupby data-analysis

我有一个看起来像这样的数据框:

public interface BluetoothMessageListener {
        void onMessageReceived(String message);
    }

我想找出编号。用户提供的天数之间存在差距,因此我希望每个用户的每一行都有一列,并且我的数据框应如下所示:

UserId    Date_watched    Days_not_watch
  1        2010-09-11         5
  1        2010-10-01         8
  1        2010-10-28         1
  2        2010-05-06         12
  2        2010-05-18         5
  3        2010-08-09         10
  3        2010-09-25         5

我已经在数据框的列名称旁边提到了用于计算Gap的公式。

1 个答案:

答案 0 :(得分:2)

这是使用groupby + shift的一种方法:

# sort by date first
df['Date_watched'] = pd.to_datetime(df['Date_watched'])
df = df.sort_values(['UserId', 'Date_watched'])

# calculate groupwise start dates, shifted
grp = df.groupby('UserId')
starts = grp['Date_watched'].shift() + \
         pd.to_timedelta(grp['Days_not_watch'].shift(), unit='d')

# calculate timedelta gaps
df['Gap'] = (df['Date_watched'] - starts).fillna(pd.Timedelta(0))

# convert to days and then integers
df['Gap'] = (df['Gap'] / pd.Timedelta('1 day')).astype(int)

print(df)

   UserId Date_watched  Days_not_watch  Gap
0       1   2010-09-11               5    0
1       1   2010-10-01               8   15
2       1   2010-10-28               1   19
3       2   2010-05-06              12    0
4       2   2010-05-18               5    0
5       3   2010-08-09              10    0
6       3   2010-09-25               5   37