这是一个数据集,我想知道哪个用户查看哪个项目少于10s(deep_view - view< 10)。数据集如下:
user_id item_id action_type action_time
0 0365F7AE-5048-42B3-BB2C-8E637A380A3E 557082 view 1487423564
1 0365F7AE-5048-42B3-BB2C-8E637A380A3E 557166 view 1487424075
2 0365F7AE-5048-42B3-BB2C-8E637A380A3E 555824 view 1487424241
5 0365F7AE-5048-42B3-BB2C-8E637A380A3E 555824 deep_view 1487424345
3 0365F7AE-5048-42B3-BB2C-8E637A380A3E 554390 view 1487424395
4 0365F7AE-5048-42B3-BB2C-8E637A380A3E 557166 deep_view 1487424175
6 0365F7AE-5048-42B3-BB2C-8E637A380A3E 557082 deep_view 1487423680
7 0365F7AE-5048-42B3-BB2C-8E637A380A3E 554390 deep_view 1487424422
8 06068254-792D-4AFE-AC6C-DE43DB15D735 556134 view 1487417354
9 06068254-792D-4AFE-AC6C-DE43DB15D735 556134 deep_view 1487417411
10 06068254-792D-4AFE-AC6C-DE43DB15D735 550176 view 1487400366
11 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 519444 view 1487415176
12 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555729 deep_view 1487412841
13 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555171 deep_view 1487414707
14 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555744 view 1487412883
15 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555757 view 1487412616
16 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555337 view 1487414331
17 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555784 view 1487413081
18 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555653 view 1487412036
19 077F63F3-3DF4-4041-B3C9-7BAB2BDCA795 555537 view 1487413842
我已尝试过这段代码:
def short_time(data):
data = data.sort_values(by=['user_id', 'action_time'])
id = []
for i in range(data.shape[0] - 1):
if data['action_type'][data.index[i]] == 'view' and data['action_type'][data.index[i + 1]] == 'deep_view' and \
data['user_id'][data.index[i]] == data['user_id'][data.index[i + 1]] \
and data['item_id'][data.index[i]] == data['item_id'][data.index[i + 1]]:
if data['action_time'][data.index[i + 1]] - data["action_time"][data.index[i]] < 10:
id.append(data.index[i])
return data.loc[id, :]
它有效,但它需要很多循环。任何更好的解决方案?这是输出的一部分:
user_id item_id action_type action_time
301800 135973 558284 view 1487386449
457083 149124 544766 view 1487349898
203814 1258134 538039 view 1487382777
454537 1489322 550339 view 1487419315
131863 11703060 553010 view 1487424398
132345 11705467 546168 view 1487369955
137092 11761967 471721 view 1487425655
137236 11765536 539269 view 1487370412
137229 11765536 542229 view 1487370465
137238 11765536 462871 view 1487370491
137241 11765536 542217 view 1487370845
137276 11765536 550339 view 1487379656
137263 11765536 539302 view 1487379832
137275 11765536 541951 view 1487380143
137278 11765536 550737 view 1487381556
137208 11765536 541946 view 1487412335
138095 11776713 552341 view 1487413089
138898 11783870 542197 view 1487406728
138904 11783870 542235 view 1487406763
138903 11783870 541683 view 1487407348
138905 11783870 496537 view 1487407465
139175 11785631 550982 view 1487384606
更重要的是,在完成此任务后,我想找出谁在第一部分中阅读了该项(item = data.loc [id,:] [&#39; item_id&# 39;] ),我尝试了数据[数据[&#39; item_id&#39;] == item] 它失败了。我不想在......中使用......
例如:
正如您所看到的,558284是第一个任务输出中的item_id。所以我想知道谁在数据集中读过这个项目。在数据集中我们可以找到这个
user_id item_id action_type action_time
2 0365F7AE-5048-42B3-BB2C-8E637A380A3E 555824 view 1487424241
5 0365F7AE-5048-42B3-BB2C-8E637A380A3E 555824 deep_view 1487424345
因此目标是找出第一个任务输出中包含item_id的所有行。 有人帮吗? 感谢。