根据其他列pandas计算列

时间:2017-05-12 14:25:57

标签: python pandas

这是一个数据集,我想知道哪个用户查看哪个项目少于10s(deep_view - view< 10)。数据集如下:

                               user_id  item_id  action_type  action_time
0   0365F7AE-5048-42B3-BB2C-8E637A380A3E   557082        view   1487423564
1   0365F7AE-5048-42B3-BB2C-8E637A380A3E   557166        view   1487424075 
2   0365F7AE-5048-42B3-BB2C-8E637A380A3E   555824        view   1487424241
5   0365F7AE-5048-42B3-BB2C-8E637A380A3E   555824   deep_view   1487424345 
3   0365F7AE-5048-42B3-BB2C-8E637A380A3E   554390        view   1487424395
4   0365F7AE-5048-42B3-BB2C-8E637A380A3E   557166   deep_view   1487424175    
6   0365F7AE-5048-42B3-BB2C-8E637A380A3E   557082   deep_view   1487423680
7   0365F7AE-5048-42B3-BB2C-8E637A380A3E   554390   deep_view   1487424422 
8   06068254-792D-4AFE-AC6C-DE43DB15D735   556134        view   1487417354
9   06068254-792D-4AFE-AC6C-DE43DB15D735   556134   deep_view   1487417411  
10  06068254-792D-4AFE-AC6C-DE43DB15D735   550176        view   1487400366  
11  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   519444        view   1487415176  
12  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555729   deep_view   1487412841  
13  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555171   deep_view   1487414707 
14  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555744        view   1487412883 
15  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555757        view   1487412616 
16  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555337        view   1487414331  
17  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555784        view   1487413081  
18  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555653        view   1487412036
19  077F63F3-3DF4-4041-B3C9-7BAB2BDCA795   555537        view   1487413842  

我已尝试过这段代码:

def short_time(data):
    data = data.sort_values(by=['user_id', 'action_time'])
    id = []
    for i in range(data.shape[0] - 1):
        if data['action_type'][data.index[i]] == 'view' and data['action_type'][data.index[i + 1]] == 'deep_view' and \
                        data['user_id'][data.index[i]] == data['user_id'][data.index[i + 1]] \
                and data['item_id'][data.index[i]] == data['item_id'][data.index[i + 1]]:
            if data['action_time'][data.index[i + 1]] - data["action_time"][data.index[i]] < 10:
                id.append(data.index[i])
    return data.loc[id, :]

它有效,但它需要很多循环。任何更好的解决方案?这是输出的一部分:

          user_id  item_id  action_type  action_time
301800     135973   558284        view   1487386449
457083     149124   544766        view   1487349898
203814    1258134   538039        view   1487382777
454537    1489322   550339        view   1487419315
131863   11703060   553010        view   1487424398
132345   11705467   546168        view   1487369955
137092   11761967   471721        view   1487425655
137236   11765536   539269        view   1487370412
137229   11765536   542229        view   1487370465
137238   11765536   462871        view   1487370491
137241   11765536   542217        view   1487370845
137276   11765536   550339        view   1487379656
137263   11765536   539302        view   1487379832
137275   11765536   541951        view   1487380143
137278   11765536   550737        view   1487381556
137208   11765536   541946        view   1487412335
138095   11776713   552341        view   1487413089
138898   11783870   542197        view   1487406728
138904   11783870   542235        view   1487406763
138903   11783870   541683        view   1487407348
138905   11783870   496537        view   1487407465
139175   11785631   550982        view   1487384606

更重要的是,在完成此任务后,我想找出谁在第一部分中阅读了该项(item = data.loc [id,:] [&#39; item_id&# 39;] ),我尝试了数据[数据[&#39; item_id&#39;] == item] 它失败了。我不想在......中使用......

例如:

正如您所看到的,558284是第一个任务输出中的item_id。所以我想知道谁在数据集中读过这个项目。在数据集中我们可以找到这个

                                user_id  item_id  action_type  action_time
2   0365F7AE-5048-42B3-BB2C-8E637A380A3E   555824        view   1487424241
5   0365F7AE-5048-42B3-BB2C-8E637A380A3E   555824   deep_view   1487424345 

因此目标是找出第一个任务输出中包含item_id的所有行。 有人帮吗? 感谢。

0 个答案:

没有答案