Pandas dataframe应用非常慢

时间:2017-07-25 01:43:49

标签: python pandas dataframe lambda apply

我正在尝试添加一个二进制变量,该变量告诉产品是否在最后5个订单中的最后订单和订单之前的订单等。我想出了以下pandas dataframe表达式。它完全按照它应该做的那样做,但它挤压得很慢。我能做错什么?

这是我的数据框:

order_id    user_id order_number    product_id  us_last_order_number
2539329 1   1   196 10
2539329 1   1   14084   10
2539329 1   1   12427   10
2539329 1   1   26088   10
2539329 1   1   26405   10
2398795 1   2   196 10
2398795 1   2   10258   10
2398795 1   2   12427   10
2398795 1   2   13176   10
2398795 1   2   26088   10
2398795 1   2   13032   10
473747  1   3   196 10
473747  1   3   12427   10
473747  1   3   10258   10
473747  1   3   25133   10
473747  1   3   30450   10
2254736 1   4   196 10
2254736 1   4   12427   10
2254736 1   4   10258   10
2254736 1   4   25133   10
2254736 1   4   26405   10
431534  1   5   196 10
431534  1   5   12427   10
431534  1   5   10258   10
431534  1   5   25133   10
431534  1   5   10326   10
431534  1   5   17122   10
431534  1   5   41787   10
431534  1   5   13176   10
3367565 1   6   196 10



tmp2 =  priors_orders_detail.groupby(['user_id', 
                    'product_id']).apply(lambda x: [1 if item in x.order_number.tolist() else -1 if item<0 else 0 for item in range(x.us_last_order_number.iloc[0],x.us_last_order_number.iloc[0]-5,-1)])

tmp2= pd.DataFrame(tmp2).reset_index()

tmp2.columns.values[-1]='present_in_orders' tmp2[['in_orders_1','in_orders_2',
      'in_orders_3','in_orders_4',
      'in_orders_5']] = pd.DataFrame([x for x in tmp2.present_in_orders]) tmp2.drop('present_in_orders',axis=1,inplace=True) 

0 个答案:

没有答案