pandas包装器引发ValueError

时间:2016-07-25 08:10:42

标签: python pandas sklearn-pandas

我在尝试通过pandas运行我的python脚本时出现以下错误,当运行30个记录数据时,请告知出了什么问题

  

回溯(最近一次调用最后一次):文件“extractyooochoose2.py”,第32行,totalitems = [len(x)for x in clicksdat.groupby('Sid')['itemid']。unique()]    文件“”,第13行,独特    在包装器中输入文件“/home/ubuntu/anaconda2/lib/python2.7/site-packages/pandas/core/groupby.py”,第620行       提出ValueError

数据和代码如下所示

import pandas as pd
import datetime as dt
clickspath='/tmp/gensim/yoochoose/yoochoose-clicks.dat'
buyspath='/tmp/gensim/yoochoose/yoochoose-buys.dat'
clicksdat=pd.read_csv(clickspath,header=None,dtype={'itemid': pd.np.str_,'Sid':pd.np.str_,'Timestamp':pd.np.str_,'itemcategory':pd.np.str_})
clicksdat.columns=['Sid','Timestamp','itemid','itemcategory']
buysdat=pd.read_csv(buyspath,header=None)
buysdat.columns=['Sid','Timestamp','itemid','price','qty']
segment={}
for i in range(24):
    if i<7:
        segment[i]='EM'
    elif i<10:
        segment[i]='M'
    elif i<13:
        segment[i]='A'
    elif i<18:
        segment[i]='E'
    elif i<23:
        segment[i]='N'
    elif i<25:
        segment[i]='MN'
#*******************************************
buyersession=buysdat.Sid.unique()
clickersession=clicksdat.Sid.unique()
maxtemp=[(dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ"))  for x in  clicksdat.groupby('Sid')['Timestamp'].max()]
mintemp=[dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ")  for x in  clicksdat.groupby('Sid')['Timestamp'].min()]
duration=[int((a-b).total_seconds()) for a,b  in zip(maxtemp,mintemp)]
day=[x.day for x in maxtemp]
month=[x.month for x in maxtemp]
noofnavigations=[clicksdat.groupby('Sid').count().Timestamp][0]
totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()]
totalcats=[len(x) for x in clicksdat.groupby('Sid')['itemcategory'].unique()]
timesegment= [segment[x.hour]for x in maxtemp]
segmentchange=[1 if (segment[x.hour]!=segment[y.hour]) else 0 for x,y in zip(maxtemp,mintemp)]
purchased=[x in buyersession for x in noofnavigations.index.values ]
percentile_list = pd.DataFrame({'purchased' : purchased,'duration':duration,'day':day,'month':month,'noofnavigations':noofnavigations,'totalitems':totalitems,'totalcats':totalcats,'timesegment':timesegment,'segmentchange':segmentchange  })
percentile_list.to_csv('/tmp/gensim/yoochoose/yoochoose-clicks1001.csv')

示例数据如下所示

sessioid,timestamp,itemid,category  
1,2014-04-07T10:51:09.277Z,214536502,0  
1,2014-04-07T10:54:09.868Z,214536500,0  
1,2014-04-07T10:54:46.998Z,214536506,0  
1,2014-04-07T10:57:00.306Z,214577561,0  
2,2014-04-07T13:56:37.614Z,214662742,0  
2,2014-04-07T13:57:19.373Z,214662742,0  
2,2014-04-07T13:58:37.446Z,214825110,0  
2,2014-04-07T13:59:50.710Z,214757390,0  

0 个答案:

没有答案
相关问题