缺少来自csv

时间:2016-04-23 13:25:15

标签: python python-3.x

我正在将.csv文件读入数据框(CorpActionsDf),但是当我打印出CorpActionsDf的头部时,我发现我错过了第一行数据:

.cvs数据负责人:

BBG.XAMS.ASML.S 24/04/2015  0.7 Annual  Regular Cash
BBG.XAMS.ASML.S 25/04/2014  0.61    Annual  Regular Cash
BBG.XAMS.ASML.S 26/04/2013  0.53    Annual  Regular Cash
BBG.XAMS.ASML.S 26/11/2012  9.18    None    Return of Capital
BBG.XAMS.ASML.S 27/04/2012  0.46    Annual  Regular Cash

CorpActions的负责人:

                       date  factor_value reference             factor
unique_id                                                             
BBG.XAMS.ASML.S  25/04/2014          0.61    Annual       Regular Cash
BBG.XAMS.ASML.S  26/04/2013          0.53    Annual       Regular Cash
BBG.XAMS.ASML.S  26/11/2012          9.18      None  Return of Capital
BBG.XAMS.ASML.S  27/04/2012          0.46    Annual       Regular Cash
BBG.XAMS.ASML.S  26/04/2011          0.40    Annual       Regular Cash

如您所见,数据框中缺少csv中的第一行数据。

BBG.XAMS.ASML.S 24/04/2015  0.7 Annual  Regular Cash

我的代码如下:

def getCorpActionsData(rawStaticDataPath):
    pattern = 'CorporateActions'+ '.csv'
    staticPath = rawStaticDataPath

    with open(staticPath+pattern,'rt') as f:

        #staticDf=pd.read_csv(f,engine='c',header=0,index_col=0, parse_dates=True, infer_datetime_format=True,usecols=(0,3))
        CorpActionsDf=pd.read_csv(f,engine='c',header=0,index_col=0, parse_dates=True, infer_datetime_format=True,names=['unique_id', 'date','factor_value','reference','factor'])        
        print('CorpActionsDf')
        print(CorpActionsDf.head())

任何人都知道我错过了什么?

谢谢

3 个答案:

答案 0 :(得分:2)

您必须使用None代替0 header参数。否则,您告诉代码将第0行视为包含标题的行,并且稍后仅使用names参数替换它们。

CorpActionsDf=pd.read_csv(f,engine='c',header=None,index_col=0, parse_dates=True, infer_datetime_format=True,names=['unique_id', 'date','factor_value','reference','factor'])        

答案 1 :(得分:1)

您是否尝试过header = None而不是header = 0?

Docu表示header = 0:

“如果没有传递名称,则默认行为就像设置为0,否则为None。显式传递header = 0以替换现有名称。”

 CorpActionsDf=pd.read_csv(f,engine='c',header=None,index_col=0, parse_dates=True, infer_datetime_format=True,names=['unique_id', 'date','factor_value','reference','factor']) 

答案 2 :(得分:0)

我不确定您是否正在使用参数。我不知道大熊猫,因为我使用Numpy,但如果我看Pandas Documentation,我认为标题和名称参数不好。

header = 0替换现有名称,因此您应该写header = None

CorpActionsDf=pd.read_csv(f,engine='c',header=None,index_col=0, parse_dates=True, infer_datetime_format=True,names=['unique_id', 'date','factor_value','reference','factor']) 

试着告诉我它是否更好?否则,你可以使用Numpy,我可以帮助你!