无法将简单的文本文件转换为pandas dataframe

时间:2017-07-17 19:00:46

标签: python pandas dataframe text error-handling

这就是我的文件:

raw_file - >

'Date\tValue\tSeries\tLabel\n07/01/2007\t687392\t31537611\tThis home\n08/01/2007\t750624\t31537611\tThis home\n09/01/2007\t769358\t31537611\tThis home\n10/01/2007\t802014\t31537611\tThis home\n11/01/2007\t815973\t31537611\tThis home\n12/01/2007\t806853\t31537611\tThis home\n01/01/2008\t836318\t31537611\tThis home\n02/01/2008\t856792\t31537611\tThis home\n03/01/2008\t854411\t31537611\tThis home\n04/01/2008\t826354\t31537611\tThis home\n05/01/2008\t789017\t31537611\tThis home\n06/01/2008\t754162\t31537611\tThis home\n07/01/2008\t749522\t31537611\tThis home\n08/01/2008\t757577\t31537611\tThis home\n'

type(raw_file) - &gt; <type 'str'>

由于某种原因,I can't use pd.read_csv(raw_file)或我会收到错误:

File "pandas\_libs\parsers.pyx", line 710, in pandas._libs.parsers.TextReader._setup_parser_source (pandas\_libs\parsers.c:8873)
IOError: File Date  Value   Series  Label
07/01/2007  687392  31537611    This home
08/01/2007  750624  31537611    This home
does not exist
我能想到的最好的是:

for row in raw_file.split('\n'):
   print(row.split('\t'))

这很慢。有更好的方法吗?

2 个答案:

答案 0 :(得分:0)

为什么不使用csv模块并将分隔符设置为\t

https://docs.python.org/3.4/library/csv.html

使用csv.reader(your_file,delimiter ='\ t')作为f:     #做东西

答案 1 :(得分:0)

当您给pandas string作为filepath_or_buffer参数时,它会认为它是文件名或网址。

来自docs

  

filepath_or_buffer strpathlib.Pathpy._path.local.LocalPath或具有read()方法的任何对象(例如文件handleStringIO

     

字符串可以是URL。有效的URL方案包括http,ftp,s3,   和文件。对于文件URL,需要主机。例如,本地

     

文件可以是file://localhost/path/to/table.csv

解决方案:使用io.StringIO()构造函数:

In [69]: pd.read_csv(io.StringIO(raw_file), delim_whitespace=True)
Out[69]:
              Date     Value Series Label
07/01/2007  687392  31537611   This  home
08/01/2007  750624  31537611   This  home
09/01/2007  769358  31537611   This  home
10/01/2007  802014  31537611   This  home
11/01/2007  815973  31537611   This  home
12/01/2007  806853  31537611   This  home
01/01/2008  836318  31537611   This  home
02/01/2008  856792  31537611   This  home
03/01/2008  854411  31537611   This  home
04/01/2008  826354  31537611   This  home
05/01/2008  789017  31537611   This  home
06/01/2008  754162  31537611   This  home
07/01/2008  749522  31537611   This  home
08/01/2008  757577  31537611   This  home
相关问题