读入数组时忽略字符串

时间:2014-01-07 05:48:03

标签: python numpy

我正在尝试将.csv文件加载到数组中。 但是,该文件看起来像这样。

"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422
 .................................
 .................................

我试图跳过前导字符串。到目前为止,我一直在取消第一排。

 a = np.genfromtxt(file,delimiter=',',skiprows=1)   

但我想知道是否有办法在处理过程中读入一个忽略字符串的数组。

2 个答案:

答案 0 :(得分:2)

您可以使用loadtxt(..., usecols=(1,2,3), ...),这可以避免在文件开头跳过一行吗?

usecols参数只是告诉loadtxt要提取哪些列(并且是数字)

# Put data into file (in shell, just me copying the sample)
cat >> /tmp/data.csv
"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422

# In IPython
In [1]: import numpy as np

In [2]: a = np.loadtxt('/tmp/data.csv', usecols=(1,2,3), delimiter=',')

In [3]: a
Out[3]: 
array([[ 0.03435345, -1.234556  , -3.        ],
       [ 1.43567896, -1.45322124,  9.543422  ]])

答案 1 :(得分:0)

因为它只是文件开头的第一行,所以你可以编写一个辅助生成器来删除该字符串:

def helper(filename):
    with open(filename) as fin:
        # this could get more robust ... e.g. by doing typechecking if necessary.
        line = next(fin).split(',')
        yield ','.join(line[1:])
        for line in fin:
            yield line

arr = np.genfromtxt(helper('myfile.csv'), delimiter=',')
相关问题