Question

您好，我试图在jupyter笔记本中执行一个包含txt文件的单元格，我做了这样的事情：

dataset = numpy.loadtxt("C:/Users/jayjay/learning/try.txt", delimiter=",", skiprows=1)
# split into input (X) and output (Y) variables
X=dataset[:100,2:4]
Y=dataset[:100,4]

当我尝试运行此代码时，出现此错误：

ValueError                                Traceback (most recent call last)
<ipython-input-64-d2d2260af43e> in <module>
----> 1 dataset = numpy.loadtxt("C:/Users/jayjay/learning/try.txt", delimiter=",", skiprows=1)
      2 # split into input (X) and output (Y) variables
      3 X=dataset[:100,2:4]
      4 Y=dataset[:100,4]


    ValueError: could not convert string to float: 'not 1'

在try.txt中，我有一个与此类似的数据：

135,10,125,10,1
230,16,214,19,not 1
226,16,210,19,1
231,16,215,19,not 1
205,16,189,17,not 1

如何解决此错误？我是一个自学的新手。有人可以帮我吗？

Answer 1

使用熊猫读取文件：

df = pandas.read_csv(file, sep = ',')
numpydata = df.to_numpy() # will give a numpy array

Answer 2

很高兴您提供了文件样本：

In [1]: txt="""135,10,125,10,1 
   ...: 230,16,214,19,not 1 
   ...: 226,16,210,19,1 
   ...: 231,16,215,19,not 1 
   ...: 205,16,189,17,not 1"""

loadtxt接受字符串列表代替文件：

In [2]: np.loadtxt(txt.splitlines(),delimiter=',')                           
...
ValueError: could not convert string to float: 'not 1'

它试图返回一个float数组，但是not 1字符串出现了问题：

genfromtxt与之类似，但是在它可以创建浮点数时给出nan：

In [3]: np.genfromtxt(txt.splitlines(),delimiter=',')                        
Out[3]: 
array([[135.,  10., 125.,  10.,   1.],
       [230.,  16., 214.,  19.,  nan],
       [226.,  16., 210.,  19.,   1.],
       [231.,  16., 215.,  19.,  nan],
       [205.,  16., 189.,  17.,  nan]])

您可以跳过问题列：

In [4]: np.loadtxt(txt.splitlines(),delimiter=',', usecols=[0,1,2,3])        
Out[4]: 
array([[135.,  10., 125.,  10.],
       [230.,  16., 214.,  19.],
       [226.,  16., 210.,  19.],
       [231.,  16., 215.,  19.],
       [205.,  16., 189.,  17.]])

或者由于您仍然要将数组分为两个数组：

In [8]: np.genfromtxt(txt.splitlines(),delimiter=',', usecols=[0,1,2,3], dtype=int)                                                               
Out[8]: 
array([[135,  10, 125,  10],
       [230,  16, 214,  19],
       [226,  16, 210,  19],
       [231,  16, 215,  19],
       [205,  16, 189,  17]])
In [9]: np.genfromtxt(txt.splitlines(),delimiter=',', usecols=[4], dtype=None, encoding=None)                                                     
Out[9]: array(['1', 'not 1', '1', 'not 1', 'not 1'], dtype='<U5')

dtype=None可以为每列选择适当的dtype。

In [10]: np.genfromtxt(txt.splitlines(),delimiter=',', dtype=None, encoding=N
    ...: one)                                                                
Out[10]: 
array([(135, 10, 125, 10, '1'), (230, 16, 214, 19, 'not 1'),
       (226, 16, 210, 19, '1'), (231, 16, 215, 19, 'not 1'),
       (205, 16, 189, 17, 'not 1')],
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8'), ('f4', '<U5')])

这是一个结构化的数组，每列带有field。并使用更高级的dtype规范：

In [13]: np.genfromtxt(txt.splitlines(),delimiter=',', dtype='4i,U5', encoding=None)                                                             
Out[13]: 
array([([135,  10, 125,  10], '1'), ([230,  16, 214,  19], 'not 1'),
       ([226,  16, 210,  19], '1'), ([231,  16, 215,  19], 'not 1'),
       ([205,  16, 189,  17], 'not 1')],
      dtype=[('f0', '<i4', (4,)), ('f1', '<U5')])
In [14]: _['f0']                                                             
Out[14]: 
array([[135,  10, 125,  10],
       [230,  16, 214,  19],
       [226,  16, 210,  19],
       [231,  16, 215,  19],
       [205,  16, 189,  17]], dtype=int32)
In [15]: __['f1']                                                            
Out[15]: array(['1', 'not 1', '1', 'not 1', 'not 1'], dtype='<U5')

到目前为止，我还没有尝试解析或转换那些“非1”字符串。我们可以构造一个converter并将其转换为数字，例如0。

如果我定义了转换器函数，例如：

def foo(astr):
    if astr==b'not 1':
        astr = b'0'
    return int(astr)

In [31]: np.genfromtxt(txt.splitlines(),delimiter=',', converters={4:foo}, dtype=int)                                                            
Out[31]: 
array([[135,  10, 125,  10,   1],
       [230,  16, 214,  19,   0],
       [226,  16, 210,  19,   1],
       [231,  16, 215,  19,   0],
       [205,  16, 189,  17,   0]])

或者如果转换器返回浮点数：

def foo(astr):
    if astr==b'not 1':
        astr = b'0'
    return float(astr)
In [39]: np.genfromtxt(txt.splitlines(),delimiter=',', converters={4:foo})   
Out[39]: 
array([[135.,  10., 125.,  10.,   1.],
       [230.,  16., 214.,  19.,   0.],
       [226.,  16., 210.,  19.,   1.],
       [231.,  16., 215.,  19.,   0.],
       [205.,  16., 189.,  17.,   0.]])

如何将字符串转换为float？

2 个答案: