Question

我正在尝试将包含unicode元素的numpy数组导出到文本文件中。

到目前为止，我有以下工作，但没有任何unicode字符：

onServiceDisconnected

如果我将'c'从'maca'更改为'ç'，我会收到错误：

import numpy as np

array_unicode=np.array([u'maca' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:
    np.savetxt(f,array_unicode,fmt='%s')

回溯：

import numpy as np

array_unicode=np.array([u'maça' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:
    np.savetxt(f,array_unicode,fmt='%s')

如何将Traceback (most recent call last): File "<ipython-input-48-24ff7992bd4c>", line 8, in <module> np.savetxt(f,array_unicode,fmt='%s') File "C:\Anaconda2\lib\site-packages\numpy\lib\npyio.py", line 1158, in savetxt fh.write(asbytes(format % tuple(row) + newline)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)从numpy设置为编写unicode字符？

Answer 1

在Python3（ipthon-qt终端）中，我可以这样做：

In [12]: b=[u'maça', u'banana',u'morango']

In [13]: np.savetxt('test.txt',b,fmt='%s')

In [14]: cat test.txt
ma�a
banana
morango

In [15]: with open('test1.txt','w') as f:
    ...:     for l in b:
    ...:         f.write('%s\n'%l)
    ...:         

In [16]: cat test1.txt
maça
banana
morango

Py2和3中的

savetxt都坚持保存在字节模式下的wb＆＃39;字节模式。您的错误行具有asbytes功能。

在我的示例中b是一个列表，但这并不重要。

In [17]: c=np.array(['maça', 'banana','morango'])

In [18]: c
Out[18]: 
array(['maça', 'banana', 'morango'], 
      dtype='<U7')

写同样的。在py3中，默认字符串类型是unicode，因此不需要u标记 - 但是没问题。

在Python2中，我通过普通写入

得到错误

>>> b=[u'maça' u'banana',u'morango']
>>> with open('test.txt','w') as f:
...    for l in b:
...        f.write('%s\n'%l)
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)

添加encode会给出一个很好的输出：

>>> b=[u'maça', u'banana',u'morango']
>>> with open('test.txt','w') as f:
...    for l in b:
...        f.write('%s\n'%l.encode('utf-8'))
0729:~/mypy$ cat test.txt
maça
banana
morango

encode是一个字符串方法，因此必须应用于数组（或列表）的各个元素。

回到py3方面，如果我使用encode我得到：

In [26]: c1=np.array([l.encode('utf-8') for l in b])

In [27]: c1
Out[27]: 
array([b'ma\xc3\xa7a', b'banana', b'morango'], 
      dtype='|S7')

In [28]: np.savetxt('test.txt',c1,fmt='%s')

In [29]: cat test.txt
b'ma\xc3\xa7a'
b'banana'
b'morango'

但格式正确，普通写入工作：

In [33]: with open('test1.txt','wb') as f:
    ...:     for l in c1:
    ...:         f.write(b'%s\n'%l)
    ...:         

In [34]: cat test1.txt
maça
banana
morango

这是混合unicode和2代Python的乐趣。

如果有帮助，请参阅np.lib.npyio.asbytes使用的np.savetxt函数的代码（以及wb文件模式）：

def asbytes(s):    # py3?
    if isinstance(s, bytes):
        return s
    return str(s).encode('latin1')

（请注意，编码固定为＆＃39; latin1＆＃39;）。

np.char库将各种字符串方法应用于numpy数组的元素，因此np.array([x.encode...])可表示为：

In [50]: np.char.encode(b,'utf-8')
Out[50]: 
array([b'ma\xc3\xa7a', b'banana', b'morango'], 
      dtype='|S7')

虽然过去的测试表明它不节省时间，但这很方便。它仍然必须将Python方法应用于每个元素。

Answer 2

有很多方法可以实现这一点，但是，需要以非常具体的方式设置numpy数组（通常使用dtype）以在这些情况下允许unicode字符。

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np

dt = np.dtype(str, 10)
array_unicode=np.array(['maça','banana','morangou'], dtype=dt)

with open('array_unicode.txt','wb') as f:
    np.savetxt(f, array_unicode, fmt='%s')

您需要了解数组中的字符串长度以及您决定在dtype中设置的长度。如果它太短，你就会截断你的数据，如果它太长，那就太浪费了。我建议您阅读 Numpy data type objects (dtype) documentation ，因为根据数据格式，您还可以考虑设置数组。

↳http://docs.scipy.org/doc/numpy-1.9.3/reference/arrays.dtypes.html

这是一个替代功能，可以在保存之前转换为unicode：

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np

array_unicode=np.array([u'maça',u'banana',u'morangou'])

def uniArray(array_unicode):
    items = [x.encode('utf-8') for x in array_unicode]
    array_unicode = np.array([items]) # remove the brackets for line breaks
    return array_unicode

with open('array_unicode.txt','wb') as f:
    np.savetxt(f, uniArray(array_unicode), fmt='%s')

基本上，np.savetxt会致电uniArray进行快速转换，然后再回来。可能有比这更好的方法，虽然我用了numpy已经有一段时间了;它似乎总是对编码有些敏感。

将numpy unicode数组写入文本文件

2 个答案: