将文本文件读入numpy数组

时间:2017-07-18 14:59:31

标签: python numpy genfromtxt

我试图将textfile加载到numpy数组中。

结构如下:

wait.until

但我没有使用

THE 77534223
AND 30997177
ING 30679488
ENT 17902107
ION 17769261
HER 15277018
FOR 14686159
THA 14222073
NTH 14115952
[...]

我想在第一列中使用dtype import numpy as np data = np.genfromtxt("english_trigrams.txt", dtype=(str,int), delimiter=' ') print(data) [['TH' '77'] ['AN' '30'] ['IN' '30'] ..., ['JX' '1'] ['JQ' '1'] ['JQ' '1']] 的(x,2)数组,在第二列中使用dtype str

非常感谢!

P.S:

  • Python 3.6.1
  • NumPy 1.13.0

1 个答案:

答案 0 :(得分:0)

加载此文字的各种方法

In [470]: txt=b"""THE 77534223
     ...: AND 30997177
     ...: ING 30679488
     ...: ENT 17902107
     ...: ION 17769261
     ...: HER 15277018
     ...: FOR 14686159
     ...: THA 14222073
     ...: NTH 14115952"""

genfromtxt推导出正确的列dtype

In [471]: data = np.genfromtxt(txt.splitlines(),dtype=None)
In [472]: data
Out[472]: 
array([(b'THE', 77534223), (b'AND', 30997177), (b'ING', 30679488),
       (b'ENT', 17902107), (b'ION', 17769261), (b'HER', 15277018),
       (b'FOR', 14686159), (b'THA', 14222073), (b'NTH', 14115952)],
      dtype=[('f0', 'S3'), ('f1', '<i4')])

不是正确的dtype规范;和你的一样,但每个元素只有1个字符。

In [473]: data = np.genfromtxt(txt.splitlines(),dtype=(str, int))
In [474]: data
Out[474]: 
array([['T', '7'],
       ['A', '3'],
       ['I', '3'],
       ['E', '1'],
       ['I', '1'],
       ['H', '1'],
       ['F', '1'],
       ['T', '1'],
       ['N', '1']],
      dtype='<U1')

好一点 - 但字符串太短

In [475]: data = np.genfromtxt(txt.splitlines(),dtype='str,int')
In [476]: data
Out[476]: 
array([('', 77534223), ('', 30997177), ('', 30679488), ('', 17902107),
       ('', 17769261), ('', 15277018), ('', 14686159), ('', 14222073),
       ('', 14115952)],
      dtype=[('f0', '<U'), ('f1', '<i4')])

dtype=None案例类似

In [477]: data = np.genfromtxt(txt.splitlines(),dtype='U10,int')
In [478]: data
Out[478]: 
array([('THE', 77534223), ('AND', 30997177), ('ING', 30679488),
       ('ENT', 17902107), ('ION', 17769261), ('HER', 15277018),
       ('FOR', 14686159), ('THA', 14222073), ('NTH', 14115952)],
      dtype=[('f0', '<U10'), ('f1', '<i4')])