如何从元组迭代创建numpy.ndarray

时间:2014-06-30 03:22:04

标签: python numpy multidimensional-array

我有以下循环

# `results` are obtained from some mySQldb command.

for row in results:
    print row

打印这样的元组:

('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0)
('1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107)
('1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883)
('1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837)

我的问题是从那次迭代中我怎么能创建一个看起来像这样的凹凸不平的nd.array:

array([['1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
       ['1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107],
       ['1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883],
       ['1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837]])

最后,ndarray将具有形状:(4,8)

1 个答案:

答案 0 :(得分:2)

将其读入结构化数组:

In [30]:
a=[('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0),
   ('1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107),
   ('1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883),
   ('1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837)]
np.array(a, dtype=('a10,a10,f4,f4,f4,f4,f4,f4'))
Out[30]:
array([('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0),
       ('1A9N', 'RBP', 0.045626699924468994, 0.053926799446344376, 0.331932008266449, 0.04640309885144234, 4.413359874888556e-06, 0.5221070051193237),
       ('1AQ3', 'RBP', 0.044447898864746094, 0.20111200213432312, 0.26858100295066833, 0.004975699819624424, 1.2850499744171406e-12, 0.48088300228118896),
       ('1AQ4', 'RBP', 0.01772320084273815, 0.3637459874153137, 0.30899500846862793, 0.0016986100235953927, 0.0, 0.30783700942993164)], 
      dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<f4')])

您可以在object dtype中使用所有这些内容:

In [46]:

np.array(a, dtype=object)
Out[46]:
array([['1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
       ['1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031,
        4.41336e-06, 0.522107],
       ['1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757,
        1.28505e-12, 0.480883],
       ['1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0,
        0.307837]], dtype=object)

但它不适合float值,也可能会导致不良行为:

In [48]:
b=np.array(a, dtype=object)
b[0]+b[1] #addition for float values and concatenation for string values
Out[48]:
array(['1A341A9N', 'RBPRBP', 0.0456267, 1.0539268, 0.331932, 0.0464031,
       4.41336e-06, 0.522107], dtype=object)

pandas也是另一种选择:

In [43]:
import pandas as pd
print pd.DataFrame(a)
      0    1         2         3         4         5             6         7
0  1A34  RBP  0.000000  1.000000  0.000000  0.000000  0.000000e+00  0.000000
1  1A9N  RBP  0.045627  0.053927  0.331932  0.046403  4.413360e-06  0.522107
2  1AQ3  RBP  0.044448  0.201112  0.268581  0.004976  1.285050e-12  0.480883
3  1AQ4  RBP  0.017723  0.363746  0.308995  0.001699  0.000000e+00  0.307837
In [44]:

pd.DataFrame(a).dtypes
Out[44]:
0     object
1     object
2    float64
3    float64
4    float64
5    float64
6    float64
7    float64
dtype: object

它允许列具有不同的dtype