Question

我有一个np.ndarray（称为arr），看起来像这样：

# python 3.7
import numpy as np

my_dtype = [("x", "float32"), ("y", "float32"),
                ("some_more", "int32"), ("and_more_stuff", "uint8")]  #
part1 = np.zeros(5, dtype=my_dtype)  # various lengths (here e.g. 5, 6 and 7)
part2 = np.zeros(6, dtype=my_dtype)
part3 = np.zeros(7, dtype=my_dtype)
# ... an a priori unknown number of "parts". Also of course there are values inside, not just zeros.

arr = np.array([part1, part2, part3])

（也许我不应该使用numpy数组来处理此类数据？）

现在我想用arr做事。例如，我想在所有子数组（单个数字）的所有值“ x”和“ y”中找到总最小值。我的解决方案看上去非常可怕，这意味着我不了解如何使用结构化数组（尽管阅读了文档和官方的tutorial）：

arr[0][["x", "y"]][1] = (-3., 4.5) # put in some values as an example
all_x_y_coords= np.array([[list(mytuple) for mytuple in mylist] for mylist in
                   [list(part[["x", "y"]]) for part in arr]])
print(np.min(np.min(all_x_y_coords))) # gives -3. as desired, but at what cost?!

做这样的事情显然是不切实际的。如何计算我想要的最小值？我想做的下一件事是将旋转矩阵应用于所有“ x，y”。在编写比上面的代码还要可怕的东西之前，我以为我最好了解自己在做错什么。提前非常感谢您的帮助！

Answer 1

In [167]: part1[['x','y']][1]=(-3, 4.5)                                         
In [168]: part1                                                                 
Out[168]: 
array([( 0., 0. , 0, 0), (-3., 4.5, 0, 0), ( 0., 0. , 0, 0),
       ( 0., 0. , 0, 0), ( 0., 0. , 0, 0)],
      dtype=[('x', '<f4'), ('y', '<f4'), ('some_more', '<i4'), ('and_more_stuff', 'u1')])

由于它们都具有相同的dtype，因此可以将它们合并为一个数组：

In [169]: arr = np.concatenate([part1,part2,part3])                             
In [170]: arr.shape                                                             
Out[170]: (18,)
In [171]: arr.dtype                                                             
Out[171]: dtype([('x', '<f4'), ('y', '<f4'), ('some_more', '<i4'), ('and_more_stuff', 'u1')])
In [172]: arr['x']                                                              
Out[172]: 
array([ 0., -3.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.], dtype=float32)
In [173]: np.min(arr['x'])                                                      
Out[173]: -3.0
In [174]: np.min(arr['y'])                                                      
Out[174]: 0.0

与np.array一起加入它们只会使对象的dtype数组比列表更好（甚至更糟）：

In [175]: arr1 = np.array([part1,part2,part3])                                  
In [176]: arr1.shape                                                            
Out[176]: (3,)
In [177]: arr1.dtype                                                            
Out[177]: dtype('O')

如果没有对这3个元素进行某种显式迭代，我们几乎无法使用这种数组。

Answer 2

有可能，而无需进入'recarray'选项。我一直在使用结构化数组来表示来自各种来源的几何对象。

from numpy.lib.recfunctions import structured_to_unstructured as stu
pnts
array([( 0, 10. , 10. , 4), ( 1, 10. ,  0. , 2), ( 2,  1.5,  1.5, 1),
       ( 3,  0. , 10. , 1), ( 5,  3. ,  9. , 2), ( 6,  3. ,  3. , 1),
       ( 7,  9. ,  3. , 1), ( 8,  9. ,  9. , 1), (10,  2. ,  7. , 2),
       (11,  1. ,  7. , 1), (12,  2. ,  5. , 1), (14,  2. ,  8. , 2),
       (15,  1. ,  9. , 1), (16,  1. ,  8. , 1), (18,  8. ,  8. , 2),
       (19,  8. ,  4. , 1), (20,  4. ,  4. , 1), (21,  5. ,  7. , 1),
       (23,  6. ,  7. , 2), (24,  5. ,  5. , 1), (25,  7. ,  5. , 1),
       (27, 25. , 14. , 2), (28, 25. ,  4. , 1), (29, 15. ,  4. , 1),
       (30, 15. ,  6. , 1), (31, 23. ,  6. , 1), (32, 23. , 12. , 1),
       (33, 15. , 12. , 1), (34, 15. , 14. , 1), (36, 20. , 10. , 2),
       (37, 20. ,  8. , 1), (38, 12. ,  8. , 1), (39, 12. ,  2. , 1),
       (40, 20. ,  2. , 1), (41, 20. ,  0. , 1), (44, 14. , 10. , 3),
       (46, 11. ,  9. , 2), (47, 12. ,  8.5, 1), (48, 12. ,  9. , 1),
       (50, 10.5,  8.5, 2), (51, 10.5,  7. , 1), (52, 11.5,  7. , 1),
       (54, 10.5,  2. , 2), (55, 10.5,  0.5, 1), (56, 11.5,  0.5, 1),
       (60, 15. , 18. , 1)],
      dtype=[('New_ID', '<i4'), ('Xs', '<f8'), ('Ys', '<f8'), ('Num', '<i4')])

np.mean(stu(pnts[['Xs', 'Ys']]),axis=0)     # --- array([10.58,  6.77])
# or
(np.mean(pnts['Xs']), np.mean(pnts['Ys']))  # --- (10.576086956521738, 6.771739130434782)

选项2 ...保留数据结构，然后在适当时进行转换

pnts2 = stu(pnts[['Xs', 'Ys']])

pnts2
array([[10. , 10. ],
       [10. ,  0. ],
       [ 1.5,  1.5],
... snip
       [10.5,  0.5],
       [11.5,  0.5],
       [15. , 18. ]])

np.mean(pnts2, axis=0)  # ---- array([10.58,  6.77])

索引结构化numpy数组

2 个答案: