根据键重新调整数组

时间:2014-03-07 21:35:41

标签: python arrays numpy reshape

我不知道我希望做什么的确切技术术语,所以我将尝试用一个例子来演示:

我有两个相同长度的矢量, a b ,如下所示:

In [41]:a
Out[41]:
array([ 0.61689215,  0.31368813,  0.47680184, ...,  0.84857976,
    0.97026244,  0.89725481])

In [42]:b
Out[42]:
array([35, 36, 37, ..., 36, 37, 38])

a 包含N个浮点数, b 包含N个元素:具有10个不同值的键:35,36,37,...,43,44

我希望得到一个包含10列的新矩阵 M ,其中第一列包含 a 中的所有行,其中 b 中的对应键>是35. M 中的第二列包含 a 中的所有行, b 中的对应键为36.等等所有列都在列中10在 M

我希望这很清楚。谢谢

3 个答案:

答案 0 :(得分:1)

itertools.groupby可用于对值进行分组(排序后)。使用numpy arrays是可选的。

import numpy as np
import itertools
N=50
# a = np.random.rand(50)*100
a = np.random.randint(0,100,N) # int to make printing more compact
b = np.random.randint(35,45, N)

# make structured array to easily sort both arrays together
dtype = np.dtype([('a',float),('b',int)])
ab = np.ndarray(a.shape,dtype=dtype)
ab['a'] = a
ab['b'] = b
# ab = np.sort(ab,order=['b']) # sorts both 'b' and 'a'
I = np.argsort(b,kind='mergesort') # preserves order
ab = ab[I]

# now group, and extract lists of lists
gp = itertools.groupby(ab, lambda x: x['b'])
xx = [list(x[1]) for x in gp]
#print np.array([[y[0] for y in x] for x in xx]) # list of lists

def filled(x):
    M = max(len(z) for z in x)
    return np.array([z+[np.NaN]*(M-len(z)) for z in x])
print filled([[y[1] for y in x] for x in xx]).T
print filled([[y[0] for y in x] for x in xx]).T
制造

[[ 35.  36.  37.  38.  39.  40.  41.  42.  43.  44.]
 [ 35.  36.  37.  38.  39.  40.  41.  42.  43.  44.]
 [ nan  36.  37.  nan  39.  40.  41.  42.  43.  44.]
 [ nan  36.  37.  nan  39.  40.  41.  42.  43.  44.]
 ...]

[[ 54.  69.  34.  28.  71.  53.  33.  19.  64.  56.]
 [ 90.  52.  11.   9.  50.  53.  25.  37.  69.  56.]
 [ nan  97.  31.  nan  69.  35.   2.  80.  91.  54.]
 [ nan  33.  87.  nan  47.  90.  81.  45.  86.  57.]
 ...]

我正在使用argsortmergesort来保留子列表中a的顺序。 np.sortba进行词汇排序{与我对order参数的期望相反)。

另一种方法是使用Python字典,也保留a的顺序。它可能在大型阵列上较慢,但它隐藏的细节较少:

import collections
d = collections.defaultdict(list)
for k,v in zip(b,a):
    d[k].append(v)
values = [d[k] for k in sorted(d.keys())]
print filled(values).T

答案 1 :(得分:0)

你可以使用pandas:

import numpy as np
import pandas as pd

a = np.random.rand(50)
b = np.random.randint(10, 15, 50)

s = pd.Series(a)
s.groupby(b).apply(pd.Series.reset_index, drop=True).unstack(level=0)

输出是:

          10        11        12        13        14
0   0.465079  0.041393  0.692856  0.634328  0.179690
1   0.934678  0.746048  0.060014  0.072626  0.824729
2   0.388190  0.510527  0.078662  0.077157  0.291183
3   0.972033  0.761159  0.017317  0.104768  0.278871
4   0.750713  0.430246  0.083407  0.262037  0.487742
5   0.216965  0.482364  0.820535  0.207008  0.276452
6   0.282038  0.607303  0.675856  0.994369  0.602059
7   0.897106  0.398808  0.312332  0.751388  0.878177
8   0.229121       NaN       NaN  0.061288  0.032066
9   0.810678       NaN       NaN       NaN  0.718237
10  0.571125       NaN       NaN       NaN  0.668292
11  0.410750       NaN       NaN       NaN  0.288145
12  0.984507       NaN       NaN       NaN       NaN

答案 2 :(得分:0)

这是一种没有Pandas的方法(因此您需要手动跟踪列标签):

import numpy as np
from itertools import izip_longest
from collections import defaultdict

a = np.random.rand(50)
b = np.random.randint(10, 15, 50)
d = defaultdict(lambda:[])

for i, key_val in enumerate(b):
    d[key_val].append(a[i])

output = np.asarray(list(izip_longest(*(d.values()), 
                                      fillvalue=np.NaN)))

print (a)
print (b)
print (output)

这给出了:

a

array([ 0.98688273,  0.95584584,  0.91011945,  0.56402919,  0.86185936,
        0.09380343,  0.69290659,  0.97238284,  0.81297425,  0.73446398,
        0.25927151,  0.44622982,  0.20537961,  0.61665218,  0.90168399,
        0.58556404,  0.47017152,  0.32278718,  0.15044929,  0.07859976,
        0.26715756,  0.38281878,  0.30169241,  0.47785937,  0.15377038,
        0.93395325,  0.79099068,  0.92471442,  0.03154578,  0.0437627 ,
        0.31711433,  0.78550517,  0.77062104,  0.76002167,  0.1842867 ,
        0.52935392,  0.16038216,  0.46510856,  0.4311615 ,  0.73923847,
        0.45499238,  0.2630405 ,  0.67722848,  0.1391463 ,  0.50800704,
        0.50618842,  0.19540159,  0.38150066,  0.82831838,  0.3383787 ])

b

array([14, 10, 13, 12, 12, 13, 13, 12, 11, 10, 10, 13, 14, 12, 11, 12, 14,
       12, 12, 14, 11, 10, 13, 13, 13, 10, 14, 11, 13, 11, 11, 11, 12, 10,
       11, 11, 14, 12, 12, 14, 13, 10, 11, 14, 13, 11, 10, 11, 12, 12])

output

array([[ 0.95584584,  0.81297425,  0.56402919,  0.91011945,  0.98688273],
       [ 0.73446398,  0.90168399,  0.86185936,  0.09380343,  0.20537961],
       [ 0.25927151,  0.26715756,  0.97238284,  0.69290659,  0.47017152],
       [ 0.38281878,  0.92471442,  0.61665218,  0.44622982,  0.07859976],
       [ 0.93395325,  0.0437627 ,  0.58556404,  0.30169241,  0.79099068],
       [ 0.76002167,  0.31711433,  0.32278718,  0.47785937,  0.16038216],
       [ 0.2630405 ,  0.78550517,  0.15044929,  0.15377038,  0.73923847],
       [ 0.19540159,  0.1842867 ,  0.77062104,  0.03154578,  0.1391463 ],
       [        nan,  0.52935392,  0.46510856,  0.45499238,         nan],
       [        nan,  0.67722848,  0.4311615 ,  0.50800704,         nan],
       [        nan,  0.50618842,  0.82831838,         nan,         nan],
       [        nan,  0.38150066,  0.3383787 ,         nan,         nan]])