根据条件替换ndarray的尾部

时间:2018-03-12 13:38:23

标签: python numpy numpy-broadcasting numpy-indexing

我有多维数组。一旦它在最后一个维度中具有临界值,我想改变维度的尾部。

np.random.seed(100)
arr = np.random.uniform(size=100).reshape([2,5,2,5])
# array([[[[ 0.54340494,  0.27836939,  0.42451759,  0.84477613,  0.00471886],
#          [ 0.12156912,  0.67074908,  0.82585276,  0.13670659,  0.57509333]],
#         [[ 0.89132195,  0.20920212,  0.18532822,  0.10837689,  0.21969749],
#          [ 0.97862378,  0.81168315,  0.17194101,  0.81622475,  0.27407375]],
#         [[ 0.43170418,  0.94002982,  0.81764938,  0.33611195,  0.17541045],
#          [ 0.37283205,  0.00568851,  0.25242635,  0.79566251,  0.01525497]],
#         [[ 0.59884338,  0.60380454,  0.10514769,  0.38194344,  0.03647606],
#          [ 0.89041156,  0.98092086,  0.05994199,  0.89054594,  0.5769015 ]],
#         [[ 0.74247969,  0.63018394,  0.58184219,  0.02043913,  0.21002658],
#          [ 0.54468488,  0.76911517,  0.25069523,  0.28589569,  0.85239509]]],
#        [[[ 0.97500649,  0.88485329,  0.35950784,  0.59885895,  0.35479561],
#          [ 0.34019022,  0.17808099,  0.23769421,  0.04486228,  0.50543143]],
#         [[ 0.37625245,  0.5928054 ,  0.62994188,  0.14260031,  0.9338413 ],
#          [ 0.94637988,  0.60229666,  0.38776628,  0.363188  ,  0.20434528]],
#         [[ 0.27676506,  0.24653588,  0.173608  ,  0.96660969,  0.9570126 ],
#          [ 0.59797368,  0.73130075,  0.34038522,  0.0920556 ,  0.46349802]],
#         [[ 0.50869889,  0.08846017,  0.52803522,  0.99215804,  0.39503593],
#          [ 0.33559644,  0.80545054,  0.75434899,  0.31306644,  0.63403668]],
#         [[ 0.54040458,  0.29679375,  0.1107879 ,  0.3126403 ,  0.45697913],
#          [ 0.65894007,  0.25425752,  0.64110126,  0.20012361,  0.65762481]]]])

我们假设临界值为0.80。在看到高于0.80的值后,我们需要改变所有的值。我们专注于两个第一行"行"。在使用[3,2]选择后,哪个代表np.argmax

where_bigger = np.argmax(arr >= 0.80, axis = 3)
# array([[[3, 2], ## used as example later !!!!!!!!!
#         [0, 0],
#         [1, 0],
#         [0, 0],
#         [0, 4]],
#        [[0, 0],
#         [4, 0],
#         [3, 0],
#         [3, 1],
#         [0, 0]]])

例如,我们首先关注3中索引为[3,2]的元素(见上文!!!!)。一旦我们发现值高于0.80(此类索引为3),则所有后续值都应替换为np.na

arr[0,0,0,3] ## 0.84477613 comes as first element in [3,2]
# [ 0.54340494,  0.27836939,  0.42451759,  0.84477613,  np.na]

此处类似,我们关注2中的元素[3,2],并需要将所有后续元素设置为np.na

arr[0,0,1,2] ## 0.82585276 comes as second element in [3,2]
# [ 0.12156912,  0.67074908,  0.82585276,  np.na,  np.na]

最后,我们对argmax找到的所有元素重复它:

# array([[[[ 0.54340494,  0.27836939,  0.42451759,  0.84477613,  np.na],
#          [ 0.12156912,  0.67074908,  0.82585276,       np.na,  np.na]],
#         [[ 0.89132195,       np.na,       np.na,       np.na,  np.na],
#          [ 0.97862378,       np.na,       np.na,       np.na,  np.na]],
#         [[ 0.43170418,  0.94002982,       np.na,       np.na,  np.na],
# ...

是否可以在不循环的情况下一次调整整个数组?可能可以使用切片。我想用一些方法 arr[where_bigger:] = np.na,但显然是错误的。到目前为止,我无法进一步发展。

1 个答案:

答案 0 :(得分:2)

最好的选择是某种类型的布尔掩码。您可以通过tail生成np.logical_or.accumulate,但这将包含具有阈值的索引。如果你想保留第一个实例,你必须填写它。

mask = np.c_[np.zeros(arr.shape[:-1] + (1,), dtype = bool), np.logical_or.accumulate(arr > .8, axis = -1)[...,:-1]]
arr[mask] = np.nan