Question

我有两个数组A（380万的镜头）和B（20k的镜头）。对于最小的示例，让我们来考虑这种情况：

A = np.array([1,1,2,3,3,3,4,5,6,7,8,8])
B = np.array([1,2,8])

现在我希望结果数组为：

C = np.array([3,3,3,4,5,6,7])

即如果在B中找到了A中的任何值，请将其从A中删除，如果不保留它。

我想知道是否有没有for循环的任何方法，因为它是一个很长的数组，因此循环需要很长时间。

Answer 1

使用`searchsorted`

使用已排序的B，我们可以使用searchsorted-

A[B[np.searchsorted(B,A)] !=  A]

从链接的文档中，searchsorted(a,v)将索引查找到排序数组a中，这样，如果将v中的相应元素插入索引之前，则a的顺序为保留。因此，假设idx = searchsorted(B,A)，我们将B索引到B[idx]中，我们将得到B的映射版本，该版本对应于A中的每个元素。因此，如果将此映射版本与A进行比较，则会告诉我们A中的每个元素，如果B中是否存在匹配项。最后，索引到A中以选择不匹配的内容。

一般情况（B未排序）：

如果B尚未按照先决条件进行排序，请对其进行排序，然后使用建议的方法。

或者，我们可以将sorter参数与searchsorted-

sidx = B.argsort()
out = A[B[sidx[np.searchsorted(B,A,sorter=sidx)]] != A]

使用`in1d/isin`

我们还可以使用np.in1d，这很简单（链接的文档应该帮助澄清），因为它会针对B中的每个元素在A中查找任何匹配项，然后我们可以将boolean-indexing与倒置掩码一起使用以查找不匹配的掩码-

A[~np.in1d(A,B)]

与isin相同-

A[~np.isin(A,B)]

带有invert标志-

A[np.in1d(A,B,invert=True)]

A[np.isin(A,B,invert=True)]

这解决了B不一定要排序时的泛型问题。

Answer 2

我对numpy不太熟悉，但是如何使用集合：

C = set(A.flat) - set(B.flat)

编辑：从注释开始，集合不能有重复的值。

所以另一种解决方案是使用lambda表达式：

C = np.array(list(filter(lambda x: x not in B, A)))

Answer 3

添加到上面的Divakar's answer-

如果原始数组A的范围比B大，这将为您提供“索引超出范围”错误。参见：

A = np.array([1,1,2,3,3,3,4,5,6,7,8,8,10,12,14])
B = np.array([1,2,8])

A[B[np.searchsorted(B,A)] !=  A]
>> IndexError: index 3 is out of bounds for axis 0 with size 3

之所以会发生这种情况，是因为在此示例中，np.searchsorted将索引3（B中的倒数第二个）分配为适当的位置，以便将B中的元素10、12和14插入到B中。因此，您在B[np.searchsorted(B,A)]中得到了IndexError。

为避免这种情况，可能的方法是：

def subset_sorted_array(A,B):
    Aa = A[np.where(A <= np.max(B))]
    Bb = (B[np.searchsorted(B,Aa)] !=  Aa)
    Bb = np.pad(Bb,(0,A.shape[0]-Aa.shape[0]), method='constant', constant_values=True)
    return A[Bb]

其工作方式如下：

# Take only the elements in A that would be inserted in B
Aa = A[np.where(A <= np.max(B))]

# Pad the resulting filter with 'Trues' - I split this in two operations for
# easier reading
Bb = (B[np.searchsorted(B,Aa)] !=  Aa)
Bb = np.pad(Bb,(0,A.shape[0]-Aa.shape[0]),  method='constant', constant_values=True)

# Then you can filter A by Bb
A[Bb]
# For the input arrays above:
>> array([ 3,  3,  3,  4,  5,  6,  7, 10, 12, 14])

请注意，这在字符串数组和其他类型（对于定义了比较<=运算符的所有类型）之间也将起作用。

如果存在于另一个数组中，则从一个数组中删除元素，并保持重复-NumPy / Python

3 个答案:

使用`searchsorted`

使用`in1d/isin`

如果存在于另一个数组中，则从一个数组中删除元素，并保持重复-NumPy / Python

3 个答案:

使用searchsorted

使用in1d/isin

使用`searchsorted`

使用`in1d/isin`