获取array2中不在array2中的元素

时间:2017-02-08 17:37:34

标签: python performance numpy vectorization

主要问题

检索特定数组中在不同数组中找不到的元素的更好/ pythonic方法是什么。这就是我所拥有的;

idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
idata = np.vstack(idata)

我的兴趣在于表现。我的data是一个(X,Y,Z)大小的数组(7000 x 3),我的gdata是(X,Y)数组(11000 x 2)

序言

我正在进行八分之一搜索,以找到每个八分圆中最接近我的圆点(o)的n(例如8)个点(+)。这意味着我的分数(+)减少到只有64(每八分之八)。然后,对于每个gdata,我会保存data中找不到的元素。

enter image description here

import tkinter as tk
from tkinter import filedialog
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from collections import defaultdict

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
nrow, cols = data.shape

file_path1 = filedialog.askopenfilename()
gdata = pd.read_excel(file_path1)
gdata = np.array(gdata, dtype=np.float)
gnrow, gcols = gdata.shape

N=8  
delta = gdata - data[:,:2]
angles = np.arctan2(delta[:,1], delta[:,0])
bins = np.linspace(-np.pi, np.pi, 9)
bins[-1] = np.inf  # handle edge case
octantsort = []

for j in range(gnrow):    
    delta = gdata[j, ::] - data[:, :2]
    angles = np.arctan2(delta[:, 1], delta[:, 0])
    octantsort = []

    for i in range(8):
        data_i = data[(bins[i] <= angles) & (angles < bins[i+1])]
        if data_i.size > 0:
            dist_order = np.argsort(cdist(data_i[:, :2], gdata[j, ::][np.newaxis]), axis=0)
            if dist_order.size < npoint_per_octant+1:
                [octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(dist_order.size)]
            else:
                [octantsort.append(data_i[dist_order[:npoint_per_octant][j]]) for j in range(npoint_per_octant)]
            final = np.vstack(octantsort)

    idata = [np.column_stack(data[k]) for k in range(len(data)) if data[k] not in final]
    idata = np.vstack(idata)

这样做是否有效率和pythonic方法可以提高代码最后两行的性能?

1 个答案:

答案 0 :(得分:0)

如果我正确理解您的代码,那么我会看到以下潜在的节省:

  • final = ...
  • 不要使用arctan它的价格昂贵;因为你只想要八度将坐标比较为零和相互之间
  • 请勿完整argsort,请使用argpartition
  • 使你的octantsort成为&#34; octantargsort&#34;,即将索引存储到数据中,而不是数据点本身;这样可以在最后一行中保存搜索,并允许您使用np.delete删除
  • 不要在列表理解中使用append。这将生成一个立即丢弃的None列表。您可以在理解之外使用list.extend
  • 此外,这些列表推导看起来像是一种将data_i[dist_order[:npoint_per_octant]]转换为列表的复杂方式,为什么不简单地转换,甚至保留为数组,因为你最终想要vstack? / LI>

以下是一些说明这些想法的示例代码:

import numpy as np

def discard_nearest_in_each_octant(eater, eaten, n_eaten_p_eater):
    # build octants
    # start with quadrants ...
    top, left = (eaten < eater).T
    quadrants = [np.where(v&h)[0] for v in (top, ~top) for h in (left, ~left)]
    dcoord2 = (eaten - eater)**2
    dc2quadrant = [dcoord2[q] for q in quadrants]
    # ... and split them
    oct4158 = [q[:, 0] < q [:, 1] for q in dc2quadrant]
    # main loop
    dc2octants = [[q[o], q[~o]] for q, o in zip (dc2quadrant, oct4158)]
    reloap = [[
        np.argpartition(o.sum(-1), n_eaten_p_eater)[:n_eaten_p_eater]
        if o.shape[0] > n_eaten_p_eater else None
        for o in opair] for opair in dc2octants]
    # translate indices
    octantargpartition = [q[so] if oap is None else q[np.where(so)[0][oap]]
                          for q, o, oaps in zip(quadrants, oct4158, reloap)
                          for so, oap in zip([o, ~o], oaps)]
    octantargpartition = np.concatenate(octantargpartition)
    return np.delete(eaten, octantargpartition, axis=0)