Question

我正在尝试查看节点是否位于球体的体积内，并将节点ID添加到列表中。但是，算法的效率非常慢，我不知道如何改进它。

我有两个清单。列表A的格式为 [{'num'：ID，'x'：VALUE，'y'：VALUE，'z'：VALUE] ，而列表B的格式为 [{' x'：VALUE，'y'：VALUE，'z'：VALUE，'rad'：VALUE}] 。

两个列表的大小每个都可以超过100,000个。

我目前的代码发布在下面，但效率非常低。

    filteredList = []
    for i in range(len(sList)):

            minx = (sList[i]['x']) - (sList[i]['radius'])
            maxx = (sList[i]['x']) + (sList[i]['radius'])
            miny = (sList[i]['y']) - (sList[i]['radius'])
            maxy = (sList[i]['y']) + (sList[i]['radius'])
            minz = (sList[i]['z']) - (sList[i]['radius'])
            maxz = (sList[i]['z']) + (sList[i]['radius'])

            for j in range(len(nList)):
                    if minx <= nList[j]['x'] <= maxx:
                            if miny <= nList[j]['y'] <= maxy:
                                    if minz <= nList[j]['z'] <= maxz:
                                            tmpRad = findRadius(sList[i],nList[j])
                                            if tmpRad <= sList[i]['radius']:
                                                    filteredList.append(int(nList[j]['num']))

我感到茫然，并欣赏任何想法。

编辑：添加有关数据格式的额外信息。

列表A （nList） - 定义节点，位置为x，y，z和标识符号
[{'y'：0.0，'x'：0.0，'num'：1.0，'z'：0.0}，
{'y'：0.0，'x'：1.0，'num'：2.0，'z'：0.0}，
{'y'：0.0，'x'：2.0，'num'：3.0，'z'：0.0}，
{'y'：0.0，'x'：3.0，'num'：4.0，'z'：0.0}，
{'y'：0.0，'x'：4.0，'num'：5.0，'z'：0.0}，
{'y'：0.0，'x'：5.0，'num'：6.0，'z'：0.0}，
{'y'：0.0，'x'：6.0，'num'：7.0，'z'：0.0}，
{'y'：0.0，'x'：7.0，'num'：8.0，'z'：0.0}，
{'y'：0.0，'x'：8.0，'num'：9.0，'z'：0.0}，
{'y'：0.0，'x'：9.0，'num'：10.0，'z'：0.0}]

列表B （sList） - 使用x，y，z，半径定义球体 [{'y'：18.0，'x'：25.0，'z'：26.0，'radius'：0.0056470000000000001}，
{'y'：29.0，'x'：23.0，'z'：45.0，'radius'：0.0066280000000000002}，
{'y'：46.0，'x'：29.0，'z'：13.0，'radius'：0.014350999999999999}，
{'y'：0.0，'x'：20.0，'z'：25.0，'radius'：0.014866000000000001}，
{'y'：27.0，'x'：31.0，'z'：18.0，'radius'：0.018311999999999998}，
{'y'：10.0，'x'：36.0，'z'：46.0，'radius'：0.024702000000000002}，
{'y'：27.0，'x'：13.0，'z'：48.0，'radius'：0.027300999999999999}，
{'y'：14.0，'x'：1.0，'z'：13.0，'radius'：0.033889000000000002}，
{'y'：31.0，'x'：20.0，'z'：11.0，'radius'：0.034118999999999997}，
{'y'：23.0，'x'：28.0，'z'：8.0，'radius'：0.036683}]

Answer 1

（这个答案涉及简单的优化和Python风格;它适用于现有算法，教授一些优化点，而不是用更有效的方法替换它。）

以下是一些要点，以使代码更易于阅读和理解：

迭代sList，而不是超出范围（len（sList））。 for i in range(len(sList))变为for i in sList，sList[i]变为i。
不需要那个tmpRad;把它内联。
而不是if a: if b: if c:使用if a and b and c。

现在我们在这：

filteredList = []
for i in sList:
    minx = i['x'] - i['radius']
    maxx = i['x'] + i['radius']
    miny = i['y'] - i['radius']
    maxy = i['y'] + i['radius']
    minz = i['z'] - i['radius']
    maxz = i['z'] + i['radius']

    for j in nList:
        if minx <= j['x'] <= maxx and miny <= j['y'] <= maxy and minz <= j['z'] <= maxz and findRadius(i,j) <= i['radius']:
            filteredList.append(int(j['num']))

（PEP 8建议将该长行拆分为不超过80个字符的行; PEP 8还会推荐filtered_list和s_list以及n_list而不是filteredList ，sList和nList。）

我已经将findRadius(i, j) <= i['radius']放在第一位用于样式，因为它看起来更有可能评估为false，从而加快了计算速度。然后我还内联了minx等变量：

filteredList = []
for i in sList:
    for j in nList:
        if findRadius(i, j) <= i['radius'] \
        and i['x'] - i['radius'] <= j['x'] <= i['x'] + i['radius'] \
        and i['y'] - i['radius'] <= j['y'] <= i['y'] + i['radius'] \
        and i['z'] - i['radius'] <= j['z'] <= i['z'] + i['radius']:
            filteredList.append(int(j['num']))

要考虑的一件事是i['x'] - i['radius'] <= j['x'] <= i['x'] + i['radius']可以简化;尝试从所有三个部分中减去i['x']。

您可以通过列表理解来进一步缩短这一点。

filteredList = [int(j['num']) for j in nList for i in sList
        if findRadius(i, j) <= i['radius']
        and i['x'] - i['radius'] <= j['x'] <= i['x'] + i['radius']
        and i['y'] - i['radius'] <= j['y'] <= i['y'] + i['radius']
        and i['z'] - i['radius'] <= j['z'] <= i['z'] + i['radius']]

最后，named tuples（这有副作用使它们也变得不可变，这可能是期望的吗？还要注意它只是Python 2.6，请阅读页面，了解如何使用旧版本的Python）的：

from collections import namedtuple

node = namedtuple('node', 'x y z num')
sphere = namedtuple('sphere', 'x y z radius')

nList = [
        node(x=0.0, y=0.0, z=0.0, num=1.0),
        node(x=1.0, y=0.0, z=0.0, num=2.0),
        node(x=2.0, y=0.0, z=0.0, num=3.0),
        node(x=3.0, y=0.0, z=0.0, num=4.0),
        node(x=4.0, y=0.0, z=0.0, num=5.0),
        node(x=5.0, y=0.0, z=0.0, num=6.0),
        node(x=6.0, y=0.0, z=0.0, num=7.0),
        node(x=7.0, y=0.0, z=0.0, num=8.0),
        node(x=8.0, y=0.0, z=0.0, num=9.0),
        node(x=9.0, y=0.0, z=0.0, num=10.0)]

sList = [
        sphere(x=25.0, y=18.0, z=26.0, radius=0.0056470000000000001),
        sphere(x=23.0, y=29.0, z=45.0, radius=0.0066280000000000002),
        sphere(x=29.0, y=46.0, z=13.0, radius=0.014350999999999999),
        sphere(x=20.0, y=0.0, z=25.0, radius=0.014866000000000001),
        sphere(x=31.0, y=27.0, z=18.0, radius=0.018311999999999998),
        sphere(x=36.0, y=10.0, z=46.0, radius=0.024702000000000002),
        sphere(x=13.0, y=27.0, z=48.0, radius=0.027300999999999999),
        sphere(x=1.0, y=14.0, z=13.0, radius=0.033889000000000002),
        sphere(x=20.0, y=31.0, z=11.0, radius=0.034118999999999997),
        sphere(x=28.0, y=23.0, z=8.0, radius=0.036683)]

然后，您可以sphere['radius']代替sphere.radius。这使得代码更整洁：

filteredList = [int(j.num) for j in nList for i in sList
        if findRadius(i, j) <= i.radius
        and i.x - i.radius <= j.x <= i.x + i.radius
        and i.y - i.radius <= j.y <= i.y + i.radius
        and i.z - i.radius <= j.z <= i.z + i.radius]

或者，没有列表理解，

filteredList = []
for i in sList:
    for j in nList:
        if findRadius(i, j) <= i.radius \
        and i.x - i.radius <= j.x <= i.x + i.radius \
        and i.y - i.radius <= j.y <= i.y + i.radius \
        and i.z - i.radius <= j.z <= i.z + i.radius:
            filteredList.append(int(j.num))

最后，选择更好的名字; [样式根据评论略有变化，将findRadius放在最后，因为它更有可能是计算上的昂贵 - 你是最好的判断，但是，

filteredList = [int(n.num) for n in nodes for s in spheres
        if s.x - s.radius <= n.x <= s.x + s.radius and
            s.y - s.radius <= n.y <= s.y + s.radius and
            s.z - s.radius <= n.z <= s.z + s.radius and
            findRadius(s, n) <= s.radius]

或者，

filteredList = []
for s in spheres:
    for n in nodes:
        if (s.x - s.radius <= n.x <= s.x + s.radius and
            s.y - s.radius <= n.y <= s.y + s.radius and
            s.z - s.radius <= n.z <= s.z + s.radius and
            findRadius(s, n) <= s.radius):
            filteredList.append(int(n.num))

（如果需要，您可以将srad = s.radius放在外部循环中以获得可能的轻微性能增益。）

Answer 2

我们可以从样本中删除

除非您需要按索引迭代列表，否则不应该，也要避免使用范围，并将ifs合并在一起

filteredList = []
for a in sList:

        minx = (a['x']) - (a['radius'])
        maxx = (a['x']) + (a['radius'])
        miny = (a['y']) - (a['radius'])
        maxy = (a['y']) + (a['radius'])
        minz = (a['z']) - (a['radius'])
        maxz = (a['z']) + (a['radius'])

        for b in nList:
                if minx <= b['x'] <= maxx and miny <= b['y'] <= maxy and minz <= b['z'] <= maxz:
                    tmpRad = findRadius(a,b)
                    if tmpRad <= a['radius']:
                        filteredList.append(int(b['num']))

Answer 3

首先，Python不是为那种迭代而构建的。使用索引来获取列表的每个元素是向后的，这是一种由低级语言教授的更快的脑损伤。在Python中，它实际上更慢。 range(len(whatever))实际上会创建一个新的数字列表，然后您处理从该列表中传递给您的数字。你真正想做的是只使用从whatever 传递给你的对象。

当我们处于此状态时，我们可以拉出多次检查的公共s['radius']位，并将边界框的所有if检查放在一行上。哦，我们不需要一个单独的'tmpRad'，我认为'num'已经是int并且不需要转换（如果它们有，为什么？为什么不让它们提前转换时间？）

这些都不会产生巨大的差异，但它至少使它更容易阅读，并且绝对不会受到伤害。

filteredList = []
for s in sList:
  radius = s['radius']
  minx = s['x'] - radius
  maxx = s['x'] + radius
  miny = s['y'] - radius
  maxy = s['y'] + radius
  minz = s['z'] - radius
  maxz = s['z'] + radius

  for n in nList:
    if (minx <= n['x'] <= maxx) and (miny <= n['y'] <= maxy) and \
       (minz <= n['z'] <= maxz) and (findRadius(s, n) <= radius): 
      filteredList.append(n['num'])

现在它至少清楚发生了什么。

但是，对于我们正在处理的问题的规模，听起来我们需要进行算法改进。你可能想要做的是使用某种BSP（二进制空间分区）技术。这种方式的工作原理是：

首先，我们将nList重新排列为树。我们根据x>是否将其切割成8个较小的列表。 0，是否y> 0和是否z>每个点0（3个布尔结果的8个组合）。然后，使用相同的标准 - 例如，每个都被切成8个。如果x / y / z的可能范围是-10..10，那么我们根据是否x> 1来切割“x> 0，y> 0，z> 0”列表。 5，y> 5，z> 5，等等。你明白了。
对于sList中的每个点，我们检查minx是否> 0等美丽的部分：如果minx＆gt; 0，我们不必检查任何'x＆lt; 0'列表，如果maxx＆lt; 0，我们不必检查任何'x＆gt; 0'列表。等等。我们弄清楚边界框与空间的8个“八分圆”中的哪一个相交;对于每一个，我们递归检查那些八分圆等的适当的八分圆，直到我们到达树的叶子，然后我们做正常的入点边框，然后进行点球测试。

Answer 4

实际上，您可以通过以下方式保存所有内容：

filteredList = [int(node['num']) for sphere in sList \
    for node in nList if findRadius(sphere,node)<=sphere['radius']]

如果从点到球体的球体的距离小于球体的半径，那么我想我们可以说它在球体中，对吧？

我认为findRadius的定义如下：

def findRadius(sphere,node):
    return ((node['x']-sphere['x'])**2 + \
            (node['y']-sphere['y'])**2 + \
            (node['z']-sphere['z'])**2)**.5

Answer 5

（AFAICT，以下解决方案在算法上比目前发布的任何其他答案都要快：大约O（ N log N ）vs O（N²）。警告：这个假设您在边界框之间没有大量重叠。）

如果允许预先计算索引结构：

将所有最小/最大x值推入集合并对其进行排序，从而创建跨越x轴的垂直区域列表。将每个区域与包含它的一组边界框相关联。
对min / max y值重复此过程，以创建水平区域列表，并将每个区域与其包含的边界框组相关联。
对于每个被测试的点：
- 使用二进制印章查找包含点的x坐标的水平区域。但是，你真正想要的是与该地区相关的一组边界框。
- 同样，找到与y坐标关联的一组边界框。
- 找到这两组的交集。
使用Pythagoras测试此残留物中的边界框。

Answer 6

接受了所有这些建议，我设法提出了比原始解决方案快50倍的解决方案。

我意识到瓶颈在于我正在使用的数据类型（dicts列表）。在我的演员阵容中循环使用多个列表非常慢，使用集合效率更高。

我做的第一件事就是实现命名元组。我知道我的节点列表是如何编号的，这提供了效率所需的哈希值。

def findNodesInSpheres(sList,nList,nx,ny,nz):
    print "Running findNodesInSpheres"
    filteredList = []
    for a in sList:
            rad = a.radius
            minx = (a.x) - (rad) if (a.x - rad > 0) else 0
            maxx = (a.x) + (rad) if (a.x + rad < nx ) else nx
            miny = (a.y) - (rad) if (a.y - rad > 0) else 0
            maxy = (a.y) + (rad) if (a.y + rad < ny ) else ny
            minz = (a.z) - (rad) if (a.z - rad > 0) else 0
            maxz = (a.z) + (rad) if (a.z + rad < nz ) else nz
            boundingBox = set([ (i + j * (nx + 1) + k * (nx + 1) * (ny + 1)) for i in range (int(minx),int(maxx)+1)
                            for j in range (int(miny),int(maxy)+1) for k in range(int(minz),int(maxz)+1) ])

            for b in sorted(boundingBox):
                    if findRadius(a,nList[b]) <= rad:
                            filteredList.append(nList[b].num)
    return filteredList

使用set（）而不是list提供了大量的加速。数据集越大（nx，ny，nz），加速越快。

仍然可以使用树实现和域分解来改进，如同建议的那样，但目前它可以工作。

感谢大家的建议！

Python优化词典列表之间的比较

6 个答案: