用于查找另一点距离内的所有点的算法

时间:2016-08-11 23:05:59

标签: algorithm voronoi

我有一个关于工作的入门测试的问题。我没有通过考试。我依照公司来掩饰这个问题。

想象一下,你在A X B空间的公园里有N个人。如果一个人在50英尺内没有其他人,他就享有他的隐私。否则,他的个人空间就会受到侵犯。给定一组(x,y),有多少人会违反他们的空间?

例如,在Python中提供此列表:

人= [(0,0),(1,1),(1000,1000)]

我们会发现2个人的空间受到侵犯:1,2。

我们不需要找到所有人;只是独特人的总数。

您无法使用粗暴方法来解决问题。换句话说,您不能在数组中使用简单数组。

我一直在研究这个问题已经持续了几个星期,虽然我得到的解决方案比n ^ 2更快,但还没有出现可扩展的问题。

我认为解决这个问题的唯一正确的方法是使用Fortune的算法?

这是我在Python中所拥有的(不使用Fortune' s算法):

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability
TEST = True

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

def _get_distance(x1, y1, x2, y2):
    """
    require: x1, y1, x2, y2: all integers
    return: a distance as a float
    """
    distance = math.sqrt(math.pow((x1 - x2), 2) + math.pow((y1 - y2),2))
    return distance

def check_real_distance(people1, people2, conflict_radius):
    """
    determine if two people are too close

    """
    if people2[1] - people1[1] > conflict_radius:
        return False
    d = _get_distance(people1[0], people1[1], people2[0], people2[1])
    if d >= conflict_radius:
        return False
    return True

def check_for_conflicts(peoples, conflict_radius):
    # sort  people
    def sort_func1(the_tuple):
        return the_tuple[0]
    _peoples = []
    index = 0
    for people in peoples:
        _peoples.append((people[0], people[1], index))
        index += 1
    peoples = _peoples
    peoples = sorted(peoples, key = sort_func1)
    conflicts_dict = {}
    i = 0
    # use a type of sweep strategy
    while i < len(peoples) - 1:
        x_len = peoples[i + 1][0] - peoples[i][0]
        conflict = False
        conflicts_list =[peoples[i]]
        j = i + 1
        while x_len <= conflict_radius and j < len(peoples):
            x_len = peoples[j][0] - peoples[i][0]
            conflict = check_real_distance(peoples[i], peoples[j], conflict_radius)
            if conflict:
                people1 = peoples[i][2]
                people2 = peoples[j][2]
                conflicts_dict[people1] = True
                conflicts_dict[people2] = True
            j += 1
        i += 1
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

5 个答案:

答案 0 :(得分:3)

从评论中可以看出,有很多方法可以解决这个问题。在面试的情况下,你可能想尽可能多地列出,并说出每个人的优点和缺点。

对于上述问题,如果您有一个固定的半径,最简单的方法可能是rounding and hashing。 k-d树等是强大的数据结构,但它们也非常复杂,如果你不需要反复查询它们或者添加和删除对象,它们可能会有点过分。哈希可以实现线性时间,而空间树是n log n,尽管它可能取决于点的分布。

要理解散列和舍入,只需将其视为将空间划分为长度等于要检查的半径的正方形网格。每个方块都有自己的“邮政编码”,您可以将其用作哈希键,以便在该方格中存储值。您可以通过将x和y坐标除以半径来计算点的邮政编码,然后向下舍入,如下所示:

def get_zip_code(x, y, radius):
    return str(int(math.floor(x/radius))) + "_" + str(int(math.floor(y/radius)))

我正在使用字符串,因为它很简单,但只要为每个方块生成唯一的邮政编码,就可以使用任何字符串。

创建一个字典,其中键是邮政编码,值是该邮政编码中所有人的列表。要检查冲突,请一次添加一个人,然后在添加每个人之前,测试与同一邮政编码中所有人的冲突,以及邮政编码的8个邻居。我重复使用你的方法来跟踪冲突:

def check_for_conflicts(peoples, conflict_radius):

    index = 0
    d = {}
    conflicts_dict = {}
    for person in peoples:  

        # check for conflicts with people in this person's zip code
        # and neighbouring zip codes:
        for offset_x in range(-1, 2):
            for offset_y in range(-1, 2):
                 offset_zip_code = get_zip_code(person[0] + (offset_x * conflict_radius), person[1] + (offset_y * conflict_radius), conflict_radius)

                 if offset_zip_code in d:
                     # get a list of people in this zip:
                     other_people = d[offset_zip_code]
                     # check for conflicts with each of them:
                     for other_person in other_people:
                         conflict = check_real_distance(person, other_person, conflict_radius)
                         if conflict:
                             people1 = index
                             people2 = other_person[2]
                             conflicts_dict[people1] = True
                             conflicts_dict[people2] = True

        # add the new person to their zip code
        zip_code = get_zip_code(person[0], person[1], conflict_radius)
        if not zip_code in d:
            d[zip_code] = []
        d[zip_code].append([person[0], person[1], index])
        index += 1

    return len(conflicts_dict.keys())

时间的复杂性取决于几个方面。如果你增加人数,但不增加你分配它们的空间的大小,那么它将是O(N 2 ),因为冲突的数量将以二次方式增加而且你必须把它们全部计算在内。但是,如果你增加空间和人数,那么密度是相同的,它将更接近O(N)。

如果您只计算独特的人数,则可以计算每个邮政编码中有多少人至少有1次冲突。如果它与邮政编码中的每个人相等,那么在第一次与新人发生冲突后,您可以提前退出检查给定邮编中的冲突的循环,因为不会再找到任何唯一身份证。您也可以循环两次,在第一个循环中添加所有人,并在第二个循环上进行测试,当您发现每个人的第一个冲突时突然出现循环。

答案 1 :(得分:0)

以下是我对这个有趣问题的解决方案:

from math import sqrt
import math
import random


class Person():

    def __init__(self, x, y, conflict_radius=500):
        self.position = [x, y]
        self.valid = True
        self.radius = conflict_radius**2

    def validate_people(self, people):
        P0 = self.position

        for p in reversed(people):
            P1 = p.position
            dx = P1[0] - P0[0]
            dy = P1[1] - P0[1]
            dx2 = (dx * dx)

            if dx2 > self.radius:
                break

            dy2 = (dy * dy)
            d = dx2 + dy2

            if d <= self.radius:
                self.valid = False
                p.valid = False

    def __str__(self):
        p = self.position
        return "{0}:{1} - {2}".format(p[0], p[1], self.valid)


class Park():

    def __init__(self, num_people=10000, park_size=128000):
        random.seed(1)
        self.num_people = num_people
        self.park_size = park_size

    def gen_coord(self):
        return int(random.random() * self.park_size)

    def generate(self):
        return [[self.gen_coord(), self.gen_coord()] for i in range(self.num_people)]


def naive_solution(data):
    sorted_data = sorted(data, key=lambda x: x[0])
    len_sorted_data = len(sorted_data)
    result = []

    for index, pos in enumerate(sorted_data):
        print "{0}/{1} - {2}".format(index, len_sorted_data, len(result))
        p = Person(pos[0], pos[1])
        p.validate_people(result)
        result.append(p)

    return result

if __name__ == '__main__':
    people_positions = Park().generate()

    with_conflicts = len(filter(lambda x: x.valid, naive_solution(people_positions)))
    without_conflicts = len(filter(lambda x: not x.valid, naive_solution(people_positions)))
    print("people with conflicts: {}".format(with_conflicts))
    print("people without conflicts: {}".format(without_conflicts))

我确信代码仍然可以进一步优化

答案 2 :(得分:0)

您可以看到this topcoder链接和“最近对”部分。您可以修改最近的对算法,使距离h始终为50。

所以,你基本上做的是,

  • 按X坐标对人进行排序
  • 从左向右扫描。
  • 保持平衡的二叉树并将所有点保持在二叉树中的50个半径内。二叉树的将是点的Y坐标
  • 选择Y-50和Y + 50的点,这可以在lg(n)时间内使用二叉树来完成。
  • 因此整体复杂性变为nlg(n)
  • 请确保将您找到的点标记为将来跳过这些点。

你可以在C ++中使用 set 作为二叉树。但是我找不到python set 是否支持范围查询或upper_bound和lower_bound。如果有人知道,请在评论中指出。

答案 3 :(得分:0)

我找到了相对解决问题的方法。按X值对坐标列表进行排序。然后一次查看每个X值。向右扫描,检查下一个位置的位置,直到达到扫描区域的末端(500米),或发现冲突。

如果未发现冲突,请以相同方式向左扫描。此方法可避免不必要的检查。例如,如果公园内有1,000,000人,那么所有人都会发生冲突。该算法只会检查每个人一次:一旦发现冲突,搜索就会停止。

我的时间似乎是O(N)。

以下是代码:

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
        and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)


def check_for_conflicts(peoples, conflict_radius):
    peoples.sort(key = lambda x: x[0])
    conflicts_dict = {}
    i = 0
    num_checks = 0
    # use a type of sweep strategy
    while i < len(peoples)  :
        conflict = False
        j = i + 1
        #sweep right
        while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j += 1
        j = i - 1
        #sweep left
        while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j -= 1
        i += 1
    print("num checks is {0}".format(num_checks))
    print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

答案 4 :(得分:0)

我想出了一个似乎需要O(N)时间的答案。策略是按X值对数组进行排序。对于每个X值,向左扫描直到找到冲突,或者距离超过冲突距离(500 M)。如果未发现冲突,则以相同方式向左扫描。使用此技术,您可以限制搜索量。

以下是代码:

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
        and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)


def check_for_conflicts(peoples, conflict_radius):
    peoples.sort(key = lambda x: x[0])
    conflicts_dict = {}
    i = 0
    num_checks = 0
    # use a type of sweep strategy
    while i < len(peoples)  :
        conflict = False
        j = i + 1
        #sweep right
        while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j += 1
        j = i - 1
        #sweep left
        while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j -= 1
        i += 1
    print("num checks is {0}".format(num_checks))
    print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))