Question

阅读Guido对问题Sorting a million 32-bit integers in 2MB of RAM using Python的臭名昭着的回答，我发现了模块heapq。

我也发现我对此并不了解杰克，也不知道我能用它做什么。

你能解释一下（用众所周知的6岁目标）堆队列算法是什么以及你可以用它做什么？

你能提供一个简单的 Python片段吗？使用它（使用heapq模块）解决了一个问题，这个问题可以用它来解决，而不是用别的东西解决？

Answer 1

heapq实现binary heaps，它是部分排序的数据结构。特别是，他们有三个有趣的操作：

heapify在O（ n ）时间内将列表转换为就地堆;
heappush在O（lg n ）时间内向堆中添加元素;
heappop在O（lg n ）时间内从堆中检索最小元素。

许多有趣的算法依靠堆来提高性能。最简单的可能是部分排序：获取列表中 k 最小（或最大）元素而不对整个列表进行排序。 heapq.nsmallest（nlargest）就是这么做的。 implementation of nlargest可以解释为：

def nlargest(n, l):
    # make a heap of the first n elements
    heap = l[:n]
    heapify(heap)

    # loop over the other len(l)-n elements of l
    for i in xrange(n, len(l)):
        # push the current element onto the heap, so its size becomes n+1
        heappush(heap, l[i])
        # pop the smallest element off, so that the heap will contain
        # the largest n elements of l seen so far
        heappop(heap)

    return sorted(heap, reverse=True)

分析：让N为l中的元素数量。 heapify运行一次，费用为O（n）;这可以忽略不计。然后，在一个运行Nn = O（N）次的循环中，我们分别以O（lg n）成本执行heappop和heappush，总运行时间为O（N lg n）。当N>＆gt; n，与其他明显的算法sorted(l)[:n]相比，这是一个很大的胜利，它需要O（N lg N）时间。

Answer 2

例如：您有一组1000个浮点数。您希望重复删除集合中的最小项目，并将其替换为0到1之间的随机数。最快的方法是使用heapq模块：

heap = [0.0] * 1000
# heapify(heap)   # usually you need this, but not if the list is initially sorted
while True:
    x = heappop(heap)
    heappush(head, random.random())

每次迭代需要一段时间，该时间在堆的长度上是对数的（即，对于长度为1000的列表，大约7个单位）。其他解决方案需要一个线性时间（即大约1000个单位，慢140倍，并且当长度增加时变得越来越慢）：

lst = [0.0] * 1000
while True:
    x = min(lst)    # linear
    lst.remove(x)   # linear
    lst.append(random.random())

或：

lst = [0.0] * 1000
while True:
    x = lst.pop()   # get the largest one in this example
    lst.append(random.random())
    lst.sort()      # linear (in this case)

甚至：

lst = [0.0] * 1000
while True:
    x = lst.pop()   # get the largest one in this example
    bisect.insort(lst, random.random())   # linear

什么是堆队列？

2 个答案: