Question

我正在解决此leetcode问题link，并使用heapq模块找到了一个惊人的解决方案，此功能的运行时间非常少。这是在程序之下：

from itertools import islice
import heapq

def nlargest(n, iterable):
    """Find the n largest elements in a dataset.

    Equivalent to:  sorted(iterable, reverse=True)[:n]
    """
    if n < 0:
        return []
    it = iter(iterable)
    result = list(islice(it, n))
    if not result:
        return result
    heapq.heapify(result)
    _heappushpop = heapq.heappushpop
    for elem in it:
        _heappushpop(result, elem)
    result.sort(reverse=True)
    return result

print nlargest(5, [10, 122, 2, 3, 3, 4, 5, 5, 10, 12, 23, 18, 17, 15, 100, 101])

这个算法非常聪明，您也可以在这里进行可视化LINK

但我很难理解整个算法的时间复杂度。这是我的分析，如果我错了，请纠正我！

时间复杂度：

result = list(islice(it, n)) - > O(n)

heapq.heapify(result) -> O(len(result)) 

for elem in it:
        _heappushpop(result, elem)  -> I am confused at this part

result.sort(reverse=True) -> O(len(result)*log(len(result)))

任何人都可以帮我理解算法的整体时间复杂度。

Answer 1

所以你在这里有两个相关的参数：n（要返回的项目数），以及M（数据集中的项目数）。

islice(it, n) -- O(n)
heapify(result) -- O(n), because len(result)=n
for elem in it: _heappushpop(result, elem) -- performing M-N times an operation of O(logn), because len(result) remains n, i.e. (M-N)*logn
result.sort(reverse=True) -- O(n*logn)

总体：

n + n + (M-n)*logn + n*logn

导致O(M*logn)。您可以很容易地看到主要部分是heappushpop循环（假设M>＆gt; n，否则问题就不那么有趣了，因为解决方案或多或少地减少了排序）。

值得指出的是有inear-time algorithms来解决这个问题，所以如果你的数据集很大，那么值得一试。

在Python中查找Kth最大元素的总体复杂性

1 个答案: