Question

假设我有一份清单

S = [list1, list2, ...]

我希望编写一个函数find，这样对于输入x，该函数将查找x是否在S的某个子列表中，然后找不到列表或返回None id x的输出。

（注意：任何两个子列表的交集都是空的，因此最多只能找到一个列表。）

我的代码很简单：

def find(x):
    for L in S:
        if x in L:
            return L
    return None

但我看到有人这样写：

def find(x):
    try:
        return next( L for L in S if x in L)
    except StopIteration:
        return None

我想知道两个代码之间的差异是什么？第二个比第一个更受欢迎吗？（例如，从软件项目的角度来看）

Answer 1

不同之处在于，如果您可以在该项目中找到S，则第二个版本会构建一个生成项列表x中的项目的生成器。

然后它尝试通过调用next来返回从该生成器产生的第一个对象。

从概念上讲，两个片段之间确实差别不大，请注意他们如何使用for L in S - ＆gt; if x in L，第一个是传统的for循环，其中包含if语句，第二个是理解形式。这两个版本都是懒惰的，即当找到匹配项时它们会立即返回。

我认为你的代码非常好。第二个可以使用默认值来避免手动异常处理，即

return next((L for L in S if x in L), None)

尝试返回生成器产生的第一个项目，或者None如果没有这样的项目。是否值得构建一个应该在这里产生单个项目的生成器，它是否更具可读性？我说＆＃34;可能不是＆＃34;在我看来。

Answer 2

您的代码很好，但使用列表推导可以更简洁。第二个解决方案使用生成器理解创建generator。由于已知两个列表的交集是空集，因此生成器最多只包含一个元素。

在这里使用生成器引入了一些开销，如果你只比较几个列表，列表理解可以快得多。

def find_list(x, S):
    ret = [L for L in S if x in L]
    return ret[0] if len(ret) else None

def find_iter(x, S):
    ret = (L for L in S if x in L)
    try:
        return next(ret)
    except StopIteration:
        return None

运行时测试在交互式iPython shell中：

In [1]: S = [["a"], ["b", "c",], ["d"]]

In [2]: %timeit find_list("b", S)
1000000 loops, best of 3: 475 ns per loop

In [3]: %timeit find_list("f", S)
1000000 loops, best of 3: 349 ns per loop

In [4]: %timeit find_iter("b", S)
1000000 loops, best of 3: 802 ns per loop

In [5]: %timeit find_iter("f", S)
100000 loops, best of 3: 1.58 µs per loop

修改

使用@timgeb优化的生成器版本，生成器理解更接近：

def find_iter_opt(x, S):
    ret = (L for L in S if x in L)
    return next(ret, None)

In [8]: %timeit find_iter_opt("b", S)
1000000 loops, best of 3: 751 ns per loop

In [9]: %timeit find_iter_opt("f", S)
1000000 loops, best of 3: 597 ns per loop

这两个实现之间的区别是什么？

2 个答案:

运行时测试在交互式iPython shell中：

修改