Question

为什么startwith的实现比切片慢？

In [1]: x = 'foobar'

In [2]: y = 'foo'

In [3]: %timeit x.startswith(y)
1000000 loops, best of 3: 321 ns per loop

In [4]: %timeit x[:3] == y
10000000 loops, best of 3: 164 ns per loop

令人惊讶的是，即使包括计算长度，切片仍然显着更快：

In [5]: %timeit x[:len(y)] == y
1000000 loops, best of 3: 251 ns per loop

注意：Python for Data Analysis（第3章）中注明了此行为的第一部分，但未提供相关说明。

如果有帮助：here is the C code for startswith;这是dis.dis的输出：

In [6]: import dis

In [7]: dis_it = lambda x: dis.dis(compile(x, '<none>', 'eval'))

In [8]: dis_it('x[:3]==y')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_CONST               0 (3)
              6 SLICE+2             
              7 LOAD_NAME                1 (y)
             10 COMPARE_OP               2 (==)
             13 RETURN_VALUE        

In [9]: dis_it('x.startswith(y)')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_ATTR                1 (startswith)
              6 LOAD_NAME                2 (y)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE

Answer 1

可以通过考虑.运算符执行其操作所需的时间来解释性能差异的一些：

>>> x = 'foobar'
>>> y = 'foo'
>>> sw = x.startswith
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 316 ns per loop
>>> %timeit sw(y)
1000000 loops, best of 3: 267 ns per loop
>>> %timeit x[:3] == y
10000000 loops, best of 3: 151 ns per loop

差异的另一部分可以通过startswith是函数的事实来解释，甚至无操作函数调用也需要一些时间：

>>> def f():
...     pass
... 
>>> %timeit f()
10000000 loops, best of 3: 105 ns per loop

这不是完全解释差异，因为使用切片的版本和len调用函数并且仍然更快（与上面的sw(y)相比 - 267 ns）：

>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 213 ns per loop

我唯一的猜测是，Python可能会优化内置函数的查找时间，或者len调用被大量优化（这可能是真的）。可以使用自定义len func对其进行测试。或者这可能是由LastCoder确定的差异引起的。请注意larsmans'结果，这表明startswith实际上对于更长的字符串更快。上面的整个推理仅适用于那些我正在谈论的开销实际上很重要的情况。

Answer 2

比较不公平，因为您只是衡量startswith返回True的情况。

>>> x = 'foobar'
>>> y = 'fool'
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 221 ns per loop
>>> %timeit x[:3] == y  # note: length mismatch
10000000 loops, best of 3: 122 ns per loop
>>> %timeit x[:4] == y
10000000 loops, best of 3: 158 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 210 ns per loop
>>> sw = x.startswith
>>> %timeit sw(y)
10000000 loops, best of 3: 176 ns per loop

此外，对于更长的字符串，startswith要快得多：

>>> import random
>>> import string
>>> x = '%030x' % random.randrange(256**10000)
>>> len(x)
20000
>>> y = r[:4000]
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 211 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 469 ns per loop
>>> sw = x.startswith
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop

当没有匹配时，这仍然是正确的。

# change last character of y
>>> y = y[:-1] + chr((ord(y[-1]) + 1) % 256)
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 210 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 470 ns per loop
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop
# change first character of y
>>> y = chr((ord(y[0]) + 1) % 256) + y[1:]
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 210 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 442 ns per loop
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop

因此，对于短字符串，startswith可能更慢，因为它针对长字符串进行了优化。

（欺骗从this answer获取随机字符串。）

Answer 3

startswith比切片更复杂......

2924 result = _string_tailmatch(self,
2925 PyTuple_GET_ITEM(subobj, i),
2926 start, end, -1);

这不是干草堆开始时针头的简单字符比较循环。我们正在寻找一个for循环，它遍历vector / tuple（subobj）并在其上调用另一个函数（_string_tailmatch）。多个函数调用有关于堆栈，参数健全性检查等的开销......

startswith是一个库函数，而切片似乎内置于该语言中。

2919 if (!stringlib_parse_args_finds("startswith", args, &subobj, &start, &end))
2920 return NULL;

Answer 4

要引用the docs，startswith会做更多你想的事情：

str.startswith(prefix[, start[, end]])


如果字符串以前缀开头，则返回True，否则返回False。     前缀也可以是要查找的前缀元组。可选     开始，从该位置开始测试字符串。使用可选的结束，停止     比较该位置的字符串。

Answer 5

调用函数非常昂贵。但是，我不知道是否也是用C语言编写的内置函数的情况。

请注意，切片可能还涉及函数调用，具体取决于所使用的对象。

为什么startwith比切片慢

5 个答案: