Question

假设我们有一个包含整数的巨大排序列表。使用不在列表中的上限来切片此列表的最快方法是什么？

例如，假设我们的列表是：

l=list(range(0,1000000, 2))

（这是一个简单的例子，列表可以是任意长度的，没有特定的间隔，所以它不能与某个范围相关）

我们想要得到一个切片，其中的项目小于 limit=1001

实现这一目标的最快方法是什么，最好不检查列表中的所有项目？一种常见的方法是使用列表推导式，例如[i for i in l if i<limit]，但是这样我们必须检查l 的所有项目并将它们与limit 进行比较。如果限制在列表中，我们可以使用类似 l[:l.index(limit)] 的东西，但如果它不在列表中呢？有什么想法吗？

Answer 1

您可以使用 bisect：

form.getTextField('field_name').acroField.setFontSize(20);

Answer 2

我只想为这两个答案提供一些比较时间。

鉴于此基准：

import bisect 
import time 

def f1(l, tgt):
    return bisect.bisect_right(l, tgt)

def f2(l,tgt):
    slice_condition = lambda num: num >= tgt
    try:
        slice_idx = next(idx for idx, num in enumerate(l) if slice_condition(num))
    except StopIteration:
        slice_idx = len(l)
    return slice_idx 

def f3(l,tgt):
    return next((idx for idx, val in enumerate(l) if val>=tgt), len(l))


def cmpthese(funcs, args=(), cnt=10, rate=True, micro=True, deepcopy=True):
    from copy import deepcopy 
    """Generate a Perl style function benchmark"""                   
    def pprint_table(table):
        """Perl style table output"""
        def format_field(field, fmt='{:,.0f}'):
            if type(field) is str: return field
            if type(field) is tuple: return field[1].format(field[0])
            return fmt.format(field)     

        def get_max_col_w(table, index):
            return max([len(format_field(row[index])) for row in table])         

        col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
        for i,row in enumerate(table):
            # left col
            row_tab=[row[0].ljust(col_paddings[0])]
            # rest of the cols
            row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
            print(' '.join(row_tab))                

    results={}
    for i in range(cnt):
        for f in funcs:
            if args:
                local_args=deepcopy(args)
                start=time.perf_counter_ns()
                f(*local_args)
                stop=time.perf_counter_ns()
            results.setdefault(f.__name__, []).append(stop-start)
    results={k:float(sum(v))/len(v) for k,v in results.items()}     
    fastest=sorted(results,key=results.get, reverse=True)
    table=[['']]
    if rate: table[0].append('rate/sec')
    if micro: table[0].append('\u03bcsec/pass')
    table[0].extend(fastest)
    for e in fastest:
        tmp=[e]
        if rate:
            tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))

        if micro:
            tmp.append('{:,.1f}'.format(results[e]/float(cnt)))

        for x in fastest:
            if x==e: tmp.append('--')
            else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
        table.append(tmp) 

    pprint_table(table)                    

if __name__=='__main__':
    import sys
    print(sys.version)
    
    small=range(1_000)
    mid=range(100_000)
    large=range(1_000_000)
    cases=(
        ('small, found', small, len(small)//2),
        ('small, not found', small, len(small)),
        ('mid, found', mid, len(mid)//2),
        ('mid, not found', mid, len(mid)),
        ('large, found', large, len(large)//2),
        ('large, not found', large, len(large))
    )
    for txt, x, tgt in cases:
        print(f'\n{txt}:')
        l=list(x)
        args=(l,tgt)
            cmpthese([f1,f2,f3],args)

如果您使用小型、中型和大型列表运行它，每个列表的情况为 1) 在中间找到或 2) 一直扫描到最后，您可以看到 bisect 是 明显更快。数量级。

基准打印在我的电脑上：

3.9.1 (default, Jan 30 2021, 15:51:59) 
[Clang 12.0.0 (clang-1200.0.32.29)]

small, found:
   rate/sec μsec/pass      f2      f3     f1
f2      182   5,501.4      --  -59.2% -98.8%
f3      445   2,246.9  144.9%      -- -96.9%
f1   14,562      68.7 7911.4% 3172.0%     --

small, not found:
   rate/sec μsec/pass       f2      f3     f1
f2       90  11,053.2       --  -58.8% -99.5%
f3      220   4,555.5   142.6%      -- -98.7%
f1   17,349      57.6 19076.4% 7803.3%     --

mid, found:
   rate/sec μsec/pass        f2       f3     f1
f2        2 561,882.8        --   -57.2% -99.9%
f3        4 240,253.2    133.9%       -- -99.9%
f1    2,942     339.9 165184.0% 70573.1%     --

mid, not found:
   rate/sec   μsec/pass        f2        f3      f1
f2        1 1,119,041.1        --    -58.0% -100.0%
f3        2   469,960.8    138.1%        --  -99.9%
f1    3,804       262.9 425552.8% 178660.3%      --

large, found:
   rate/sec   μsec/pass        f2       f3     f1
f2        0 5,833,734.0        --   -55.3% -99.9%
f3        0 2,605,010.2    123.9%       -- -99.9%
f1      335     2,988.1 195135.5% 87080.9%     --

large, not found:
   rate/sec    μsec/pass        f2        f3      f1
f2        0 11,553,311.3        --    -54.4% -100.0%
f3        0  5,264,216.7    119.5%        -- -100.0%
f1      710      1,408.9 819923.5% 373540.2%      --

Answer 3

在 O(n) 中（特别是 2k，其中 k 是需要对列表进行切片的索引）中执行此操作的一种快速简便的方法，而不依赖于二分搜索使用迭代器搜索第一个不满足条件的元素的索引，并切片到该点：

slice_condition = lambda num: num >= limit
slice_idx = next((idx for idx, num in enumerate(l) if slice_condition(num)), len(l))
slice = l[:slice_idx]

当然，二分查找会在 slice_idx 时间内找到 O(log(n))，但无论如何切片都是线性操作，因此整个单元的复杂度仍然是 O(n)。

切片上限不在列表中的列表的最快方法

3 个答案: