具有(单个)最大分区大小的星条

时间:2018-12-21 10:50:19

标签: python performance

我正在使用“星条”算法从多个列表中选择项目,条k和k + 1之间的星数是第k个列表中的索引。我面临的问题是分区(即两个小节之间的星星数)可能大于列表的大小,这将导致许多无效的组合。

例如:如果我有两个长度分别为8的列表,则(14,0)是有效的star分布,其sum = 14,但是当然会超过第一个列表的容量。 (7,7)是最高的有效索引-因此,我得到了大量无效索引,尤其是在列表大小不相等的情况下。

出于性能原因,我需要分区大小有限的算法变体。我怎样才能做到这一点?我现在正在使用的star-bars实现是this one,但我可以轻松对其进行更改。 列表通常具有相似的长度,但不一定具有相同的长度。将分区大小限制为最长列表的长度是可以的,但是单独的限制当然会更好。

import itertools

def stars_and_bars(stars, partitions):
    for c in itertools.combinations(range(stars+partitions-1), partitions-1):
        yield tuple(right-left-1 for left,right in zip((-1,) + c, c + (stars+partitions-1,)))

def get_items(*args):
    hits = 0
    misses = 0
    tries = 0
    max_idx = sum(len(a) - 1 for a in args)
    for dist in range(max_idx):
        for indices in stars_and_bars(dist, len(args)):
            try:
                tries += 1
                [arg[i] for arg,i in zip(args,indices)]
                hits += 1
            except IndexError:
                misses += 1
                continue
    print('hits/misses/tries: {}/{}/{}'.format(hits, misses, tries))

# Generate 4 lists of length 1..4
lists = [[None]*(r+1) for r in range(4)]
get_items(*lists)
# hits/misses/tries: 23/103/126

编辑:我在mathexchange上发现了两个相关的问题,但是我还不能将它们翻译成代码:

1 个答案:

答案 0 :(得分:1)

基于this post,这里有一些代码可以有效地生成解决方案。与其他文章的主要区别在于,现在存储桶具有不同的限制,并且存储桶的数量是固定的,因此解决方案的数量不是无限的。

def find_partitions(x, lims):
    # partition the number x in a list of buckets;
    # the number of elements of each bucket i is strictly smaller than lims[i];
    # the sum of all buckets is x;
    # output the lists of buckets one by one

    a = [x] + [0 for l in lims[1:]]  # create an output array of the same lenghth as lims, set a[0] to x

    while True:

        # step 1: while a[i] is too large: redistribute to a[i+1]
        i = 0
        while a[i] >= lims[i] and i < len(lims) - 1:
            a[i + 1] += a[i] - (lims[i] - 1)
            a[i] = (lims[i] - 1)
            i += 1
        if a[-1] >= lims[-1]:
            return # the last bucket has too many elements: we've reached the last partition;
                   # this only happens when x is too large

        yield a

        # step 2:  add one to group 1;
        #    while a group i is already full: set to 0 and increment group i+1;
        #    while the surplus is too large (because a[0] is too small): repeat incrementing
        i0 = 1
        surplus = 0
        while True:
            for i in range(i0, len(lims)):  # increment a[i] by 1, which can carry to the left
                if a[i] < lims[i]-1:
                    a[i] += 1
                    surplus += 1
                    break
                else:  # a[i] would become too full if 1 were added, therefore clear a[i] and increment a[i+1]
                    surplus -= a[i]
                    a[i] = 0
            else:  # the for-loop didn't find a small enough a[i]
                return

            if a[0] >= surplus:   # if a[0] is large enough to absorb the surplus, this step is done
                break
            else:  # a[0] would get negative to when absorbing the surplus, set a[i0] to 0 and start incrementing a[i0+1]
                surplus -= a[i0]
                a[i0] = 0
                i0 += 1
                if i0 == len(lims):
                    return

        # step 3: a[0] should absorb the surplus created in step 2, although a[0] can get be too large
        a[0] -= surplus


x = 11
lims = [5, 4, 3, 5]

for i, p in enumerate(find_partitions(x, lims)):
    print(f"partition {i+1}: {p} sums to {sum(p)}  lex: { ''.join([str(i) for i in p[::-1]]) }")

0<=a[0]<50<=a[1]<40<a[2]<30<a[3]<5a[0]+a[1]+a[2]+a[3] == 11的19个解决方案(从右到左书写,它们的词汇顺序递增) :

[4, 3, 2, 1]
[4, 3, 1, 2]
[4, 2, 2, 2]
[3, 3, 2, 2]
[4, 3, 0, 3]
[4, 2, 1, 3]
[3, 3, 1, 3]
[4, 1, 2, 3]
[3, 2, 2, 3]
[2, 3, 2, 3]
[4, 2, 0, 4]
[3, 3, 0, 4]
[4, 1, 1, 4]
[3, 2, 1, 4]
[2, 3, 1, 4]
[4, 0, 2, 4]
[3, 1, 2, 4]
[2, 2, 2, 4]
[1, 3, 2, 4]

在测试代码中,您可以将for indices in stars_and_bars(dist, len(args)):替换为for indices in find_partitions(dist, limits):,其中limits = [len(a) for a in args]。然后您将得到hits/misses/tries: 23/0/23。要获得全部24个解决方案,dist的for循环还应允许最后一个:for dist in range(max_idx+1):

PS:如果只希望列表中元素的所有可能组合,而又不关心首先获得最小的索引,则itertools.product会生成它们:

lists = [['a'], ['b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i', 'j']]
for i, p in enumerate(itertools.product(*lists)):
    print(i+1, p)