Question

我正在进行如下的分词实验。

lst是一系列字符，output是所有可能的字词。

lst = ['a', 'b', 'c', 'd']

def foo(lst):
    ...
    return output

output = [['a', 'b', 'c', 'd'],
          ['ab', 'c', 'd'],
          ['a', 'bc', 'd'],
          ['a', 'b', 'cd'],
          ['ab', 'cd'],
          ['abc', 'd'],
          ['a', 'bcd'],
          ['abcd']]

我已在combinations库中检查了permutations和itertools 并尝试了combinatorics 然而，似乎我在看错了，因为这不是纯粹的排列组合......

似乎我可以通过使用大量循环来实现这一点，但效率可能很低。

修改

单词顺序非常重要，因此['ba', 'dc']或['cd', 'ab']等组合无效。

订单应始终为从左到右。

修改

@Stuart的解决方案在Python 2.7.6中不起作用

修改

@Stuart的解决方案在Python 2.7.6中有效，请参阅下面的评论。

Answer 1

itertools.product确实可以帮到你。

这个想法是： - 考虑由板块分隔的A1，A2，...，AN。将有N-1板。如果有平板，则存在分段。如果没有平板，则有连接。因此，对于给定的长度为N的序列，您应该具有2 ^（N-1）个这样的组合。

就像下面的

import itertools
lst = ['a', 'b', 'c', 'd']
combinatorics = itertools.product([True, False], repeat=len(lst) - 1)

solution = []
for combination in combinatorics:
    i = 0
    one_such_combination = [lst[i]]
    for slab in combination:
        i += 1
        if not slab: # there is a join
            one_such_combination[-1] += lst[i]
        else:
            one_such_combination += [lst[i]]
    solution.append(one_such_combination)

print solution

Answer 2

有8个选项，每个选项反映二进制数0到7：

每个0和1表示该索引处的2个字母是否“粘合”在一起。 0表示否，1表示是。

>>> lst = ['a', 'b', 'c', 'd']
... output = []
... formatstr = "{{:0{}.0f}}".format(len(lst)-1)
... for i in range(2**(len(lst)-1)):
...     output.append([])
...     s = "{:b}".format(i)
...     s = str(formatstr.format(float(s)))
...     lstcopy = lst[:]
...     for j, c in enumerate(s):
...         if c == "1":
...             lstcopy[j+1] = lstcopy[j] + lstcopy[j+1]
...         else:
...             output[-1].append(lstcopy[j])
...     output[-1].append(lstcopy[-1])
... output
[['a', 'b', 'c', 'd'],
 ['a', 'b', 'cd'],
 ['a', 'bc', 'd'],
 ['a', 'bcd'],
 ['ab', 'c', 'd'],
 ['ab', 'cd'],
 ['abc', 'd'],
 ['abcd']]
>>>

Answer 3

#!/usr/bin/env python
from itertools import combinations
a = ['a', 'b', 'c', 'd']
a = "".join(a)
cuts = []
for i in range(0,len(a)):
    cuts.extend(combinations(range(1,len(a)),i))
for i in cuts:
    last = 0
    output = []
    for j in i:
        output.append(a[last:j])
        last = j
    output.append(a[last:])
    print(output)

输出：

zsh 2419 % ./words.py  
['abcd']
['a', 'bcd']
['ab', 'cd']
['abc', 'd']
['a', 'b', 'cd']
['a', 'bc', 'd']
['ab', 'c', 'd']
['a', 'b', 'c', 'd']

Answer 4

您可以使用递归生成器：

def split_combinations(L):
    for split in range(1, len(L)):
        for combination in split_combinations(L[split:]):
            yield [L[:split]] + combination
    yield [L]

print (list(split_combinations('abcd')))

编辑。我不确定这对于长字符串的扩展程度如何，以及它在多大程度上都会影响Python的递归限制。与其他一些答案类似，您也可以使用combinations中的itertools来处理每个可能的分割点组合。

def split_string(s, t):
    return [s[start:finish] for start, finish in zip((None, ) + t, t + (None, ))]

def split_combinations(s):
    for i in range(len(s)):
        for split_points in combinations(range(1, len(s)), i):
            yield split_string(s, split_points)

这些似乎都在Python 2.7（see here）和Python 3.2（here）中按预期工作。正如@twasbrillig所说，请确保按图示缩进。

Python：使用一系列字符查找所有可能的单词组合（分词）

4 个答案: