Question

我有一个按输入字符串生成的元素长度排序的列表示例：

tst_str =  "crystalapplehatcat"

我有从列表中获取所有连续子字符串并删除长度小于 2 个字符的字符串的从属函数。

#   return list : all continuous substrings
def get_all_substrings(input_string):
    length = len(input_string)
    return [input_string[i:j + 1] for i in range(length) for j in range(i, length)]


#   return list : pruned continuous substrings
def get_pruned_list(input_list, size_floor=2):
    pruned_list = []
    for element in input_list:
        if len(element) > size_floor:
            pruned_list.append(element)
    return pruned_list

要处理的结果列表具有以下形式。

tst_list = ['crystal', 'cryst', 'apple', 'tala', 'alap', 'lapp', 'appl', 'cry', 'sta', 'tal', 'ala', 'lap', 'app', 'ppl', 'hat', 'cat']

我只需要返回列表中最大的唯一字符串：

result_list = ['crystal','apple','hat', 'cat']

我一直在尝试使用 any() 根据列表的其余部分处理当前元素，但返回的值不正确

附加：输入字符串可以长达 128 个字符。一些输入列表可以有重叠字符串 = 'generalisedeep' from root -> generalize deep ，所以只会返回一个字符串：['generalised'] 应该是 ['generalise', 'deep']。

Answer 1

递归方法会起作用，通过跟踪单词组合的最长覆盖范围（在使用字符串的其余部分递归之前删除找到的单词）：

def findWords(S,words):
    result = []
    for i,word in enumerate(words):
        if word not in S: continue
        match = [word]+findWords(S.replace(word," "),words[i+1:])
        if sum(map(len,match)) > sum(map(len,result)):
            result = match
    return result

输出：

tst_str =  "crystalapplehatcat"
tst_list = ['crystal', 'cryst', 'apple', 'tala', 'alap', 'lapp', 'appl', 'cry', 'sta', 'tal', 'ala', 'lap', 'app', 'ppl', 'hat', 'cat']
print(findWords(tst_str,tst_list))
['crystal', 'apple', 'hat', 'cat']

tst_str  = "generalisedeep"
tst_list = ["generalised","generalise","deep"]
print(findWords(tst_str,tst_list))
['generalise', 'deep']

Python 只返回子字符串列表中最大的子字符串字符串

1 个答案: