在m个字符的列表中查找n个不同字符序列的每个匹配项

时间:2017-02-28 19:47:39

标签: python python-3.x pattern-matching

举个例子:

我有3个清单 -

seq_to_find = ['abc','de'] (length = n)

main_list = ['a','b','c','ghi','d','e','far','last ','a','b','c'] (长度= m)

transaction_nums = [1,3,6,8,10,15,16,17,19,20,22] (注意:总是排序,长度= m)

如何找到main_list中出现的每个序列的起始和结束索引号。

换句话说,我想写一个函数,Say

def findTheMasks(seq_to_find,main_list,transaction_nums):

    returns a list with sublists having "start" and "end" transaction_nums

对于上面给出的例子:[[1,6] [10,15] [19,22]]

请帮忙。提前致谢。

1 个答案:

答案 0 :(得分:0)

我认为,部分匹配的序列不会进入结果。而且我还假设没有空('')seq。这是一个示例解决方案。

seq_to_find = ['abc', 'de']
main_list= ['a','b','c','ghi','d','e','far','last','a','b','c'] 
transaction_nums=[1,3,6,8,10,15,16,17,19,20,22]

def findTheMasks(seq_to_find,main_list,transaction_nums):
    ret = []

    # go through each in main list
    for i in range(0, len(main_list)):
        # try to match each seq
        for seq in seq_to_find:
            remain = seq

            # match seq from start, reduce seq if any match, until empty
            for j in range(i, len(main_list)):
                x = main_list[j]

                # remain matches next in main list
                if remain.startswith(x):
                    remain = remain.replace(x, '', 1)
                    # everything matched
                    if not remain:
                        break
                # not matched
                else:
                    break

            # fully matched, add to result
            if not remain:
                ret.append([transaction_nums[i], transaction_nums[j]])
    return ret

print(findTheMasks(seq_to_find, main_list, transaction_nums))

输出是:

[[1, 6], [10, 15], [19, 22]]