按给定顺序搜索子字符串

时间:2015-03-09 16:10:03

标签: python-3.x

我想编写一个函数来按照给定子串的顺序在一个或多个字符串中搜索给定的子字符串。例如,如果子串是

  

"所述" "新" "存储"

和字符串

  

"新店在旧金山","男孩进入一个新的和酷   商店","一个新男孩进入商店","商店里有牛顿"

然后该函数将只匹配第1,第2和第4个句子,因为它们的子字符串是特定的顺序,而第3个句子的顺序是错误的。

2 个答案:

答案 0 :(得分:1)

您可以使用regular expressions.+匹配除\n以外的1个或多个字符。 .*匹配0个或更多字符。

import re

l = ["the new store is in san francisco", "the boy enters a new and cool store", "a new boy enters the store",
     "there is newton in the store"]

for i in l:
    m = re.search(r'the.+new.+store.*', i)
    if m:
        print(m.group())

答案 1 :(得分:1)

关键思想是对每个子字符串使用string.index(sub,start),最初从0开始,并在找到的每个子字符串后递增。如果这是作业,也许即使没有,你应该在阅读我的答案之前尝试自己编写subs_in_strings(subs,strings)。

...

def subs_in_strs(subs, strings):
    '''Yield strings that contain subs in order, without overlap.

    strings: iterable of strings
    subs: reiterable sequence of substrings
    '''
    for string in strings:
        dex = 0
        try:
            for sub in subs:
                dex = string.index(sub, dex) + len(sub)
            yield string
        except ValueError:
            pass

你的考试

for s in subs_in_strs(('the', 'new', 'store'),
        ("the new store is in san francisco",
         "the boy enters a new and cool store",
         "a new boy enters the store",
         "there is newton in the store",)):
    print(s)

第3句不打印。测试重叠条件:

for s in subs_in_strs(('sent', 'tense'),
        ('sent a tense note',
         'mispelled sentense',
         'another senttense')):
    print(s)

第2个被省略,第3个没有,按照规范。