Question

我用Python编写程序。用户输入一条短信。有必要检查此消息中是否有单词序列。样品。消息：“你好，世界，我的朋友。”。检查以下两个单词的顺序：“ Hello”，“ world”。结果是“正确”。但是，当检查消息中这些单词的顺序时：“您好，美丽的世界”的结果是“假”。当您只需要检查两个单词的存在时，就可以像我在代码中所做的那样进行检查，但是当5个或更多单词的组合很困难时。有什么小办法可以解决这个问题吗？

$ awk -F'[[:space:];=]+' '
    {delete f; f["conf"]="\t"; for (i=5; i<NF; i+=2) f[$i]=$(i+1); print $0, f["conf"]}
' file
A   10  20  bob.1   ID=bob.1;Parent=bob;conf=XF;Note=bob_v1 XF
A   20  30  bob.2   ID=bob.2;Parent=bob;Note=bob_v1;conf=XF XF

Answer 1

直接的方法是使用循环。将您的消息拆分成单个单词，然后检查一般一词中的每个单词。

word_list = message.split()     # this gives you a list of words to find
word_found = True
for word in word_list:
    if word not in message2:
        word_found = False

print(word_found)

如果在句子中找到所有单词，则标记word_found为True。有很多方法可以使此操作更短，更快，特别是使用all运算符，并以内嵌表达式形式提供单词列表。

word_found = all(word in message2 for word in message.split())

现在，如果您需要将“找到的”属性限制为与确切的单词匹配，则需要更多的预处理。上面的代码太宽容了子字符串，例如找到“你还好吗？”这句话“你的笑话简直太搞笑了”。对于限制性更强的情况，您应该将message2分成单词，将标点符号的单词去除，将它们放小写（以使匹配更容易），然后查找每个单词（从message开始） message2中单词的列表中。

你能从那里拿走吗？

Answer 2

我不知道您是否真正需要它，但是可以测试

message= 'hello world'
message2= ' hello beautiful world' 
if 'hello' in message and 'world'  in message :
  print('yes')
else :
  print('no')
if   'hello' in message2 and 'world'  in message2 :
  print('yes')

输出：是是的

Answer 3

我将首先澄清您的要求：

忽略大小写
连续序列
以任意顺序匹配，例如排列或字谜
支持重复的单词

如果数量不是太大，您可以尝试这种简单易懂但不是最快的方法。

拆分短信中的所有单词
加入' '
列出单词的所有排列并也将它们与' '合并，对于例如，如果您要检查['Hello', 'beautiful', 'world']的顺序。排列将为'Hello beautiful world'， 'Hello world beautiful'，'beautiful Hello world' ...等等。
，您可以找到是否存在一个排列，例如 'hello beautiful world'在其中。

示例代码在这里：

import itertools
import re

# permutations brute-force, O(nk!)
def checkWords(text, word_list):
    # split all words without space and punctuation
    text_words= re.findall(r"[\w']+", text.lower())

    # list all the permutations of word_list, and match
    for words in itertools.permutations(word_list):
        if ' '.join(words).lower() in ' '.join(text_words):
            return True
    return False

    # or use any, just one line
    # return any(' '.join(words).lower() in ' '.join(text_words) for words in list(itertools.permutations(word_list)))
def test():
    # True
    print(checkWords('Hello world, my friend.', ['Hello', 'world', 'my']))
    # False
    print(checkWords('Hello, beautiful world', ['Hello', 'world']))
    # True
    print(checkWords('Hello, beautiful world Hello World', ['Hello', 'world', 'beautiful']))
    # True
    print(checkWords('Hello, beautiful world Hello World', ['Hello', 'world', 'world']))

但是当单词数很大时，它会花费很多，k个单词会生成k！排列，时间复杂度为O（nk！）。

我认为sliding window是更有效的解决方案。时间复杂度将降低为O（n）：

import itertools
import re
import collections

# sliding window, O(n)
def checkWords(text, word_list):
    # split all words without space and punctuation
    text_words = re.findall(r"[\w']+", text.lower())
    counter = collections.Counter(map(str.lower, word_list))
    start, end, count, all_indexes = 0, 0, len(word_list), []

    while end < len(text_words):
        counter[text_words[end]] -= 1
        if counter[text_words[end]] >= 0:
            count -= 1
        end += 1

        # if you want all the index of match, you can change here
        if count == 0:
            # all_indexes.append(start)
            return True

        if end - start == len(word_list):
            counter[text_words[start]] += 1
            if counter[text_words[start]] > 0:
                count += 1
            start += 1

    # return all_indexes
    return False

检查句子中的单词

3 个答案: