Question

我有以下脚本来检查字符串是否包含列表项：

word = ['one',
        'two',
        'three']
string = 'my favorite number is two'
if any(word_item in string.split() for word_item in word):
    print 'string contains a word from the word list: %s' % (word_item)

这有效，但我试图打印字符串包含的列表项。我做错了什么？

Answer 1

问题是您使用的是if语句而不是for语句，因此您的print仅运行（最多）一次（如果至少有一个字）匹配），并且在那时，any已经贯穿整个循环。

这是做你想做的最简单的方法：

words = ['one',
         'two',
         'three']
string = 'my favorite number is two'
for word in words:
    if word in string.split():
        print('string contains a word from the word list: %s' % (word))

如果您希望此功能出于某种原因，您可以这样做：

for word in filter(string.split().__contains__, words):
    print('string contains a word from the word list: %s' % (word))

由于某个人必然会回答与性能相关的答案，即使这个问题与性能无关，将字符串拆分一次会更有效，并且取决于您要检查的字数，转换它到set也可能有用。

关于评论中的问题，如果你想要多个单词＆＃34;单词＆＃34;，有两个简单的选项：添加空格然后搜索完整字符串中的单词，或者带有单词边界的正则表达式

最简单的方法是在文本之前和之后添加空格字符进行搜索，然后搜索' ' + word + ' '：

phrases = ['one',
           'two',
           'two words']
text = "this has two words in it"

for phrase in phrases:
    if " %s " % phrase in text:
        print("text '%s' contains phrase '%s'" % (text, phrase))

对于正则表达式，只需使用\b字边界：

import re

for phrase in phrases:
    if re.search(r"\b%s\b" % re.escape(phrase), text):
        print("text '%s' contains phrase '%s'" % (text, phrase))

哪一个是更好的＆＃34;很难说，但正则表达式可能效率显着降低（如果这对你很重要）。

如果你不关心单词边界，你可以这样做：

phrases = ['one',
           'two',
           'two words']
text = "the word 'tone' will be matched, but so will 'two words'"

for phrase in phrases:
    if phrase in text:
        print("text '%s' contains phrase '%s'" % (text, phrase))

Answer 2

set(word).intersection(string.split())

Answer 3

如果你有一个像'ninety five'这样的单词，你可以拆分该单词并检查所有单词与字符串中的一组单词相交：

words = ['one',
        'two',
        'three', "fifty ninety"]
string = set('my favorite number is two fifty five'.split())

for word in words:
    spl = word.split()
    if len(spl) > 1:
        if all(string.intersection([w]) for w in spl):
            print(word)
    elif string.intersection([word]):
        print(word)

它也会为ninety five返回True，因此您需要决定是否可行，但对单个单词使用intersection会很有效。确保将字符串包装在列表或元组中，或"foo"将成为{"f","o"}

您也可以使用set.issuperset代替all：

for word in words:
    spl = word.split()
    if len(spl) > 1:
        if string.issuperset(spl):
            print(word)
    elif string.intersection([word]):
        print(word)

Answer 4

您可以使用set交叉点：

word = ['one', 'two', 'three']
string = 'my favorite number is two'
co_occuring_words = set(word) & set(string.split())
for word_item in co_occuring_words:
    print 'string contains a word from the word list: %s' % (word_item)

检查字符串是否包含列表项

4 个答案: