将特定单词与列表中的字符串匹配。完全匹配而不是部分匹配

时间:2018-10-17 03:53:08

标签: python

delete = ["man", "eat"]

item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']

我的代码:

lst = []
for x in item_list:
    if not any(y in x for y in delete):
        lst.append([x, x])

print(lst)

但是,此方法将使我的输出变得非常麻烦。例如,如果我的删除包含delete = [“ man”,“ eat”],它与item_list中的单词“ eater”不相似,但是仍然可以使用,因为我使用了if if any(y IN x)这个“输入”将返回true,因为eat包含在eater内,但我想要的不是包含在单词内而是匹配项。我想将“食者”与“食者”和“人与人”相匹配,而不是“食者”与“食人”。

有没有办法完全匹配而不是部分匹配?我当前的代码部分匹配,当删除中有很多部分单词时,这是错误的。

4 个答案:

答案 0 :(得分:1)

然后您可以检查字符串的完全匹配:

    delete = ["man", "eat"]

    item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']


    lst = []
    for x in item_list:
        if not any(y == x for y in delete):
            lst.append([x, x])

    print(lst)


#  [['sharper_task|$none_venue|man', 'sharper_task|$none_venue|man'], ['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,']]

注意:or |运算符在'sharper_task|eater_venue|todo'之类的字符串中没有任何用途。

答案 1 :(得分:1)

您可以先使用|将字符串拆分为子字符串,然后再使用in运算符来测试delete中的项目是否在其中一个子字符串中,并与使用{ {1}}:

_

这将输出:

lst = [] for x in item_list: if not any(y in s.split('_') for s in x.split('|') for y in delete): lst.append([x, x]) print(lst)

答案 2 :(得分:0)

假设您要分割竖线字符,

delete = ["man", "eat"]

item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']

lst = [item 
       for item in item_list 
       if any(word in item.split('|') for word in delete)]

答案 3 :(得分:0)

尝试以下-

import re

del_list = ["man", "eat"]
regex = '|'.join([r'\b' + y + r'\b' for y in del_list])

item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']

lst = []
for x in item_list:
  if not re.search(regex, x):
      lst.append([x, x])

print(lst)

此输出-

[['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,']]

使用单个正则表达式而不是列表可确保每个“待删除”项目的匹配都不会将item_list元素引入到输出列表中,而先前的“待删除”项目已将其删除。

regex ='|'.join()-在这里,它使用带有(\ b)的原始(r'')字符串创建正则表达式,以匹配单词边界(由非字母数字字符标识)。进一步了解here

如果我们使用2个循环,其中一个用于del_list,另一个用于item_list,则输出将如下所示,我认为这是不正确的,因为“ man”列表由于“ eat”不匹配而仍然出现一次。其余即使与del_list之一都不匹配的项目也会出现两次-

[['sharper_task|$none_venue|man', 'sharper_task|$none_venue|man'], ['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,']]