如何查找句子中是否包含特定单词

时间:2020-01-16 18:04:01

标签: python-3.x fuzzy-logic fuzzywuzzy

enter image description here如何在Python中查找句子中是否包含特定单词?

我有两个文件,

播放器[文件1] 乔不喜欢踢足球 Kumar最喜欢的游戏是曲棍球 Mohit喜欢足球比赛 纳文不喜欢板球 萨钦曾是一名板球运动员 萨凡喜欢板球 Vinod喜欢篮子 安迪喜欢排球

游戏[文件2]

hockey

足球 足球 rick 蟋蟀 篮球

输出预期: 玩家游戏分数[%] 萨钦曾是一名板球选手板球100 乔不喜欢踢足球 Naveen不喜欢板球100 萨瓦人喜欢板球100 Vinod喜欢Basketb篮球160 库马尔最喜欢的游戏是曲棍球曲棍球100 安迪(Andy)喜欢排球null没有匹配项 Mohit喜欢足球游戏Soccer 100

分数被定义为“ len(游戏)/ len(匹配词)

如果同一位玩家进行了2场比赛,则得分最高。

像这样,我有超过10000条记录。

1 个答案:

答案 0 :(得分:1)

首先,您需要阅读播放器文件并将其分成句子

>>> with open ('testfiles/player.txt') as f:
...    sentences = []
...    for line in f:
...        sentences.append (line.strip ())
>>> sentences
['Sachin was a cricket player', 'Mohit likes soccer game', 'Kumar favourite game is hockey', "Joe doesn't like to play football"]

以不同的方式对Game进行相同操作,但将其转换为一个集合,以提高唯一性和效率:

>>> with open ('testfiles/games.txt') as f:
...    games = set ([line.strip () for line in f])
...
>>> games
{'hockey', 'crick', 'soccer', 'volleyball', 'badminton'}

现在,我们只需要在句子中查找关键字并到达下面的输出即可。

>>> game_score = {}
...game_found = set ()
...for sentence in sentences:
...    for game in games:
...        if game in sentence:
...            game_score.setdefault (game, [sentence, '100%'])  # Save game name as key and set sentence a list of value that include sentence and % matching
...            game_found.add (sentence)  # Save the game name that are found to be checked against the game name that isn't found
>>> game_score
{'hockey': ['Kumar favourite game is hockey', '100%'], 'crick': ['Sachin was a cricket player', '100%'], 'soccer': ['Mohit likes soccer game', '100%']}
>>> game_found
{'Mohit likes soccer game', 'Kumar favourite game is hockey', 'Sachin was a cricket player'}

将game_found与玩家的句子进行比较,并将未找到的游戏添加到game_score中:

>>> for i, sentence in enumerate (sentences):
...    if sentence not in game_found:
...        game_name = 'null-%d' % i  # Dictionary key cannot contain duplicate
...        game_score.setdefault (game_name, [sentence, 'No match'])
...
>>> game_score
{'hockey': ['Kumar favourite game is hockey', '100%'], 'crick': ['Sachin was a cricket player', '100%'], 'soccer': ['Mohit likes soccer game', '100%'], 'null-3': ["Joe doesn't like to play football", 'No match']}

最后,打印结果:

>>> print ('Output%sGame%sMatching Score' % (' ' * 35, ' ' * 10))
...for k in game_score:
...    spacing = 41 - len (game_score [k][0])
...    print ('%s%s%s%s%s' % (game_score [k][0], ' ' * spacing, k, ' ' * (55 - (len (game_score [k][0]) + spacing + len (k))), game_score [k][1]))
...
Output                                   Game          Matching Score
Kumar favourite game is hockey           hockey        100%
Sachin was a cricket player              crick         100%
Mohit likes soccer game                  soccer        100%
Joe doesn't like to play football        null-3        No match

您应该想出一种逻辑来处理具有多种运动的句子,例如“简同时玩曲棍球和足球。