Python找到最接近的匹配句子

时间:2013-01-27 11:26:32

标签: python

我正在尝试从专辑中获取曲目列表(歌曲),对于给定的曲目,我希望得到所有匹配相似的曲目。我已经提到了下面的例子,关于如何在python中继续这个的任何想法?看起来像difflib.get_close_matches只适用于单个单词而不是句子。

示例:(找到包含字符串'环游世界'的任何内容

tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']

输出:

 Around The World (La La La La La) (Radio Version)
 Around The World (La La La La La) (Alternative Radio Version)
 Around The World (La La La La La) (Acoustic Mix)
 Around The World (La La La La La) (Rüegsegger#Wittwer Club Mix)

4 个答案:

答案 0 :(得分:6)

difflib.get_close_matches可以使用字符串(单个单词除外)。在这种情况下,您需要降低截止值(默认值为0.6),并提高n,即最大匹配数:

In [19]: import difflib

In [20]: tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']

In [21]: difflib.get_close_matches('Around the world', tracks, n = 4,cutoff = 0.3)
Out[21]: 
['Around The World (La La La La La) (Acoustic Mix)',
 'Around The World (La La La La La) (Radio Version)',
 'Around The World (La La La La La) (Alternative Radio Version)',
 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)']

答案 1 :(得分:2)

filter(lambda x: 'Around The World' in x, tracks)

这会为您提供名称中包含'Around The World'的歌曲列表。如果您使用的是Python 3,请将其强制转换为列表(list(filter(...))),因为它会返回filter个对象。

如果可能存在拼写错误,那么我无法帮助你。

答案 2 :(得分:0)

您可以为此目的提出get_matching_blocks SequenceMatcher

>>> from pprint import PrettyPrinter
>>> from difflib import SequenceMatcher
>>> pp = PrettyPrinter(indent = 4)
>>> pp.pprint(tracks)
[   'World In Motion',
    'With You',
    'Why Oh Why',
    'Thinking Of You',
    'My Heart Beats Like A Drum (Dam Dam Dam)',
    'Mistake No. 2',
    'Love Is Blind',
    'Lonesome Suite',
    'Let Me Come & Let Me Go',
    'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
    'Around The World (La La La La La) (Radio Version)',
    'Around The World (La La La La La) (Alternative Radio Version)',
    'Around The World (La La La La La) (Acoustic Mix)']
>>> seq = ((e, SequenceMatcher(None, 'Around the world', e).get_matching_blocks()[0]) for e in tracks)
>>> seq = [k for k, _ in sorted(seq, key = lambda e:e[-1].size, reverse = True)]
>>> pp.pprint(seq)
[   'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
    'Around The World (La La La La La) (Radio Version)',
    'Around The World (La La La La La) (Alternative Radio Version)',
    'Around The World (La La La La La) (Acoustic Mix)',
    'World In Motion',
    'With You',
    'Thinking Of You',
    'Why Oh Why',
    'My Heart Beats Like A Drum (Dam Dam Dam)',
    'Mistake No. 2',
    'Love Is Blind',
    'Lonesome Suite',
    'Let Me Come & Let Me Go']
>>> 

答案 3 :(得分:-1)

你可以这样做。

temp = "Around The World (La La La La La)"

for string in fh.readlines():
    if temp in string:
       print temp

如果它与您正在阅读的文件中的临时值匹配,则会打印出来。

或者您可以使用正则表达式进行匹配。