Python - 交叉字典列表

时间:2017-09-03 03:53:44

标签: python dictionary intersection

我有这个词典列表:

artist_and_tags = [{u'Yo La Tengo': ['indie', 'indie rock', 'seen live', 'alternative', 'indie pop', 'rock', 'post-rock', 'dream pop', 'shoegaze', 'noise pop', 'folk', 'experimental', 'alternative rock', 'american', 'lo-fi', 'pop', 'new jersey', 'yo la tengo', 'usa', 'noise rock', '90s', 'noise', '00s', 'ambient', 'post-punk', '80s', 'mellow', 'psychedelic', 'hoboken', 'experimental rock', 'singer-songwriter', 'post rock', 'electronic', 'female vocalists', 'alt-country', 'dreamy', 'matador', 'chillout', 'instrumental', 'favorites', 'punk', 'electronica', 'slowcore', 'folk rock', 'new wave', 'jazz', 'eclectic', 'new york', 'emo']}, {u'Radiohead': ['alternative', 'alternative rock', 'rock', 'indie', 'electronic', 'seen live', 'british', 'britpop', 'indie rock', 'experimental', 'radiohead', 'progressive rock', '90s', 'electronica', 'art rock', 'experimental rock', 'post-rock', 'psychedelic', 'uk', 'male vocalists', 'pop', '00s', 'ambient', 'chillout', 'progressive', 'favorites', 'melancholic', 'awesome', 'overrated', 'english', 'beautiful', 'classic rock', 'genius', 'melancholy', 'better than radiohead', 'trip-hop', 'idm', 'indie pop', 'emo']}, {u'Portishead': ['trip-hop', 'electronic', 'female vocalists', 'chillout', 'trip hop', 'alternative', 'electronica', 'seen live', 'downtempo', 'british', 'indie', 'portishead', 'experimental', 'ambient', 'female vocalist', 'alternative rock', '90s', 'lounge', 'mellow', 'bristol', 'jazz', 'psychedelic', 'chill', 'melancholic', 'triphop', 'uk', 'rock', 'bristol sound', 'acid jazz', 'lo-fi']}]

我用它来获得艺术家之间的相关性。

为此,我正在做:

tags0 = set(artist_and_tags[0].values()[0])
tags1 = set(artist_and_tags[1].values()[0])
tags2 = set(artist_and_tags[2].values()[0])

然后:

intersection1 = tags0 & tags1
intersection2 = tags0 & tags2
intersection3 = tags1 & tags2

这样:

print (intersection1, len(intersection1), intersection2, len(intersection), intersection3, len(intersection3))

告诉我" Yo La Tengo"更接近" Radiohead"比#34; Portishead",有20个相交的标签。

这段代码似乎有点多余,但是......

问题:

有没有办法在for loop中使用此逻辑(或包含在简单function中),因此它适用于包含n艺术家的字典(keys )?

1 个答案:

答案 0 :(得分:1)

您可以使用itertools.combinations

import itertools
import collections

ArtistTags = collections.namedtuple('ArtistTags', ('name', 'tags'))
tags = (ArtistTags(artist, set(tags))
        for artists_dict in artist_and_tags
        for artist, tags in artists_dict.items())
artist_pairings = itertools.combinations(tags, 2)
intersections = ((len(a.tags & b.tags), a, b) for a, b in artist_pairings)
for n, a, b in sorted(intersections, reverse=True):
    print(n, a.name, b.name)

输出:

20 Yo La Tengo Radiohead
16 Yo La Tengo Portishead
16 Radiohead Portishead