我正在研究这个代码,该代码采用dictorainary将页面数字映射到该页面上的单词并将其反转,以便创建一个新的有序字典,将每个唯一单词映射到单词出现的所有页面
例如inpout:
words_on_page = {1: ['hi', 'there', 'fred'], 2: ['there', 'we', 'go'], 3: ['fred', 'was', 'there']}
.....应该以:
返回{'hi':[1], 'fred':[1, 3], 'there': [1, 2, 3], 'we' :[2], 'go': [2], 'was': [3]}
我的解决方案到目前为止反转了字典,但它使得关键是该页面上的每个字都映射到页码。我需要一些如何拆分键中的单词并将它们映射到它们出现的所有页面的列表
def make_index(words_on_page):
"""returnings inverse dictionarty mapping from a word (key) to an
ordered list of pages on which that word appears"""
inverted = {}
for page, word in words_on_page.items():
word = str(word)
if word in inverted:
inverted[word].append(page)
else:
inverted[word] = [page]
return inverted
答案 0 :(得分:1)
我用以下内容回答了解决方案(只需要添加另一个迭代)
for page, words in words_on_page.items():
for word in words:
if word in inverted:
inverted[word].append(page)
else:
inverted[word] = [page]
return inverted
答案 1 :(得分:1)
您可以使用if
删除dict.setdefault
支票:
o = dict()
for k, v in words_on_page.items():
for i in v:
o.setdefault(i, []).append(k)
print(o)
{'fred': [1, 3],
'go': [2],
'hi': [1],
'there': [1, 2, 3],
'was': [3],
'we': [2]}
您还可以使用defaultdict
:
from collections import defaultdict
o = defaultdict(list)
for k, v in words_on_page.items():
o.update({y : o[y] + [x] for x, y in zip([k] * len(v), v)})
print(dict(o))
{'fred': [1, 3],
'go': [2],
'hi': [1],
'there': [1, 2, 3],
'was': [3],
'we': [2]}
答案 2 :(得分:1)
只是为了好玩,一只大熊猫“单线”解决方案:
import pandas as pd
words_on_page = {1: ['hi', 'there', 'fred'],
2: ['there', 'we', 'go'], 3: ['fred', 'was', 'there']}
def make_index(words_on_page):
return pd.DataFrame(words_on_page.items(), columns=["page", "word"]) \
.set_index("page")["word"].apply(pd.Series).stack().reset_index() \
.drop("level_1",1).groupby(0)["page"].unique().apply(list).to_dict()
print make_index(words_on_page)
返回
{'we': [2], 'there': [1, 2, 3], 'fred': [1, 3], 'hi': [1], 'go': [2], 'was': [3]}
答案 3 :(得分:0)
你可以试试这个:
from itertools import chain
words_on_page = {1: ['hi', 'there', 'fred'], 2: ['there', 'we', 'go'], 3: ['fred', 'was', 'there']}
final_dict = {i:[a for a, b in words_on_page.items() if i in b] for i in chain.from_iterable(words_on_page.values())}
输出:
{'we': [2], 'there': [1, 2, 3], 'fred': [1, 3], 'hi': [1], 'go': [2], 'was': [3]}