Question

如何使用FreqDist中的fd.items（）来总结单词频率？

>>> fd = FreqDist(text) 
>>> most_freq_w = fd.keys()[:10] #gives me the most 10 frequent words in the text
>>> #here I should sum up numbers of each of these 10 freq words appear in the text

e.g。如果most_freq_w中的每个单词出现10次，则结果应为100

!!! 我不需要文本中所有单词的数量，只需要10个最常用的单词

Answer 1

我不熟悉nltk，但由于FreqDist来自dict，因此以下内容应该有效：

v = fd.values()
v.sort()
count = sum(v[-10:])

Answer 2

要查找单词在语料库中显示的次数（您的文本）：

raw="<your file>"
tokens = nltk.word_tokenize(raw)
fd = FreqDist(tokens)
print fd['<your word here>']

Answer 3

它具有漂亮的打印功能

会做到的。

Answer 4

如果FreqDist是单词到其频率的映射：

sum(map(fd.get, most_freq_w))

使用FreqDist，python总结字数

4 个答案: