如何找到相对字频?

时间:2018-03-26 19:08:26

标签: python

以下是我的代码:

(nth x list)

这打印出如下内容:

for question in questions:
    print('Processing ' + str( question))
    counts = Counter(dataset_final[str(question)])
    print(counts)

我想获得相对单词频率,所以我想做类似的事情:

Processing 1
Counter({'would': 18, 'think': 12, 'patient': 11, 'condition': 11, 'might': 10, 'increased': 1})

Processing 2
Counter({'cancer': 32, 'condition': 22, 'prostate': 20, 'educational': 1})

但是我收到了一个错误:

for question in questions:
    print("Processing " + str(question))
    counts = Counter(dataset_final[str(question)])
    length = len(dataset_final[str(question)])
    print(counts/length)

我该怎么做?

编辑:我的意思是相对单词频率,而非正常化

1 个答案:

答案 0 :(得分:0)

让:

 count = collections.Counter({'would': 18, 'think': 12, 'patient': 11, 'condition': 11, 'might': 10, 'increased': 1})

您可以使用列表解析来标准化值:

 normalized_count = {w:c/sum(count.values()) for w,c in count.items()}

每个字数的数量除以字总数。

输出:

{'would': 0.2857142857142857, 'think': 0.19047619047619047, 'patient': 0.1746031746031746, 'condition': 0.1746031746031746, 'might': 0.15873015873015872, 'increased': 0.015873015873015872}