以下是我的代码:
(nth x list)
这打印出如下内容:
for question in questions:
print('Processing ' + str( question))
counts = Counter(dataset_final[str(question)])
print(counts)
我想获得相对单词频率,所以我想做类似的事情:
Processing 1
Counter({'would': 18, 'think': 12, 'patient': 11, 'condition': 11, 'might': 10, 'increased': 1})
Processing 2
Counter({'cancer': 32, 'condition': 22, 'prostate': 20, 'educational': 1})
但是我收到了一个错误:
for question in questions:
print("Processing " + str(question))
counts = Counter(dataset_final[str(question)])
length = len(dataset_final[str(question)])
print(counts/length)
我该怎么做?
编辑:我的意思是相对单词频率,而非正常化
答案 0 :(得分:0)
让:
count = collections.Counter({'would': 18, 'think': 12, 'patient': 11, 'condition': 11, 'might': 10, 'increased': 1})
您可以使用列表解析来标准化值:
normalized_count = {w:c/sum(count.values()) for w,c in count.items()}
每个字数的数量除以字总数。
输出:
{'would': 0.2857142857142857, 'think': 0.19047619047619047, 'patient': 0.1746031746031746, 'condition': 0.1746031746031746, 'might': 0.15873015873015872, 'increased': 0.015873015873015872}