计算单词和字符串的频率

时间:2019-02-11 13:15:30

标签: python

我需要计算句子中的单词数。我用

word_matrix[i][j] = sentences[i].count([*words_dict][j])

但是当一个单词包含在另一个单词中时(例如,“交互”中包含“ in”),它也会计算在内。如何避免呢?

4 个答案:

答案 0 :(得分:1)

您可以为此使用collections.Counter

from collections import Counter
s = 'This is a sentence'

Counter(s.lower().split())

# Counter({'this': 1, 'is': 1, 'a': 1, 'sentence': 1})

答案 1 :(得分:1)

您可以这样做:

sentence = 'this is a test sentence'
word_count = len(sentence.split(' '))

在这种情况下,word_count为5。

答案 2 :(得分:0)

根据情况,最有效的解决方案是使用collection.Counter,但您会错过所有带有符号的单词:
ininteractive(根据需要)不同,但也与in:不同。
考虑此问题的替代解决方案可能是计算RegEx的匹配模式:

import re

my_count = re.findall(r"(?:\s|^)({0})(?:[\s$\.,;:])".format([*words_dict][j]), sentences[i])
print(len(my_count))

RegEx在做什么?
对于给定的单词,您匹配:
相同的单词,其前面带有空格或行(\s|^)
然后在方括号([\s$\.,;:]中加上空格,行尾,点,逗号和任何符号

答案 3 :(得分:0)

使用split标记语句中的单词,然后使用逻辑(如果dict中存在单词),然后将该值加1,否则将count设为1的单词添加

paragraph='Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been' 
words=paragraph.split()
word_count={}
counter=0
for i in words:
    if i in word_count:
        word_count[i]+=1
    else:
        word_count[i]=1

print(word_count)