计算平均每句话数

时间:2017-02-09 18:17:02

标签: python math split nlp counting

我在尝试计算每个句子的单词数时遇到了一些麻烦。就我而言,我假设句子只以"!""?""."

结尾

我有一个如下所示的列表:

["Hey, "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]

对于上面的示例,计算结果为1 + 3 + 5 / 3。不过,我很难实现这一目标!有什么想法吗?

3 个答案:

答案 0 :(得分:3)

words = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]

sentences = [[]]
ends = set(".?!")
for word in words:
    if word in ends: sentences.append([])
    else: sentences[-1].append(word)

if sentences[0]:
    if not sentences[-1]: sentences.pop()
    print("average sentence length:", sum(len(s) for s in sentences)/len(sentences))

答案 1 :(得分:3)

一个简单的解决方案:

mylist = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
terminals = set([".", "?", "!"]) # sets are efficient for "membership" tests
terminal_count = 0

for item in mylist:
    if item in terminals: # here is our membership test
        terminal_count += 1

avg = (len(mylist) - terminal_count)  / float(terminal_count)

这假设您只关心获得平均值,而不是每个句子的个人数。

如果您想要有点花哨,可以用以下内容替换for循环:

terminal_count = sum(1 for item in mylist if item in terminals)

答案 2 :(得分:1)

使用re.split()sum()函数的简短解决方案:

import re
s = "Hey ! How are you ? I would like a sandwich ."
parts = [len(l.split()) for l in re.split(r'[?!.]', s) if l.strip()]

print(sum(parts)/len(parts))

输出:

3.0

如果只有一个单词列表作为输入:

import re
s = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
parts = [len(l.split()) for l in re.split(r'[?!.]', ' '.join(s)) if l.strip()]

print(sum(parts)/len(parts))   # 3.0
相关问题