我正在尝试编写一个程序,该程序读取一个对特殊字符和单词进行计数的段落
我的意见:
list words ="'He came,"
words = list words. partition("'")
for i in words:
list-1. extend(i.split())
print(list-1)
我的输出如下:
["'", 'He', 'came,']
但我想要
["'", 'He', 'came', ',']
任何人都可以帮我解决这个问题吗?
答案 0 :(得分:0)
我正在尝试编写一个程序,该程序读取一个对特殊字符和单词进行计数的段落
让我们关注目标,而不是你的方法。你的方法可能是可能的,但它可能需要一堆分裂,所以让我们暂时忽略它。使用re.findall
和冗长的filter
ed正则表达式可以更好地工作。
lst = re.findall(r"\w+|[^\w\s]", some_sentence)
会有意义的。分解它确实:
pat = re.compile(r"""
\w+ # one or more word characters
| # OR
[^\w\s] # exactly one character that's neither a word character nor whitespace
""", re.X)
results = pat.findall('"Why, hello there, Martha!"')
# ['"', 'Why', ',', 'hello', 'there', ',', 'Martha', '!', '"']
然而,你必须经历列表的另一次迭代来计算特殊字符!那么,让它们分开吧。幸运的是,这很容易 - 只需添加捕获括号。
new_pat = re.compile(r"""
( # begin capture group
\w+ # one or more word characters
) # end capturing group
| # OR
( # begin capture group
[^\w\s] # exactly one character that's neither a word character nor whitespace
) # end capturing group
""", re.X)
results = pat.findall('"Why, hello there, Martha!"')
# [('', '"'), ('Why', ''), ('', ','), ('hello', ''), ('there', ''), ('', ','), ('Martha', ''), ('', '!'), ('', '"')]
grouped_results = {"words":[], "punctuations":[]}
for word,punctuation in results:
if word:
grouped_results['words'].append(word)
if punctuation:
grouped_results['punctuations'].append(punctuation)
# grouped_results = {'punctuations': ['"', ',', ',', '!', '"'],
# 'words': ['Why', 'hello', 'there', 'Martha']}
然后只计算你的字典键。
>>> for key in grouped_results:
print("There are {} items in {}".format(
len(grouped_results[key]),
key))
There are 5 items in punctuations
There are 4 items in words