在python中对多个特殊字符进行分区

时间:2014-10-21 14:00:59

标签: python python-3.x

我正在尝试编写一个程序,该程序读取一个对特殊字符和单词进行计数的段落

我的意见:

list words ="'He came,"
words = list words. partition("'")

for i in words:
    list-1. extend(i.split())

print(list-1)

我的输出如下:

["'", 'He', 'came,']

但我想要

["'", 'He', 'came', ',']

任何人都可以帮我解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

  

我正在尝试编写一个程序,该程序读取一个对特殊字符和单词进行计数的段落

让我们关注目标,而不是你的方法。你的方法可能是可能的,但它可能需要一堆分裂,所以让我们暂时忽略它。使用re.findall和冗长的filter ed正则表达式可以更好地工作。

lst = re.findall(r"\w+|[^\w\s]", some_sentence)

会有意义的。分解它确实:

pat = re.compile(r"""
    \w+        # one or more word characters
    |          #   OR
    [^\w\s]    # exactly one character that's neither a word character nor whitespace
    """, re.X)

results = pat.findall('"Why, hello there, Martha!"')
# ['"', 'Why', ',', 'hello', 'there', ',', 'Martha', '!', '"']

然而,你必须经历列表的另一次迭代来计算特殊字符!那么,让它们分开吧。幸运的是,这很容易 - 只需添加捕获括号。

new_pat = re.compile(r"""
    (          # begin capture group
        \w+        # one or more word characters
    )          # end capturing group
    |          #   OR
    (          # begin capture group
        [^\w\s]    # exactly one character that's neither a word character nor whitespace
    )          # end capturing group
    """, re.X)

results = pat.findall('"Why, hello there, Martha!"')
# [('', '"'), ('Why', ''), ('', ','), ('hello', ''), ('there', ''), ('', ','), ('Martha', ''), ('', '!'), ('', '"')]

grouped_results = {"words":[], "punctuations":[]}

for word,punctuation in results:
    if word:
        grouped_results['words'].append(word)
    if punctuation:
        grouped_results['punctuations'].append(punctuation)
# grouped_results = {'punctuations': ['"', ',', ',', '!', '"'],
#                    'words': ['Why', 'hello', 'there', 'Martha']}

然后只计算你的字典键。

>>> for key in grouped_results:
        print("There are {} items in {}".format(
            len(grouped_results[key]),
            key))

There are 5 items in punctuations
There are 4 items in words
相关问题