在正确的位置加入分词和标点符号和标点符号

时间:2013-04-11 13:55:39

标签: python join split punctuation

所以我在将字符串拆分为单词和标点符号后尝试使用join(),但它将字符串与标点符号之间的空格连接起来。

b = ['Hello', ',', 'who', 'are', 'you', '?']
c = " ".join(b)

但是返回:
c = 'Hello , who are you ?'

我希望:
c = 'Hello, who are you?'

4 个答案:

答案 0 :(得分:1)

你可以先加入标点符号:

def join_punctuation(seq, characters='.,;?!'):
    characters = set(characters)
    seq = iter(seq)
    current = next(seq)

    for nxt in seq:
        if nxt in characters:
            current += nxt
        else:
            yield current
            current = nxt

    yield current

c = ' '.join(join_punctuation(b))

join_punctuation生成器生成的字符串包含以下标点符号:

>>> b = ['Hello', ',', 'who', 'are', 'you', '?']
>>> list(join_punctuation(b))
['Hello,', 'who', 'are', 'you?']
>>> ' '.join(join_punctuation(b))
'Hello, who are you?'

答案 1 :(得分:1)

在得到结果之后执行此操作,而不是已满,但有效...

c = re.sub(r' ([^A-Za-z0-9])', r'\1', c)

输出:

c = 'Hello , who are you ?'
>>> c = re.sub(r' ([^A-Za-z0-9])', r'\1', c)
>>> c
'Hello, who are you?'
>>> 

答案 2 :(得分:1)

可能是这样的:

>>> from string import punctuation
>>> punc = set(punctuation) # or whatever special chars you want
>>> b = ['Hello', ',', 'who', 'are', 'you', '?']
>>> ''.join(w if set(w) <= punc else ' '+w for w in b).lstrip()
'Hello, who are you?'

这会在b中的单词之前添加一个空格,而这些单词并非完全由标点符号组成。

答案 3 :(得分:0)

如何abt

c = " ".join(b).replace(" ,", ",")