用字符串中的单词替换单词

时间:2015-02-06 10:35:40

标签: python string replace

我有一个如下词典

word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}

我有一个这样的字符串:

data = "It's winter not summer. Have a nice day"

我想要做的是在a by a1中替换单词winter by colddata等。我确实尝试使用以下代码:

for word in word_dict:
    data = data.replace(word, word_dict[word])

但它失败了,因为它取代了子串(data的子串,而不是单词)。事实上,Have一词被Ha1ve替换。

结果应为:

data = "It's cold not hot. Have a1 nice day"

4 个答案:

答案 0 :(得分:4)

您可以使用re.sub\b在单词字符和非单词字符之间匹配的单词边界。我们需要使用单词边界来匹配精确的单词字符串,否则它也会匹配a中的day

>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> for word in word_dict:
        data = re.sub(r'\b'+word+r'\b', word_dict[word], data)


>>> data
"It's cold not hot. Have a1 nice day"

答案 1 :(得分:1)

除了正则表达式之外,还有多种方法可以实现这一目标:

ldata = data.split(' ') #splits by whitespace characters
res = []
for i in ldata:
    if i in word_dict:
        res.append(word_dict[i])
    else:
        res.append(i)
final = ' '.join(res)

正则表达式解决方案更实用,更符合您的需求,但list.split()和string.join()方法有时会派上用场。 :)

答案 2 :(得分:1)

使用split与dict.get并在" "上拆分以保持正确的间距:

from string import punctuation

print(" ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")]))
It's cold not hot. Have a1 nice day

我们还需要删除标点符号,以便summer.匹配summer等...

有些时间表明,即使拆分和剥离,非正则表达式的方法仍然快两倍:

In [18]: %%timeit                                                              data = "It's winter not summer. Have a nice day"
for word in word_dict:
        data = re.sub(r'\b'+word+r'\b', word_dict[word], data)
   ....: 
100000 loops, best of 3: 12.2 µs per loop

In [19]: timeit " ".join([word_dict.get(x.rstrip(punctuation), x) for x in data.split(" ")])
100000 loops, best of 3: 5.52 µs per loop

答案 3 :(得分:0)

您可以在join()功能中使用生成器:

>>> word_dict = {'a': 'a1', 'winter': 'cold', 'summer': 'hot'}
>>> data = "It's winter not summer. Have a nice day"
>>> ' '.join(word_dict[j] if j in word_dict else j for j in data.split())
"It's cold not summer. Have a1 nice day"

分割您可以在其单词中搜索的数据,然后使用简单的理解来替换特定的单词。