替换python中与正则表达式匹配的单词

时间:2015-01-08 17:13:00

标签: python regex output nltk ontology

import re
replacement_patterns = [
(r'won\'t', 'will not'),
(r'can\'t', 'cannot'),
(r'i\'m', 'i am'),
(r'ain\'t', 'is not'),
(r'(\w+)\'ll', '\g<1> will'),
(r'(\w+)n\'t', '\g<1> not'),
(r'(\w+)\'ve', '\g<1> have'),
(r'(\w+)\'s', '\g<1> is'),
(r'(\w+)\'re', '\g<1> are'),
(r'(\w+)\'d', '\g<1> would')
 ]
class RegexpReplacer(object):

   def __init__(self, patterns=replacement_patterns):
      self.patterns = [(re.compile(regex), repl) for (regex, repl)          
                      in pattern]
   def replace(self, text):
      s = text
      for (pattern, repl) in self.patterns:
          (s, count) = re.subn(pattern, repl, s)
   return s


 rep=RegexpReplacer()
 print rep.replace("can't is a contradicton")

我已经使用Jacob Perkins的NLTK 2.0 Cookbook从Python文本处理中复制了这段代码

但是我的预期输出是: 不能是一个矛盾

实际输出是: 不可能是一个矛盾

我无法查明t

中的错误

2 个答案:

答案 0 :(得分:2)

你的代码有一些缩进问题和拼写错误 - 我不太确定解释器是如何给你任何输出的。修好后,我得到了你预期的输出。

import re
replacement_patterns = [
(r'won\'t', 'will not'),
(r'can\'t', 'cannot'),
(r'i\'m', 'i am'),
(r'ain\'t', 'is not'),
(r'(\w+)\'ll', '\g<1> will'),
(r'(\w+)n\'t', '\g<1> not'),
(r'(\w+)\'ve', '\g<1> have'),
(r'(\w+)\'s', '\g<1> is'),
(r'(\w+)\'re', '\g<1> are'),
(r'(\w+)\'d', '\g<1> would')
 ]
class RegexpReplacer(object):

   def __init__(self, patterns=replacement_patterns):

      # Fixed this line - "patterns", not "pattern"
      self.patterns = [(re.compile(regex), repl) for (regex, repl) in patterns]

   def replace(self, text):
      s = text
      for (pattern, repl) in self.patterns:
          (s, count) = re.subn(pattern, repl, s)

      # Fixed indentation here
      return s


rep=RegexpReplacer()
print rep.replace("can't is a contradicton")

答案 1 :(得分:0)

使用原始字符串来转义引号,但不能同时使用两者。

>>> print r'won\'t'
won\'t
>>> print 'won\'t'
won't

或者,如果您更喜欢原始字符串:

>>> print r"won't"
won't