用其他文本+相同的数字替换模式中的通配符号

时间:2015-04-17 19:32:49

标签: python regex

我需要在这个特定模式中找到大文本字符串的所有部分:

"\t\t" + number (between 1-999) + "\t\t" 

然后用:

替换每个匹配项
TEXT+"\t\t"+same number+"\t\t" 

所以,最终的结果是:

  

'TEXT \ t \ t24 \ t \ tblah blah blahTEXT \ t \ ttt \ t \ t \ t'...等等......

各种数字介于1-999之间,因此它需要某种通配符。

有人可以告诉我该怎么做吗?谢谢!

1 个答案:

答案 0 :(得分:0)

您需要使用Python的re库,特别是re.sub函数:

import re  # re is Python's regex library
SAMPLE_TEXT = "\t\t45\t\tbsadfd\t\t839\t\tds532\t\t0\t\t"  # Test text to run the regex on

# Run the regex using re.sub (for substitute)
# re.sub takes three arguments: the regex expression,
# a function to return the substituted text,
# and the text you're running the regex on.

# The regex looks for substrings of the form:
# Two tabs ("\t\t"), followed by one to three digits 0-9 ("[0-9]{1,3}"),
# followed by two more tabs.

# The lambda function takes in a match object x,
# and returns the full text of that object (x.group(0))
# with "TEXT" prepended.
output = re.sub("\t\t[0-9]{1,3}\t\t",
                lambda x: "TEXT" + x.group(0),
                SAMPLE_TEXT)

print output  # Print the resulting string.