Question

我有一个脚本来搜索和替换。它基于脚本here。它被修改为接受文件作为输入，但它似乎不能很好地识别正则表达式。

剧本：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
import re
import glob

_replacements = {
    '[B]': '**',
    '[/B]': '**',
    '[I]': '//',
    '[/I]': '//',

}

def _do_replace(match):
    return _replacements.get(match.group(0))

def replace_tags(text, _re=re.compile('|'.join((r) for r in _replacements))): 
    return _re.sub(_do_replace, text)

def getfilecont(FN):
    if not glob.glob(FN): return -1 # No such file
    text = open(FN, 'rt').read()
    text = replace_tags(text, re.compile('|'.join(re.escape(r) for r in _replacements)))
    return replace_tags(text)

scriptName = os.path.basename(sys.argv[0])
if sys.argv[1:]:
    srcfile = glob.glob(sys.argv[1])[0]
else:
    print """%s: Error you must specify file, to convert forum tages to wiki tags!
            Type %s FILENAME """ % (scriptName, scriptName)
    exit(1)
dstfile = os.path.join('.' , os.path.basename(srcfile)+'_wiki.txt')
converted = getfilecont(srcfile)
try:
    open(dstfile, 'wt+').write(converted)
    print 'Done.'
except:
    print 'Error saving file %s' % dstfile

print converted
#print replace_tags("This is an [[example]] sentence. It is [[{{awesome}}]].")

我想要的是替换

'[B]': '**',
'[/B]': '**',

只有一行像正则表达式一样

\[B\](.*?)\[\/B\] : **\1**

对于像这样的BBcode标签非常有帮助：

[FONT=Arial]Hello, how are you?[/FONT]

然后我可以使用这样的东西

\[FONT=(.*?)\](.*?)\[\/FONT\] : ''\2''

但我似乎无法用这个脚本做到这一点。还有另一种方法可以在此脚本的原始源中进行正则表达式搜索和替换，但它一次使用re.sub对一个标记有效。这个脚本的其他优点是我可以添加尽可能多的行，所以我可以稍后更新它。

Answer 1

对于初学者来说，你正在逃避这条线上的模式：

text = replace_tags(text, re.compile('|'.join(re.escape(r) for r in _replacements)))

re.escape接受一个字符串并以这样的方式对其进行转义：如果新字符串用作正则表达式，它将与输入字符串完全匹配。

删除re.escape并不能完全解决您的问题，但是，您只需在此行中查找dict中匹配的文本即可找到替代品：

return _replacements.get(match.group(0))

要解决此问题，您可以将每个模式设置为自己的捕获组：

text = replace_tags(text, re.compile('|'.join('(%s)' % r for r in _replacements)))

您还需要知道哪种模式与哪种替换有关。这样的事情可能有用：

_replacements_dict = {
    '[B]': '**',
    '[/B]': '**',
    '[I]': '//',
    '[/I]': '//',
}
_replacements, _subs = zip(*_replacements_dict.items())

def _do_replace(match):
    for i, group in m.groups():
        if group:
            return _subs[i]

请注意，这会将_replacements更改为模式列表，并为实际替换创建并行数组_subs。（我会将它们命名为regex和替换，但不想重新编辑每次出现的“_replacements”）。

Answer 2

有人做过here。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
import re
import glob

_replacements_dict = {
    '\[B\]': '**',
    '\[\/B\]': '**',
    '\[I\]': '//',
    '\[\/I\]': '//',
    '\[IMG\]' : '{{',
    '\[\/IMG\]' : '}}',
    '\[URL=(.*?)\]\s*(.*?)\s*\[\/URL\]' : r'[[\1|\2]]',
    '\[URL\]\s*(.*?)\s*\[\/URL\]' : r'[[\1]]',
    '\[FONT=(.*?)\]' : '',
    '\[color=(.*?)\]' : '',
    '\[SIZE=(.*?)\]' : '',
    '\[CENTER]' : '',
    '\[\/CENTER]' : '',
    '\[\/FONT\]' : '',
    '\[\/color\]' : '',
    '\[\/size\]' : '',
}
_replacements, _subs = zip(*_replacements_dict.items())

def replace_tags(text):
    for i, _s in enumerate(_replacements):
        tag_re = re.compile(r''+_s,  re.I) 
        text, n = tag_re.subn(r''+_subs[i], text)
    return text


def getfilecont(FN):
    if not glob.glob(FN): return -1 # No such file
    text = open(FN, 'rt').read()
    return replace_tags(text)

scriptName = os.path.basename(sys.argv[0])
if sys.argv[1:]:
    srcfile = glob.glob(sys.argv[1])[0]
else:
    print """%s: Error you must specify file, to convert forum tages to wiki tags!
            Type %s FILENAME """ % (scriptName, scriptName)
    exit(1)
dstfile = os.path.join('.' , os.path.basename(srcfile)+'_wiki.txt')
converted = getfilecont(srcfile)
try:
    open(dstfile, 'wt+').write(converted)
    print 'Done.'
except:
    print 'Error saving file %s' % dstfile

#print converted
#print replace_tags("This is an [[example]] sentence. It is [[{{awesome}}]].")

http://pastie.org/1447448

Python：替换标签但保留内部文本V2

2 个答案: