Question

我想使用Python正则表达式审查单词。

我的词被定义为字母数字[a-zA-Z0-9]，并由非字母数字[^ a-zA-Z0-9]分隔。

内部字符应该被删除的单词被*替换，而其他单词保持不变。

例如：

test=y
tes't
test'
test-y
tes-ty
    test  Test    
test
abcdefg  Test ... test are the best... some thing words @@$: HAHA TEST ONE REAL PLAYER!!! EXCELLENT! It's testy night

结果应为

t**t=y
tes't
t**t'
t**t-y
tes-ty
    t**t  T**t    
t**t
abcdefg  T**t ... test are the best... some thing words @@$: HAHA T**T ONE REAL PLAYER!!! EXCELLENT! It's testy night

我曾尝试使用正则表达式来执行此操作。我在python3中使用了re模块。

1.我尝试匹配模式。

2.以匹配的模式捕获组，没有删失的单词。

3.尝试用小组连接小组。

例如：我试图审查“测试”这个词。

由于我不知道如何用*替换它，我尝试用＆＃39; SUB＆＃39;先看看我的模式是对还是不对。

inputStr = re.sub(r'([^a-zA-z0-9]+)test([^a-zA-z0-9]+)', r'\1SUB\2', inputStr, flags=re.IGNORECASE)
inputStr = re.sub(r'^test([^a-zA-z0-9]+)', r'SUB\1', inputStr, flags=re.IGNORECASE)
replacedStr = re.sub(r'([^a-zA-z0-9]+)test$', r'\1SUB', inputStr, flags=re.IGNORECASE)
print(replacedStr)

另外，是否可以使用一行来做上述事情，我不知道如何以单行模式使用该组。

replacedStr = re.sub('[^a-zA-z0-9]+test[^a-zA-z0-9]+|^test[^a-zA-z0-9]+|[^a-zA-z0-9]+test$', 'SUB', inputStr, flags=re.IGNORECASE)

但它并没有很好地运作。

re.sub('[^a-zA-z0-9]+test[^a-zA-z0-9]+|^test[^a-zA-z0-9]+|[^a-zA-z0-9]+test$', 'SUB', inputStr, flags=re.IGNORECASE)

我的结果

SUB=y
tes't
SUB'
test-y
tes-ty
    SUB  Test    
SUB
abcdefg  SUB ... test are the best... some thing words @@$: HAHA SUB ONE REAL PLAYER!!! EXCELLENT! It's testy night

我认为我的模式有点与某些“测试”相匹配。而且我不知道为什么。 https://regexr.com/3nk9l

所以，我的问题是

1.我的模式出了什么问题？

2.如何让匹配的单词在其内部用*替换？

THX

Answer 1

我认为，不是明确地匹配^test，^test$和test$（这是我认为你的正则表达式正在崩溃的地方），你可能会使用look ahead and behind assertions做得更好分开单词然后替换内部字母。

import re

for line in """test=y
tes't
test'
test-y
tes-ty
    test  Test
test
abcdefg  Test ... test are the best... some thing words @@$: HAHA TEST ONE REAL PLAYER!!! EXCELLENT! It's testy night
""".splitlines():
    print line

    print re.sub(r'(?<!{0})(t)es(t)(?!{0})'.format(r'[a-zA-z0-9]'), r'\1**\2', line, flags=re.IGNORECASE)

结果：

test=y
t**t=y
tes't
tes't
test'
t**t'
test-y
t**t-y
tes-ty
tes-ty
    test  Test
    t**t  T**t
test
t**t
abcdefg  Test ... test are the best... some thing words @@$: HAHA TEST ONE REAL PLAYER!!! EXCELLENT! It's testy night
abcdefg  T**t ... t**t are the best... some thing words @@$: HAHA T**T ONE REAL 
PLAYER!!! EXCELLENT! It's testy night

Answer 2

你可以试试这个：

(?<![a-zA-Z0-9])(t)es(t)(?![a-zA-Z0-9])

并替换为：

\1**\2

Python demo：

import re 
regex = r"(?<![a-zA-Z0-9])(t)es(t)(?![a-zA-Z0-9])"
subst = "\\1**\\2"
result = re.sub(regex, subst, inputStr, 0, re.IGNORECASE)

正则表达式替换匹配单词的中间内容

2 个答案: