Question

除了某些单词，我想剥离所有不需要的[A-Z]字符（以及其他）。例如，我们有以下字符串：

get 5 and 9

我想摆脱所有不是'and'或'or'的单词，所以最终结果是5 and 9。我还想删除所有不属于'[0-9]。+-*（）<> \ s'的字符。

当前的正则表达式适用于去除所有字符，但是我不希望它去除“和”。在此示例中，结果将为“ 5 9”。

string = 'get 5 and 9'
pattern = re.compile(r'[^0-9\.\+\-\/\*\(\)<>\s)]')
string = re.sub(pattern, '', string)

我不是正则表达式方面的专家，并且很难为此找到解决方案。我有点迷路。

这是可能的还是我应该寻找其他解决方案？

Answer 1

修订版

import re

test = "get 6 AND 9 or 3 for 6"
keywords = ['and', 'or']
print(' '.join(t for t in test.split() if t.lower() in keywords or t.isdigit()))

$ python test.py
6 AND 9 or 3 6

这会拒绝包含和和或

的单词

以前的版本。我认为这是一个非常简单的解决方案，但不幸的是，由于它用较长的字词表示了“和”和“或”，因此无法正常工作。

import re

test = "get 6 AND 9 or 3"
pattern=re.compile("(?i)(and|or|\d|\s)")
result = re.findall(pattern, test)
print(''.join(result).strip())

$ python test.py
6 AND 9 or 3

由于（？i），单词不区分大小写。空格用\ s保留，但从print语句的开头和结尾删除。数字通过\ d保留。和|或| \ d | \ s周围的括号是通过findall找到的字符串的位，该字符串生成已找到的内容的列表，然后在打印功能中将它们重新连接在一起。

Answer 2

不使用正则表达式的方法

input = 'get 5 and 9'

accept_list = ['and', 'or']

output = []
for x in input.split():
    try :
        output.append(str(int(x)))
    except :
        if x in accept_list:
            output.append(x)

print (' '.join(output))

输出

5和9

如何去除不需要的字符和字符串？

2 个答案: