我想在字符串中匹配单词的10个单词之前和之后打印。
例如,我有
SELECT *
FROM
(
SELECT FACULTY, YEAR, ADMINISSION, DROPPUTS
FROM TABLE
PIVOT (SUM (ADMISSIONS)
FOR YEAR IN (2018,2019,2020)
)
在上面的字符串中,我想搜索字母经验并想要类似的输出
string = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"
我尝试了Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language"
,但它只在单词之前返回一个。
答案 0 :(得分:2)
在一个或多个空白字符上拆分单词可能是最好的方法:
import re
s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"
words = re.split(r'\s+', s)
try:
index = words.index('experience')
except Exception:
pass
else:
start = max(index - 5, 0)
end = min(index + 6, len(words))
print(' '.join(words[start:end]))
打印:
-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine
但是,如果您不愿意使用正则表达式,则应该在“体验”之前最多打印5个单词,在其后最多打印5个单词:
import re
s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"
m = re.search(r'([\w,;!.+-]+\s+){0,5}experience(\s+[\w,;!.+-]+){0,5}', s)
if m:
print(m[0])
打印:
-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine
更新为处理“体验”或“体验”
我还简化了正则表达式:
import re
s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"
# By splitting on one or more whitespace characters:
words = re.split(r'\s+', s)
try:
index = words.index('experience')
except Exception:
try:
index = words.index('Experience')
except Exception:
index = None
if index:
start = max(index - 5, 0)
end = min(index + 6, len(words))
print(' '.join(words[start:end]))
# Using a regular expression:
m = re.search(r'(\S+\s+){0,5}[eE]xperience(\s+\S+){0,5}', s)
if m:
print(m[0])
打印:
-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine
-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine
答案 1 :(得分:1)
您可以先用空格分隔单词,然后从前10个单词中选择单词,直到列表结尾,最后将列表分组以重做字符串
ts=string.split(' ')[10:]
print(" ".join(ts))
答案 2 :(得分:1)