Question

我有一些包含某些行的文件。在这些行中，我将只选择以$ nodetool getendpoints stresstest status bill 10.134.38.15 10.134.38.24 10.134.38.26开头的行。现在以xxx开头的行具有如下模式：

xxx

我想只提取第一个双引号中的字符串即，＆＃34; pqrs＆＃34;和＆＃34; abc＆＃34;。

感谢使用正则表达式的任何帮助。

我的代码如下：

xxx:(12:"pqrs",223,"rst",-90)
xxx:(23:"abc",111,"def",-80)

这段代码给了我错误

Answer 1

您的代码错误缩进。您的f = f.readlines()前面有9个空格，而for line in f:有4个空格。它应该如下所示。

import re
list_of_prefixes = ["xxx","aaa"]
resulting_list = []
with open("raw.txt","r") as f:
        f = f.readlines()
        for line in f:
            line=line.rstrip()
            for phrase in list_of_prefixes:
                if re.match(phrase + ':\(\d+:\"(\w+)',line) != None:
                    resulting_list.append(re.findall(phrase +':\(\d+:\"(\w+)',line)[0])

Answer 2

results = []
with open("log.txt","r") as f:
    f = f.readlines()
    for line in f:
        if line.startswith("xxx"):
            line = line.split(":")  # line[1] will be what is after :
            result = line[1].split(",")[0][1:-1] # will be pqrs
            results.append(result)

您想要查找以xxx开头的行然后拆分：之后的第一件事是：你想要的 - 直到逗号。然后你的结果是该字符串，但删除引号。没有正则表达式。 Python字符串函数没问题

Answer 3

你正朝着正确的方向前进。

如果输入很简单，可以使用正则表达式组。

with open("log.txt","r") as f:
    f = f.readlines()
    for line in f:
        line=line.rstrip()
        m = re.match('^xxx:\(\d*:("[^"]*")',line)
        if m is not None:
            print(m.group(1))

所有魔法都在正则表达式中。

^ xxx：（\ d * :(“[^”] *“）表示

从该行的开头开始，匹配“xxx :(＆lt;任意数量的数字＆gt;：”＆lt;“除了”＆gt;“

因为序列“＆lt; nothing but”＆gt;“括在圆括号中，它将作为一组（通过调用m.group（1））。

PS：下次确保包含您获得的确切错误

Answer 4

检查行是否以 xxx

开头

line.startswith('xxx')

要在第一个双引号中找到文本，请执行

re.search(r'"(.*?)"', line).group(1)

（因为match.group(1)是第一个带括号的子组）

所以代码将是

with open("file") as f:
    for line in f:
        if line.startswith('xxx'):
            print(re.search(r'"(.*?)"', line).group(1))

re module docs

使用python从文件中提取某些字符串

4 个答案: