python解析文件并提取段落

时间:2018-07-05 10:38:23

标签: python regex python-3.x parsing

我正在使用家用工具来分析计算机配置,以验证是否应用了某些基本配置,如果未应用,则它将在运行该工具的主机上的文本文件中生成警报。

该工具不会在无法正常工作的计算机上创建文件,而是为所有人创建文件。

我想解析此文本文件,并获取与每台计算机相对应的每个段落,以向IT部门发送电子邮件,IT负责计算机,告诉他他必须做什么。

例如以下示例:

---- mycomputerone ---- 

 Hello

 During Test of mycomputerone following misconfiguration were detected
 - bad ip adress
 - bad name

 please could take the action to correct it and come back to us?

 ---- mycomputertwo ---- 

 Hello

 During Test of mycomputertwo following misconfiguration were detected
 - bad ip adress
 - bad name
 - administrative share available

 please could take the action to correct it and come back to us?

 ---- mycomputerthree ---- 
.....

我想获取hello?之间的文本,但无法管理该方法

我尝试了

re.search(r'hello'(S*\w+)\?'), text)

它没有用。我通过

读取了文件
d = open(file, 'r'; encoding="UTF-8") 
text = d.read()

1 个答案:

答案 0 :(得分:1)

您要的是

re.findall(r'(?m)^\s*Hello\s*[^?]+', d)

其中d是作为单个字符串读取的整个文件。参见this demo。如果内容包含?,它将无法正常工作。

我建议一行一行地阅读,检查一行是否以---开头,然后将后续的行添加到当前记录中。

请参阅以下Python demo

items = []
tmp = ''
with open(file, 'r'; encoding="UTF-8") as d:
for line in d:
    if (line.strip().startswith('---')):
        if tmp:
            items.append(tmp.strip())
            tmp = ''
    else:
        tmp = tmp + line + "\n"
if tmp:
    items.append(tmp)

print(items)    

输出:

['Hello\n\n During Test of mycomputerone following misconfiguration were detected\n - bad ip adress\n - bad name\n\n please could take the action to correct it and come back to us?', 
 'Hello\n\n During Test of mycomputertwo following misconfiguration were detected\n - bad ip adress\n - bad name\n - administrative share available\n\n please could take the action to correct it and come back to us?']