Regex Expression可以获得双引号之间的所有内容

时间:2015-01-27 18:55:06

标签: python regex

我正在尝试使用正则表达式来处理一串多行文本。需要这个才能用于python。

示例文字:

description : "4.10 TCP Wrappers - not installed"
info        : "If some of the services running in /etc/inetd.conf are 

required, then it is recommended that TCP Wrappers are installed and configured to limit access to any active TCP and UDP services.

TCP Wrappers allow the administrator to control who has access to various inetd network services via source IP address controls. TCP Wrappers also provide logging information via syslog about both successful and unsuccessful connections.

TCP Wrappers are generally triggered via /etc/inetd.conf, but other options exist for \"wrappering\" non-inetd based software.

The configuration of TCP Wrappers to suit a particular environment is outside the scope of this benchmark; however the following links will provide the necessary documentation to plan an appropriate implementation:

ftp://ftp.porcupine.org/pub/security/index.html

The website contains source code for both IPv4 and IPv6 versions."

expect      : "^[\\s]*[A-Za-z0-9]+:[\\s]+[^A][^L][^L]"
required        : YES

我想出了这个,

[(a-zA-Z_ \t#)]*[:][ ]*\"[^\"]*.*\"

但问题是它停在第二个\“未选择该行的其余部分。

我的目标是让整个字符串从info开始直到双引号的末尾,与信息行相关。

同样的正则表达式也适用于'expect'行,从期望结束于与期望字符串相关的双引号开始。

一旦我得到整个字符串,我将把它拆分为第一个“:”,因为我想将这些字符串存储到DB中,其中“description”,“info”,“expect”作为列,然后字符串作为值那些专栏。

感谢帮助!

2 个答案:

答案 0 :(得分:1)

另一种方法是使用shlex模块中提供的工具:

>>> s = """tester : "this is a long string
that
is multiline, contains \\" double qoutes \\" and .
this line is finished\""""
>>> shlex.split(s[s.find('"'):])[0]
'this is a long string\nthat\nis multiline, contains " double qoutes " and .\nthis line is finished'

它还会从字符串中的双引号中删除后退。

代码在字符串中找到第一个双引号,只查看从那里开始的字符串。然后,它使用shlex.split()来标记字符串的其余部分,并从返回的列表中获取第一个标记。

答案 1 :(得分:0)

更新1:我让这个工作:

[(a-zA-Z_ \t#)]*[:][ ]*\"([^\"]|(?<=\\\\)[\"])*\"

更新2:如果您无法修改文件以在上面的表达式中添加必要的引号,那么只要

这样的行
group : "@GROUP@" || "test"

仅作为单行存在,然后我认为这将抓住那些带有较长引用值的那些:

[(a-zA-Z_ \t#)]*[:][ ]*(?:\"([^\"]|(?<=\\\\)[\"])*\"|.*)(?=(?:\r\n|$))

尝试一下,如果有效,我会再次更新以解释它。