Question

我是正规表达的新手。

我试图在svstat命令中获取up或down的服务列表。

svstat的输出示例：

data _null_;
  file 'insert_queries.sql';
  set sashelp.class;
  query = "insert into class values ("||
          catx(",", quote(trim(Name)), quote(Sex), Height, Weight)||
          ");";
  put query;
run;

所以，目前我需要2个正则表达式来过滤UP或DOWN

的服务

UP的样本regex-1：

/etc/service/worker-test-1: up (pid 1234) 97381 seconds
/etc/service/worker-test-2: up (pid 4567) 92233 seconds
/etc/service/worker-test-3: up (pid 8910) 97381 seconds
/etc/service/worker-test-4: down 9 seconds, normally up
/etc/service/worker-test-5: down 9 seconds, normally up
/etc/service/worker-test-6: down 9 seconds, normally up

regex-1的输出：

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)\s\(pid\s(?P<pid>\d+)\)\s(?P<seconds>\d+)

DOWN的示例regex-2

Match 1
status -> up
service_name -> worker-test-1
pid -> 1234
seconds -> 97381

Match 2
status -> up
service_name -> worker-test-2
pid -> 4567
seconds -> 92233

Match 3
status -> up
service_name -> worker-test-3
pid -> 8910
seconds -> 97381

regex-2的输出

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)\s(?P<seconds>\d+)

问题是，如何只使用1个正则表达式来获得UP和DOWN？

顺便说一下，我使用http://pythex.org/来创建和测试这些正则表达式。

Answer 1

您可以将/etc/service/(?P<service_name>.+):\s(?P<status>up|down)(?:\s\(pid\s(?P<pid>\d+)\))?\s(?P<seconds>\d+)括在可选的非捕获组中：

pid

如果服务已关闭，这将导致None为{{1}}。见Regex101 demo.

Answer 2

正如我所承诺的那样，我的午休替代方案（不想谈论固定令牌拆分解析，但在考虑只有OP知道的其他用例时可能会派上用场;）

#! /usr/bin/env python
from __future__ import print_function

d = """
/etc/service/worker-test-1: up (pid 1234) 97381 seconds
/etc/service/worker-test-2: up (pid 4567) 92233 seconds
/etc/service/worker-test-3: up (pid 8910) 97381 seconds
/etc/service/worker-test-4: down 9 seconds, normally up
/etc/service/worker-test-5: down 9 seconds, normally up
/etc/service/worker-test-6: down 9 seconds, normally up
"""


def service_state_parser_gen(text_lines):
    """Parse the lines from service monitor by splitting
    on well known binary condition (either up or down)
    and parse the rest of the fields based on fixed
    position split on sanitized data (in the up case).
    yield tuple of key and dictionary as result or of
    None, None when neihter up nor down detected."""

    token_up = ': up '
    token_down = ': down '
    path_sep = '/'

    for line in d.split('\n'):
        if token_up in line:
            chunks = line.split(token_up)
            status = token_up.strip(': ')
            service = chunks[0].split(path_sep)[-1]
            _, pid, seconds, _ = chunks[1].replace(
                '(', '').replace(')', '').split()
            yield service, {'name': service,
                            'status': status,
                            'pid': int(pid),
                            'seconds': int(seconds)}
        elif token_down in line:
            chunks = line.split(token_down)
            status = token_down.strip(': ')
            service = chunks[0].split(path_sep)[-1]
            pid = None
            seconds, _, _, _ = chunks[1].split()
            yield service, {'name': service,
                            'status': status,
                            'pid': None,
                            'seconds': int(seconds)}
        else:
            yield None, None


def main():
    """Sample driver for parser generator function."""

    services = {}
    for key, status_map in service_state_parser_gen(d):
        if key is None:
            print("Non-Status line ignored.")
        else:
            services[key] = status_map

    print(services)

if __name__ == '__main__':
    main()

运行时，它会在给定的样本输入上产生结果：

Non-Status line ignored.
Non-Status line ignored.
{'worker-test-1': {'status': 'up', 'seconds': 97381, 'pid': 1234, 'name': 'worker-test-1'}, 'worker-test-3': {'status': 'up', 'seconds': 97381, 'pid': 8910, 'name': 'worker-test-3'}, 'worker-test-2': {'status': 'up', 'seconds': 92233, 'pid': 4567, 'name': 'worker-test-2'}, 'worker-test-5': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-5'}, 'worker-test-4': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-4'}, 'worker-test-6': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-6'}}

所以在命名组匹配的情况下存储的信息被存储（已经在dict中的匹配键下转换为值。如果服务关闭，当然没有进程id，因此pid被映射到{{1}这使得以一种健壮的方式对其进行编码变得容易（如果将所有下行服务存储在一个隐含的单独结构中，则不建议访问None字段...

希望它有所帮助。 PS：是的，展示函数的参数名称text_lines没有最佳命名，包含它的内容，但你应该得到解析的想法。

Answer 3

我不知道你是否被迫使用正则表达式，但如果你不需要，你可以这样做：

if "down" in linetext:
    print( "is down" )
else:
    print( "is up" )

更容易阅读和更快。

在Python中结合正则表达式

3 个答案: