使用简洁的逻辑规则过滤字符串的好用户友好方法是什么?

时间:2014-07-05 14:34:41

标签: string list filter

我有一个字符串列表,我想使用一些规则来过滤它们。传递过滤的任何字符串都会附加到新的字符串列表中。一个示例规则可以是传递一个字符串,如果它包含X ,如果它包含Y Z.我知道我可以使用Python if编写这些东西语句等等,但有没有更简洁,用户友好的方式进行这种过滤?是否有一些(可能类似于SQL)语言来做这样的事情?

# Accept or filter specified datasets.
filterDatasets = False   
if filterDatasets:
    # Filter specified datasets.
    datasets = []
    # Cycle over all datasets specified.
    logger.info('filtering specified datasets')
    for dataset in datasetsSpecified:
        # If data was specified, then skip a specified dataset if its name
        # does not contain "data12" or "merge". If data was not specified,
        # then skip a specified dataset if its name does not contain "mc12".
        if isData:
            requiredSubstrings = [
                #'data12',
                'Egamma',
                'Muons',
                #'merge',
            ]
            for substring in requiredSubstrings:
                if substring not in dataset:
                    logger.debug("substring {substring} not in dataset name {dataset}".format(substring = substring, dataset = dataset))
                    continue
                else:
                    datasets.append(dataset)                
        else:
            requiredSubstrings = [
                'mc12'
            ]
            for substring in requiredSubstrings:
                if substring not in dataset:
                    logger.debug("substring {substring} not in dataset name {dataset}".format(substring = substring, dataset = dataset))
                    continue
                else:
                    datasets.append(dataset)
        excludedSubstrings = [
            '#'
        ]
        for substring in excludedSubstrings:
            if substring in dataset:
                logger.debug("substring {substring} in dataset name {dataset}".format(substring = substring, dataset = dataset))
                continue
else:
    datasets = datasetsSpecified
logger.info('datasets accepted: {datasets}'.format(datasets = datasets))

1 个答案:

答案 0 :(得分:0)

我认为正则表达式是文本字符串的“SQL” - 几乎所有你能想到的文本处理都可以完成:

http://rick.measham.id.au/paste/explain.pl?regex=hello|[a-z]%2Bbye

匹配此内容的示例:

  • 你好
  • 再见
  • hellobye
  • bbye

与此不符的例子:

  • 地狱
  • ELLO
  • 再见
  • 的Goodby

只需通过正则表达式运行列表中的每个项目,如果匹配则保留它。

我自己(仅偶尔)不是一个沉重的Python用户,但根据文档,他们支持: https://docs.python.org/2/library/re.html