Question

我正在尝试解析这样的字符串： aa bb first item ee ff

我需要单独的前缀' aa bb '，关键字：'第一项'和后缀' ee ff '

前缀和后缀可以是几个单词甚至不存在。关键字是预定义值列表。

这是我尝试的但是它不起作用：

a = ZeroOrMore(Word(alphas)('prefix')) & oneOf(['first item', 'second item'])('word') & ZeroOrMore(Word(alphas)('suffix'))

Answer 1

第一个问题是您使用＆＃39;＆amp;＆＃39;运营商。在pyparsing中，＆＃39;＆amp;＆＃39;生成Each个表达式，类似于And，但是可以按任何顺序接受子表达式：

Word('a') & Word('b') & Word('c')

匹配＆＃39; aaa bbb ccc＆＃39;，但也包括＆＃39; bbb aaa ccc＆＃39;，＆＃39; ccc bbb aaa＆＃39;等。

在您的解析器中，您将要使用＆＃39; +＆＃39;运算符，它生成And个表达式。 And匹配多个子表达式，但只按给定的顺序排列。

其次，使用pyparsing的原因之一是接受不同的空格。空格是解析器的一个问题，特别是在正则表达式中使用str.find或正则表达式时，这通常表现为整个匹配表达式中的大量\s+个碎片。在您的pyparsing解析器中，如果输入字符串包含'first item'（在＆＃39; first＆＃39;和＆＃39; item＆＃39;之间有两个空格），则尝试匹配文字字符串＆＃39; first item＆＃39;将失败。相反，你应该分别匹配多个单词，可能使用pyparsing的Keyword类，并让pyparsing跳过它们之间的任何空格。为了简化这一点，我写了一个简短的方法wordphrase：

def wordphrase(s):
    return And(map(Keyword, s.split())).addParseAction(' '.join)
keywords = wordphrase('first item') | wordphrase('second item')
print(keywords)

打印：

{{"first" "item"} | {"second" "item"}}

表示每个单词将被单独解析，单词之间有任意数量的空格。

最后，你必须编写pyparsing解析器，知道pyparsing不做任何前瞻。在您的解析器中，前缀表达式ZeroOrMore(Word(alphas))将匹配所有中的单词＆＃34; aa bb first item ee ff＆＃34; - 然后没有任何东西可以匹配关键字表达式，因此解析器失败。要在pyparsing中对此进行编码，您必须在ZeroOrMore中为前缀单词编写一个表达式，该单词转换为＆＃34;匹配alpha的每个单词，但首先要确保我们不打算解析关键字表达式＆＃ 34 ;.在pyparsing中，使用NotAny实现了这种负向前瞻，您可以使用一元~运算符创建。为了获得可读性，我们将使用上面的keywords表达式：

non_keyword = ~keywords + Word(alphas)
a = ZeroOrMore(non_keyword)('prefix') + keywords('word') + ZeroOrMore(Word(alphas))('suffix')

这是一个完整的解析器，并使用runTests对不同的示例字符串进行结果：

def wordphrase(s):
    return And(map(Keyword, s.split())).addParseAction(' '.join)
keywords = wordphrase('first item') | wordphrase('second item')

non_keyword = ~keywords + Word(alphas)
a = ZeroOrMore(non_keyword)('prefix') + keywords('word') + ZeroOrMore(Word(alphas))('suffix')

text = """
    # prefix and suffix
    aa bb first item ee ff

    # suffix only
    first item ee ff

    # prefix only
    aa bb first item

    # no prefix or suffix
    first item

    # multiple spaces in item, replaced with single spaces by parse action
    first   item
    """

a.runTests(text)

给出：

# prefix and suffix
aa bb first item ee ff
['aa', 'bb', 'first item', 'ee', 'ff']
- prefix: ['aa', 'bb']
- suffix: ['ee', 'ff']
- word: 'first item'

# suffix only
first item ee ff
['first item', 'ee', 'ff']
- suffix: ['ee', 'ff']
- word: 'first item'

# prefix only
aa bb first item
['aa', 'bb', 'first item']
- prefix: ['aa', 'bb']
- word: 'first item'

# no prefix or suffix
first item
['first item']
- word: 'first item'

# multiple spaces in item, replaced with single spaces by parse action
first   item
['first item']
- word: 'first item'

Answer 2

如果我正确地理解了你的问题，这应该可以解决问题：

toParse='aa bb first item ee ff'
keywords=['test 1','first item','test two']
for x in keywords:
    res=toParse.find(x)
    if res>=0:
        print('prefix='+toParse[0:res])
        print('keyword='+x)
        print('suffix='+toParse[res+len(x)+1:])
        break

给出了这个结果：

prefix=aa bb 
keyword=first item
suffix=ee ff

使用pyparsing查找关键字的前缀和后缀

2 个答案: