使用pyparsing作为查询解析器

时间:2012-10-31 13:53:29

标签: pyparsing

我刚刚了解了优秀的pyparsing模块,我想用它来创建一个查询解析器。

基本上我希望能够解析以下类型的表达式:

'b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)'

其中b_coherent,symbol和nucleon是数据库的关键字。

我仔细阅读了pyparsing(searchparser.py)附带的一个示例,我认为(我希望!),让我非常接近我的目标,但仍然有问题。

这是我的代码:

from pyparsing import *

logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])

value = Word(alphanums+'_')
quote = Combine('"' + value + '"') | value

selection = db_keyword + arithmetic_operator + (value|quote)
selection = selection + ZeroOrMore(logical_operator+selection)

parenthesis = Forward()
parenthesis << ((selection + parenthesis) | selection)
parenthesis = Combine('(' + parenthesis + ')') | selection

grammar = parenthesis + lineEnd

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)')

我有一些问题要完全理解Forward对象。也许这是我的解析器无法正常工作的一个原因。你知道我的语法有什么问题吗?

非常感谢你的帮助

埃里克

2 个答案:

答案 0 :(得分:1)

你可以使用Forward在括号内手工制作自己的表达式,但是pyparsing的operatorPrecedence简化了整个过程。请参阅下面我原始代码的更新形式,并附上评论:

from pyparsing import *

# break these up so we can represent higher precedence for 'and' over 'or'
#~ logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
and_operator        = oneOf(['and','&'], caseless=True) 
or_operator         = oneOf(['or' ,'|'], caseless=True) 

# db_keyword is okay, but you might just want to use a general 'identifier' expression,
# you won't have to keep updating as you add other terms to your query language
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
ident = Word(alphas+'_', alphanums+'_')

# these aren't really arithmetic operators, they are comparison operators
#~ arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])

# instead of generic 'value', define specific value types 
#~ value = Word(alphanums+'_')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

# use pyparsing's QuotedString class for this, it gives you quote escaping, and
# automatically strips quotes from the parsed text
#~ quote = Combine('"' + value + '"') | value
quote = QuotedString('"')

# when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
literal_true = Keyword('true', caseless=True)
literal_false = Keyword('false', caseless=True)
boolean_literal = literal_true | literal_false

# in future, you can expand comparison_operand to be its own operatorPrecedence 
# term, so that you can do things like "nucleon != 1+2" - but this is fine for now
comparison_operand = quote | db_keyword | ident | float_ | integer
comparison_expr = Group(comparison_operand + comparison_operator + comparison_operand)

# all this business is taken of for you by operatorPrecedence
#~ selection = db_keyword + arithmetic_operator + (value|quote)
#~ selection = selection + ZeroOrMore(logical_operator+selection)
#~ parenthesis = Forward()
#~ parenthesis << ((selection + parenthesis) | selection)
#~ parenthesis = Combine('(' + parenthesis + ')') | selection
#~ grammar = parenthesis + lineEnd

boolean_expr = operatorPrecedence(comparison_expr | boolean_literal, 
    [
    (not_operator, 1, opAssoc.RIGHT),
    (and_operator, 2, opAssoc.LEFT),
    (or_operator,  2, opAssoc.LEFT),
    ])
grammar = boolean_expr

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)', parseAll=True)

print res.asList()

打印

[[['b_coherent', '==', '1_2'], 'or', [['symbol', '==', 2], 'and', ['nucleon', '!=', 3]]]]

从这里,我建议您研究如何创建可以实际评估的内容的下一步,查看simpleBool.py example中的pyparsing wiki,了解使用{{1}时如何完成此操作}}

我很高兴听到你正在享受pyparsing,欢迎!

答案 1 :(得分:0)

  

后面定义的表达式的前向声明 - 用于   递归语法,例如代数中缀表示法。当。。。的时候   表达式是已知的,它被赋值给Forward变量使用   '&LT;&LT;'操作