Pyparser如何解析相似的分隔和非分隔字符串

时间:2017-03-25 02:22:38

标签: python parsing pyparsing

如何使用两个单独的解析器解析下面两种类型的字符串 - 每种模式一个?

from pyparsing import *    
dd = """
  wire         c_f_g;
  wire         cl_3_f_g4;

   x_y abc_d
      (.c_l (cl_dclk_001l),
       .c_h (cl_m1dh_ff),
       .ck     (b_f_1g));

我能够使用下面的解析器独立解析它们:

# For the lines containing wire
printables_less_semicolon = printables.replace(';','')
wireDef = Literal("wire") + Word( printables)

# For the nested pattern
instanceStart = Word( printables ) + Word( printables_less_semicolon )
u = nestedExpr(opener="(", closer=")", ignoreExpr=dblSlashComment)
t = OneOrMore(instanceStart + u + Word( ";" ) + LineEnd())
print instanceStart.parseString(dd)

如果运行上面的代码,instanceStart解析器会匹配有线。我怎样才能可靠地区分两者?

1 个答案:

答案 0 :(得分:0)

我有一个有效的解决方案(绝对不是最好的)。

    printables_less_semicolon = printables.replace(';','')

bracketStuff    = Group(QuotedString("(", escChar=None, multiline=True, endQuoteChar=");"))
ifDef           = Group(QuotedString("`ifdef", endQuoteChar="`endif", multiline=True))
theEnd          = Word( "endmodule" )
nestedConns     = Group(nestedExpr(opener="(", closer=")", ignoreExpr=dblSlashComment))
instance        = Regex('[\s?|\r\n?].*\(')
othersWithSc    = Group(Word (printables) + Word (printables_less_semicolon) + Literal(";"))
othersWithoutSc = Word (printables) + Word (printables_less_semicolon) + NotAny(Literal(";"))

上述解析器的组合允许我以我正在处理的格式解析文件。 输入示例:

ts2 = """
module storyOfFox ( andDog, 
        JLT);
  input andDog;
  output JLT;

   `ifdef quickFox
 `include "gatorade"
 `include "chicken" 
`endif 

   wire         hello;
   wire         and;
   wire         welcome;


   the quick
      (.brown (fox),
       .jumps (over),
       .the (lazy),
       .dog    (and),
       .the (dog),
       .didNot    (likeIt));

    theDog thenWent
      (// Waiver unused
       .on (),
       // Waiver unused
       .to (),
       .sueThe (foxFor),
       .jumping (andBeingTooQuick),
       .TheDog    (wasHailedAsAHero),
       .endOf (Story));


  endmodule
"""

用于解析上述内容的解析器:

try:
    tp  = othersWithoutSc + Optional(bracketStuff) + Optional(ZeroOrMore(othersWithSc)) + Optional( Group( ZeroOrMore( othersWithoutSc + nestedConns ) ) ) + theEnd
    tpI = Group( ZeroOrMore( othersWithoutSc + nestedConns +  Word( ";" ) ) )
    tpO = Each( [Optional(ZeroOrMore(othersWithSc)), Optional(ifDef)] )
    tp  = othersWithoutSc + Optional(bracketStuff) + tpO + Group(tpI) + theEnd
    #print othersWithoutSc.parseString("input xyz;")
    print tp.parseString(ts2)
except ParseException as x:
    print "Line {e.lineno}, column {e.col}:\n'{e.line}'".format(e=x)

获得的输出:

module
storyOfFox
[' andDog, \n        JLT']
['input', 'andDog', ';']
['output', 'JLT', ';']
[' quickFox\n `include "gatorade"\n `include "chicken" \n']
['wire', 'hello', ';']
['wire', 'and', ';']
['wire', 'welcome', ';']
[['the', 'quick', [['.brown', ['fox'], ',', '.jumps', ['over'], ',', '.the', ['lazy'], ',', '.dog', ['and'], ',', '.the', ['dog'], ',', '.didNot', ['likeIt']]], ';', 'theDog', 'thenWent', [['// Waiver unused', '.on', [], ',', '// Waiver unused', '.to', [], ',', '.sueThe', ['foxFor'], ',', '.jumping', ['andBeingTooQuick'], ',', '.TheDog', ['wasHailedAsAHero'], ',', '.endOf', ['Story']]], ';']]
endmodule

我不想接受这个答案,因为我可能没有解决我之前遇到的真正问题。我刚刚找到了解决它的方法并获得了我需要的输出。