使用起始和结束字符串的pyparsing是相同的

时间:2015-04-08 18:40:03

标签: python regex pyparsing

相关:Python parsing bracketed blocks

我有一个格式如下的文件:

#
here
are
some
strings
#
and
some
others
 #
 with
 different
 levels
 #
 of
  #
  indentation
  #
 #
#

因此,块由起始#和尾随#定义。但是,第n个块的尾随#也是第n个块的起始#

我正在尝试编写一个函数,在给定此格式的情况下,将检索每个块的内容,这也可以是递归的。

首先,我开始使用正则表达式,但我放弃了很快(我想你猜对了),所以我尝试使用pyparsing,但我不能简单地写

print(nestedExpr('#','#').parseString(my_string).asList())

因为它会引发ValueError异常(ValueError: opening and closing strings cannot be the same)。

知道我无法更改输入格式,对于这个格式,我有比pyparsing更好的选择吗?

我也尝试使用此答案:https://stackoverflow.com/a/1652856/740316,并将{ / }替换为#/#,但无法解析字符串。

1 个答案:

答案 0 :(得分:1)

不幸的是(对你而言),你的分组不仅仅依赖于分离的'#'字符,而且还依赖于缩进级别(否则,['with','different','levels']将与前一组{{1}处于同一级别}})。解析缩进敏感的语法并不适合于pyparsing - 它可以完成,但它并不令人愉快。为此,我们将使用pyparsing helper宏['and','some','others'],这也要求我们定义indentedBlock可用于其缩进堆栈的列表变量。

请参阅下面代码中的嵌入式评论,了解如何使用一种方法进行pyparsing和indentedBlock

indentedBlock

打印:

from pyparsing import *

test = """\
#
here
are
some
strings
#
and
some
others
 #
 with
 different
 levels
 #
 of
  #
  indentation
  #
 #
#"""

# newlines are significant for line separators, so redefine 
# the default whitespace characters for whitespace skipping
ParserElement.setDefaultWhitespaceChars(' ')

NL = LineEnd().suppress()
HASH = '#'
HASH_SEP = Suppress(HASH + Optional(NL))

# a normal line contains a single word
word_line = Word(alphas) + NL


indent_stack = [1]

# word_block is recursive, since word_blocks can contain word_blocks
word_block = Forward()
word_group = Group(OneOrMore(word_line | ungroup(indentedBlock(word_block, indent_stack))) )

# now define a word_block, as a '#'-delimited list of word_groups, with 
# leading and trailing '#' characters
word_block <<= (HASH_SEP + 
                 delimitedList(word_group, delim=HASH_SEP) + 
                 HASH_SEP)

# the overall expression is one large word_block
parser = word_block

# parse the test string
parser.parseString(test).pprint()
相关问题