使用Scala解析器组合器解析方案

时间:2010-12-06 06:34:23

标签: parsing scala

我正在Scala中编写一个小型的解释器,我在解析Scheme中的列表时遇到了问题。我的代码解析包含多个数字,标识符和布尔值的列表,但如果我尝试解析包含多个字符串或列表的列表,它会发生窒息。我错过了什么?

这是我的解析器:

class SchemeParsers extends RegexParsers {

// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] = 
    ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}

// A Scheme identifier allows alphanumeric chars, some symbols, and 
// can't start with a digit
def id : Parser[String] = 
    """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}

// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}

// A string can have any character except ", and is wrapped in "
def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' ^^ {case s => s}

// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] = 
    '(' ~> rep(expr) <~ ')' ^^ {s: List[Any] => s}

// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
} 

2 个答案:

答案 0 :(得分:3)

正如@Gabe正确指出的那样,你留下了一些未处理的白色空间:

scala> object SchemeParsers extends RegexParsers {
     |
     | private def space  = regex("[ \\n]*".r)
     |
     | // Scheme boolean #t and #f translate to Scala's true and false
     | private def bool : Parser[Boolean] =
     |     ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}
     |
     | // A Scheme identifier allows alphanumeric chars, some symbols, and
     | // can't start with a digit
     | private def id : Parser[String] =
     |     """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r
     |
     | // This interpreter only accepts numbers as integers
     | private def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}
     |
     | // A string can have any character except ", and is wrapped in "
     | private def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' <~ space ^^ {case s => s}
     |
     | // A Scheme list is a series of expressions wrapped in ()
     | private def list : Parser[List[Any]] =
     |     '(' ~> space  ~> rep(expr) <~ ')' <~ space ^^ {s: List[Any] => s}
     |
     | // A Scheme expression contains any of the other constructions
     | private def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
     |
     | def parseExpr(str: String) = parse(expr, str)
     | }
defined module SchemeParsers   

scala> SchemeParsers.parseExpr("""(("a" "b") ("a" "b"))""")
res12: SchemeParsers.ParseResult[Any] = [1.22] parsed: List(List(a, b), List(a, b))

scala> SchemeParsers.parseExpr("""("a" "b" "c")""")
res13: SchemeParsers.ParseResult[Any] = [1.14] parsed: List(a, b, c)

scala> SchemeParsers.parseExpr("""((1) (1 2) (1 2 3))""")
res14: SchemeParsers.ParseResult[Any] = [1.20] parsed: List(List(1), List(1, 2), List(1, 2, 3))

答案 1 :(得分:1)

代码的唯一问题是您使用字符而不是字符串。下面,我删除了多余的^^ { case s => s }并用字符串替换了所有字符。我将在下面进一步讨论这个问题。

class SchemeParsers extends RegexParsers {

// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] = 
    ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}

// A Scheme identifier allows alphanumeric chars, some symbols, and 
// can't start with a digit
def id : Parser[String] = 
    """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}

// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}

// A string can have any character except ", and is wrapped in "
def str : Parser[String] = "\"" ~> """[^""]*""".r <~ "\""

// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] = 
    "(" ~> rep(expr) <~ ")" ^^ {s: List[Any] => s}

// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list
}

所有Parsers的{​​{1}}类型都有隐式accept。因此,如果基本元素是Elem,例如Char,则会对它们进行隐式接受操作,这就是符号RegexParsers,{{1} }和(,它们是代码中的字符。

)自动执行的操作是在任何"RegexParsers的开头自动跳过空格(定义为protected val whiteSpace = """\s+""".r,以便您可以覆盖)。它还负责在出现错误信息时将定位光标移过空白区域。

您似乎没有意识到的一个结果是String将从解析的输出中删除其前缀空间,这不太可能是您想要的。 : - )

此外,由于Regex包含新行,因此在任何标识符之前可以接受新行,这可能是您想要的也可能不是。

您可以通过覆盖" a string beginning with a space"来禁用整个正则表达式中的空格跳过。另一方面,默认\s测试skipWhiteSpace的长度,因此您可以通过在整个解析过程中操纵skipWhiteSpace的值来打开和关闭它。