解析简单的查询语法

时间:2017-02-10 14:02:16

标签: regex parsing search lucene

假设我有一个类似的查询字符串:

#some terms! "phrase query" in:"my container" in:group_3

#some terms!

in:"my container" in:group_3 terms! "phrase query"

in:"my container" test in:group_3 terms!

正确解析这个问题的最佳方法是什么?

我看过Lucene的SimpleQueryParser,但对我的用例来说似乎相当复杂。我正在尝试使用正则表来解析该查询,但直到现在才真正成功,主要是因为可能在引号内使用空格

有什么简单的想法吗?

我只需要输出一个元素列表,然后我很容易解决剩下的问题:

[
  "#some",
  "terms!",
  "phrase query",
  "in:\"my container\"",
  "in:group_3"
]

2 个答案:

答案 0 :(得分:2)

以下正则表达式与输出文本匹配:

(?:\S*"(?:[^"]+)"|\S+)

launch

答案 1 :(得分:0)

对于那些感兴趣的人,这是我用来解决问题的最终Scala / Java解析器,受到这个问题中答案的启发:

def testMatcher(query: String): Unit = {
  def optionalPrefix(groupName: String) = s"(?:(?:(?<$groupName>[a-zA-Z]+)[:])?)"
  val quoted = optionalPrefix("prefixQuoted") + "\"(?<textQuoted>[^\"]*)\""
  val unquoted = optionalPrefix("prefixUnquoted") + "(?<textUnquoted>[^\\s\"]+)"
  val regex = quoted + "|" + unquoted
  val matcher = regex.r.pattern.matcher(query)
  var results: List[QueryTerm] = Nil
  while (matcher.find()) {
    val quotedResult = Option(matcher.group("textQuoted")).map(textQuoted =>
      (Option(matcher.group("prefixQuoted")),textQuoted)
    )
    val unquotedResult = Option(matcher.group("textUnquoted")).map(textUnquoted =>
      (Option(matcher.group("prefixUnquoted")),textUnquoted)
    )
    val anyResult = quotedResult.orElse(unquotedResult).get
    results = QueryTerm(anyResult._1,anyResult._2) :: results
  }
  println(s"results=${results.mkString("\n")}")
}