Scala在空白处分割字符串,不包括某些部分

时间:2019-04-08 17:56:42

标签: string scala list

我想拆分此字符串“ 158.106.201.22'-''-'[08 / Apr / 2019:15:19:48 +0000]'GET /media/2tSodgDfwCjIMCBY8h/200w_d.gif HTTP / 1.1 '200 3293“ 分为七个单独的令牌 这样我最终得到一个像这样的列表:

List("158.106.201.22", "-", "-", "08/Apr/2019:15:19:48 +0000]", "GET /media/2tSodgDfwCjIMCBY8h/200w_d.gif HTTP/1.1", "200", "3293"). 

我尝试使用Scala split()方法使用空格作为分隔符,但最终将“ [[08 / Apr / 2019:15:19:48 +0000]” “获取/media/2tSodgDfwCjIMCBY8h/200w_d.gif HTTP / 1.1” 放入单独的标记中,因为它们还包含空白区域,所以我最终得到以下结果:

List("158.106.201.22", "-", "-", "[08/Apr/2019:15:19:48", "+0000]", "GET", "/media/2tSodgDfwCjIMCBY8h/200w_d.gif", "HTTP/1.1", "200", "3293")

最好的方法是什么? 谢谢!

2 个答案:

答案 0 :(得分:2)

如果您不喜欢复杂的正则表达式:

val str = "158.106.201.22 '-' '-' [08/Apr/2019:15:19:48 +0000] 'GET /media/2tSodgDfwCjIMCBY8h/200w_d.gif HTTP/1.1' 200 3293"

val stage1 =  str.split("[\\[\\]\\']")
                  .map(_.trim)
                  .filterNot(_.isEmpty)

val result = stage1.dropRight(1) ++ stage1.last.split(" ")

println(result.toList)

//List(158.106.201.22, -, -, 08/Apr/2019:15:19:48 +0000, GET /media/2tSodgDfwCjIMCBY8h/200w_d.gif HTTP/1.1, 200, 3293)

答案 1 :(得分:0)

也许只使用正则表达式?

val str = "158.106.201.22 '-' '-' [08/Apr/2019:15:19:48 +0000] 'GET /media/2tSodgDfwCjIMCBY8h/200w_d.gif HTTP/1.1' 200 3293" 

val pattern = "([\\d\\.]+) ('-') ('-') (\\[.+\\]) ('.*') (\\d+) (\\d+)".r

val values = pattern.findAllIn(str) match {
   case matched => (1 to matched.groupCount).map(matched.group).toArray
}

values //Array("158.106.201.22", "'-'", "'-'", "[08/Apr/2019:15:19:48 +0000]", "'GET /media/2tSodgDfwCjIMCBY8h/200w_d.gif HTTP/1.1'", "200", "3293")