使用spark过滤与日志文件中的单词匹配的行时出错

时间:2018-10-12 08:49:06

标签: apache-spark apache-spark-sql

我的目标是在日志文件中将rdd与错误消息一起显示。 我正在读取日志文件并筛选与单词“ ERROR”匹配的行,我需要通过将其作为RDD来将错误消息写入数据库。

我是新来的人

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.text( "hdfs://10.90.3.78:9000/user/centuryuidt-3-1-1.out")
val patt: String = "ERROR"
val rdd=df.filter(line => line.contains(patt)).collect()
df.foreach(println)

执行此代码时出现以下异常。

<console>:40: error: value contains is not a member of org.apache.spark.sql.Row
       val rdd=df.filter(line => line.contains(patt)).collect()
                                      ^
<console>:43: error: overloaded method value foreach with alternatives:
  (func: org.apache.spark.api.java.function.ForeachFunction[org.apache.spark.sql.Row])Unit <and>
  (f: org.apache.spark.sql.Row => Unit)Unit
 cannot be applied to (Unit)
       df.foreach(println)
          ^

屏幕截图:

error screenshot

添加少量更改

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val lines = sc.textFile( "hdfs://10.90.3.78:9000/user/centuryuidt-3-1-1.out")
val error = lines.filter(_.contains("ERROR"))
val df = error.toDF()

这对我有用,但是我需要用行来框住DF,它只给了我所有错误行在一行中。 谁能帮我把线分成几行??

1 个答案:

答案 0 :(得分:0)

这是我完整的示例:

scala> errors.rdd
res7: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[13] at rdd at <console>:34

如果您确实需要将错误作为RDD,请注意,这是RDD [Row]:

scala> errors.map(_.getString(0)).rdd
res9: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[19] at rdd at <console>:34

如果您确实需要将错误作为RDD [String]:

@computedFrom()