带有空字符串的RDD为0.0

时间:2017-02-11 04:22:45

标签: scala apache-spark

我有一个名为医生的RDD,如:

age,part,day,val
9,elbow,Mon Aug 15 00:00:00 EDT 3399,1.0
9,elbow,Mon Aug 15 00:00:00 EDT 3399,
9,neck,Mon Aug 18 00:00:00 EDT 3499,1.0

val列的某些行中有一个空白区域 有没有办法保留这个RDD,但用0.0替换所有空字符串?

我尝试了带有条件.isEmpty()的地图,但是双人不能使用isEmpty() 我还在(if doctor.val == '') 0.0 else doctor.val的地图中尝试了条件,但这不起作用

1 个答案:

答案 0 :(得分:1)

我认为spark-csv会有所帮助,但这是纯粹的Scala方法。

当你说'#34;空格"时,我认为你的字面意思是那里有一些空格,而且这条线并没有用逗号结尾。

case class Doctor(age:Int, part:String,day:String,value:Double)

val line = "9,elbow,Mon Aug 15 00:00:00 EDT 3399, "
val data = line.split(",").map(_.trim).map {
    case "" => "0.0"
    case (x:String) => x 
}
val doc = Doctor(data(0).toInt, data(1), data(2), data(3).toDouble)

输出

data: Array[String] = Array(9, elbow, Mon Aug 15 00:00:00 EDT 3399, 0.0)
doc: Doctor(9,elbow,Mon Aug 15 00:00:00 EDT 3399,0.0)

就Spark而言......这会产生RDD[Doctor]

case class Doctor(age:Int, part:String,day:String,value:Double)

sc.textFile(fileName).map { line =>
    val data = line.split(",").map(_.trim).map {
        case "" => "0.0"
        case (x:String) => x 
    }
    Doctor(data(0).toInt, data(1), data(2), data(3).toDouble)
}