I am receiving the following error while testing a Spark app (written in scala). I am submitting the job to Spark local mode. My intention is to process sensor data using Spark Dataframe and group the data by week of the year. This is just a prototype app.
16/04/04 23:49:06 WARN memory.TaskMemoryManager: leak 16.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@70fbb930
16/04/04 23:49:06 ERROR executor.Executor: Managed memory leak detected; size = 17039360 bytes, TID = 1
16/04/04 23:49:06 ERROR executor.Executor: Managed memory leak detected; size = 17039360 bytes, TID = 0
16/04/04 23:49:06 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NumberFormatException: multiple points
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.text.DigitList.getDouble(DigitList.java:169)
at java.text.DecimalFormat.parse(DecimalFormat.java:2056)
at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1869)
at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)
at java.text.DateFormat.parse(DateFormat.java:364)
at SensorStreaming$.to_date(SensorsStreaming.scala:24)
...
I am running the following piece of code written in Scala. I am using Apache Spark 1.6.0. While grouping does not work, simple select query (with no grouping) on the same temp table works just fine. I am using org.apache.spark.sql.functions.weekofyear function which was introduced in Spark 1.5.
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{DataFrame, SQLContext, functions}
import java.text.SimpleDateFormat
import java.sql.Date
case class Sensor(resid: String,
date: java.sql.Date,
time: String,
hz: Double,
disp: Double,
flo: Double,
sedPPM: Double,
psi: Double,
chlPPM: Double)
object SensorStreaming {
private val formatter = new SimpleDateFormat("M/d/y")
def to_date(s:String):java.sql.Date = {
new java.sql.Date(formatter.parse(s).getTime)
}
def parse(splits: Array[String]):Sensor = {
Sensor(splits(0),
to_date(splits(1)),
splits(2),
splits(3).toDouble,
splits(4).toDouble,
splits(5).toDouble,
splits(6).toDouble,
splits(7).toDouble,
splits(8).toDouble)
}
def main(args: Array[String]){
val conf = new SparkConf()
.setAppName("SensorStreamingApp")
.setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext._
import sqlContext.implicits._
val file = "/Volumes/SONY/Data/sensor_data/sensordata.csv"
val rdd = sc.textFile(file).map(_.split(","))
val df = rdd.map(parse).toDF
df.registerTempTable("sensors");
sqlContext.sql("select weekofyear(date) from sensors group by weekofyear(date)").show
}
}