我收到一个错误我在spark中获得了类转换异常

时间:2016-02-20 12:57:48

标签: apache-spark apache-spark-sql spark-streaming

我已经创建了一个JSONArray并为此创建了RDD。当我试图映射sqlContext.jsonRDD(rdd)时,我收到以下错误:

Error: application failed with exception
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 5, esu3v148.federated.fds): java.lang.ClassCastException: org.json.simple.JSONObject cannot be cast to java.lang.String
        at org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$2.apply(JsonRDD.scala:307)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:885)
        at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:884)
        at org.apache.spark.SparkContext$$anonfun$32.apply(SparkContext.scala:1534)
        at org.apache.spark.SparkContext$$anonfun$32.apply(SparkContext.scala:1534)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

创建了JsonArray并在spark中使用,详情如下:

JSONArray jsonResultArray = new JSONArray();
SparkConf sparkConf = new SparkConf().setAppName("HBaseTest");

        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        JavaStreamingContext ssc = new JavaStreamingContext(sc, Durations.seconds(60));
        SQLContext sqlContext = new SQLContext(sc);

        if (!jsonResultArray.isEmpty()) {

            @SuppressWarnings("unchecked")
            //JavaRDD<String> rdd = sc.parallelize(jsonResultArray);

            DataFrame input = sqlContext.jsonRDD(sc.parallelize(jsonResultArray));

请帮帮我,如何解决这个问题 感谢。

1 个答案:

答案 0 :(得分:1)

sqlContext.jsonRDD expects JavaRDD<java.lang.String>类型的参数。

JSONArray是org.json.simple.JSONObject的列表,因此sc.parallelize(jsonResultArray)会创建JavaRDD<JSONObject> - 因此在将此jsonRDD传递给org.json.simple.JSONArray时会引发异常。这通常是编译时错误,但编译器误导了List扩展泛型 final JavaRDD<JSONObject> jsonObjectRDD = sc.parallelize((List<JSONObject>) jsonResultArray); final JavaRDD<String> jsonStringRDD = jsonObjectRDD.map(new Function<JSONObject, String>() { @Override public String call(JSONObject v) throws Exception { return v.toJSONString(); } }); DataFrame input = sqlContext.jsonRDD(jsonStringRDD); (没有显式类型)这一事实,因此这种不匹配只是在运行时检测到。

如果你真的必须使用JSONArray,你必须在创建RDD之前或之后将它映射到字符串,例如:

import java.util.BitSet;
public class Test_Main {

public static void main(String[] args) {
    // TODO code application logic here
   BitSet test=new BitSet();
   test.set(0);
   test.set(1,false);
   test.set(2,false);
   test.set(3,false);
   //test.set(4);
//       String S="1000";
//       BitSet test=Binary.toBitSet(S);

   String testString=Binary.toString(test);
   System.out.println("Result is:"+testString);
}

}
相关问题