使用Spark 2.1.0在Java中使用FlatMap

时间:2017-07-21 14:20:34

标签: java apache-spark spark-streaming

我正在尝试使用Java 8中的spark 2.1.0进行flatMap

2.2.0文档显示了这个例子

JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(x.split(" ")).iterator());

当我从2.1.0尝试时,我得到以下

Error:(31, 25) java: method flatMap in class org.apache.spark.rdd.RDD<T> cannot be applied to given types;
required: scala.Function1<java.lang.String,scala.collection.TraversableOnce<U>>,scala.reflect.ClassTag<U>
found: (x)->Array[...]tor()
reason: cannot infer type-variable(s) U
(actual and formal argument lists differ in length)

给定这些版本的flatMap的正确方法是什么?

1 个答案:

答案 0 :(得分:0)

下面的代码适用于Spark 2.1.0。

JavaDStream<String> lines = messages.map(tuple -> tuple._2());
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(x)).iterator());
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1))
    .reduceByKey((i1, i2) -> i1 + i2);

请检查pom.xml有关spark依赖项的版本。如果您想参考Spark 2.1.0版本的示例,请转到https://github.com/apache/spark/tree/branch-2.1/examples/src/main/java/org/apache/spark/examples/streaming

相关问题