Spark Java中Vector的DataType

时间:2018-11-27 17:24:59

标签: java apache-spark apache-spark-mllib apache-spark-ml

我正在尝试测试udf(Spark Java函数),该代码在代码中可以很好地与数据集一起工作,但在Junit测试中却不能,这似乎是带有向量结构的解析错误,错误指定为:

Caused by: java.lang.ClassCastException: org.apache.spark.mllib.linalg.DenseVector cannot be cast to org.apache.spark.ml.linalg.Vector

要包含哪些Vector类而不是VectorUDT()?我找不到它们。

udf标头:

public class CalculateM implements UDF2<Vector,Vector, Double> {

测试:

@Test
public void udfCalculateMTest() {
    List<Row> data = Arrays.asList(
            RowFactory.create(
                    Vectors.dense(new double[]{4.0, 5.0}),
                    Vectors.dense(new double[]{4.0, 7.0})
            )
    );
    StructType schema = new StructType(new StructField[]{
            new StructField("v1", new VectorUDT(), false, Metadata.empty()),
            new StructField("v2", new VectorUDT(), false, Metadata.empty())
    });
    spark.createDataFrame(data, schema).createOrReplaceTempView("df");

    spark.sqlContext().udf().registerJava("corr", CalculateM.class.getName(), DataTypes.DoubleType);
   Row result = spark.sql("SELECT corr(v1,v2) from df").head();
   Assert.assertEquals(2, result.getDouble(0), 1.0e-6);
}

0 个答案:

没有答案
相关问题