从vectorassembler的输出中获取元素

时间:2017-07-12 07:35:17

标签: apache-spark apache-spark-ml

我需要使用Java API将向量汇编程序的输出元素作为单独的列。

VectorAssembler assembler3 = new VectorAssembler()
              .setInputCols(new String[]{"res1", "res2"})
              .setOutputCol("res3");

DataFrame output = assembler1.transform(sensordataDF);

res1和res2都是双数组矢量。任何人都可以指导我如何做到这一点吗?

1 个答案:

答案 0 :(得分:1)

The output dataframe will be sensordataDF with a new column called res3, but also it will still have columns res1 and res2.

Edit: Maybe could be done using spark.sql.functions split and casting the column to string, and then while separating, casting back to doubletype.

I use spark with python, but in java should be nearly the same

Example:

split_col = split(output['res3'], ',')

df = ouput.withColumn('first_data', split_col.getItem(0))
df = df.withColumn('second_data', split_col.getItem(1))