使用新的spark.ml API预测单个向量,以避免在flatMap()之后使用groupByKey()?

时间:2016-10-20 17:17:57

标签: apache-spark machine-learning

有没有办法用新的spark.ml API预测单个载体?我想在map()中执行此操作以避免在flatMap()之后调用groupByKey():

当前代码(pyspark):

% Given 'model', 'rdd', and a function 'split_element' that splits an
% element of the RDD into a list of elements (and assuming each element
% has a value and a key so that groupByKey will work to merge them later)

split_rdd = rdd.flatMap(split_element)
split_results = model.transform(split_rdd.toDF()).rdd
return split_results.groupByKey()

所需代码:

split_rdd = rdd.map(split_element)
split_results = split_rdd.map(lambda elem_list: [model.transformOne(elem) for elem in elem_list])
return split_results

0 个答案:

没有答案
相关问题