如何将Spark模型保存为文件

时间:2017-05-07 01:41:31

标签: apache-spark pyspark

我正在https://spark.apache.org/docs/1.6.2/mllib-ensembles.html#random-forests测试文档代码。出于某种原因,myRandomForestClassificationModel已保存为目录。如何将其另存为文件?我是Spark的新手,所以我不确定代码中是否有任何错误。

from pyspark import SparkContext

from pyspark.mllib.tree import RandomForest, RandomForestModel
from pyspark.mllib.util import MLUtils

sc = SparkContext(appName="rf")
# Load and parse the data file into an RDD of LabeledPoint.
data = MLUtils.loadLibSVMFile(sc, '/sample_libsvm_data.txt')
# Split the data into training and test sets (30% held out for testing)
(trainingData, testData) = data.randomSplit([0.7, 0.3])

# Train a RandomForest model.
#  Empty categoricalFeaturesInfo indicates all features are continuous.
#  Note: Use larger numTrees in practice.
#  Setting featureSubsetStrategy="auto" lets the algorithm choose.
model = RandomForest.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={},
                                     numTrees=100, featureSubsetStrategy="auto",
                                     impurity='gini', maxDepth=4, maxBins=32)

# Evaluate model on test instances and compute test error
predictions = model.predict(testData.map(lambda x: x.features))
labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions)
testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(testData.count())
print('Test Error = ' + str(testErr))
print('Learned classification forest model:')
print(model.toDebugString())

# Save and load model
model.save(sc, "/rf/myRandomForestClassificationModel")
sameModel = RandomForestModel.load(sc, "/rf/myRandomForestClassificationModel")

0 个答案:

没有答案