未绑定方法createDataFrame()

时间:2016-09-15 08:03:02

标签: apache-spark pyspark

尝试从RDD创建DataFrame时遇到错误 我的代码:

from pyspark import SparkConf, SparkContext
from pyspark import sql


conf = SparkConf()
conf.setMaster('local')
conf.setAppName('Test')
sc = SparkContext(conf = conf)
print sc.version

rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)])

df = sql.SQLContext.createDataFrame(rdd, ["id", "score"]).collect()

print df

错误:

df = sql.SQLContext.createDataFrame(rdd, ["id", "score"]).collect()
TypeError: unbound method createDataFrame() must be called with SQLContext 
           instance as first argument (got RDD instance instead)

我在spark shell中完成了相同的任务,其中直接前三行代码将打印值。我主要怀疑import语句,因为IDE和Shell之间存在差异。

1 个答案:

答案 0 :(得分:4)

您需要使用SQLContext的实例。所以你可以尝试以下内容:

sqlContext = sql.SQLContext(sc)
df = sqlContext.createDataFrame(rdd, ["id", "score"]).collect()

pyspark documentation中的更多详情。