运行我的第一个spark python程序错误

时间:2017-08-07 10:37:42

标签: python hadoop apache-spark pyspark

我一直在使用Python上的spark(基于hadoop 2.7)工作,我试图运行示例示例“word count”,这是我的代码:     #进口     #注意未使用的导入(以及未使用的变量),     #请将它们全部评论,否则,您将在执行时遇到任何错误。     #注意指令“@PydevCodeAnalysisIgnore”和“@UnusedImport”都没有     #将能够解决该问题。     #from pyspark.mllib.clustering import KMeans     来自pyspark导入SparkConf,SparkContext     import os

# Configure the Spark environment
sparkConf = SparkConf().setAppName("WordCounts").setMaster("local")
sc = SparkContext(conf = sparkConf)

# The WordCounts Spark program
textFile = sc.textFile(os.environ["SPARK_HOME"] + "/README.md")
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word:     (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect(): print wc

然后我遇到以下错误:

17/08/07 12:28:13 WARN NativeCodeLoader: Unable to load native-hadoop     library for your platform... using builtin-java classes where applicable
17/08/07 12:28:16 WARN Utils: Service 'SparkUI' could not bind on port     4040. Attempting port 4041.
Traceback (most recent call last):
File "/home/hduser/eclipse-workspace/PythonSpark/src/WordCounts.py", line  12, in <module>
sc = SparkContext(conf = sparkConf)
File "/usr/local/spark/python/pyspark/context.py", line 118, in __init__
conf, jsc, profiler_cls)
File "/usr/local/spark/python/pyspark/context.py", line 186, in _do_init
self._accumulatorServer = accumulators._start_update_server()
File "/usr/local/spark/python/pyspark/accumulators.py", line 259, in  _start_update_server
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
File "/usr/lib/python2.7/SocketServer.py", line 417, in __init__
self.server_bind()
File "/usr/lib/python2.7/SocketServer.py", line 431, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -3] Temporary failure in name resolution

任何帮助?我可以在scclipse上使用spark-shell和任何(非spark)python程序运行任何带有Scala的项目,没有错误 我认为我的问题是与pyspark有关的任何事情吗?

4 个答案:

答案 0 :(得分:0)

你可以试试这个,只需创建SparkContext即可,它的工作正常。

sc = SparkContext()
# The WordCounts Spark program
textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word:     (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect():
print wc

答案 1 :(得分:0)

尝试这种方式......

启动你的火花后,它在COMMAND PROMPT sc上显示为SparkContext。

如果没有,您可以使用以下方式..

>>sc=new org.apache.spark.SparkContext()
>>NOW YOU CAN USE...sc

答案 2 :(得分:0)

这足以运行您的程序。 因为,sc可用你的Shell。

首先试试你的SHEEL MODE ......

逐行......

textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word:     (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect():
print wc

答案 3 :(得分:0)

根据我的理解,如果正确安装了Spark,下面的代码应该可以正常工作。

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("WordCount")
sc = SparkContext(conf = conf)

input = sc.textFile("file:///sparkcourse/PATH_NAME")
words = input.flatMap(lambda x: x.split())
wordCounts = words.countByValue()

for word, count in wordCounts.items():
    cleanWord = word.encode('ascii', 'ignore')
    if (cleanWord):
        print(cleanWord.decode() + " " + str(count))
相关问题