文件不存在 - 火花提交

时间:2016-01-27 14:47:09

标签: python apache-spark pyspark


我尝试使用此命令启动spark应用程序:

time spark-submit --master "local[4]" optimize-spark.py

但我收到了这些错误:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/27 15:43:32 INFO SparkContext: Running Spark version 1.6.0
16/01/27 15:43:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/27 15:43:32 INFO SecurityManager: Changing view acls to: DamianFox
16/01/27 15:43:32 INFO SecurityManager: Changing modify acls to: DamianFox
16/01/27 15:43:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(DamianFox); users with modify permissions: Set(DamianFox)
16/01/27 15:43:33 INFO Utils: Successfully started service 'sparkDriver' on port 51613.
16/01/27 15:43:33 INFO Slf4jLogger: Slf4jLogger started
16/01/27 15:43:33 INFO Remoting: Starting remoting
16/01/27 15:43:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.102:51614]
16/01/27 15:43:33 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 51614.
16/01/27 15:43:33 INFO SparkEnv: Registering MapOutputTracker
16/01/27 15:43:33 INFO SparkEnv: Registering BlockManagerMaster
16/01/27 15:43:33 INFO DiskBlockManager: Created local directory at /private/var/folders/8m/h5qcvjrn1bs6pv0c0_nyqrlm0000gn/T/blockmgr-defb91b0-50f9-45a7-8e92-6d15041c01bc
16/01/27 15:43:33 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/01/27 15:43:33 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/27 15:43:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/27 15:43:33 INFO SparkUI: Started SparkUI at http://192.168.0.102:4040
16/01/27 15:43:33 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/Project/MinimumFunction/optimize-spark.py does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:745)
16/01/27 15:43:34 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
16/01/27 15:43:34 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/27 15:43:34 INFO MemoryStore: MemoryStore cleared
16/01/27 15:43:34 INFO BlockManager: BlockManager stopped
16/01/27 15:43:34 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/27 15:43:34 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/01/27 15:43:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/27 15:43:34 INFO SparkContext: Successfully stopped SparkContext
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/01/27 15:43:34 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
ERROR - failed to write data to stream: <open file '<stdout>', mode 'w' at 0x10bb6e150>

16/01/27 15:43:34 INFO ShutdownHookManager: Shutdown hook called
16/01/27 15:43:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/8m/h5qcvjrn1bs6pv0c0_nyqrlm0000gn/T/spark-c00170ca-0e05-4ece-a962-f9303bce4f9f
spark-submit --master "local[4]" optimize-spark.py  6.12s user 0.52s system 187% cpu 3.539 total

我该如何解决这个问题?这些变量有问题吗?我搜索的时间很长,但我找不到解决方案。谢谢!

3 个答案:

答案 0 :(得分:4)

我将项目文件夹移动到桌面文件夹,现在它正在运行。
可能它之前没有工作,因为我把项目放在一个名称有空格的文件夹中,因此命令很可能找不到该文件。

答案 1 :(得分:0)

为混乱道歉。 --py-files用于提供程序所需的其他相关python文件,以便将它们放在PYTHONPATH中。 我在windows / Spark-1.6中再次尝试使用命令工作: -

bin\spark-submit --master "local[4]" testingpyfiles.py

testingpyfiles.py是一个简单的python文件,它在控制台上打印一些随机数据,并存储在我执行上述命令的同一目录中。以下是testingpyfiles.py

的代码
from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("Python App")
sc = SparkContext(conf=conf)

data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print("Now it will print the data")
print(distData)

在您的情况下,似乎任一路径都不正确,或者执行文件的权限可能存在一些问题。还要确保optimize-spark.py位于我们执行spark-submit的同一目录中。

答案 2 :(得分:0)

您可以通过两种方式解决此问题:

  1. 您可以将文件作为参数传递给--py-files,如此,

    spark-submit --master "local[4]" --py-files="<filepath>/optimize-spark.py" optimize-spark.py
    
  2. filepath是本地文件系统的路径。

    1. 您可以将optimize-spark.py文件转储到HDFS并通过代码添加

      sc.addFile("hdfs:<filepath_on_hdfs>/optimize-spark.py")