无法在群集上运行spark v2.0.0示例

时间:2016-09-18 09:39:15

标签: apache-spark port bind

所以我已经建立了一个Spark集群。但我实际上无法让它发挥作用。当我使用:

提交SparkPi示例时
./bin/spark-submit   --class org.apache.spark.examples.SparkPi  \
  --master spark://x.y.129.163:7077  \
  --deploy-mode cluster   \
  --supervise  \
  --executor-memory 20G \
  --total-executor-cores 2  \
  examples/jars/spark-examples_2.11-2.0.0.jar   1000

我从工作日志中获得以下信息:

Spark Command: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java -cp /opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/* -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://mesos-master:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/09/18 09:20:56 INFO Worker: Started daemon with process name: 23949@mesos-slave-4.novalocal
16/09/18 09:20:56 INFO SignalUtils: Registered signal handler for TERM
16/09/18 09:20:56 INFO SignalUtils: Registered signal handler for HUP
16/09/18 09:20:56 INFO SignalUtils: Registered signal handler for INT
16/09/18 09:20:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/18 09:20:56 INFO SecurityManager: Changing view acls to: root
16/09/18 09:20:56 INFO SecurityManager: Changing modify acls to: root
16/09/18 09:20:56 INFO SecurityManager: Changing view acls groups to:
16/09/18 09:20:56 INFO SecurityManager: Changing modify acls groups to:
16/09/18 09:20:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
16/09/18 09:21:00 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
16/09/18 09:21:00 INFO Utils: Successfully started service 'sparkWorker' on port 55256.
16/09/18 09:21:00 INFO Worker: Starting Spark worker x.y.129.162:55256 with 4 cores, 6.6 GB RAM
16/09/18 09:21:00 INFO Worker: Running Spark version 2.0.0
16/09/18 09:21:00 INFO Worker: Spark home: /opt/spark/spark-2.0.0-bin-hadoop2.7
16/09/18 09:21:00 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/09/18 09:21:00 INFO WorkerWebUI: Bound WorkerWebUI to x.y.129.162, and started at http://x.y.129.162:8081
16/09/18 09:21:00 INFO Worker: Connecting to master mesos-master:7077...
16/09/18 09:21:00 INFO TransportClientFactory: Successfully created connection to mesos-master/x.y.129.163:7077 after 33 ms (0 ms spent in bootstraps)
16/09/18 09:21:00 INFO Worker: Successfully registered with master spark://x.y.129.163:7077
16/09/18 09:21:00 INFO Worker: Asked to launch driver driver-20160918090435-0001
16/09/18 09:21:01 INFO DriverRunner: Launch Command: "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java" "-cp" "/opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.executor.memory=20G" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.cores.max=2" "-Dspark.rpc.askTimeout=10" "-Dspark.driver.supervise=true" "-Dspark.jars=file:/opt/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.0.jar" "-Dspark.master=spark://x.y.129.163:7077" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@x.y.129.162:55256" "/opt/spark/spark-2.0.0-bin-hadoop2.7/work/driver-20160918090435-0001/spark-examples_2.11-2.0.0.jar" "org.apache.spark.examples.SparkPi" "1000"
16/09/18 09:21:06 INFO DriverRunner: Command exited with status 1, re-launching after 1 s.
16/09/18 09:21:07 INFO DriverRunner: Launch Command: "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java" "-cp" "/opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.executor.memory=20G" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.cores.max=2" "-Dspark.rpc.askTimeout=10" "-Dspark.driver.supervise=true" "-Dspark.jars=file:/opt/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.0.jar" "-Dspark.master=spark://x.y.129.163:7077" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@x.y.129.162:55256" "/opt/spark/spark-2.0.0-bin-hadoop2.7/work/driver-20160918090435-0001/spark-examples_2.11-2.0.0.jar" "org.apache.spark.examples.SparkPi" "1000"
16/09/18 09:21:12 INFO DriverRunner: Command exited with status 1, re-launching after 1 s.

即。作业/驱动程序似乎失败,然后无限期重试。

当我从工作节点上的驱动程序查看启动器时,我看到:

Launch Command: "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java" "-cp" "/opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.executor.memory=20G" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.cores.max=2" "-Dspark.rpc.askTimeout=10" "-Dspark.driver.supervise=true" "-Dspark.jars=file:/opt/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.0.jar" "-Dspark.master=spark://x.y.129.163:7077" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@x.y.129.162:33364" "/opt/spark/spark-2.0.0-bin-hadoop2.7/work/driver-20160918090435-0001/spark-examples_2.11-2.0.0.jar" "org.apache.spark.examples.SparkPi" "1000"
========================================

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/09/18 09:13:18 INFO SecurityManager: Changing view acls to: root
16/09/18 09:13:18 INFO SecurityManager: Changing modify acls to: root
16/09/18 09:13:18 INFO SecurityManager: Changing view acls groups to: 
16/09/18 09:13:18 INFO SecurityManager: Changing modify acls groups to: 
16/09/18 09:13:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
16/09/18 09:13:21 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'Driver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'Driver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:463)
    at sun.nio.ch.Net.bind(Net.java:455)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
    at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
    at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
    at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
    at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
    at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:745)

(原谅时间戳,后面的日志也一样)

在主人身上我有:

# /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 mesos-master
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

x.y.129.155 mesos-slave-1
x.y.129.161 mesos-slave-2
x.y.129.160 mesos-slave-3
x.y.129.162 mesos-slave-4

# conf/spark-env.sh
#!/usr/bin/env bash
SPARK_MASTER_HOST=x.y.129.163
SPARK_LOCAL_IP=x.y.129.163

对于我的工人:

# /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 mesos-slave-4 mesos-slave-4.novalocal
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

x.y.129.163 mesos-master

# conf/spark-env.sh
#!/usr/bin/env bash
SPARK_MASTER_HOST=x.y.129.163
SPARK_LOCAL_IP=x.y.129.162

我还禁用了/etc/sysctl.conf中的所有ipv6。

所有守护程序都以sbin/start-master.shsbin/start-slave.sh spark://x.y.129.163:7077命令启动。

更新:所以我再次尝试了spark-submit,但没有--deploy-mode cluster ....并且它有效!知道为什么它没有集群模式吗?

0 个答案:

没有答案