Py4JJavaError:SparkException:作业由于阶段故障而中止

时间:2019-10-17 15:39:46

标签: python python-3.x apache-spark pyspark

我正在通过pyspark使用Spark。我正在运行以下玩具示例(在Jupyter Notebook中):

import findspark
findspark.init()

import pyspark
import random

sc = pyspark.SparkContext(appName="Pi")
num_samples = 10000

def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1

count = sc.parallelize(range(0, num_samples)).filter(inside).count()

pi = 4 * count / num_samples
print(pi)

sc.stop()

在使用num_samples = 100或类似值时,它可以很好地运行,但是对于给定的数字,它返回有关Python Workers的错误:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost, executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
        [...]
    Caused by: org.apache.spark.SparkException: Python worker failed to connect back.
        [...]
    Caused by: java.net.SocketTimeoutException: Accept timed out
        [...]

0 个答案:

没有答案
相关问题