Question

这是我的spark配置属性。我的主节点在linux操作系统中。

spark = SparkSession.builder \
.master("spark://ip:7077") \
.appName("usres mobile related information analysis") \
.config("spark.submit.deployMode", "client") \
.config("spark.executor.memory","2g") \
.config("spark.driver.maxResultSize", "2g") \
.config("spark.executor.pyspark.memory","2g") \
.config("spark.driver.memory", "2g") \
.enableHiveSupport() \
.getOrCreate()

但是当我尝试通过以下代码从本地PC的loacl目录读取csv文件时，

sep_1_customer_all_info_df = spark.read\
.format("csv")\
.option("header","true")\
.option("mode", "PERMISSIVE")\
.load('report/info.csv')

我收到以下错误，它的原因是什么以及如何处理？

Py4JJavaError: An error occurred while calling o672.load.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 19, ip, executor 0): java.io.FileNotFoundException: File file:/C:/Users/taimur.islam/Desktop/banglalink/Data Science/High Value Prediction/report/info.csv does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.

PySpark主节点文件位置配置

0 个答案: