spark-submit 使用 pyspark -- ModuleNotFoundError: No module named 'CommonPackage'

时间:2021-06-25 17:12:05

标签: python apache-spark pyspark spark-submit

我正在使用以下项目结构并将所有可重用类保留在 CommonPackage 模块中的 sparkcommonlib.py 中。

 - README.rst
 - LICENSE
 - setup.py
 - requirements.txt
 - CommonPackage/__init__.py
 - CommonPackage/sparkCommonLib.py
 - CommonPackage/config.xml
 - pyspark/__init__.py
 - pyspark/SparkTableInsert.py

下面是在sparkCommonLib.py中为pyspark模块中不同spark应用程序设置spark会话的通用类。

class set_spark_session():
    def __init__(self , appname , master=None ):
        sparkenv = set_spark()
        appName = appname
        if master is None:
            master = sparkenv.MASTER
        self.batchsize = sparkenv.BATCH_SIZE
        self.maxpartition = sparkenv.MAX_PART
        drivers_path = os.path.normpath(sparkenv.DRIVER_PATH)
        jars = os.path.join(drivers_path, "mssql-jdbc-8.4.1.jre8.jar") \
               + "," + os.path.join(drivers_path, "jconn4.jar")

        self.spark = SparkSession \
            .builder \
            .config("spark.driver.extraClassPath", drivers_path) \
            .config("spark.jars", jars) \
            .appName(appName) \
            .master(master).getOrCreate()

下面是pysaprk模块中pyspark/sparktableinsert.py文件中的应用代码。

import numpy as np , time , sys , os , pandas as pd
from CommonPackage import sparkCommonLib as sparkCommon

if __name__ == '__main__':
    spark = sparkCommon.set_spark_session(appname="SparkTableInsert").spark
    spark.sparkContext.setLogLevel("ERROR")
    log = sparkCommon.common.PrintLogInfo()
    ## Some Spark processing steps
    spark.stop()

当我使用 pip install 安装包时,应用程序运行良好,但是当我作为 spark-submit 作业运行同一个应用程序时,我收到以下错误。

spark-submit --jars ".\CommonPackage\jars\jconn4.jar,.\CommonPackage\jars\mssql-jdbc-8.4.1.jre8.jar" --py-files "dependencies.zip" ".\pyspark\SparkTableInsert.py" <Arg1> <Arg2> 

Traceback (most recent call last):
  File "./pyspark/SparkTableInsert.py", line 2, in <module>
    **from CommonPackage import sparkCommonLib as sparkCommon**

我尝试添加使用 "python setup.py sdist" 创建的完整应用程序 egg 或 zip 文件,但结果相同。

spark-submit --jars ".\CommonPackage\jars\jconn4.jar,.\CommonPackage\jars\mssql-jdbc-8.4.1.jre8.jar" --py-files "SparkTableInsert-0.0.0.zip" ".\pyspark\SparkTableInsert.py" <arg1> <arg2>                                                                                                                                      Traceback (most recent call last):
  File "./pyspark/SparkTableInsert.py", line 2, in <module>
    **from CommonPackage import sparkCommonLib as sparkCommon**
ModuleNotFoundError: No module named 'CommonPackage'

spark-submit --jars ".\CommonPackage\jars\jconn4.jar,.\CommonPackage\jars\mssql-jdbc-8.4.1.jre8.jar" --py-files "dependencies.zip" --archives "SparkTableInsert-0.0.0.zip" ".\pyspark\SparkTableInsert.py" <arg1> <arg2>                                                                                                        Traceback (most recent call last):
  File "./pyspark/SparkTableInsert.py", line 2, in <module>
    from CommonPackage import sparkCommonLib as sparkCommon
ModuleNotFoundError: No module named 'CommonPackage'

spark-submit --jars ".\CommonPackage\jars\jconn4.jar,.\CommonPackage\jars\mssql-jdbc-8.4.1.jre8.jar" --archives "SparkTableInsert-0.0.0.zip"  --py-files "dependencies.zip" ".\pyspark\SparkTableInsert.py" <arg1> <arg2>                                                                                                      Traceback (most recent call last):
  File "./pyspark/SparkTableInsert.py", line 2, in <module>
    from CommonPackage import sparkCommonLib as sparkCommon
ModuleNotFoundError: No module named 'CommonPackage'

我尝试在 ./pyspark/SparkTableInsert.py 中添加以下几行,但出现同样的错误。

spark.sparkContext.addPyFile("SparkTableInsert-0.0.0.zip")
spark.sparkContext.addPyFile("dependencies.zip")

我正在本地机器上测试 spark-submit,然后将其推送到 Google Spark 作业以在 Google 集群中运行。

0 个答案:

没有答案
相关问题