尝试使用以下命令在YARN群集上提交以下test.py Spark应用
PYSPARK_PYTHON=./venv/venv/bin/python spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./venv/venv/bin/python --master yarn --deploy-mode cluster --archives venv#venv test.py
注意:我没有使用本地模式,而是尝试使用用于在PyCharm中构建代码的virtualenv下的python3.7站点软件包。 virtualenv提供的自定义应用程序包未作为集群服务提供
这是Python项目结构以及venv目录内容的样子
-rw-r--r-- 1 schakrabarti nobody 225908565 Feb 26 13:07 venv.tar.gz
-rw-r--r-- 1 schakrabarti nobody 1313 Feb 26 13:07 test.py
drwxr-xr-x 6 schakrabarti nobody 4096 Feb 26 13:07 venv
drwxr-xr-x 3 schakrabarti nobody 4096 Feb 26 13:07 venv/bin
drwxr-xr-x 3 schakrabarti nobody 4096 Feb 26 13:07 venv/share
-rw-r--r-- 1 schakrabarti nobody 75 Feb 26 13:07 venv/pyvenv.cfg
drwxr-xr-x 2 schakrabarti nobody 4096 Feb 26 13:07 venv/include
drwxr-xr-x 3 schakrabarti nobody 4096 Feb 26 13:07 venv/lib
不存在与File相同的错误-pyspark.zip(如下所示)
java.io.FileNotFoundException: File does not exist: hdfs://hostname-nn1.cluster.domain.com:8020/user/schakrabarti/.sparkStaging/application_1571868585150_999337/pyspark.zip
请参考我在Spark-10795上添加的评论:https://issues.apache.org/jira/browse/SPARK-10795
答案 0 :(得分:0)
如果我误解了问题,我深表歉意,但根据
{
[Activity(MainLauncher = true, NoHistory = true)]
public class SplashScreenActivity : Activity
{
protected override void OnCreate(Bundle savedInstanceState)
{
RequestWindowFeature(WindowFeatures.NoTitle);
base.OnCreate(savedInstanceState);
SetContentView(Droid.Resource.Layout.SplashScreen);
System.Threading.Thread.Sleep(3000);
StartActivity(typeof(MainActivity));
}
public override void OnBackPressed() { }
}
}
您使用的是Yarn群集,但是在您的test.py中
PYSPARK_PYTHON=./venv/venv/bin/python spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./venv/venv/bin/python --master yarn --deploy-mode cluster --archives venv#venv test.py
您尝试连接到Spark独立集群
#test.py
import json
from pyspark.sql import SparkSession
if __name__ == "__main__":
spark = SparkSession.builder \
.appName("Test_App") \
.master("spark://gwrd352n36.red.ygrid.yahoo.com:41767") \
.config("spark.ui.port", "4057") \
.config("spark.executor.memory", "4g") \
.getOrCreate()
print(json.dumps(spark.sparkContext.getConf().getAll(), indent=4))
spark.stop()
所以,这可能是个问题