py4j.protocol.Py4JJavaError并通过Pyspark运行Python脚本时

时间:2018-10-14 18:03:40

标签: python apache-spark pyspark apache-spark-sql

因此,我是Spark和PySpark的新手。我正在尝试运行Python脚本从MySql数据库读取数据,如以下代码所示:

from pyspark.sql import SparkSession
from pyspark.sql import SQLContext


sc = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

def mysql_connection():

    sql = SQLContext(sc)

    dataframe = sql.read.format("jdbc").options(
        url="jdbc:mysql://localhost/evidencia",
        driver="com.mysql.cj.jdbc.Driver",
        dbtable="estados",
        user="root",
        password="").load()

    output = dataframe.collect()

    print ("_____________ OUTPUT _____________")
    print (output)

mysql_connection()

加载部分还可以,但是在数据帧上运行collect()或任何其他方法时,会显示以下错误:

  

回溯(最近通话最近):文件   “ /home/gustavo/Documentos/TCC/prototipo/connections/MysqlConnection.py”,   第27行,在       mysql_connection()文件“ /home/gustavo/Documentos/TCC/prototipo/connections/MysqlConnection.py”,   第22行,在mysql_connection中       输出= dataframe.collect()文件“ /usr/local/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py”,   收集文件中的第466行   “ /usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,   第1257行,在通话文件中   “ /usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py”,行   63,在装饰文件中   “ /usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,   第328行,位于get_return_value py4j.protocol.Py4JJavaError:错误   发生在调用o51.collectToPython时。 :   java.lang.IllegalArgumentException在   org.apache.xbean.asm5.ClassReader。(来源未知)   org.apache.xbean.asm5.ClassReader。(来源未知)   org.apache.xbean.asm5.ClassReader。(来源未知)   org.apache.spark.util.ClosureCleaner $ .getClassReader(ClosureCleaner.scala:46)     在   org.apache.spark.util.FieldAccessFinder $$ anon $ 3 $$ anonfun $ visitMethodInsn $ 2.apply(ClosureCleaner.scala:449)     在   org.apache.spark.util.FieldAccessFinder $$ anon $ 3 $$ anonfun $ visitMethodInsn $ 2.apply(ClosureCleaner.scala:432)     在   scala.collection.TraversableLike $ WithFilter $$ anonfun $ foreach $ 1.apply(TraversableLike.scala:733)     在   scala.collection.mutable.HashMap $ anon $ 1 $$ anonfun $ foreach $ 2.apply(HashMap.scala:103)     在   scala.collection.mutable.HashMap $ anon $ 1 $$ anonfun $ foreach $ 2.apply(HashMap.scala:103)     在   scala.collection.mutable.HashTable $ class.foreachEntry(HashTable.scala:230)     在scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)     在   scala.collection.mutable.HashMap $$ anon $ 1.foreach(HashMap.scala:103)     在   scala.collection.TraversableLike $ WithFilter.foreach(TraversableLike.scala:732)     在   org.apache.spark.util.FieldAccessFinder $$ anon $ 3.visitMethodInsn(ClosureCleaner.scala:432)     在org.apache.xbean.asm5.ClassReader.a(未知来源)处   org.apache.xbean.asm5.ClassReader.b(来源未知)   org.apache.xbean.asm5.ClassReader.accept(来源不明)   org.apache.xbean.asm5.ClassReader.accept(来源不明)   org.apache.spark.util.ClosureCleaner $$ anonfun $ org $ apache $ spark $ util $ ClosureCleaner $$ clean $ 14.apply(ClosureCleaner.scala:262)     在   org.apache.spark.util.ClosureCleaner $$ anonfun $ org $ apache $ spark $ util $ ClosureCleaner $$ clean $ 14.apply(ClosureCleaner.scala:261)     在scala.collection.immutable.List.foreach(List.scala:381)在   org.apache.spark.util.ClosureCleaner $ .org $ apache $ spark $ util $ ClosureCleaner $$ clean(ClosureCleaner.scala:261)     在   org.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:159)     在org.apache.spark.SparkContext.clean(SparkContext.scala:2299)处   org.apache.spark.SparkContext.runJob(SparkContext.scala:2073)在   org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)在   org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1.apply(RDD.scala:945)在   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)     在   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)     在org.apache.spark.rdd.RDD.withScope(RDD.scala:363)在   org.apache.spark.rdd.RDD.collect(RDD.scala:944)在   org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:297)     在   org.apache.spark.sql.Dataset $$ anonfun $ collectToPython $ 1.apply(Dataset.scala:3200)     在   org.apache.spark.sql.Dataset $$ anonfun $ collectToPython $ 1.apply(Dataset.scala:3197)     在org.apache.spark.sql.Dataset $$ anonfun $ 52.apply(Dataset.scala:3259)     在   org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:77)     在org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)处   org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3197)在   java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(本机   方法)   java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   java.base / jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.base / java.lang.reflect.Method.invoke(Method.java:564)在   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)在   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)在   py4j.Gateway.invoke(Gateway.java:282)在   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)     在py4j.commands.CallCommand.execute(CallCommand.java:79)处   py4j.GatewayConnection.run(GatewayConnection.java:238)在   java.base / java.lang.Thread.run(Thread.java:844)

我已经搜索了此错误,但是找不到解决方案。

我正在使用带有Python 3.6.6和Spark 2.3.2的Anaconda虚拟环境

我使用以下命令运行脚本(使用Ubuntu 18.04 BTW):

$SPARK_HOME/bin/spark-submit --jars /usr/share/java/mysql-connector-java-8.0.12.jar ~/Documentos/TCC/prototipo/connections/MysqlConnection.py

如果需要更多信息来了解问题,请问我:

谢谢。

1 个答案:

答案 0 :(得分:0)

因此,显然是Java版本引起了问题。

我正在使用openjdk-11.0.2并切换到Java oracle 8,脚本运行得很好。

相关问题