spark.read不是软件包org.apache.spark的成员

时间:2019-06-13 11:22:48

标签: scala apache-spark intellij-idea jdbc

我正在尝试在Intellij上连接postgresql和spark。但是,即使我在build.sbt中包括了JDBC驱动程序,也遇到了object read is not a member of package org.apache.spark错误。

我正在尝试遵循本教程https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html,这是我的Scala代码:

import org.apache.spark

object DBConn {

  def main(args: Array[String]): Unit = {

    // Note: JDBC loading and saving can be achieved via either the load/save or jdbc methods
    // Loading data from a JDBC source
    val jdbcDF = spark.read
      .format("jdbc")
      .option("url", "jdbc:postgresql://host/db")
      .option("dbtable", "chroniker_log")
      .option("user", "username")
      .option("password", "password")
      .load()

    val connectionProperties = new Properties()
    connectionProperties.put("user", "username")
    connectionProperties.put("password", "password")
    val jdbcDF2 = spark.read
      .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)
    // Specifying the custom data types of the read schema
    connectionProperties.put("customSchema", "id DECIMAL(38, 0), name STRING")
    val jdbcDF3 = spark.read
      .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)
  }
}

build.sbt:

name := "DBConnect"

version := "0.1"

scalaVersion := "2.11.12"

val sparkVersion = "2.4.3"

resolvers ++= Seq(
  "apache-snapshots" at "http://repository.apache.org/snapshots/"
)

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.postgresql" % "postgresql" % "42.2.5"
)

我试图通过在控制台上运行spark-shell来简化问题。但是,以下命令也会引发相同的警告:

spark-shell --driver-class-path postgresql-42.2.5.jar --jars postgresql-42-2.5.jar -i src/main/scala/DBC
onn.scala

有趣的是,在上述代码失败后我进入spark-shell时,它开始识别spark.read并成功连接到数据库。

1 个答案:

答案 0 :(得分:1)

您需要一个SparkSession实例,通常称为spark(包括spark-shell)。参见this tutorial

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()

所以read不是包对象中的方法,而是类SparkSession中的方法