什么是我们在Apache-Spark中运行scala代码的所有方法?

时间:2015-04-13 07:54:36

标签: scala apache-spark

我知道有两种方法可以在Apache-Spark中运行scala代码:

1- Using spark-shell
2- Making a jar file from our project and Use spark-submit to run it

还有其他方法可以在Apache-Spark中运行scala代码吗?例如,我可以直接在Apache-Spark中运行scala对象(例如:object.scala)吗?

由于

1 个答案:

答案 0 :(得分:2)

1。使用spark-shell

2。从我们的项目制作一个jar文件并使用spark-submit来运行它

3。以编程方式运行Spark Job

String sourcePath = "hdfs://hdfs-server:54310/input/*";

SparkConf conf = new SparkConf().setAppName("TestLineCount");
conf.setJars(new String[] { App.class.getProtectionDomain()
        .getCodeSource().getLocation().getPath() });
conf.setMaster("spark://spark-server:7077");
conf.set("spark.driver.allowMultipleContexts", "true");

JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> log = sc.textFile(sourcePath);

JavaRDD<String> lines = log.filter(x -> {
    return true;
});

System.out.println(lines.count());

Scala版本:

import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.{SparkConf, SparkContext}

object SimpleApp {
  def main(args: Array[String]) {
    Logger.getLogger("org").setLevel(Level.OFF)
    Logger.getLogger("okka").setLevel(Level.OFF)
    val logFile = "/tmp/logs.txt"

    val conf = new SparkConf()
        .setAppName("Simple Application")
        .setMaster("local")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache

    println("line count: " + logData.count())
  }
}

有关详细信息,请参阅this blog post