如何在AWS EMR中使用Oozie 4.3执行Spark 2操作

时间:2017-08-15 03:37:43

标签: oozie emr apache-spark-2.0

我将AWS EMR 5.7.0与Oozie版本4.3.0和spark版本2.1.1一起使用

我用Scala编写了一个简单的Spark程序。使用spark-submit从shell执行时工作正常。

但是当我尝试使用Oozie Spark动作执行此程序时,我遇到了错误。

Job.properties:

nameNode=hdfs://ip-xx-xx-xx-xx.ec2.internal:8020
jobTracker=ip-xx-xx-xx-xx.ec2.internal:8032
master=local
oozie.use.system.libpath=true   
oozie.wf.application.path=hdfs://ip-xx-xx-xx-xx.ec2.internal:8020/test-artifacts/
oozie.action.sharelib.for.spark = spark2

Workflow.xml:

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="Test program">
    <start to="spark-node" />
    <action name="spark-node">
        <spark xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>${master}</master>
            <name>Spark on Oozie - test job</name>
            <class>TestPackage.TestObj</class>
            <jar>/home/hadoop/oozie-test.jar</jar>
    </spark>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Workflow failed, error message]</message>
    </kill>
    <end name="end" />
</workflow-app>

Workflow.xml保存在HDFS中,job.properties位于主节点中。 当使用命令&#34; oozie job -oozie http:/ /ip-xx-xx-xx-xx.ec2.internal:11000/oozie -config job.properties -run&#34;执行Oozie作业时, map-reduce程序启动了。没有启动spark作业,Mapreduce作业失败并出现错误。

1)对于Sparkmaster = yarn-cluster和mode = cluster,获得以下异常。

Log file: /mnt/yarn/usercache/hadoop/appcache/application_1502719828530_0011/container_1502719828530_0011_01_000001/spark-oozie-job_1502719828530_0011.log  not present. Therefore no Hadoop job IDs found.
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, null
java.lang.NullPointerException
    at java.io.File.<init>(File.java:277)
    at org.apache.spark.deploy.yarn.Client.addDistributedUri$1(Client.scala:416)
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:454)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:580)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:579)
    at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:579)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:578)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:578)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:814)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1091)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:340)
    at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:259)
    at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:60)
    at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:80)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

2)(master = yarn或master = local [*]或master = local [1]或master = local),mode =&#39; client&#39;得到如下错误。

Log file: /mnt/yarn/usercache/hadoop/appcache/application_1502719828530_0013/container_1502719828530_0013_01_000001/spark-oozie-job_1502719828530_0013.log  not present. Therefore no Hadoop job IDs found.
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, No FileSystem for scheme: org.apache.spark
java.io.IOException: No FileSystem for scheme: org.apache.spark
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2708)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2715)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2751)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2733)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:377)
    at org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$downloadFileList$2.apply(SparkSubmit.scala:850)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$downloadFileList$2.apply(SparkSubmit.scala:850)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
    at org.apache.spark.deploy.SparkSubmit$.downloadFileList(SparkSubmit.scala:850)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$2.apply(SparkSubmit.scala:317)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$2.apply(SparkSubmit.scala:317)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:317)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:340)
    at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:259)
    at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:60)
    at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:80)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187)
    at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

===============================

从链接https://issues.apache.org/jira/plugins/servlet/mobile#issue/OOZIE-2767,Oozie尚未支持spark2动作。

但根据链接https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html,似乎有解决方法。

除了Hortonworks链接外,我还遵循了https://aws.amazon.com/blogs/big-data/use-apache-oozie-workflows-to-automate-apache-spark-jobs-and-more-on-amazon-emr/

中提到的所有步骤

但到目前为止还没有运气。 我无法找到任何证明Oozie + Spark 2受支持或不受支持的文档。 如果它适用于任何人,请提供有关如何让Oozie + Spark2在AWS EMR中工作的详细步骤。

0 个答案:

没有答案
相关问题