Spark 2.3.0退出结构化流媒体流程的正确方法

时间:2018-03-21 13:11:49

标签: apache-spark spark-structured-streaming

我正在从Spark The Definitive Guide一书中学习结构化流媒体。第一个示例从JSON文件流中读取,然后在使用以下代码片段检查内存表5次后退出:

val streaming = spark.readStream.schema(dataSchema).
option("maxFilesPerTrigger", 1).json(/data/activity-data/")

val activityCounts = streaming.groupBy("gt").count()

val activityQuery = activityCounts.writeStream
.queryName("activity_counts")
.option("checkpointLocation","/runtime/checkpoint/spark_sstreaming/app01")
.format("memory").outputMode("complete")
.start()

 // ...
for (i <-1 to 5) { 
  spark.sql("SELECT * FROM activity_counts").collect
  Thread.sleep(1000)
}

// ... 
activityQuery.stop()
activityQuery.awaitTermination()
logger.info(s"quiting program now")
System.exit(0)

我将此应用程序提交给3个节点的YARN群集。我得到的问题是这几行出现在应用程序的标准输出中:

27516 [Driver] INFO sparkAppDriverLogger  - quiting program now
27524 [task-result-getter-2] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 90.0 in stage 9.0 (TID 791, data02, executor 4): TaskKilled (Stage cancelled)
27527 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 70.0 in stage 9.0 (TID 787, data01, executor 3): TaskKilled (Stage cancelled)
27537 [task-result-getter-1] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 80.0 in stage 9.0 (TID 792, data03, executor 2): TaskKilled (Stage cancelled)
27538 [task-result-getter-3] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 81.0 in stage 9.0 (TID 793, data03, executor 2): TaskKilled (Stage cancelled)
27542 [task-result-getter-2] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 102.0 in stage 9.0 (TID 795, data02, executor 1): TaskKilled (Stage cancelled)
27543 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 101.0 in stage 9.0 (TID 794, data02, executor 1): TaskKilled (Stage cancelled)
27546 [task-result-getter-1] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 62.0 in stage 9.0 (TID 776, data01, executor 3): TaskKilled (Stage cancelled)

不知何故,YARN认为该实例出现故障并重试该应用程序。我是否使用选项&#34; checkpointLocation&#34;并不重要。或不。重试的应用程序将从流的开头运行。我的每个申请提交都会重复此问题。所以我最终为每次提交运行两次应用程序。从Spark历史记录服务器,我可以看到一些已杀死的阶段和这样的消息:

Job 5 cancelled part of cancelled job group 1ff5bc87-6c56-4e5f-abb1-2becf2bd9ac0 

我应该注意哪些建议?

[更新]我在提交此问题后立即找到解决方案。它是调用System.exit(0)导致问题的原因。如果我将其取出并让Scala应用程序正常退出,则YARN应用程序实例将标记为SUCCEEDED。这种类型的线条也不重要:

27546 [task-result-getter-1] WARN org.apache.spark.scheduler.TaskSetManager  - Lost task 62.0 in stage 9.0 (TID 776, data01, executor 3): TaskKilled (Stage cancelled)

1 个答案:

答案 0 :(得分:0)

如何将此配置设置为true?默认情况下,至少在2.3.0中为false。

spark.streaming.stopGracefullyOnShutdown

您可以通过SparkSession设置配置:

val spark = SparkSession
  .builder
  .config("spark.streaming.stopGracefullyOnShutdown", true)
  .getOrCreate