随机播放文件丢失

时间:2015-12-29 23:51:45

标签: apache-spark yarn apache-spark-1.5.2

我正在获取没有用Spark编写的随机文件的随机实例。

15/12/29 17:30:26 ERROR server.TransportRequestHandler: Error sending result 
    ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=347837678000, chunkIndex=0}, 
    buffer=FileSegmentManagedBuffer{file=/data/24/hadoop/yarn/local/usercache/root/appcache
    /application_1451416375261_0032/blockmgr-c2e951bb-856d-487f-a5be-2b3194fdfba6/1a/
    shuffle_0_35_0.data, offset=1088736267, length=8082368}} 
    to /10.7.230.74:42318; closing connection
java.io.FileNotFoundException: 
    /data/24/hadoop/yarn/local/usercache/root/appcache/application_1451416375261_0032/
    blockmgr-c2e951bb-856d-487f-a5be-2b3194fdfba6/1a/shuffle_0_35_0.data 
    (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    ...

似乎大多数shuffle文件都已成功写入,而不是所有

这是洗牌阶段 - 或'阅读随机文件'阶段。

首先,所有执行程序都能够读取文件。最终,并且不可避免地,其中一个执行程序抛出上述异常并被删除。所有其他人都开始失败,因为他们无法检索那些随机播放的文件。

enter image description here

我在每个执行器中都有40GB的RAM,而且我有8个执行器。额外的一个是这个列表是因为失败后删除了执行程序。我的数据很大,但我没有看到任何out of memory问题。

有什么想法吗?

将我的repartition调用从1000个分区更改为100000个分区,现在我正在获得新的堆栈跟踪。

Job aborted due to stage failure: Task 71 in stage 9.0 failed 4 times, most recent failure: Lost task 71.3 in stage 9.0 (TID 2831, dev-node1): java.io.IOException: FAILED_TO_UNCOMPRESS(5)
    at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
    at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
    at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444)
    at org.xerial.snappy.Snappy.uncompress(Snappy.java:480)
    at org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:135)
    at org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:92)
    at org.xerial.snappy.SnappyInputStream.<init>(SnappyInputStream.java:58)
    at org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:159)
    at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1179)
    at org.apache.spark.shuffle.hash.HashShuffleReader$$anonfun$3.apply(HashShuffleReader.scala:53)
    at org.apache.spark.shuffle.hash.HashShuffleReader$$anonfun$3.apply(HashShuffleReader.scala:52)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:173)
    at org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$executePartition$1(sort.scala:160)
    at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$4.apply(sort.scala:169)
    at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$4.apply(sort.scala:169)
    at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
    ...

0 个答案:

没有答案
相关问题