Flume文件通道的数据文件已损坏,无法启动文件通道

时间:2020-07-31 08:33:27

标签: hdfs flume flume-ng

我们使用flume 1.9.0将数据从Kafka提取到HDFS,我们的配置文件在文章末尾。它可以顺利运行一段时间,但是目前无法提取数据并不断抛出错误日志,其中一些重要的日志是发布结束时的配额。对于每个日志,由于数据文件损坏而无法启动文件通道,并且它进行了不懈的尝试,但始终失败。该flume实例托管在Kubernetes中,并具有4个副本,每个副本都有其自己的单独的持久性卷用于文件通道,并且在发生问题时几乎有95%的可用磁盘空间。

所以有两个问题,

  1. 什么原因导致数据文件损坏?因为它是我们的生产应用程序,并且我们信任flume的鲁棒性,所以我们不希望看到此损坏的数据文件。此外,我们如何避免此类损坏的数据文件?
  2. 我们如何在不丢失任何数据的情况下从这种情况恢复?不允许删除checkoutDir和dataDir。

非常感谢。

我们的水槽配置,

flume-agent-1.sources = source1
flume-agent-1.sinks = HDFSSink1
flume-agent-1.channels = channel2HDFS1

flume-agent-1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
flume-agent-1.sources.source1.kafka.bootstrap.servers = ${KAFKA_BOOTSTRAP_SERVERS}
flume-agent-1.sources.source1.kafka.consumer.group.id = ${GROUP_ID} 
flume-agent-1.sources.source1.kafka.topics.regex = ${KAFKA_TOPIC_PATTERN}
flume-agent-1.sources.source1.setTopicHeader = true
flume-agent-1.sources.source1.batchSize = ${KAFKA_BATCH_SIZE}
flume-agent-1.sources.source1.batchDurationMillis = 1000
flume-agent-1.sources.source1.channels = channel2HDFS1

flume-agent-1.sources.source1.interceptors = int-1
flume-agent-1.sources.source1.interceptors.int-1.type = com.nvidia.gpuwa.kafka2file.interceptors.MainInterceptor$Builder

flume-agent-1.channels.channel2HDFS1.type = file
flume-agent-1.channels.channel2HDFS1.checkpointDir = ${FILE_CHANNEL_BASEDIR}/checkpoint
flume-agent-1.channels.channel2HDFS1.dataDirs = ${FILE_CHANNEL_BASEDIR}/data
flume-agent-1.channels.channel2HDFS1.transactionCapacity = 10000

flume-agent-1.sinks.HDFSSink1.channel = channel2HDFS1
flume-agent-1.sinks.HDFSSink1.type = hdfs
flume-agent-1.sinks.HDFSSink1.hdfs.path = ${HADOOP_URL}/%{projectdir}
flume-agent-1.sinks.HDFSSink1.hdfs.fileType = CompressedStream
flume-agent-1.sinks.HDFSSink1.hdfs.codeC = gzip
flume-agent-1.sinks.HDFSSink1.hdfs.filePrefix = %{projectsubdir}-%Y%m%d-%[localhost]
flume-agent-1.sinks.HDFSSink1.hdfs.useLocalTimeStamp = true
flume-agent-1.sinks.HDFSSink1.hdfs.rollCount= 0
flume-agent-1.sinks.HDFSSink1.hdfs.rollSize= 134217728
flume-agent-1.sinks.HDFSSink1.hdfs.rollInterval= 3600
flume-agent-1.sinks.HDFSSink1.hdfs.batchSize= ${HDFS_BATCH_SIZE}
flume-agent-1.sinks.HDFSSink1.hdfs.threadsPoolSize= ${HDFS_THREAD_COUNT}
flume-agent-1.sinks.HDFSSink1.hdfs.timeZone=America/Los_Angeles

这里有一些非常关键的日志,完整的日志可以在附件中看到。

org.apache.flume.channel.file.FileChannel.start(FileChannel.java:295)] Failed to start the file channel [channel=channel2HDFS1]
2020-07-29T07:15:31.640949847Z java.lang.RuntimeException: org.apache.flume.channel.file.CorruptEventException: Could not parse event from data file. 

2020-07-29T07:15:31.638860323Z at org.apache.flume.channel.file.TransactionEventRecord.fromByteArray(TransactionEventRecord.java:212)

...

2020-07-29T07:15:31.64750767Z 2020-07-29 00:15:31,646 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:158)] Unable to deliver event. Exception follows.
2020-07-29T07:15:31.647539686Z java.lang.IllegalStateException: Channel closed [channel=channel2HDFS1]. Due to java.lang.RuntimeException: org.apache.flume.channel.file.CorruptEventException: Could not parse event from data file.
2020-07-29T07:15:31.647552984Z at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:358)

0 个答案:

没有答案
相关问题