当数据从Kafka流向HDFS时,Flume没有足够的空间错误

时间:2017-08-17 12:49:59

标签: hadoop apache-kafka hdfs flume flume-ng

我们正在努力解决由Flume管理的从Kafka到HDFS的数据流。 由于下面描述的例外情况,数据未完全传输到hdfs。 但是,这个错误对我们来说有误导性,我们在数据目录和hdfs中都有足够的空间。我们认为它可能是通道配置的问题,但我们对其他来源有类似的配置,它可以正常工作。如果有人不得不处理这个问题,我会很感激提示。

17 Aug 2017 14:15:24,335 ERROR [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log$BackgroundWorker.run:1204)  - Error doing checkpoint
java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288000 bytes
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1003)
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:986)
        at org.apache.flume.channel.file.Log.access$200(Log.java:75)
        at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1201)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
17 Aug 2017 14:15:27,552 ERROR [PollableSourceRunner-KafkaSource-kafkaSource] (org.apache.flume.source.kafka.KafkaSource.doProcess:305)  - KafkaSource EXCEPTION, {}
org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel1]
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:639)
        at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
        at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
        at org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:286)
        at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:58)
        at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288026 bytes
        at org.apache.flume.channel.file.Log.rollback(Log.java:722)
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:637)
        ... 6 more

Flume配置:

agent2.sources = kafkaSource

#sources defined
agent2.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent2.sources.kafkaSource.kafka.bootstrap.servers = …
agent2.sources.kafkaSource.kafka.topics = pega-campaign-response
agent2.sources.kafkaSource.channels = channel1

# channels defined
agent2.channels = channel1

agent2.channels.channel1.type = file
agent2.channels.channel1.checkpointDir = /data/cloudera/.flume/filechannel/checkpointdirs/pega
agent2.channels.channel1.dataDirs = /data/cloudera/.flume/filechannel/datadirs/pega
agent2.channels.channel1.capacity = 10000
agent2.channels.channel1.transactionCapacity = 10000

#hdfs sinks

agent2.sinks = sink

agent2.sinks.sink.type = hdfs
agent2.sinks.sink.hdfs.fileType = DataStream
agent2.sinks.sink.hdfs.path = hdfs://bigdata-cls:8020/stage/data/pega/campaign-response/%d%m%Y
agent2.sinks.sink.hdfs.batchSize = 1000
agent2.sinks.sink.hdfs.rollCount = 0
agent2.sinks.sink.hdfs.rollSize = 0
agent2.sinks.sink.hdfs.rollInterval = 120
agent2.sinks.sink.hdfs.useLocalTimeStamp = true
agent2.sinks.sink.hdfs.filePrefix = pega-

df -h命令:

Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root   26G  6.8G   18G  28% /
devtmpfs               126G     0  126G   0% /dev
tmpfs                  126G  6.3M  126G   1% /dev/shm
tmpfs                  126G  2.9G  123G   3% /run
tmpfs                  126G     0  126G   0% /sys/fs/cgroup
/dev/sda1              477M  133M  315M  30% /boot
tmpfs                   26G     0   26G   0% /run/user/0
cm_processes           126G  1.9G  124G   2% /run/cloudera-scm-agent/process
/dev/scinib            2.0T   53G  1.9T   3% /data
tmpfs                   26G   20K   26G   1% /run/user/2000

2 个答案:

答案 0 :(得分:1)

将通道类型更改为内存通道并对其进行测试以隔离磁盘空间问题。 agent2.channels.channel1.type = memory

此外,由于您的设置中已经有kafka,因此将其用作水槽通道。

https://flume.apache.org/FlumeUserGuide.html#kafka-channel

答案 1 :(得分:0)

您的错误没有指向hdfs中的可用空间,它是您频道中使用的文件的本地磁盘中的自由空间。如果您在此处看到file-channel,您将看到默认值为524288000.检查可用的本地空间是否足够(根据您的错误,它似乎为0)。您还可以更改属性minimumRequiredSpace。

相关问题