Flume脚本给出警告:没有设置配置目录!使用--conf <dir>覆盖

时间:2016-10-14 21:23:45

标签: hadoop cloudera flume

这是我之前的配置文件,但以后突然发出错误。实际上我要做的是将所有日志从本地移动到hdfs日志应该作为一个文件移动到hdfs而不是作为一个部分:

#create source, channels, and sink

agent1.sources=S1
agent1.sinks=H1
agent1.channels=C1

#bind the source and sink to the channel

agent1.sources.S1.channels=C1
agent1.sinks.H1.channel=C1

#Specify the source type and directory
agent1.sources.S1.type=spooldir
agent1.sources.S1.spoolDir=/tmp/spooldir

#Specify the Sink type, directory, and parameters
agent1.sinks.H1.type=HDFS
agent1.sinks.H1.hdfs.path=/user/hive/warehouse
agent1.sinks.H1.hdfs.filePrefix=events
agent1.sinks.H1.hdfs.fileSuffix=.log
agent1.sinks.H1.hdfs.inUsePrefix=processing
A1.sinks.H1.hdfs.fileType=DataStream

#Specify the channeltyoe (Memory vs File)
agent1.channels.C1.type=file

我从这个脚本运行我的代理:

flume-ng agent --conf-file /usr/local/flume/conf/spoolingToHDFS.conf --name agent1

然后我收到了这个警告:

Warning: No configuration directory set! Use --conf <dir> to override.

16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration for 'A1' does not contain any channels. Marking it as invalid.
16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration invalid for agent 'A1'. It will be removed.

然后就像这样重命名,创建和关闭相同的日志到hdfs:

16/10/14 16:22:41 INFO node.Application: Starting Sink H1
16/10/14 16:22:41 INFO node.Application: Starting Source S1
16/10/14 16:22:41 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /tmp/spooldir
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: H1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: H1 started
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: S1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: S1 started
16/10/14 16:22:41 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
16/10/14 16:22:42 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561961.log.tmp to /user/hive/warehouse/events.1476476561961.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561962.log.tmp to /user/hive/warehouse/events.1476476561962.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561963.log.tmp to /user/hive/warehouse/events.1476476561963.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561964.log.tmp to /user/hive/warehouse/events.1476476561964.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561965.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561965.log.tmp
:
:
:

为什么flume会一直将同一个文件写入hdfs,如何将一个日志从本地移动到hdfs而不将它们分成几部分,因为我的日志大小通常介于50 kb到300 kb之间。

更新警告:

16/10/18 10:10:05 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)

16/10/18 10:10:05 WARN file.ReplayHandler: Ignoring /home/USER/.flume/file-channel/data/log-18 due to EOF
java.io.EOFException
    at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
    at org.apache.flume.channel.file.LogFileFactory.getSequentialReader(LogFileFactory.java:169)
    at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:264)
    at org.apache.flume.channel.file.Log.doReplay(Log.java:529)
    at org.apache.flume.channel.file.Log.replay(Log.java:455)
    at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:295)
    at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

flume使用conf文件夹来提取JRE和记录属性,您可以使用--conf参数修复错误消息,如下所示:

flume-ng agent --conf /usr/local/flume/conf --conf-file /usr/local/flume/conf/spoolingToHDFS.conf --name agent1

关于A1的警告是因为您的代理配置文件末尾附近可能有拼写错误:

A1.sinks.H1.hdfs.fileType=DataStream

应阅读

agent1.sinks.H1.hdfs.fileType=DataStream

对于文件 - 您尚未为spoolDir源配置反序列化器,默认为LINE,因此您需要为spoolDir中的文件中的每一行获取HDFS文件。如果您希望Flume将整个文件用作单个事件(https://flume.apache.org/FlumeUserGuide.html#blobdeserializer

,则需要使用BlobDeserializer
agent1.sources.S1.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder