Hadoop使用Flume来推送Twitter推文

时间:2018-04-17 19:34:38

标签: hadoop twitter flume

当我尝试抓取Twitter推文时,我收到了一个接收错误。我添加了Twitter API配置并在HDFS中创建了一个目录。我不确定我做错了什么。我正在使用Hadoop 2.0.0-cdh4.2.1

ERROR flume.SinkRunner: Unable to deliver event. 
ERROR hdfs.HDFSEventSink: process failed

以下是例外。

java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses.

flume.conf

# Naming the components on the current agent. 
TwitterAgent.sources = twitter
TwitterAgent.channels = memoryChannel
TwitterAgent.sinks = HDFS
  
# Describing/Configuring the source 
TwitterAgent.sources.twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.twitter.consumerKey = Xxx
TwitterAgent.sources.twitter.consumerSecret = Xxx 
TwitterAgent.sources.twitter.accessToken = Xxx
TwitterAgent.sources.twitter.accessTokenSecret = Xxx
TwitterAgent.sources.twitter.maxBatchDurationMillis = 200 
TwitterAgent.sources.twitter.channels = memoryChannel
TwitterAgent.sources.twitter.keywords = lsu
  
TwitterAgent.channels.memoryChannel.type = memory
TwitterAgent.channels.memoryChannel.capacity = 10000
TwitterAgent.channels.memoryChannel.transactionCapacity = 1000
 
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.channel = memoryChannel
TwitterAgent.sinks.HDFS.hdfs.path = hdfs:/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

0 个答案:

没有答案
相关问题