从twitter获取数据并使用Flume将其加载到hdfs

时间:2016-09-28 10:16:10

标签: flume flume-twitter

我在hadoop中运行以下命令时遇到错误

bin/flume-ng agent  -c /usr/local/hadoop/flume/conf -f usr/local/hadoop/flume/conf/flume-twitter.conf -n TwitterAgent - flume.root.logger=INFO,console
执行水槽命令时

显示以下错误

2016-09-28 17:38:23,508 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: HDFS, type: hdfs                                       
2016-09-28 17:38:23,546 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:114)] Channel MemChannel connected to [Twitter, HDFS]  
2016-09-28 17:38:23,565 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{Twitter=EventDrivenSourceRunner: { source:org.apache.flume.source.twitter.TwitterSource{name:Twitter,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@9238ca counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }                                                                                                                                        
2016-09-28 17:38:23,594 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel MemChannel                                                        
2016-09-28 17:38:23,653 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.                                                                                                                                                                              
2016-09-28 17:38:23,654 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: CHANNEL, name: MemChannel started            
2016-09-28 17:38:23,654 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS                                                                 
2016-09-28 17:38:23,654 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source Twitter                                                            
2016-09-28 17:38:23,655 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.twitter.TwitterSource.start(TwitterSource.java:131)] Starting twitter source org.apache.flume.source.twitter.TwitterSource{name:Twitter,state:IDLE} ...                                                                                                                                                                                               
2016-09-28 17:38:23,659 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.                                                                                                                                                                                       
2016-09-28 17:38:23,659 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SINK, name: HDFS started                     
2016-09-28 17:38:23,660 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.twitter.TwitterSource.start(TwitterSource.java:139)] Twitter source Twitter started.                                              
2016-09-28 17:38:23,660 (Twitter Stream consumer-1[initializing]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.                                                
2016-09-28 17:38:25,544 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist.                                                                                                                                                                                      
Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api                                                                                                                  

2016-09-28 17:38:25,545 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Waiting for 10000 milliseconds
2016-09-28 17:38:35,547 (Twitter Stream consumer-1[Waiting for 10000 milliseconds]) [ERROR - org.apache.flume.source.twitter.TwitterSource.onException(TwitterSource.java:331)] Exception while streaming tweets
404:The URI requested is invalid or the resource requested, such as a user, does not exist.                                                                                                                                                  
Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api                                                                                                                                           
Relevant discussions can be found on the Internet at:                                                                                                                                                                                        
        http://www.google.co.jp/search?q=ec814753 or                                                                                                                                                                                         
        http://www.google.co.jp/search?q=0a74cca1                                                                                                                                                                                            
TwitterException{exceptionCode=[ec814753-0a74cca1], statusCode=404, retryAfter=-1, rateLimitStatus=null, featureSpecificRateLimitStatus=null, version=2.2.6}                                                                                 
        at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:185)                                                                                                                                                           
        at twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:65)                                                                                                                                                      
        at twitter4j.internal.http.HttpClientWrapper.get(HttpClientWrapper.java:93)                                                                                                                                                          
        at twitter4j.TwitterStreamImpl.getSampleStream(TwitterStreamImpl.java:160)                                                                                                                                                           
        at twitter4j.TwitterStreamImpl$4.getStream(TwitterStreamImpl.java:149)                                                                                                                                                               
        at twitter4j.TwitterStreamImpl$4.getStream(TwitterStreamImpl.java:147)                                                                                                                                                               
        at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:426)                                                                                                                                                 
2016-09-28 17:38:35,571 (Twitter Stream consumer-1[Waiting for 10000 milliseconds]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.                                                       
2016-09-28 17:38:37,049 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist.                                                                                                                                                                                                                                        
Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api   
i have added the configuration file it looks like the following  

            witterAgent.sources = Twitter
            TwitterAgent.channels = MemChannel
            TwitterAgent.sinks = HDFS

            TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
            TwitterAgent.sources.Twitter.channels = MemChannel
            TwitterAgent.sources.Twitter.consumerKey = xxxxx
            TwitterAgent.sources.Twitter.consumerSecret =  xxxx
            TwitterAgent.sources.Twitter.accessToken = xxxx
            TwitterAgent.sources.Twitter.accessTokenSecret = xxxx
            TwitterAgent.sources.Twitter.keywords = INDIA VS NEWZELAND, apache spark, spark, flume, apache mahout, kafka


            TwitterAgent.sinks.HDFS.channel = MemChannel
            TwitterAgent.sinks.HDFS.type = hdfs
            TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:54310/Flume_twitter_data/
            TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
            TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
            TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
            TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
            TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

            TwitterAgent.channels.MemChannel.type = memory
            TwitterAgent.channels.MemChannel.capacity = 10000
            TwitterAgent.channels.MemChannel.transactionCapacity = 100

                if any one knows please help me

                thank u 

0 个答案:

没有答案