Spring hadoop Mapper配置

时间:2013-12-02 15:23:03

标签: spring hadoop

我正在使用Hadoop 1.2.1和Spring Hadoop 1.0.2

我想在Hadoop Mapper中检查Spring自动装配。我写了这个配置文件:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:hdp="http://www.springframework.org/schema/hadoop"
    xmlns:p="http://www.springframework.org/schema/p"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
    http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
    http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

    <context:property-placeholder location="configuration.properties"/>

    <context:component-scan base-package="it.test"/>

    <hdp:configuration id="hadoopConfiguration">
      fs.default.name=${hd.fs}
    </hdp:configuration>
    <hdp:job id="my-job" 
    mapper="hadoop.mapper.MyMapper" 
    reducer="hadoop.mapper.MyReducer" 
    output-path="/root/Scrivania/outputSpring/out" 
    input-path="/root/Scrivania/outputSpring/in" jar="" />
    <hdp:job-runner id="my-job-runner" job-ref="my-job" run-at-startup="true"/>

    <hdp:hbase-configuration configuration-ref="hadoopConfiguration" zk-quorum="${hbase.host}" zk-port="${hbase.port}"/>

    <bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
        <property name="configuration" ref="hbaseConfiguration"/>
    </bean>

</beans>

然后我创建了这个Mapper

public class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
private static final Log logger = ....
@Autowired
private IHistoricalDataService hbaseService;
private List<HistoricalDataModel> data;
@SuppressWarnings({ "unchecked", "rawtypes" })
@Override
protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
    super.cleanup(context);
}

@SuppressWarnings({ "rawtypes", "unchecked" })
@Override
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
    super.setup(context);
    try {
        data = hbaseService.findAllHistoricalData();
        logger.warn("Data "+data);
    } catch (Exception e) {
        String message = "Errore nel setup del contesto; messaggio errore: "+e.getMessage();
        logger.fatal(message, e);
        throw new InterruptedException(message);
    }
}

@Override
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
    // TODO Auto-generated method stub
    super.map(key, value, context);
}

}

正如你所看到的,MyMapper什么都不做;我唯一想要打印的是数据变量;没什么异常

当我通过JUnit Test在我的IDE(Eclipse Luna)中启动它时,我只能看到这些打印:

16:19:11,902 INFO  [XmlBeanDefinitionReader] Loading XML bean definitions from class path resource [application-context.xml]
16:19:12,540 INFO  [GenericApplicationContext] Refreshing org.springframework.context.support.GenericApplicationContext@150e804: startup date [Mon Dec 02 16:19:12 CET 2013]; root of context hierarchy
16:19:12,693 INFO  [PropertySourcesPlaceholderConfigurer] Loading properties file from class path resource [configuration.properties]
16:19:12,722 INFO  [DefaultListableBeanFactory] Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@109f81a: defining beans [org.springframework.context.support.PropertySourcesPlaceholderConfigurer#0,pinfClusteringHistoricalDataDao,historicalDataServiceImpl,clusterAnalysisSvcImpl,org.springframework.context.annotation.internalConfigurationAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor,org.springframework.context.annotation.internalCommonAnnotationProcessor,hadoopConfiguration,clusterAnalysisJob,clusterAnalysisJobRunner,hbaseConfiguration,hbaseTemplate,org.springframework.context.annotation.ConfigurationClassPostProcessor.importAwareProcessor]; root of factory hierarchy
16:19:13,516 INFO  [JobRunner] Starting job [clusterAnalysisJob]
16:19:13,568 WARN  [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16:19:13,584 WARN  [JobClient] No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
16:19:13,619 INFO  [FileInputFormat] Total input paths to process : 0
16:19:13,998 INFO  [JobClient] Running job: job_local265750426_0001
16:19:14,065 INFO  [LocalJobRunner] Waiting for map tasks
16:19:14,065 INFO  [LocalJobRunner] Map task executor complete.
16:19:14,127 INFO  [ProcessTree] setsid exited with exit code 0
16:19:14,134 INFO  [Task]  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@b1258d
16:19:14,144 INFO  [LocalJobRunner] 
16:19:14,148 INFO  [Merger] Merging 0 sorted segments
16:19:14,149 INFO  [Merger] Down to the last merge-pass, with 0 segments left of total size: 0 bytes
16:19:14,149 INFO  [LocalJobRunner] 
16:19:14,219 INFO  [Task] Task:attempt_local265750426_0001_r_000000_0 is done. And is in the process of commiting
16:19:14,226 INFO  [LocalJobRunner] 
16:19:14,226 INFO  [Task] Task attempt_local265750426_0001_r_000000_0 is allowed to commit now
16:19:14,251 INFO  [FileOutputCommitter] Saved output of task 'attempt_local265750426_0001_r_000000_0' to /root/Scrivania/outputSpring/out
16:19:14,254 INFO  [LocalJobRunner] reduce > reduce
16:19:14,255 INFO  [Task] Task 'attempt_local265750426_0001_r_000000_0' done.
16:19:15,001 INFO  [JobClient]  map 0% reduce 100%
16:19:15,005 INFO  [JobClient] Job complete: job_local265750426_0001
16:19:15,007 INFO  [JobClient] Counters: 13
16:19:15,007 INFO  [JobClient]   File Output Format Counters 
16:19:15,007 INFO  [JobClient]     Bytes Written=0
16:19:15,007 INFO  [JobClient]   FileSystemCounters
16:19:15,007 INFO  [JobClient]     FILE_BYTES_READ=22
16:19:15,007 INFO  [JobClient]     FILE_BYTES_WRITTEN=67630
16:19:15,007 INFO  [JobClient]   Map-Reduce Framework
16:19:15,008 INFO  [JobClient]     Reduce input groups=0
16:19:15,008 INFO  [JobClient]     Combine output records=0
16:19:15,008 INFO  [JobClient]     Reduce shuffle bytes=0
16:19:15,008 INFO  [JobClient]     Physical memory (bytes) snapshot=0
16:19:15,008 INFO  [JobClient]     Reduce output records=0
16:19:15,008 INFO  [JobClient]     Spilled Records=0
16:19:15,008 INFO  [JobClient]     CPU time spent (ms)=0
16:19:15,009 INFO  [JobClient]     Total committed heap usage (bytes)=111935488
16:19:15,009 INFO  [JobClient]     Virtual memory (bytes) snapshot=0
16:19:15,009 INFO  [JobClient]     Reduce input records=0
16:19:15,009 INFO  [JobRunner] Completed job [clusterAnalysisJob]
16:19:15,028 WARN  [SpringHadoopTest] Scrivo............ OOOOOOO

似乎JOb启动但我的Mapper从未执行过;任何人都可以向我建议我错在哪里吗?

2 个答案:

答案 0 :(得分:2)

没有自动装配映射器或缩减器。这些类由Hadoop加载,因此在运行时没有与它们关联的应用程序上下文。应用程序上下文仅作为作业的工作流程编排的一部分提供。

我不知道为什么你的安装方法没有记录任何消息,你确定你为mapper指定了正确的类和包吗?

-Thomas

答案 1 :(得分:0)

您的输入文件是否可能存在但是为空?没有输入拆分,就不会创建任何映射器任务。只是一个猜测...