Question

我之前已经阅读了与此相关的帖子，但没有任何意义。

我的用例是：

汇总展示次数和点击数据
在不同文件中分隔单击和未单击的数据。

我已经为此编写了mapper和reducer，但是reducer的输出是包含clicked＆amp;的数据。未点击，它将在同一个文件中。我想将这些数据分开，因此点击的数据应该出现在一个文件中，未点击的数据应该出现在其他文件中。

错误：

java.lang.IllegalStateException: Reducer has been already set
    at org.apache.hadoop.mapreduce.lib.chain.Chain.checkReducerAlreadySet(Chain.java:662)

代码

    Configuration conf = new Configuration();
    conf.set("mapreduce.output.fileoutputformat.compress", "true");
    conf.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.GzipCodec");
    conf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec");
    conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
    Job job = Job.getInstance(conf, "IMPRESSION_CLICK_COMBINE_JOB");
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    job.setReducerClass(ImpressionClickReducer.class);

    FileInputFormat.setInputDirRecursive(job, true);

    // FileInputFormat.addInputPath(job, new Path(args[0]));
    // job.setMapperClass(ImpressionMapper.class);

    Path p = new Path(args[2]);
    FileSystem fs = FileSystem.get(conf);
    fs.exists(p);
    fs.delete(p, true);

    /**
     * Here directory of impressions will be present
     */
    MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, ImpressionMapper.class);
    /**
     * Here directory of clicks will be present
     */
    MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, ClickMapper.class);

    FileOutputFormat.setOutputPath(job, new Path(args[2]));

    job.setNumReduceTasks(10);

    job.setPartitionerClass(TrackerPartitioner.class);

    ChainReducer.setReducer(job, ImpressionClickReducer.class,  Text.class, Text.class, Text.class, Text.class, new Configuration(false));

    ChainReducer.addMapper(job, ImpressionClickMapper.class, Text.class, Text.class, Text.class, Text.class, new Configuration(false));

    //Below mentioned line is giving Error
    //ChainReducer.setReducer(job, ImpressionAndClickReducer.class,  Text.class, Text.class, Text.class, Text.class, new Configuration(false));

    job.waitForCompletion(true);

Answer 1

ChainReducer用于在Reducer之后链接Map任务，您只能调用setReducer()一次（See the code here）。

来自Javadocs：

ChainReducer类允许在a之后链接多个Mapper类   Reducer任务中的Reducer。

使用ChainMapper和ChainReducer类可以组合看起来像[MAP + / REDUCE MAP *]的Map / Reduce作业。这种模式的直接好处是磁盘IO的大幅减少。

所以我的想法是你设置一个Reducer然后链接Map操作。

听起来你确实想要使用MultipleOutputs。 Hadoop Javadocs提供了如何使用它的示例。通过这种方式，您可以定义多个输出，并向下输出要写入的输出键/值。

使用ChainReducer抛出异常添加多个Reducer

1 个答案: