Hadoop KeyComposite和Combiner

时间:2015-10-04 08:35:46

标签: hadoop hadoop-streaming hadoop2 hadoop-partitioning hadoop-plugins

我正在Hadoop 2.6.0中进行二次排序,我正在学习本教程: https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/

我有完全相同的代码,但现在我正在尝试提高性能,所以我决定添加一个组合器。我添加了两个修改:

主档案:

job.setCombinerClass(CombinerK.class);

组合文件:

public class CombinerK extends Reducer<KeyWritable, KeyWritable, KeyWritable, KeyWritable> {

    public void reduce(KeyWritable key, Iterator<KeyWritable> values, Context context) throws IOException, InterruptedException {


        Iterator<KeyWritable> it = values;

        System.err.println("combiner " + key);

        KeyWritable first_value = it.next();
        System.err.println("va: " + first_value);

        while (it.hasNext()) {

            sum += it.next().getSs();

        }
        first_value.setS(sum);
        context.write(key, first_value);


    }
}

但它似乎没有运行,因为我无法找到任何包含单词&#34; combiner&#34;的日志文件。当我跑完后看到计数器时,我可以看到:

    Combine input records=4040000
    Combine output records=4040000

组合器似乎正在被执行,但似乎它已经接收到每个键的调用,因此它在输入中具有与输出相同的数字。

0 个答案:

没有答案