Question

假设我有两个数据集：

hello world
bye world

和

hello earth
new earth

我想运行一个map-reduce任务，它没有指定mapper类或reducer类，所以将调用默认的mapper和reducer - 它们都是identity函数。当我运行作业时，输出是::

0       hello world
0       hello earth
12      new earth
12      bye world

我很困惑，为什么键是0和12？我刚刚使用了默认的mapper和reducer，因为我在main() ::

中注释掉了这些行

//    job.setMapperClass(Map.class);
//    job.setCombinerClass(Reduce.class);
//    job.setReducerClass(Reduce.class);

所以，我的问题是输出键在这里是什么？为什么它看起来像0,0,12,12？

Answer 1

0,0,12和12是输入数据中的文件偏移量。在文本输入的情况下，映射器的K是文件偏移量，值是输入行。查看this了解详情。