Question

我在某处读到，如果我们在创建Mapper / Reducer时定义输出可写，并且在Mapper / Reducer中我们应该只设置可写的值，而不是为每个输出记录创建可写的。

例如（伪代码）：

map(){
     IntWritable idWritable = new IntWritable(outputValue);
     emit(idWritable);
}

比以下更优秀：

{{1}}

这是真的吗？在创建Mapper / Reducer时定义输出可写入是否真的是一个好习惯，它将用于所有输出记录？

Answer 1

Yes this is true. In your second example you're creating a brand new IntWritable every time you process a record. This requires overhead for new memory allocation, and also means that the old IntWritable has to be garbage collected at some point. If you're processing millions of records and using a complex Writable (say with several ints and Strings), the heap can be filled very quickly.

Alternately, by just re-setting the value within the same object, no new memory needs to be allocated and no garbage collection needs to take place. It's much faster, but I can recommend doing your own experiments to confirm this.

为整个Mapper / Reducer定义一个可写

1 个答案: