Question

在许多MapReduce程序中，我看到减速器也被用作组合器。我知道这是因为这些计划的具体性质。但我想知道他们是否会有所不同。

Answer 1

是的，组合器可以与Reducer不同，尽管您的Combiner仍将实现Reducer接口。组合器只能用于与工作有关的特定情况。 Combiner将像Reducer一样运行，但仅限于每个Mapper的Key / Values输出的子集。

与Reducer不同，Combiner将具有的一个约束是输入/输出键和值类型必须匹配Mapper的输出类型。

Answer 2

是的，他们肯定会有所不同，但我不认为你想要使用不同的课程，因为大多数情况下你会得到意想不到的结果。

组合器只能用于可交换的函数（a.b = b.a）和关联的{a。（b.c）=（a.b）.c}。这也意味着组合器可能只在您的键和值的子集上运行，或者可能根本不执行，但您仍希望程序的输出保持相同。

选择具有不同逻辑的其他类可能无法为您提供逻辑输出。

Answer 3

这是实现，你可以在没有组合器和组合器的情况下运行，两者都给出完全相同的答案。这里的Reducer和Combiner有不同的动机和不同的实现。

package combiner;

import java.io.IOException;


import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable, Text, Text, Average> {

Text name = new Text();
String[] row;

protected void map(LongWritable offSet, Text line, Context context) throws IOException, InterruptedException {
    row = line.toString().split(" ");
    System.out.println("Key "+row[0]+"Value "+row[1]);
    name.set(row[0]);
    context.write(name, new Average(Integer.parseInt(row[1].toString()), 1));
}}

减少班级

public class Reduce extends Reducer<Text, Average, Text, LongWritable> {
    LongWritable avg =new LongWritable();
    protected void reduce(Text key, Iterable<Average> val, Context context)throws IOException, InterruptedException {
    int total=0; int count=0; long avgg=0;

    for (Average value : val){
        total+=value.number*value.count;
        count+=value.count;
        avgg=total/count;   
        }
    avg.set(avgg);
    context.write(key, avg);
}
}

MapObject类

public class Average implements Writable {

long number;
int count;

public Average() {super();}

public Average(long number, int count) {
    this.number = number;
    this.count = count;
}

public long getNumber() {return number;}
public void setNumber(long number) {this.number = number;}
public int getCount() {return count;}
public void setCount(int count) {this.count = count;}

@Override
public void readFields(DataInput dataInput) throws IOException {
    number = WritableUtils.readVLong(dataInput);
    count = WritableUtils.readVInt(dataInput);      
}

@Override
public void write(DataOutput dataOutput) throws IOException {
    WritableUtils.writeVLong(dataOutput, number);
    WritableUtils.writeVInt(dataOutput, count);

}
}

Combiner Class

public class Combine extends Reducer<Text, Average, Text, Average>{

protected void reduce(Text name, Iterable<Average> val, Context context)throws IOException, InterruptedException {
    int total=0; int count=0; long avg=0;

    for (Average value : val){
        total+=value.number;
        count+=1;
        avg=total/count;    
        }
    context.write(name, new Average(avg, count));

}
}

驱动程序类

public class Driver1 {

public static void main(String[] args) throws Exception { 

    Configuration conf = new Configuration();
    if (args.length != 2) {
        System.err.println("Usage: SecondarySort <in> <out>");
        System.exit(2);
    }
    Job job = new Job(conf, "CustomCobiner");
    job.setJarByClass(Driver1.class);
    job.setMapperClass(Map.class);
    job.setCombinerClass(Combine.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Average.class);
    job.setReducerClass(Reduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);     
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

从here

获取代码

留下你的建议..

Answer 4

组合器的主要目标是优化/最小化将要使用的键值对的数量在映射器和减速器之间通过网络进行混洗，从而节省大多数带宽尽可能。

组合器的拇指规则是它必须具有相同的输入和输出变量类型，原因为此，组合使用是不保证的，它可以或不可以使用，取决于音量和泄漏次数。

当满足该规则，即相同的输入和输出时，减速器可以用作组合器变量类型。

组合器的另一个最重要的规则是它只能在你想要的功能时使用申请既是可交换的，也是联想的。比如添加数字。但不是像平均值那样（如果你使用与减速器相同的代码）。

现在回答你的问题，是的，当然它们可以是不同的，当你的reducer有不同类型的输入和输出变量时，你别无选择，只能制作一个ur reducer代码的不同副本并修改它。

如果你关心reducer的逻辑，你可以用不同的方式实现，比如说在组合器的情况下，你可以让一个集合对象拥有一个到达组合器的所有值的本地缓冲区，这比在减速器中使用它的风险要小，因为在减速器的情况下，它比组合器更容易出现内存不足。其他逻辑差异当然可以存在和确实存在。

合并器和减速器可以不同吗？

4 个答案: