我尝试自己实现单词计数示例,这是我对mapper的实现:
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
Text word = new Text();
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, new IntWritable(1));
}
}
}
和reducer:
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext())
sum += values.next().get();
context.write(key, new IntWritable(sum));
}
}
但是我执行此代码的输出看起来只是mapper的输出,例如,如果输入是“hello world hello”,输出将是
hello 1
hello 1
world 1
我还在映射和缩减之间使用组合器。任何人都可以解释一下这段代码有什么问题吗?
非常感谢!
答案 0 :(得分:3)
用这个替换你的reduce方法:
@Override
protected void reduce(Text key, java.lang.Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException,
InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
所以底线是你没有覆盖正确的方法。 @Override可以帮助解决这类错误。
还要确保将Reduce.class设置为reduce类而不是Reducer.class!
) HTH 约翰内斯
答案 1 :(得分:0)
如果你不想使用reduce方法的args而不是替代解决方案,那么:
@Override
protected void reduce(Object key, Iterable values, Context context) throws
IOException, InterruptedException {
int sum = 0;
Iterable<IntWritable> v = values;
Iterator<IntWritable> itr = v.iterator();
while(itr.hasNext()){
sum += itr.next().get();
}
context.write(key, new IntWritable(sum));
}