我正在尝试获取文件中单词的总数,以便我可以计算每个单词的百分比。我想在单个map reduce任务中完成此任务。
在组合器功能和减速器功能中使用“MAP_OUTPUT_RECORDS”计数器进行了尝试。但结果不正确。请在下面找到我的代码:
减速机:
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
int count = 0;
for (Text val : values) {
String str[] = (val.toString()).split(" ");
count += Integer.parseInt(str[0]);
sum += Integer.parseInt(str[1]);
}
result.set(sum+" "+count);
System.out.println("All value "+context.getConfiguration().getLong("All",0));
context.write(key, result);
System.out.println("Setup value "+context.getCounter("org.apache.hadoop.mapred.Task$Counter", "MAP_OUTPUT_RECORDS").getValue());
}
我到达的结果是一些与总数无关的数字:
输出:
设定值0
设定值0
设定值0
设置值0
预期值:地图输出记录= 50338
合
public static class IntSumCombiner
extends Reducer<Text,Text,Text,Text> {
private Text result = new Text();
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
int count = 0;
int sum = 0;
for (Text val : values) {
String str[] = (val.toString()).split(" ");
count += Integer.parseInt(str[0]);
sum += Integer.parseInt(str[1]);
}
result.set(count+" "+sum);
context.write(key, result);
System.out.println("Setup value"+context.getCounter("org.apache.hadoop.mapred.Task$Counter","MAP_OUTPUT_RECORDS").getValue());
}
}
输出:
设定值14296
设定值14296
设定值14296
设定值14296
预期值:地图输出记录= 50338