寻找关键的最大价值

时间:2015-07-09 18:49:11

标签: hadoop mapreduce

我想找出面积最大的国家。

我的数据集如下

Afghanistan 648
Albania 29
Algeria 2388
Andorra 0
Austria 84
Bahrain 1
Bangladesh  143
Belgium 31
Benin   113
Bhutan  47
Brunei  6
Bulgaria    111
Burma   678
Cameroon    474
Central-African-Republic    623
Chad    1284
China   9561
Cyprus  9
Czechoslovakia  128
Denmark 43
Djibouti    22
Egypt   1001
Equatorial-Guinea   28
Ethiopia    1222
Finland 337
France  547
Germany-DDR 108
Germany-FRG 249
Greece  132
Guam    0
Hong-Kong   1
Hungary 93
India   3268

有谁能帮我写mapreduce程序?

我的mapper和reducer代码就是这个

Mapper

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
        String[] tokens = value.toString().split(",");
        if(Integer.parseInt(tokens[2]) == 1){
            context.write(new Text(tokens[0]), new IntWritable(Integer.parseInt(tokens[3])));
        }
    }

减速

public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
        int max = 0;
        for(IntWritable x : values){
            if(max < Integer.parseInt(String.valueOf(x))){
                max = Integer.parseInt(String.valueOf(x));
            }
        }
        context.write(key, new IntWritable(max));
    }

1 个答案:

答案 0 :(得分:1)

算法很简单,在映射器中,您可以使用cleanup将最大值和映射器末尾的映射器写入磁盘。

int max = Integer.MIN_VALUE;
String token;

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] tokens = value.toString().split(",");
        if(Integer.parseInt(tokens[2]) == 1){       
            int val = Integer.parseInt(tokens[3])
            if(Integer.parseInt(tokens[3]) > max){
                max = val;
                token = tokens[0];
            }
        }
}

@Override
public void cleanup(Context context) throws IOException, InterruptedException {    
    context.write(new LongWritable(max), new Text(token));    
}

现在你所有的东西都会减少最大值,这意味着如果我们按降序排序,你会得到最大值作为减速器中的第一个记录。因此,您需要在工作中设置此项:

job.setSortComparatorClass(LongWritable.DecreasingComparator.class);

reducer是一个简单的/未找到的开关,如果它具有最大值(第一个记录),则只输出每个国家。

boolean foundMax = false;

@Override
public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException{
        if(!foundMax){
            for(Text t : values){
                context.write(t, key);
            }
            foundMax = true;
        }              
}