MAP - 减少代码

Question

我正在开发一个Big hadoop项目，并且有一个小的KPI，我只需要在减少输出中只编写前10个值。为了完成这个要求，我使用了一个计数器并在计数器等于11时中断循环，但仍然是reducer将所有值写入HDFS。

这是一个非常简单的java代码，但我被卡住了:(

为了进行测试，我创建了一个独立的类（java应用程序）来执行此操作，这在那里工作;我想知道为什么它不适用于reducer代码。

请有人帮助我，并建议我是否遗漏了什么。

MAP - 减少代码

package comparableTest;
import java.io.IOException;
import java.nio.ByteBuffer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.IntWritable.Comparator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class ValueSortExp2 {
    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration(true);

        String arguments[] = new GenericOptionsParser(conf, args).getRemainingArgs();

        Job job = new Job(conf, "Test commond");
        job.setJarByClass(ValueSortExp2.class);

        // Setup MapReduce
        job.setMapperClass(MapTask2.class);
        job.setReducerClass(ReduceTask2.class);
        job.setNumReduceTasks(1);

        // Specify key / value
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(Text.class);
        job.setSortComparatorClass(IntComparator2.class);
        // Input
        FileInputFormat.addInputPath(job, new Path(arguments[0]));
        job.setInputFormatClass(TextInputFormat.class);

        // Output
        FileOutputFormat.setOutputPath(job, new Path(arguments[1]));
        job.setOutputFormatClass(TextOutputFormat.class);


        int code = job.waitForCompletion(true) ? 0 : 1;
        System.exit(code);

    }

    public static class IntComparator2 extends WritableComparator {

        public IntComparator2() {
            super(IntWritable.class);
        }

        @Override
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

            Integer v1 = ByteBuffer.wrap(b1, s1, l1).getInt();
            Integer v2 = ByteBuffer.wrap(b2, s2, l2).getInt();

            return v1.compareTo(v2) * (-1);
        }
    }

    public static class MapTask2 extends Mapper<LongWritable, Text, IntWritable, Text> {

            public void  map(LongWritable key,Text value, Context context) throws IOException, InterruptedException {

                String tokens[]= value.toString().split("\\t");

            //    int empId = Integer.parseInt(tokens[0])    ;    
                int count = Integer.parseInt(tokens[2])    ;

                context.write(new IntWritable(count), new Text(value));

            }    

        }


    public static class ReduceTask2 extends Reducer<IntWritable, Text, IntWritable, Text> {
        int cnt=0;
        public void reduce(IntWritable key, Iterable<Text> list, Context context)
                throws java.io.IOException, InterruptedException {


            for (Text value : list ) {
                cnt ++;

                if (cnt==11)
                {
                    break;    
                }

                context.write(new IntWritable(cnt), value);




            }

        }
}
}

简单的JAVA CODE WOKING FINE

package comparableTest;

import java.io.IOException;
import java.util.ArrayList;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer.Context;

public class TestData {

    //static int cnt=0;


    public static void main(String args[]) throws IOException, InterruptedException {

        ArrayList<String> list = new ArrayList<String>() {{
            add("A");
            add("B");
            add("C");
            add("D");
        }};


        reduce(list);


    }

    public static void reduce(Iterable<String> list)
            throws java.io.IOException, InterruptedException {


        int cnt=0;
        for (String value : list ) {
            cnt ++;

            if (cnt==3)
            {
                break;    
            }

            System.out.println(value);    


        }

    }
}

示例数据--Header只是更多信息，实际数据来自第二行

` ID NAME COUNT（需要显示前10名desc）

1玩具总动员（1995）2077

10 GoldenEye（1995）888

100 City Hall（1996）128

1000 Curdled（1996）20

1001 Associate，The（L'Associe）（1982）0

1002 Ed's Next Move（1996）8

1003极端措施（1996）121

1004 Glimmer Man，The（1996）101

1005 D3：The Mighty Ducks（1996）142

1006 Chamber，The（1996）78

1007 Apple Dumpling Gang，The（1975）232

1008 Davy Crockett，The Wild Frontier（1955）97

1009逃到女巫山（1975）291

101 Bottle Rocket（1996）253

1010 Love Bug，The（1969）242

1011 Herbie Rides Again（1974）135

1012 Old Yeller（1957）301

1013父母陷阱，（1961）258

1014 Pollyanna（1960）136

1015 Homeward Bound：The Incredible Journey（1993）234

1016 Shaggy Dog，The（1959）156

1017 Swiss Family Robinson（1960）276

1018那只猫！（1965）123

1019 20,000海底联赛（1954年）575

102 Mr. Wrong（1996）60

1020 Cool Runnings（1993）392

1021天使队在外场（1994）247

1022灰姑娘（1950）577

1023小熊维尼和喧嚣日（1968）221

1024 Three Caballeros，The（1945）126

1025石头中的剑，（1963）293

1026亲爱的（1949）8

1027罗宾汉：盗贼王子（1991）344

1028 Mary Poppins（1964）1011

1029 Dumbo（1941）568

103难以忘怀（1996）33

1030 Pete's Dragon（1977）323

1031 Bedknobs and Broomsticks（1971）319`

Answer 1

如果在reduce方法中移动int cnt=0;（作为此方法的第一个语句），您将获得每个键的前10个值（我想这就是您想要的）。

否则，就像现在一样，你的计数器将继续增加，你将只跳过第11个值（不管是否键），继续第12个。

如果您只想打印10个值（无论密钥如何），请将cnt初始化保留在原始位置，并将if条件更改为if (cnt > 10) ... 但是，这不是一个好习惯，因此您可能需要重新考虑算法。（假设您不需要10个随机值，当您有多个减少器和散列分区器时，如何知道在分布式环境中首先处理哪个密钥？）

计数器在减速器代码中不起作用

MAP - 减少代码

简单的JAVA CODE WOKING FINE

示例数据--Header只是更多信息，实际数据来自第二行

1 个答案: