执行Mapreduce时出现Java堆空间错误

时间:2016-03-02 09:12:26

标签: java hadoop mapreduce

我想在Hadoop中找到中位数。作业失败,出现以下错误:

16/03/02 02:46:13 INFO mapreduce.Job: Task Id : attempt_1456904182817_0001_r_000412_0, Status : FAILED
Error: Java heap space

我经历了很多针对类似问题的帖子,但没有任何效果。还得到了帮助:

Error: Java heap space

我尝试了以下可能的解决方案:

  1. 按照上面的帖子中的建议增加Java堆大小。
  2. 通过更改以下属性来增加容器的大小:

    在yarn-site.xml中

    yarn.scheduler.minimum-allocation-mb到1024

  3. 将减速器数量增加到更大的值,如下所示:

    job.setNumReduceTasks(1000);

  4. 但是,上述内容对我没有任何影响。因此,我发布这个。 我知道中位数不适合Hadoop,但任何人都可以提供任何可能有用的解决方案。

        java version "1.8.0_60"
        Hadoop version is 2.x
    

    我有一个10节点集群,每个节点上有8 GB RAM,每个节点上有80 GB硬盘。

    以下是整个代码:

    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.List;
    
    import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
    import org.apache.commons.math3.stat.descriptive.rank.Median;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.DoubleWritable;
    
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    
    
    
    public class median_all_keys {
    
    
        //Mapper
        public static class map1 extends Mapper<LongWritable,Text,Text,DoubleWritable>{public void map(LongWritable key, Text value, Context context)
                throws IOException,InterruptedException{
            String[] line= value.toString().split(",");
            double col1=Double.parseDouble(line[6]);
            double col2=Double.parseDouble(line[7]);
            context.write(new Text("Key0"+"_"+line[0]+"_"+"Index:6"), new DoubleWritable(col1));
            context.write(new Text("Key0"+"_"+line[0]+"_"+"Index:7"), new DoubleWritable(col2));
            context.write(new Text("Key1"+"_"+line[1]+"_"+"Index:6"), new DoubleWritable(col1));
            context.write(new Text("Key1"+"_"+line[1]+"_"+"Index:7"), new DoubleWritable(col2));
            context.write(new Text("Key2"+"_"+line[2]+"_"+"Index:6"), new DoubleWritable(col1));
            context.write(new Text("Key2"+"_"+line[2]+"_"+"Index:7"), new DoubleWritable(col2));
            context.write(new Text("Key0"+"_"+line[0] +","+"key1"+"_"+ line[1]+"_"+"Index:6"), new DoubleWritable(col1));
            context.write(new Text("Key0"+"_"+line[0] +","+"key1"+"_"+ line[1]+"_"+"Index:7"), new DoubleWritable(col2));
            context.write(new Text("Key1"+"_"+line[1] +","+"key2"+"_"+ line[2]+"_"+"Index:6"), new DoubleWritable(col1));
            context.write(new Text("Key1"+"_"+line[1] +","+"key2"+"_"+ line[2]+"_"+"Index:7"), new DoubleWritable(col2));
            context.write(new Text("Key0"+"_"+line[0] +","+"key2"+"_"+ line[2]+"_"+"Index:6"), new DoubleWritable(col1));
            context.write(new Text("Key0"+"_"+line[0] +","+"key2"+"_"+ line[2]+"_"+"Index:7"), new DoubleWritable(col2));
            context.write(new Text("Key0"+"_"+line[0] +","+"key1"+"_"+ line[1]+","+"key2"+"_"+line[2]+"_"+"Index:6"),new DoubleWritable(col1));
            context.write(new Text("Key0"+"_"+line[0] +","+"key1"+"_"+ line[1]+","+"key2"+"_"+line[2]+"_"+"Index:7"),new DoubleWritable(col2));         
        }
    }
    
    //Reducer
        public static class sum_reduce extends Reducer<Text,DoubleWritable,Text,DoubleWritable>{
        //  HashMap<String,List<Float>> median_map = new HashMap<String,List<Float>>(); 
            @SuppressWarnings({ "unchecked", "rawtypes" })
            public void reduce(Text key,Iterable<DoubleWritable> value, Context context)
            throws IOException,InterruptedException{
                List<Double> values = new ArrayList<>();
                for (DoubleWritable val: value){
                    values.add(val.get());
                    }
                double res = calculate(values);
                context.write(key, new DoubleWritable(res));
    
    
            }
    
            public static double calculate(List<Double> values) {
                  DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics();
                  for (Double value : values) {
                   descriptiveStatistics.addValue(value);
                  }
                  return descriptiveStatistics.getPercentile(50);
                 }
        }
    
    
    
        public static void main(String[] args) throws Exception {
            Configuration conf= new Configuration();
            Job job = new Job(conf,"Sum for all keys");
            //Driver
            job.setJarByClass(median_all_keys.class);
            //Mapper
            job.setMapperClass(map1.class);
            //Reducer
            job.setReducerClass(sum_reduce.class);
            //job.setCombinerClass(TestCombiner.class);
            //Output key class for Mapper
            job.setMapOutputKeyClass(Text.class);
            //Output value class for Mapper
            job.setMapOutputValueClass(DoubleWritable.class);
            //Output key class for Reducer
            job.setOutputKeyClass(Text.class);
            job.setNumReduceTasks(1000);
            //Output value class from Reducer
            job.setOutputValueClass(DoubleWritable.class);
            //Input Format class
            job.setInputFormatClass(TextInputFormat.class);
            //Final Output Format class
            job.setOutputFormatClass(TextOutputFormat.class);
            //Path variable
            Path path = new Path(args[1]);
            //input/output path
            FileInputFormat.addInputPath(job, new Path(args[0]));
             FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
             path.getFileSystem(conf).delete(path);
             //exiting the job
             System.exit(job.waitForCompletion(true) ? 0 : 1);
    
    
        }
    
    }
    

2 个答案:

答案 0 :(得分:1)

尝试重用可写入:创建一个DoubleWritable类变量并使用.set()为其设置值,而不是每次都创建新对象。

在reducer中使用数组也是不必要的,直接将值发送到DescriptiveStatistics对象。

答案 1 :(得分:0)

根据article检查YARN,Map和Reduce任务的内存设置。

enter image description here

根据输入数据集大小设置内存参数。

关键参数:

<强> YARN

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.vmem-pmem-ratio
yarn.nodemanager.resource.memory.mb

地图记忆

mapreduce.map.java.opts
mapreduce.map.memory.mb

减少记忆

mapreduce.reduce.java.opts
mapreduce.reduce.memory.mb

申请大师

yarn.app.mapreduce.am.command-opts
yarn.app.mapreduce.am.resource.mb

看看相关的SE问题:

What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?