Hadoop映射器和reducer值类型不匹配错误

时间:2014-12-10 22:20:08

标签: java hadoop

我是hadoop的新手,遇到过这个问题。我试图将reducer的默认Text,Integer值更改为Text,Text。我想映射Text,IntWritable然后在reducer中我希望有两个计数器,具体取决于值是什么,然后在收集器的Text中写入这两个计数器。

public class WordCountMapper extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

  private final IntWritable one = new IntWritable(1);
  private Text word = new Text();

  public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
        output, Reporter reporter) throws IOException {

    String line = value.toString();
    String[] words = line.split(",");
    String[] date = words[2].split(" ");
      word.set(date[0]+" "+date[1]+" "+date[2]);
      if(words[0].contains("0"))
          one.set(0);
      else
          one.set(4);
      output.collect(word, one);

  }
}

-----------------------------------------------------------------------------------

public class WordCountReducer extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, Text> {

  public void reduce(Text key,Iterator<IntWritable> values,
                  OutputCollector<Text, Text> output,
                  Reporter reporter) throws IOException {

    int sad = 0;
    int happy = 0;
    while (values.hasNext()) {
      IntWritable value = (IntWritable) values.next();
      if(value.get() == 0)
          sad++; // process value
      else
          happy++;
    }

    output.collect(key, new Text("sad:"+sad+", happy:"+happy));
  }
}
---------------------------------------------------------------------------------

public class WordCount {

  public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(WordCount.class);

    // specify output types
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    // specify input and output dirs
    FileInputFormat.addInputPath(conf, new Path("input"));
    FileOutputFormat.setOutputPath(conf, new Path("output"));

    // specify a mapper
    conf.setMapperClass(WordCountMapper.class);

    // specify a reducer
    conf.setReducerClass(WordCountReducer.class);
    conf.setCombinerClass(WordCountReducer.class);

    client.setConf(conf);
    try {
      JobClient.runJob(conf);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

我收到此错误:

  

14/12/10 18:11:01 INFO mapred.JobClient:任务ID:   attempt_201412100143_0008_m_000000_0,状态:未通过   java.io.IOException:溢出失败           at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.collect(MapTask.java:425)           在WordCountMapper.map(WordCountMapper.java:31)           在WordCountMapper.map(WordCountMapper.java:1)           在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)           在org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)           在org.apache.hadoop.mapred.TaskTracker $ Child.main(TaskTracker.java:2209)   引起:java.io.IOException:错误的值类:class   org.apache.hadoop.io.Text不是类   org.apache.hadoop.io.IntWritable           在org.apache.hadoop.mapred.IFile $ Writer.append(IFile.java:143)           在org.apache.hadoop.mapred.Task $ CombineOutputCollector.collect(Task.java:626)           在WordCountReducer.reduce(WordCountReducer.java:29)           在WordCountReducer.reduce(WordCountReducer.java:1)           在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.combineAndSpill(MapTask.java:904)           在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill(MapTask.java:785)           在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 1600(MapTask.java:286)           在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer $ SpillThread.run(MapTask.java:712)

此后错误会多次重复。有人能解释为什么会出现这种错误吗我搜索了类似的错误,但是我发现的只是mapper和reducer的键值类型不匹配,但我可以看到我有mapper和reducer的匹配键值类型。 先感谢您。

2 个答案:

答案 0 :(得分:2)

尝试发表评论

conf.setCombinerClass(WordCountReducer.class);

然后跑。

这是因为数据缓冲区可能已满。

Spill error

还包括

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

因为Map和Reducer会发出不同的键值数据类型。

如果两者都发出相同的数据类型,那么

job.setOutputKeyClass();
job.setOutputValueClass();

就够了。

答案 1 :(得分:0)

在这行WordCount类中应该是

 conf.setOutputValueClass(Text.class);