使用一个MapReduce的输出作为另一个MapReduce的输入时出错

时间:2011-04-16 16:46:48

标签: hadoop mapreduce

I have two Map/Reduce classes, named MyMappper1/MyReducer1 and MyMapper2/MyReducer2, and want to use the output of MyReducer1 as the input of MyMapper2, by setting the input path of job2 to the output path of job1.

类型如下:

    public class MyMapper1 extends Mapper<LongWritable, Text, IntWritable, IntArrayWritable>
    public class MyReducer1 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable>
    public class MyMapper2 extends Mapper<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable>
    public class MyReducer2 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntWritable>

public class IntArrayWritable extends ArrayWritable {
    public IntArrayWritable() {
        super(IntWritable.class);
    }
}

设置输入/输出路径的代码如下:

    Path temppath = new Path("temp-dir-" + temp_time);

    FileOutputFormat.setOutputPath(job1, temppath);

            ...........

    FileInputFormat.addInputPath(job2, temppath);

设置输入/输出格式的代码如下:

    job1.setOutputFormatClass(TextOutputFormat.class);
            ..........
    job2.setInputFormatClass(KeyValueTextInputFormat.class);

但是我在运行job2时总是遇到异常:

11/04/16 12:34:09 WARN mapred.LocalJobRunner: job_local_0002
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
    at ligon.MyMapper2.map(MyMapper2.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

我尝试更改InputFormat和OutputFormat,但没有成功,在job2中发生了类似的(虽然不一样)异常。

我的完整代码包位于: http://dl.dropbox.com/u/7361939/HW2_Q1.zip

非常感谢!

2 个答案:

答案 0 :(得分:0)

问题在于,在作业2中,KeyValueTextInputFormat生成类型的键值对,并且您尝试使用接受的Mapper处理它们,从而导致ClassCastException。最好的办法是将映射器更改为接受并从Text转换为整数。

答案 1 :(得分:0)

我遇到了同样的问题,并在不久前找到了解决方案。由于您使用IntArrayWritable作为reducer的输出,因此易于编写,稍后将数据读取为二进制文件。

第一份工作:

    job1.setOutputFormatClass(SequenceFileOutputFormat.class);

    job1.setOutputKeyClass(IntWritable.class);
    job1.setOutputValueClass(IntArrayWritable.class);

第二份工作:

    job2.setInputFormatClass(SequenceFileInputFormat.class);

这适用于您的情况