Shuffle and sorting after combiner

时间:2016-10-20 20:11:36

标签: hadoop mapreduce hadoop-streaming

I have a mapper, a combiner and a reducer. As I know, combiner comes before shuffle & sorting phase. But, in my case, the output from the mapper is coming sorted to the combiner.

hadoop jar hadoop_streeaming.jar \
        -input some_folder \
        -output some_folder \
        -mapper mapper.py \
        -combiner combine.py \
        -file mapper.py \
        -file combine.py

I want the results from Mapper comes unsorted to the Combiner.

For example:

I have this text:

mary
has
a
big
cat

this text is coming to the combiner in this form:

a
big
cat
has
mary

Bur, I don't want the output sorted before combiner.

0 个答案:

没有答案