
时间:2012-09-25 19:37:03

标签: hadoop mapreduce distributed-caching


class reduce{

void configure(args)

/*I can a particular file from the Path[] here.
I want to select the  file corresponding to the key of the reduce method and pass its
contents to the reduce method. I am not able to do this as I can't access the key of 
the reduce method.*/


void reduce(args)


1 个答案:

答案 0 :(得分:1)

解决方案是在配置步骤中将DistributedCache中的Path数组分配给类变量,如DistributedCache javadocs中所述。当然,请使用reduce代码替换地图代码。


 public static class MapClass extends MapReduceBase  
 implements Mapper<K, V, K, V> {

   private Path[] localArchives;
   private Path[] localFiles;

   public void configure(JobConf job) {
     // Get the cached archives/files
     localArchives = DistributedCache.getLocalCacheArchives(job);
     localFiles = DistributedCache.getLocalCacheFiles(job);

   public void map(K key, V value, 
                   OutputCollector<K, V> output, Reporter reporter) 
   throws IOException {
     // Use data from the cached archives/files here
     // ...
     // ...
     output.collect(k, v);