Question

我需要你的帮助才能优化我的地图代码。我在MapReduce设计模式一书中使用了减少边连接的设计模式。所有工作，但我尝试改善代码，以便在加入过程中不重复键连接。

实际上，键连接是第二个表中的值，所以我想删除它。这就是为什么，我分裂我的价值并尝试删除第一个元素。但我认为这种方法并不是更好，而且成本也很高。

这是我的mapper类：

public class MapTable2 extends Mapper<Object, Text, Text, Text> {

private Text outKey = new Text();
private Text outValue = new Text();
private String tab[];
private List<String> list;
private String tmp ="";

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

    tab = value.toString().split(";");
    list = Arrays.asList(tab);
    outKey.set(list.get(0).trim());
    list.remove(0);
    for (String val : list) {
        tmp = tmp+val;
    }
    outValue.set("B" + tmp);
    context.write(outKey, outValue);
}

}

原始代码是：

public class MapTable2 extends Mapper<Object, Text, Text, Text>{

private Text outKey = new Text();
private Text outValue = new Text();
private String tab[] ;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException{


    tab = value.toString().split(";");
    outKey.set(tab[0].trim());
    outValue.set("B" + value.toString()); // outValue = outKey + value
    context.write(outKey, outValue);
}

}

您是否有一些建议来改进我的代码？

提前致谢。 Angelik

Answer 1

您可以使用此method将字符串拆分为两部分：

String[] parts = value.toString().split(";", 2);
outKey.set(parts[0].trim());
outValue.set("B" + parts[1]);

优化mapreduce代码（减少侧连接）

1 个答案: