Question

我有以下简单的reducer：

int i = 0;
int numPurchases = 0;
IntWritable count = new IntWritable();

@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

    i = 0;
    for (IntWritable val : values) {
        i = i + Integer.parseInt(val.toString());
        numPurchases ++;
    }
    count.set(i/numPurchases);
    numPurchases =0;
    context.write (key, count);
}

以上简单地将以下内容返回到输出：

customerId | avgPurchasePrice

上面的reducer从文件File1获取数据。两个问题：

1）我可以将购买次数numPurchases添加到输出文件中吗？关于如何实现这一点的任何指示都将非常感激

2）现在我有另一个文件File2。 File2基本上有以下内容：

customerId | customerName | customerPhone | customerAddress。

我可以执行reducer端连接，以便输出文件具有以下格式：

customerId | name | phone | avgPurchasePrice | totalPurchases？

如果有任何例子，我可以看一下吗？

Answer 1

我会建议这个，

创建两个自定义类型。 CustomerKey 和 PurchaseSummary

1） CustomerKey ：拥有客户ID，姓名和电话号码。这应该实现WritableComparable

实施public int compareTo，使其使用customerID进行比较。
覆盖toString方法。

2） PurchaseSummary ：拥有avgPurchasePrice和totalPurchases。您可以实施Writable

覆盖toString方法

我假设number totalPurchases是每个客户的条目数之和。

读取文本并创建CustomerKey的实例。该值应与您现在所做的相同
创建PurchaseSummary的实例并相应地填充其值。

hadoop减少了上下文和另一个输入文件的连接

1 个答案: