Question

我有一个简单的用例。在我的输入文件中，我只需要计算总字数的百分比分布。例如，word1存在10次，word2存在5次等，字总数为100然后我只需要显示％word1 = 10％，％word2 = 5％等等。所以每当我遇到一个单词时我只是把map（）和reduce中的context.write（word，1）我总结了各个计数。但要计算百分比，我们需要总字数。我也在计算。

因此，在获取reduce1中的word1或word2的键之前，我将获得每个单词的百分比计算的总字数。但在减少我得到这个总字数键后其他一些键。因此我无法计算百分比。

我还尝试使用context.getConfiguration（）在map的配置中设置这个总计数.setFloat（“total count”，count）;但在reduce中我无法从config中获取此值。它只是返回null。

任何建议请添加。

谢谢..

Answer 1

您需要先摘要您的文档，如下所示：

class WordCounter {
    Map<String, Integer> totals = new HashMap<String, Integer>();
    int wordCount;

    void digest(String document) {
        for (String word : document.split("\\w+")) {
            wordCount++;
            Integer count = totals.get(word);
            if (count == null)
                totals.put(word, 1);
            else
                totals.put(word, ++count);
        }
    }
}

然后，您可以使用您收集的信息对您的文档进行第二次传递，也许在每个单词上使用类似这样的方法：

String decorateWithPercent(String word) {
    return word + " (" + (totals.get(word) / wordCount) + "%)";
}

或者打印频率，例如：

void printFrequencies() {
    for (Map.Entry<String, Integer> wordCount : totals.entrySet()) {
        System.out.println(wordCount.getKey() + " " + wordCount.getValue());
    }
}

reduce（）方法中键的输入顺序是什么

1 个答案: