比较两个HashMaps并计算重复值的数量

时间:2016-11-20 21:50:33

标签: java arraylist hashmap

我创建了两个包含两个独立txt文件字符串的HashMaps。

现在,我正在尝试比较两个HashMaps并计算每个文件包含的重复值的数量。例如,如果file1和file2都包含字符串" hello"两次,我的控制台应该打印:你好2次出现。

这是我的第一个HashMap:

 List<String> word_list = new ArrayList<>();
        //Load your words to the word_list here


         while (INPUT_TEXT1.hasNext()) {
            String input_word = INPUT_TEXT1.next();

            word_list.add(input_word);

        }

        INPUT_TEXT1.close();

        String regexPattern = "[^a-zA-Z]";

        int index = 0;

        for (String s : word_list) {

            word_list.set(index++, s.replaceAll(regexPattern, "").toLowerCase());
        }

        //Find the unique words now from list
        String[] uniqueWords = word_list.stream().distinct().
                                       toArray(size -> new String[size]);
        Map<String, Integer> wordsMap = new HashMap<>();
        int frequency = 0;

        //Load the words to Map with each uniqueword as Key and frequency as Value
        for (String uniqueWord : uniqueWords) {
            frequency = Collections.frequency(word_list, uniqueWord);
            System.out.println(uniqueWord+" occured "+frequency+" times");
            wordsMap.put(uniqueWord, frequency);
        }

       //Now, Sort the words with the reverse order of frequency(value of HashMap)
       Stream<Entry<String, Integer>> topWords = wordsMap.entrySet().stream().
         sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6);

        //Now print the Top 5 words to console
        System.out.println("Top 5 Words:::");
        topWords.forEach(System.out::println);


        System.out.println("\n\n");

这是我的第二个HashMap:

List<String> wordList = new ArrayList<>();
        //Load your words to the word_list here


         while (INPUT_TEXT2.hasNext()) {
            String input_word1 = INPUT_TEXT2.next();

            wordList.add(input_word1);

        }

        INPUT_TEXT2.close();

        String regex = "[^a-zA-Z]";

        int index1 = 0;

        for (String s : wordList) {

            wordList.set(index1++, s.replaceAll(regex, "").toLowerCase());
        }

        String[] uniqueWords1 = wordList.stream().distinct().
                                       toArray(size -> new String[size]);
        Map<String, Integer> wordsMap1 = new HashMap<>();

         //Load the words to Map with each uniqueword as Key and frequency as Value
        for (String uniqueWord : uniqueWords1) {
            frequency = Collections.frequency(wordList, uniqueWord);
            System.out.println(uniqueWord+" occured "+frequency+" times");
            wordsMap.put(uniqueWord, frequency);
        }

       //Now, Sort the words with the reverse order of frequency(value of HashMap)
       Stream<Entry<String, Integer>> topWords1 = wordsMap1.entrySet().stream().
         sorted(Map.Entry.<String,Integer>comparingByValue().reversed()).limit(6)

以下是我找到重复值的原始方法:

 boolean val = wordsMap.keySet().containsAll(wordsMap1.keySet());

    for (Entry<String, Integer> str : wordsMap.entrySet()) {
        System.out.println("================= " + str.getKey());


        if(wordsMap1.containsKey(str.getKey())){
            System.out.println("Map2 Contains Map 1 Key");
        }
    }

    System.out.println("================= " + val);

有没有人有任何其他建议来实现这一目标?谢谢

修改 我怎么能计算每个单独值的出现次数?

1 个答案:

答案 0 :(得分:3)

我认为你的代码也可以运行。如果您的目标是找到更好的方法来实施上一次检查,您可以尝试这样做:

Set<String> keySetMap1 = new HashSet<String>(wordsMap.keySet());
Set<String> keySet2 = wordsMap1.keySet();
keySetMap1.retainAll(keySet2);
keySetMap1.stream().forEach(x -> System.out.println("Map2 Contains Map 1 Key: "+x));
相关问题