Question

我正在遍历具有+ 2000万个条目的HashMap。在每次迭代中，我再次使用+2000万个条目遍历HashMap。

 HashMap<String, BitSet> data_1 = new HashMap<String, BitSet>
HashMap<String, BitSet> data_2 = new HashMap<String, BitSet>

我根据线程数将data_1分成多个块（线程=核心，我有四个核心处理器）。

我的代码要花费20多个小时才能执行。不包括不将结果存储到文件中。

1）如果我想存储每个线程的结果而不重叠到文件中，我该怎么办这样做吗？。

2）我该如何更快地完成以下任务？

3）如何根据核数动态创建块？

  int cores = Runtime.getRuntime().availableProcessors();
  int threads = cores;

  //Number of threads
  int Chunks = data_1.size() / threads;


      //I don't trust with chunks created by the below line, that's why i created chunk1, chunk2, chunk3, chunk4 seperately and validated them.
      Map<Integer, BitSet>[] Chunk= (Map<Integer, BitSet>[]) new HashMap<?,?>[threads];

4）如何使用for循环创建线程？我在做什么对吗？

ClassName thread1 = new ClassName(data2, chunk1);
ClassName thread2 = new ClassName(data2, chunk2);
ClassName thread3 = new ClassName(data2, chunk3);
ClassName thread4 = new ClassName(data2, chunk4);

 thread1.start();
 thread2.start();
 thread3.start();
 thread4.start();

 thread1.join();
 thread2.join();
 thread3.join();
 thread4.join();

我的代码的表示形式

Public class ClassName {
Integer nSimilarEntities = 30;

    public void run() {


            for (String kNonRepeater : data_1.keySet()) {

                    // Extract the feature vector
                      BitSet vFeaturesNonRepeater = data_1.get(kNonRepeater);


                    // Calculate the sum of 1s (L2 norm is the sqrt of this)
                    double nNormNonRepeater = Math.sqrt(vFeaturesNonRepeater.cardinality());

            // Loop through the repeater set
                    double nMinSimilarity = 100;
                    int nMinSimIndex = 0;

                    // Maintain the list of top similar repeaters and the similarity values


                    long dpind = 0;
                    ArrayList<String> vSimilarKeys = new ArrayList<String>();
                    ArrayList<Double> vSimilarValues = new ArrayList<Double>();

                    for (String kRepeater : data_2.keySet()) {
                        // Status output at regular intervals
                        dpind++;
                        if (Math.floorMod(dpind, pct) == 0) {
                            System.out.println(dpind + " dot products (" + Math.round(dpind / pct) + "%) out of "
                                    + nNumSimilaritiesToCompute + " completed!");
                        }

                        // Calculate the norm of repeater, and the dot product

                        BitSet vFeaturesRepeater = data_2.get(kRepeater);


                        double nNormRepeater = Math.sqrt(vFeaturesRepeater.cardinality());
                        BitSet vTemp = (BitSet) vFeaturesNonRepeater.clone();
                        vTemp.and(vFeaturesRepeater);
                        double nCosineDistance = vTemp.cardinality() / (nNormNonRepeater * nNormRepeater);



                    //  queue.add(new MyClass(kRepeater,kNonRepeater,nCosineDistance));

                    //  if(queue.size() > YOUR_LIMIT)
                    //          queue.remove();

                        // Don't bother if the similarity is 0, obviously
                        if ((vSimilarKeys.size() < nSimilarEntities) && (nCosineDistance > 0)) {

                            vSimilarKeys.add(kRepeater);
                            vSimilarValues.add(nCosineDistance);

                            nMinSimilarity = vSimilarValues.get(0);
                            nMinSimIndex = 0;
                            for (int j = 0; j < vSimilarValues.size(); j++) {
                                if (vSimilarValues.get(j) < nMinSimilarity) {
                                    nMinSimilarity = vSimilarValues.get(j);
                                    nMinSimIndex = j;
                                }
                            }
                        } else { // If there are more, keep only the best
                            // If this is better than the smallest distance, then remove the smallest
                            if (nCosineDistance > nMinSimilarity) {
                                // Remove the lowest similarity value
                                vSimilarKeys.remove(nMinSimIndex);
                                vSimilarValues.remove(nMinSimIndex);
                                // Add this one
                                vSimilarKeys.add(kRepeater);
                                vSimilarValues.add(nCosineDistance);
                                // Refresh the index of lowest similarity value
                                nMinSimilarity = vSimilarValues.get(0);
                                nMinSimIndex = 0;
                                for (int j = 0; j < vSimilarValues.size(); j++) {
                                    if (vSimilarValues.get(j) < nMinSimilarity) {
                                        nMinSimilarity = vSimilarValues.get(j);
                                        nMinSimIndex = j;
                                    }
                                }
                            }
                        } // End loop for maintaining list of similar entries

                    }// End iteration through repeaters

            for (int i = 0; i < vSimilarValues.size(); i++) {
                    System.out.println(Thread.currentThread().getName() + kNonRepeater + "|" + vSimilarKeys.get(i) + "|" + vSimilarValues.get(i));
          }
       }
   }
}

最后，如果不是多线程的话，java中还有其他方法可以减少时间复杂度。

Answer 1

计算机的工作方式与您需要手工完成的工作类似（它一次处理更多的数字/位，但问题是相同的。

如果进行加法运算，则时间与数字的大小成正比。

如果进行乘法或除数，则与数字大小的平方成比例。

对于计算机，大小取决于32或64个有效位的倍数，具体取决于实现方式。

Answer 2

我想说这个任务适合并行流。如果有时间，请不要犹豫，看看这个概念。并行流无缝地全速使用多线程。

顶级处理如下：

data_1.entrySet()
      .parallelStream()
      .flatmap(nonRepeaterEntry -> processOne(nonRepeaterEntry.getKey(), nonRepeaterEntry.getValue(), data2))
      .forEach(System.out::println);

您应该为processOne函数提供如下原型：

Stream<String> processOne(String nonRepeaterKey, String nonRepeaterBitSet, Map<String BitSet> data2);

它将返回准备好的字符串列表以及您现在打印的内容到文件中。

要在其中创建流，可以先准备列表列表，然后在return语句中将其变成流：

return list.stream();

即使可以在流中处理内部循环，也不鼓励在内部进行并行流-您已经具有足够的并行性。

对于您的问题：

1）如果我想存储每个线程的结果而不重叠到文件中，我该怎么做？。

任何日志记录框架（logback，log4j）都可以处理。并行流可以处理它。您也可以将准备好的行存储到某些队列/数组中，并在单独的线程中打印它们。不过，需要一些注意，现成的解决方案可以更轻松，更有效地完成这些任务。

2）我该如何更快地完成以下任务？

优化和并行化。在正常情况下，以为您正在使用超线程，您可以获得number_of_threads / 1.5..number_of_threads倍的处理速度，但是这取决于您不那么并行的事情以及底层的实现。

3）如何根据核数动态创建块？

您不必。列出任务列表（每个data_1条目1个任务），并为它们提供执行程序服务-这已经足够大。您可以使用带有线程数作为参数的FixedThreadPool，它将处理将均匀分布的任务。

不是，您应该创建任务类，在threadpool.submit上为每个任务获取Future，最后运行一个为每个Future进行.get的循环。它将像行为一样隐式地执行fork-join来降低主线程的执行器处理速度。

4）直接线程创建是过时的技术。建议使用某种执行程序服务，并行流等。对于循环处理，您需要创建块列表，并在循环创建线程中将其添加到线程列表中。并在另一个循环中加入到每个线程列表中。

临时优化：

1）使Repeater类存储密钥，位集和基数。预处理您的哈希集，将其转换为Repeater实例，并一次计算基数（即，不是针对每个内部循环运行）。这将为您节省2000万*（20mil-1）个.cardinality（）的调用。您仍然需要称呼它。

2）用有限大小的priorityQueue代替相似的键，相似的值。它可以更快地处理30个元素。

看看有关PriorityQueue的infor问题： Java PriorityQueue with fixed size

3）如果nonRepeater的基数已经为0-bitSet，并且它永远不会增加结果基数，那么您可以跳过对nonRepeater的处理，并且您将滤除所有0距离值。

4）您可以跳过（从第1页优化中创建的临时列表中删除）每个基数为零的中继器。像第3页中一样，它永远不会产生任何成果。

多线程用法

2 个答案:

即使可以在流中处理内部循环，也不鼓励在内部进行并行流-您已经具有足够的并行性。