Question

我正在尝试提高在单线程中运行的项目中当前代码的性能。该代码正在做这样的事情： 1.获取10000000个对象的第一个列表。 2.获取10000000个对象的第二个列表。 3.将这两个（经过一些更改）合并到第三个列表中。

   Instant s = Instant.now();
    List<Integer> l1 = getFirstList();
    List<Integer> l2 = getSecondList();
    List<Integer> l3 = new ArrayList<>();
    l3.addAll(l1);
    l3.addAll(l2);
    Instant e = Instant.now();
    System.out.println("Execution time: " + Duration.between(s, e).toMillis());

以下是获取和合并列表的示例方法

    private static List<Integer> getFirstList() {
    System.out.println("First list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 0; i < 10000000; i++) {
        l.add(i);
    }
    return l;
}

private static List<Integer> getSecondList() {

    System.out.println("Second list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 10000000; i < 20000000; i++) {
        l.add(i);
    }
    return l;
}
private static List<Integer> combine(List<Integer> l1, List<Integer> l2) {

    System.out.println("Third list is being created by: "+ Thread.currentThread().getName());
   ArrayList<Integer> l3 = new ArrayList<>();
   l3.addAll(l1);
   l3.addAll(l2);
    return l3;
}

我正在尝试重新编写上述代码，如下所示：

    ExecutorService executor = Executors.newFixedThreadPool(10);
    Instant start = Instant.now();
    CompletableFuture<List<Integer>> cf1 = CompletableFuture.supplyAsync(() -> getFirstList(), executor);
    CompletableFuture<List<Integer>> cf2 = CompletableFuture.supplyAsync(() -> getSecondList(), executor);

    CompletableFuture<Void> cf3 = cf1.thenAcceptBothAsync(cf2, (l1, l2) -> combine(l1, l2), executor);
    try {
        cf3.get();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (ExecutionException e) {
        e.printStackTrace();
    }
    Instant end = Instant.now();
    System.out.println("Execution time: " + Duration.between(start, end).toMillis());

    executor.shutdown();

单线程代码将在4-5秒内执行，而多线程代码则需要6+秒才能执行。我在做错什么吗？

Answer 1

您是第一次执行这些方法，因此它们以解释模式启动。为了加快执行速度，优化程序必须在运行时替换它们（称为栈上替换），这并不总是具有与重新输入优化结果相同的性能。至少对于Java 8，同时执行此操作似乎更加糟糕，因为对于Java 11，我得到了完全不同的结果。

因此第一步将是插入一个明确的调用，例如getFirstList(); getSecondList();，以了解首次不被调用时的性能。

另一方面是垃圾收集。某些JVM从较小的初始堆开始，并且每次扩展堆时都会执行完整的GC，这会影响所有线程。

因此，第二步将从-Xms1G（或者更好的是-Xms2G）开始，以合理的堆大小开始您要创建的对象的数量。

但是请注意，将中间结果列表添加到最终结果列表的第三步（在两种情况下都是顺序发生的）对性能有重大影响。

因此，第三步将用两个变量的l3 = new ArrayList<>(l1.size() + l2.size())代替最终列表的构造，以确保列表具有适当的初始容量。

这些步骤的组合使顺序执行的时间少于一秒，而在Java 8下的多线程执行的时间少于半秒。

对于Java 11来说，它的起点要好得多，仅需开箱即用大约一秒钟，这些改进带来的戏剧性提速较慢。看来，此代码的内存消耗要高得多。

Answer 2

在单线程变体中，l3.addAll(l1); l3.addAll(l2);从处理器缓存中获取l1和l2的元素（它们在执行getFirstList和getSecondList时被放置在那里）。

在并行变体中，方法combine()在具有空缓存的不同处理器内核上运行，并从主内存中获取所有元素，这要慢得多。

与CompletableFuture相比，多线程为什么比单线程代码慢？

2 个答案: