Question

背景：几年前我在学校里第一次学习C ++和Java，但是在过去9年左右的时间里我没有做太多编程，因为我以前的职业生涯并不需要它。

我决定调查项目Euler以了解我的编程并解决问题14，该问题要求找到具有最长Collatz序列的一百到一百万之间的整数。（Collatz序列继续进行，给定起始编号，将数字乘以3，如果奇数则加1，或者如果数字为偶数则将数字减半。继续进行直到数字达到1.）

我首先使用蛮力解决了这个问题，如下面的代码所示。

int n;
long temp; // long is necessary since some Collatz sequences go outside scope of int
int[] n_length = new int[1000000];
    for(n = 0; n < 1000000; n++){
        temp = n + 1;
        n_length[n] = 1;
        while (temp > 1){
            n_length[n]++;
            if (temp % 2 == 0) temp = temp/2;
            else temp = 3*temp + 1;

        }
    }
int max = 0;
    int max_index = 0;
    for (int i = 0; i < 1000000; i++){
        if (n_length[i] > max){
            max = n_length[i];
            max_index = i;
        }
    }
    System.out.println("The number with the longest Collatz sequence is " + (max_index + 1));

我认为这种方法效率低下，因为它运行算法的频率要高得多。任何数字都是前一个数字的Collatz序列的一部分，它实际上已经确定了它的序列，因此你最终计算每一个数字在Collatz序列中出现时的序列。

我决定最好在Collatz序列中将每个数字存储在地图中，因此您只需要计算一次。为此，我使用了TreeMap，其中数字用作键，关联的Collatz序列长度作为值，并使用递归函数在Collatz序列中出现时将每个数字插入到地图中。（参见下面的代码。）

public static TreeMap<Long, Integer> tm = new TreeMap<Long, Integer>();
public static void main(String[] args) {

    tm.put((long)1, 1);
    int maxVal = 1;
    long keyWithMaxVal = 1;
    int maybeMax;
    for (long i = 2; i <= 1000000; i++){
        if(!(tm.containsKey(i))){
            maybeMax = addKey(i);
            if (maybeMax >= maxVal){
                maxVal = maybeMax;
                keyWithMaxVal = i;
            }
        }
    }
    System.out.println("The number with the longest Collatz sequence is " + keyWithMaxVal + " with length " + maxVal);
}
public static int addKey(long key){

    while (!(tm.containsKey(key))){
        if (key % 2 == 0){
            tm.put(key, 1 +addKey(key/2));
        }
        else{
            tm.put(key, 1 + addKey(3*key + 1));
        }
    }
    return tm.get(key);
}

我使用了TreeMap，因为它会在输入时自动对键进行排序，因此当我遍历for循环时，我可以快速检查键是否已经插入并避免调用addKey方法来添加键，除非我必须这样做。我认为这个算法会快得多。

然而，当我实际运行代码时，我惊讶地发现蛮力算法瞬间得出答案，而递归TreeMap算法需要更长的时间，大约6秒。当我修改我的程序达到500万而不是100万时，差异变得更加明显。我为每个程序添加了一些代码，以确保第二个程序的工作量比第一个程序少，实际上我确定addKey方法只为每个键调用一次，而while循环需要迭代的次数在第一个程序中等于所有数字Collatz序列的长度总和（即比第二个算法中的方法调用数更多）。

那么为什么第一个算法比第二个算法快得多？是因为第一种算法中的基元数组比第二种算法中的TreeMap of Wrapper对象需要的资源少吗？正在搜索地图以检查密钥是否已经比我预期的要慢（不应该是记录时间吗？）？需要大量方法调用的递归方法本身是否较慢？或者还有其他我忽视的东西

Answer 1

除了在其他答案中已经提到的原因之外，基于阵列的实现更快的主要原因可能是由于它具有 CPU缓存效果的很多好处：< / p>

您的两个独立的小而紧的循环将完全适合现代CPU的 L0指令缓存（它可以在Sandy Bridge上包含1,536个解码的微操作）。顺序运行这两个将比具有更多指令的单个循环快得多，这不适合该缓存。鉴于第二个循环非常小，很可能它的指令已被预取并解码为微操作，并且适合循环块缓冲区（28微操作）。

^{来源：hardwaresecrets.com}
关于数据访问，有一个很好的参考位置。在第一个和第二个循环中，您执行顺序访问。预取器也有帮助，因为您的访问模式是完全可预测的。

与这两个主题相关，我建议您观看这个出色的“技能演员”：95% of performance is about clean representative models Martin Thompson，更详细地讨论这些和其他主题。

Answer 2

此代码测试为1到5百万之间的数字找到最长的collatz序列所需的时间。它使用三种不同的方法：迭代，递归和将结果存储在哈希映射中。

输出看起来像这样

iterative
time = 2013ms
max n: 3732423, length: 597
number of iterations: 745438133

recursive
time = 2184ms
max n: 3732423, length: 597
number of iterations: 745438133

with hash map
time = 7463ms
max n: 3732423, length: 597
number of iterations: 15865083

因此，对于哈希映射解决方案，程序必须采取的步数几乎要小50倍。尽管它慢了3倍，我认为其主要原因是对数字的简单数学运算，例如添加，乘法等比哈希映射上的操作快得多。

import java.util.function.LongUnaryOperator;
import java.util.HashMap;

public class Collatz {
  static int iterations = 0;
  static HashMap<Long, Long> map = new HashMap<>();

  static long nextColl(long n) {
    if(n % 2 == 0) return n / 2;
    else return n * 3 + 1;
  }

  static long collatzLength(long n) {
    iterations++;
    int length = 1;
    while(n > 1) {
      iterations++;
      n = nextColl(n);
      length++;
    }
    return length;
  }

  static long collatzLengthMap(long n) {
    iterations++;
    if(n == 1) return 1;
    else return map.computeIfAbsent(n, x -> collatzLengthMap(nextColl(x)) + 1);
  }

  static long collatzLengthRec(long n) {
    iterations++;
    if(n == 1) return 1;
    else return collatzLengthRec(nextColl(n)) + 1;
  }

  static void test(String msg, LongUnaryOperator f) {
    iterations = 0;
    long max = 0, maxN = 0;
    long start = System.nanoTime();
    for(long i = 1; i <= 5000000; i++) {
      long length = f.applyAsLong(i);
      if(length > max) {
        max = length;
        maxN = i;
      }
    }
    long end = System.nanoTime();
    System.out.println(msg);
    System.out.println("time = " + ((end - start)/1000000) + "ms");
    System.out.println("max n: " + maxN + ", length: " + max);
    System.out.println("number of iterations: " + iterations);
    System.out.println();
  }

  public static void main(String[] args) {
    test("iterative", Collatz::collatzLength);
    test("recursive", Collatz::collatzLengthRec);
    test("with hash map", Collatz::collatzLengthMap);
  }
}

Answer 3

我对您的代码进行了一些更改，但它似乎更快，但仍然不是即时的。

一般来说，我试图摆脱不必要的，重复的地图访问。

用HashMap替换TreeMap会将一些O（log n）操作更改为O（1）。你从来没有真正使用TreeMap的sorted属性，只是它的contains方法。

在主循环中向后移动会减少maybeMax >= maxVal条件为真的次数。

import java.util.HashMap;

public class Test {
  public static HashMap<Long, Integer> tm = new HashMap<Long, Integer>();

  public static void main(String[] args) {
    tm.put((long) 1, 1);
    int maxVal = 1;
    long keyWithMaxVal = 1;
    int maybeMax;
    for (long i = 1000000; i >= 2; i--) {
      if (!(tm.containsKey(i))) {
        maybeMax = addKey(i);
        if (maybeMax >= maxVal) {
          maxVal = maybeMax;
          keyWithMaxVal = i;
        }
      }
    }
    System.out.println("The number with the longest Collatz sequence is "
        + keyWithMaxVal + " with length " + maxVal);
  }

  public static int addKey(long key) {
    Integer boxedValue = tm.get(key);
    if (boxedValue == null) {
      if (key % 2 == 0) {
        int value = 1 + addKey(key / 2);
        tm.put(key, value);
        return value;
      } else {
        int value = 1 + addKey(3 * key + 1);
        tm.put(key, value);
        return value;
      }
    }
    return boxedValue.intValue();
  }
}

Answer 4

我认为自动（联合）拳击是问题的根源。即使Java SE 8 Programming Guide提到它：

结果列表的性能可能很差，因为它在每次获取或设置操作时都会打包或取消装箱。它足够快，偶尔使用，但在性能关键的内循环中使用它会很愚蠢。

Answer 5

正如其他人所指出的那样，您应该切换到HashMap而不是使用TreeMap，以降低插入和检索操作的复杂性。

但是，HashMap的最佳使用取决于设置其初始容量。如果您不这样做，一旦您的插入超过默认容量，HashMap将重新分配更大的表格，您的项目将最终被重新散列到新表格中。这会降低程序的执行速度。

最小的改变是：

public static HashMap<Long, Integer> tm = new HashMap<Long, Integer>(1000000, 1.0);

HashMap(int initialCapacity, float loadFactor)
  使用指定的初始容量和加载因子构造一个空的HashMap。
  <子> Java documentation

在这里，我们声明我们希望HashMap具有1000000的容量（能够容纳那么多元素），加载因子为1.0（在重新发送之前插入必须超过容量的100％）。

Answer 6

H，我认为containtsKey对这个结果负责。

TreeMap ContainsKey是O（log（n））

https://github.com/benblack86/java-snippets/blob/master/resources/java_collections.pdf

根据http://en.wikipedia.org/wiki/Collatz_conjecture：

任何初始起始数小于100的最长进展百万是63,728,127，有949步。

我们会认为Collatz的复杂性为C。

所以，在你的第一个案例中，你有：

O（n * C + n）= O（n *（C + 1））= O（k * n）

在递归解决方案中：

O（n *（log（n）+ C * log（n）））= O（k * n * log（n））

（我不太确定递归部分，但我确定它超过1，因为在递归函数中你再次调用containsKey）

项目Euler＃14：为什么我的TreeMap算法比蛮力慢？

6 个答案: