Question

我正在考虑以下C ++程序：

#include <iostream>
#include <limits>


int main(int argc, char **argv) {
   unsigned int sum = 0;
   for (unsigned int i = 1; i < std::numeric_limits<unsigned int>::max(); ++i) {
      double f = static_cast<double>(i);
      unsigned int t = static_cast<unsigned int>(f); 
      sum += (t % 2);
   }
   std::cout << sum << std::endl;
   return 0; 
}

我使用gcc / g ++编译器，g ++ -v给出了gcc版本4.7.2 20130108 [gcc-4_7-branch revision 195012]（SUSE Linux）。我正在运行openSUSE 12.3（x86_64）并拥有Intel（R）Core（TM）i7-3520M CPU。

运行

g++ -O3 test.C -o test_64_opt
g++ -O0 test.C -o test_64_no_opt
g++ -m32 -O3 test.C -o test_32_opt
g++ -m32 -O0 test.C -o test_32_no_opt

time ./test_64_opt
time ./test_64_no_opt
time ./test_32_opt
time ./test_32_no_opt

产量

2147483647

real    0m4.920s
user    0m4.904s
sys     0m0.001s

2147483647

real    0m16.918s
user    0m16.851s
sys     0m0.019s

2147483647

real    0m37.422s
user    0m37.308s
sys     0m0.000s

2147483647

real    0m57.973s
user    0m57.790s
sys     0m0.011s

使用float而不是double，优化的64位变体甚至可以在2.4秒内完成，而其他运行时间保持大致相同。但是，对于浮点数，我会根据优化得到不同的输出，这可能是由于处理器内部精度较高。

我知道64位可能有更快的数学运算，但我们的因子为7（浮点数近15）。

我很感激这些运行时间差异的解释。

Answer 1

问题不是32位对64位，而是缺少SSE和SSE2。当编译64位时，gcc假设它可以使用SSE和SSE2，因为所有可用的x86_64处理器都有它。

使用-msse -msse2编译32位版本，运行时差异几乎消失。

我的完整性基准测试结果：

-O3 -m32 -msse -msse2     4.678s
-O3 (64bit)               4.524s

32位与64位：大规模运行时差异

1 个答案: