Question

在ANSI C中，以下哪项更快，为什么？或者它不会有所作为，因为它将被编译为相同的？

int main(void) {
    double width = 4.5678;
    double height = 6.7890;

    double perimeter = width + width + height + height;

    return 0;
}

或以下内容：

int main(void) {
    double width = 4.5678;
    double height = 6.7890;

    double perimeter = width * 2 + height * 2;

    return 0;
}

Answer 1

编译器会想出来，并使用最快的东西。可能甚至在编译时计算perimeter。

您应该专注于编写最易读的代码。这有助于人类和编制者理解你的意图。

Answer 2

如果你想看看编译器会对某些东西做什么，不要给它编译时常量数据。另外，请勿在{{1}}中执行此操作，因为gcc会对＆＃34;冷＆＃34;进行一些优化。函数，main会自动标记。

我在godbolt上尝试过，看看不同的编译器版本是否有所不同。

main

使用double f1(double width, double height) { return width + width + height + height; // compiles to ((width+width) + height) + height // 3 instructions, but can't happen in parallel. latency=12c(Skylake), 9c(Haswell) // with -march=haswell (implying -mfma), // compiles to fma(2.0*width + height) + height, with 2.0 as a memory operand from rodata. } double f2(double width, double height) { return width * 2 + height * 2; // compiles to (width+width) + (height+height) // 3 instructions, with the first two independent. Latency=8c(Skylake), 7c(Haswell) // with -mfma: compiles to weight=+weight; fma(2.0*height + weight) } double f3(double width, double height) { return (height + width) * 2; // compiles to 2 instructions: tmp=w+h. tmp+=tmp // latency=8(Skylake), 6(Haswell) } double f4(double width, double height) { return (width + height) * 2; // compiles to 3 instructions, including a move because gcc (even 5.2) is dumb and generates the temporary in the wrong register. // clang is fine and does the same as for f3() }，它们都会生成2条指令：-ffast-math。 gcc 4.9.2,5.1和5.2都在许多序列中生成额外的tmp=(width+height); tmp+=tmp;，即使使用-ffast-math也是如此。当然，他们没有3操作数AVX版本的问题，但AVX太新了，无法检查它是否支持。（即使是Silvermont也不支持它。）

添加操作的4次性能与C次增加的乘法操作的2次性能

2 个答案: