Question

我写了一个功能模板来测量时间：

#include <ctime>
template <typename FUNCTION,typename INPUT,int N>
double measureTime(FUNCTION f,INPUT inp){
  // double x;
  double duration = 0;
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      // x = f(inp);  
      f(inp);
  }
  clock_t end = clock();
  // std::cout << x << std::endl;
  return double(end-begin) / CLOCKS_PER_SEC;
}

我这样使用它：

#include <iostream>
typedef std::vector<double> DVect;
double passValue(DVect a){
    double sum = 0;
    for (int i=0;i<a.size();i++){sum += sum+a[i];}
    return sum;
}
typedef double (*passValue_type)(DVect);

int main(int argc, char *argv[]) {
    const int N = 1000;
    const int size = 10000;
    std::vector<double> v(size,0);
    std::cout << measureTime<passValue_type,DVect,N>(passValue,v) << std::endl;
}

目的是可靠地测量不同功能的CPU时间，例如按值传递与按引用传递。实际上它似乎工作得很好，但是，有时候产生的时间太短而无法测量，结果我只得到0。为了确保调用该函数，我打印了调用的结果（参见上面代码中的注释）。我想避免这种情况，我希望尽可能简化模板，所以我的问题是：

如何确保该函数真正被调用而未被优化（因为不使用返回值）？

Answer 1

我通常做这样的事情：

#include <ctime>

template <typename FUNCTION,typename INPUT,int N>
double measureTime(FUNCTION f,INPUT inp){
  double x = 0;
  double duration = 0;
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      x += f(inp);  
  }
  clock_t end = clock();
  std::cout << x << std::endl;
  // or if (x < 0) cout << x; or similar.
  // such that it doesn't ACTUALLY print anything.
  return double(end-begin) / CLOCKS_PER_SEC;
}

以上假设f实际上做了一些非常重要的事情，编译器无法弄清楚如何简化。如果f是return 6;，那么编译器会将其转换为x = 6 * N;，并且确实会得到非常短的运行时间。

如果您希望能够使用“任何”功能，您将需要做一些更聪明的事情：

template <typename FUNCTION,typename INPUT,int N, typename RET>
double measureTime(FUNCTION f,INPUT inp){
  RET x = 0;
  double duration = 0;
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      x += f(inp);  
  }
  clock_t end = clock();
  std::cout << x << std::endl;
  return double(end-begin) / CLOCKS_PER_SEC;
}

template <typename FUNCTION,typename INPUT,int N, void>
double measureTime(FUNCTION f,INPUT inp){
  clock_t begin = clock();
  for (int i=0;i<N;i++){
      f(inp);  
  }
  clock_t end = clock();
  return double(end-begin) / CLOCKS_PER_SEC;
}

[我实际上没有编译上面的代码，所以它可能有一些小缺陷，但作为一个概念应该可以工作]。

由于任何有意义的void函数都必须做一些影响周围世界的事情（输出到流，更改全局变量或调用某个系统调用），因此不会消除它。当然，调用空函数或类似函数可能会造成麻烦。

另一种方法，假设您不关心不内联调用，实际上是将测试中的函数放在一个单独的文件中，而不是让编译器从测量时间的代码中“看到”该函数[而不是使用-flto允许它在链接时内联函数] - 这样，编译器就无法知道被测函数在做什么，也不能消除调用。

应该注意的是，确实没有办法保证编译器不会消除调用，除了“使编译器无法知道函数的结果是什么”（例如使用随机） /外部源输入），或“不要让编译器知道函数的作用”。

Answer 2

许多编译器都有扩展来禁用内联函数。对于gcc，它是__attribute__((noinline))，例如：

__attribute__((noinline)) void foo() { ... }

Boost提供了一个便携式BOOST_NOINLINE宏。

测量时间时如何确保呼叫未被优化？

2 个答案: