为什么在极大的浮点下,“快速反平方根”比1 / sqrt()慢?

时间:2018-07-24 01:40:04

标签: performance math square-root

以下完整代码可以将fast inverse square root的速度与1 / sqrt()进行比较。根据维基百科中的sentence,(即该算法比使用另一种方法计算平方根并通过浮点除法计算倒数要快四倍。)

但是这就是为什么我在这里:它比1 / sqrt()慢。我的代码有问题吗?请。

    #include <stdio.h>
    #include <time.h>
    #include <math.h>

    float FastInvSqrt (float number);

    int
    main ()
    {
      float x = 1.0e+100;

      int N = 100000000;
      int i = 0;

      clock_t start2 = clock (); 
      do  
        {   
          float z = 1.0 / sqrt (x);
          i++;
        }   
      while (i < N); 
      clock_t end2 = clock (); 

      double time2 = (end2 - start2) / (double) CLOCKS_PER_SEC;

      printf ("1/sqrt() spends %13f sec.\n\n", time2);

      i = 0;
      clock_t start1 = clock (); 
      do  
        {   
          float y = FastInvSqrt (x);
          i++;
        }   
      while (i < N); 
      clock_t end1 = clock (); 

      double time1 = (end1 - start1) / (double) CLOCKS_PER_SEC;



      printf ("FastInvSqrt() spends %f sec.\n\n", time1);


      printf ("fast inverse square root is faster %f times than 1/sqrt().\n", time2/time1);

      return 0;
}

float
FastInvSqrt (float x)
{
  float xhalf = 0.5F * x;
  int i = *(int *) &x;  // store floating-point bits in integer
  i = 0x5f3759df - (i >> 1);    // initial guess for Newton's method
  x = *(float *) &i;            // convert new bits into float
  x = x * (1.5 - xhalf * x * x);        // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  return x;
}

结果如下:

1/sqrt() spends      0.850000 sec.

FastInvSqrt() spends 0.960000 sec.

fast inverse square root is faster 0.885417 times than 1/sqrt().

2 个答案:

答案 0 :(得分:0)

我将代码更正如下: 1.计算随机数,而不是固定数。 2.计算while循环内的时间消耗并求和。

#include <stdio.h>
#include <time.h>
#include <math.h>
#include <stdlib.h>

float FastInvSqrt (float number);

int
main ()
{
  float x=0;
  time_t t;

  srand((unsigned) time(&t));

  int N = 1000000;
  int i = 0;

  double sum_time2=0.0;

  do  
    {   
      x=(float)(rand() % 10000)*0.22158;
  clock_t start2 = clock (); 
      float z = 1.0 / sqrt (x);
  clock_t end2 = clock (); 
        sum_time2=sum_time2+(end2-start2);
      i++;
    }   
  while (i < N); 


  printf ("1/sqrt() spends %13f sec.\n\n", sum_time2/(double)CLOCKS_PER_SEC);

  double sum_time1=0.0;

  i = 0;
  do  

    {
      x=(float)(rand() % 10000)*0.22158;
  clock_t start1 = clock ();
      float y = FastInvSqrt (x);
  clock_t end1 = clock ();
        sum_time1=sum_time1+(end1-start1);
      i++;
    }
  while (i < N);

  printf ("FastInvSqrt() spends %f sec.\n\n", sum_time1/(double)CLOCKS_PER_SEC);

  printf ("fast inverse square root is faster %f times than 1/sqrt().\n", sum_time2/sum_time1);

  return 0;
}

float
FastInvSqrt (float x)
{
  float xhalf = 0.5F * x;
  int i = *(int *) &x;  // store floating-point bits in integer
  i = 0x5f3759df - (i >> 1);    // initial guess for Newton's method
  x = *(float *) &i;            // convert new bits into float
  x = x * (1.5 - xhalf * x * x);        // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  //x = x * (1.5 - xhalf * x * x);      // One round of Newton's method
  return x;
}

但是快速平方根反比1 / sqrt()还慢。

1/sqrt() spends      0.530000 sec.

FastInvSqrt() spends 0.540000 sec.

fast inverse square root is faster 0.981481 times than 1/sqrt().

答案 1 :(得分:0)

一个函数可以精确地减少计算域,计算复杂度会更低(意味着可以更快地计算)。这可以被认为是针对函数定义的子集优化函数形状的计算,或者就像搜索算法一样,每个算法都最适合特定类型的输入(无免费午餐定理)。

因此,将这个函数用于区间 [0, 1] 之外的输入(我认为它是优化/设计的)意味着在输入的子集中使用它,其复杂性比其他可能专门化的更差(更高)计算平方根的函数的变体。

您在库中使用的 sqrt() 函数本身(可能)也进行了优化,因为它在某种 LUT 中预先计算了值(作为进一步近似的初始猜测);使用这样一个更“通用的函数”(意味着它覆盖了更多的域并试图通过预计算来提高它的效率;或者消除冗余计算,但这是有限的;或者在运行时最大化数据重用)有其复杂性限制,因为在用于间隔的预计算之间的选择越多,决策开销就越大;因此,在编译时知道您对 sqrt 的所有输入都在区间 [0, 1] 中将有助于减少运行时决策开销,因为您会提前知道要使用哪个专门的近似函数(或者您可以生成专门的近似函数)每个感兴趣区间的函数,在编译时 -> 参见元编程)。