Question

我测试了三个C ++程序代码，将数据从内存加载到CPU，执行简单的+或x操作（计算时间），然后报告结果。这三个代码具有相同的结构，但具有不同的数据类型（int，double，float）。

测试结果为：当三个代码的数据大小为2x时，时间为2x。

但是，我有以下观察。

观察1：未使用优化时，时间慢2倍。但是，这很奇怪，因为加载时间（瓶颈）不应受编译器的影响。

观察2：当没有添加编译器优化且数据大小固定时，双类型程序代码的时间比int类型和浮点类型程序代码快2倍（256MB，512MB，1024MB，2048MB，4096MB ）。这也很奇怪，因为双倍应该是最慢的。

观察备注2：当我添加编译器优化（O，O2，O3）时，三个代码的时间相似。

附加代码在这里：

int main()
{
     float value;
     double totalTimeDifference;

     const int numberOFElements=178956970; //4GB for 6 arrays in total 
     float*FLOAT_Array_one=new float[numberOFElements];
     float*FLOAT_Array_two=new float[numberOFElements];
     float*FLOAT_Array_three=new float[numberOFElements];
     float*FLOAT_Array_four=new float[numberOFElements];
     float*FLOAT_Array_five=new float[numberOFElements];
     float*FLOAT_Array=new float[numberOFElements];

     srand(time(NULL));
     for(int i=0;i<numberOFElements;i++)
     {
         FLOAT_Array_one[i]=rand()% 400;
         FLOAT_Array_two[i]=rand()% 400;
         FLOAT_Array_four[i]=rand()% 400;
         FLOAT_Array_five[i]=rand()% 400;
     }

     timeval tim1;
     timeval tim2;
     gettimeofday(&tim1,NULL);

     //****************************//
     for(int i=0;i<numberOFElements;i++)
     {
         FLOAT_Array[i]=FLOAT_Array_one[i]+FLOAT_Array_two[i];
     }
     //****************************//

     //****************************//
     for(int i=0;i<numberOFElements;i++)
     {
         FLOAT_Array_three[i]=FLOAT_Array_four[i]*FLOAT_Array_five[i];
     }
     //****************************//
     gettimeofday(&tim2,NULL);

     double t1=tim1.tv_sec+(tim1.tv_usec/1000000.0);
     double t2=tim2.tv_sec+(tim2.tv_usec/1000000.0);

     for(int i=0;i<numberOFElements;i++)
     {
         if(i%2==0)
             value=value+FLOAT_Array[i]+FLOAT_Array_three[i];
         else
             value=value-FLOAT_Array[i]-FLOAT_Array_three[i];
     }

     totalTimeDifference=t2-t1; 
     cout<<value<<endl;
     cout<<totalTimeDifference<<endl;
}

Answer 1

一些有根据的猜测：

由于您在两次时间检查之间的“加载”中进行算术运算，因此只有在优化时才可以使用SSE流式浮点数学指令。这应该会带来显着的加速。
如果您使用的是64位操作系统，则对于两个半字的内存访问需要的时间超过一个整数。在64位程序中，浮点数仅占32位，随后对半字的访问比单个访问整个字所需的时间更长。然而，我不了解的部分是优化如何能够解决这个问题。

编译器优化如何影响数据加载速度？

1 个答案: