如果条件在循环和霓虹灯SIMD

时间:2015-05-04 16:21:58

标签: arm simd neon cortex-a

我正在尝试为下面的标量代码编写氖级SIMD:

标量代码:

  int *xt = new int[50];
  float32_t input1[16] = {12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,};
  float32_t input2[16] = {13.0f,12.0f,9.0f,12.0f,12.0f,12.0f,12.0f,12.0f,13.0f,12.0f,9.0f,12.0f,12.0f,12.0f,12.0f,12.0f};
  float32_t threshq    = 13.0f;
  uint32_t corners_count = 0;
  float32_t threshq =13.0f;
  for (uint32_t x = 0; x < 16; x++)
  {
      if ( (input1[x] == input2[x]) && (input2[x] > threshq) )
         {
             xt[corners_count] = x ;
         }
 }

氖:

   float32x4_t t1,t2,t3;
   uint32x4_t rq1,rq2,rq3;
   t1 = vld1q_f32(input1);       // 12 12 12 12
   t2 = vld1q_f32(input2);       // 13 12 09 12
   t3 = vdupq_n_f32(threshq);    // 13 13 13 13
   rq1 = vceqq_f32(t1,t2);       // condition to check for input1 equal to input2
   rq2 = vcgtq_f32(t1,t3);       // condition to check for input1 greater than to threshold
   rq3 = vandq_u32(rq1,rq2);     // anding the result of two conditions
   for( int i = 0;i < 4; i++){
    corners_count = corners_count + rq3[i];
   //...Not able to write a logic in neon for the same
   }

我无法在Neon中编写逻辑。 任何人都可以真正指导我。我完全厌倦了思考这个逻辑

1 个答案:

答案 0 :(得分:1)

由于循环中存在依赖关系,我认为您需要将代码重新分解为SIMD循环,然后是标量循环。伪代码:

// SIMD loop
for each set of 4 float elements
    apply SIMD threshold test
    store 4 x bool results in temp[]

// scalar loop
for each bool element in temp[]
    if temp[x]
        xt[corners_count] = x
        corner_count++

通过这种方式,您可以在大多数操作中获得SIMD的好处,并且您只需要在最后一部分使用标量代码。