
时间:2016-04-24 18:45:38

标签: opencl


有没有办法让4 x float4的小数组像4个单独的float4变量一样快速无缝地工作?


因此,如果正在打开一个括号,那么我应该暂时将v的值存储到let {'1}}(表示括号深度为0的当前值),然后{{1} }可以重置为0并执行括号内的操作,如果该括号内还有另一个括号,我会将v0放入v,依此类推v和{{} 1}}随着我们深入嵌套括号。这样我就可以在括号内应用乘法,这只会影响在括号内创建的其他内容而不影响其余内容。


1 个答案:

答案 0 :(得分:2)

From what I've seen, compilers will usually put small arrays declared in the private address space directly in registers. Of course, this is not a guarantee and there are probably different parameters that intervene in the activation of that optimization, such as:

  • Array size;
  • Register pressure;
  • Cost of spilling;
  • And others.

As is usual with optimizations, the only way to be sure is to verify what the compiler is doing by checking the generated assembly.

So if a bracket is being opened then I should temporarily store the value of v into let's say v0 (to represent the current value at the bracket depth of 0), then v can be reset to 0 and perform the operations inside the bracket, and if there's yet another bracket inside that bracket I'd put v into v1 and so on with v2 and v3 as we go deeper into nested brackets. This is so that I can for instance apply a multiplication inside a bracket that would only affect the other things created inside that bracket and not the rest.

I don't think that would help. The compiler optimizes across scopes anyway. Just do the straightforward thing and let the optimizer do its job. Then, if you notice suboptimal codegen, you may start thinking about an alternate solution, but not before.