opencl - 小数组像内核中的变量一样存储？

在我的OpenCL内核中，我需要使用通常应该是4个条目的小数组，但由于我担心该数组将如何存储（可能在比常规变量慢得多的内存中）我是而是使用4个单独的变量和一个switch-case语句来根据索引访问正确的变量。

有没有办法让4 x float4的小数组像4个单独的float4变量一样快速无缝地工作？

这就是我要做的事情：我的内核是通过查看要应用于v的操作列表来生成单个float4变量v。它按顺序运行，在列表中的操作后应用于v，但是在该列表中可以有一些括号/括号，就像在算术中隔离一组操作一样，它们可以单独完成在该支架的结果与其余支架重新合并之前。

因此，如果正在打开一个括号，那么我应该暂时将v的值存储到let {'1}}（表示括号深度为0的当前值），然后{{1} }可以重置为0并执行括号内的操作，如果该括号内还有另一个括号，我会将v0放入v，依此类推v和{{} 1}}随着我们深入嵌套括号。这样我就可以在括号内应用乘法，这只会影响在括号内创建的其他内容而不影响其余内容。

一旦括号关闭，我会检索例如v1并向其添加v2，最后所有括号都将关闭，v3将代表一系列操作的最终所需值，并写入全局缓冲区。这可以使用switch-case语句根据当前括号深度选择正确的变量，但这是非常荒谬的，因为这是数组的用途。所以我不确定最好的办法是什么。

From what I've seen, compilers will usually put small arrays declared in the private address space directly in registers. Of course, this is not a guarantee and there are probably different parameters that intervene in the activation of that optimization, such as:

Array size;
Register pressure;
Cost of spilling;
And others.

As is usual with optimizations, the only way to be sure is to verify what the compiler is doing by checking the generated assembly.

So if a bracket is being opened then I should temporarily store the value of v into let's say v0 (to represent the current value at the bracket depth of 0), then v can be reset to 0 and perform the operations inside the bracket, and if there's yet another bracket inside that bracket I'd put v into v1 and so on with v2 and v3 as we go deeper into nested brackets. This is so that I can for instance apply a multiplication inside a bracket that would only affect the other things created inside that bracket and not the rest.

I don't think that would help. The compiler optimizes across scopes anyway. Just do the straightforward thing and let the optimizer do its job. Then, if you notice suboptimal codegen, you may start thinking about an alternate solution, but not before.

小数组像内核中的变量一样存储？

1 个答案: