Question

我希望粒子会随时增加。我得到advice将缓冲区值设置得更高，以便我可以使用粒子的数量。我在想的是我将最大计数大小设置为缓冲区，然后在shader中，我将有一个struct数组来获取粒子属性。

我在swift：

中有这个

var vectMaxCount = 10
var metalvects = [float3(0.0,0.0,0.0),float3(1.0,0.0,0.0),float3(2.0,0.0,0.0)]
var vectBuffer: MTLBuffer!

然后我注册了buffer：

vectBuffer  = device!.makeBuffer(length: MemoryLayout<float3>.size * vectMaxCount, options: [])

并相应地更新buffer：

...
command_encoder.setBuffer(vectBuffer, offset: 0, at: 2)
var bufferPointer = vectBuffer.contents()
memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.size * vectMaxCount)

let threadGroupCount = MTLSizeMake(8, 8, 1)
let threadGroups = MTLSizeMake(drawable.texture.width / threadGroupCount.width, drawable.texture.height / threadGroupCount.height, 1)
command_encoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)
command_encoder.endEncoding()
command_buffer.present(drawable)
command_buffer.commit()

并尝试从metal文件中获取它：

struct Vects
{
    float3 position[100];
};

kernel void compute(texture2d<float, access::write> output [[texture(0)]],
                    constant Vects &vects [[buffer(2)]],
                    uint2 gid [[thread_position_in_grid]]) {
...
}

我收到了一个错误：

validateComputeFunctionArguments：727：断言失败`（长度 - offset（））在索引2处的缓冲区绑定时必须> = 1600 vects [0]'。

表示行command_encoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)给我错误。我读了一些关于buffer binding的内容，我认为这是我发送问题的threadGroupCounts或ThreadGroup的方式。

如果我将float3 position[100];更改为float3 position[7];，它仍然可以使用。超过7的任何东西都会得到类似的错误。

我该如何解决这个问题？

是否有一个很好的公式来估算threadGroups和threadGroupCount？甚至经验法则呢？

Update01

根据Ken Thomases的回答，我将我的代码更改为：

迅速：

vectBuffer  = device!.makeBuffer(length: MemoryLayout<float3>.stride * metalvects.count, options: [])
...
memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.stride * metalvects.count)
...

金属：

struct Vects
{
    float3 position[3];
};
...

现在确实有效。但是，我如何分配更高的缓冲存储器，以便稍后在应用中使用，如this post提到的那样？

Answer 1

这里有很多问题。

您正在定义具有特定尺寸的Vects。这允许Metal检查索引2处的缓冲区大小是否足以匹配vects变量的大小。它抱怨是因为它不够大。（例如，如果vects被声明为constant float3 *vects [[buffer(2)]]，则无法执行此检查。）

其次，缓冲区的大小 - MemoryLayout<float3>.size * vectMaxCount - 不正确。它没有考虑float3的对齐方式以及[float3]数组中元素之间存在的填充。如documentation for MemoryLayout所述，在计算分配大小时，您应始终使用stride，而不是size。

这就是Vects::position长8个或更多元素时失败的原因。您可以期望它从11个元素开始，因为vectMaxCount是10，但您的缓冲区比vectMaxCount float3的数组短。具体来说，您的缓冲区长度为10 * 12 == 120字节。 float3的步幅是16和120/16 == 7.5。

如果您在分配缓冲区时从size切换到stride并将Vects::position的元素数更改为10以匹配vectMaxCount，那么您将超越此直接问题。但是，潜伏着其他问题。

您当前的计算功能不知道实际填充了多少vects.position个元素。您需要传递元素的实际数量。

这一行：

memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.size * vectMaxCount)

不正确（即使用size替换stride后）。它读过metalvects的结尾。那是因为metalvects中的元素数量少于vectMaxCount。您应该使用metalvects.count代替vectMaxCount。

金属着色语言 - 缓冲绑定

1 个答案: