Dispatching
The compute pipeline is compiled, a descriptor set created and a buffer filled with initial values. So the remaining step is to invoke the compute shader for each particle every frame to receive discrete changes and render them.
But instead of recording draw calls you record a dispatch with a selected amount of invocations. A support function will be able to calculate the required amount of work group invocations to launch a thread for each particle depending on the work groups size. Similar as with draw calls the used descriptor sets have to be provided with their binding indices.
auto cmdStream = core.createCommandStream(vkcv::QueueType::Graphics);
/* Requesting a graphics queue for the command stream here is fine because
* most devices expose at least one queue to support graphics and compute tasks
*
* Using such a queue allows dropping any synchronization between multiple queues
* by recording compute dispatching and draw calls in just one command buffer.
*/
vkcv::PushConstants pushConstants = vkcv::pushConstants<float>();
pushConstants.appendDrawcall(dt); // append delta time for the whole dispatch
core.recordComputeDispatchToCmdStream(
cmdStream, // command stream
computePipeline, // compute pipeline
vkcv::dispatchInvocations(
particles.size(), // amount of global invocations targeted
256 // size of work group (local size in shader)
),
{
// use the written descriptor for all the invocations
vkcv::useDescriptorSet(0 /* binding = 0 */, descriptorSet)
},
pushConstants // push constants
);
/* Record a memory barrier to ensure no thread is writing
* to the storage buffer while rendering.
*/
core.recordBufferMemoryBarrier(cmdStream, particleBuffer.getHandle());
// actual rendering of the particles...
core.prepareSwapchainImageForPresent(cmdStream);
core.submitCommandStream(cmdStream);
Between the compute pass and the rendering should be a memory barrier in case your graphics shaders want to read from the buffers you are writing to in the compute shader. Otherwise you get unreliable results because of potential parallel memory access.
Now to end this guide with an idea how to render the simulated particles. You can reuse some of the things for rendering a single triangle. But instead of rendering one instance of the triangle you could render as many instances as particles are simulated. Then in the vertex shader you could use the storage buffer to read positions and translate each triangle instance to a particles position. Happy coding!