Dispatching

October 09, 2022

The compute pipeline is compiled, a descriptor set created and a buffer filled with initial values. So the remaining step is to invoke the compute shader for each particle every frame to receive discrete changes and render them.

But instead of recording draw calls you record a dispatch with a selected amount of invocations. A support function will be able to calculate the required amount of work group invocations to launch a thread for each particle depending on the work groups size. Similar as with draw calls the used descriptor sets have to be provided with their binding indices.

auto cmdStream = core.createCommandStream(vkcv::QueueType::Graphics);
/* Requesting a graphics queue for the command stream here is fine because
 * most devices expose at least one queue to support graphics and compute tasks
 *
 * Using such a queue allows dropping any synchronization between multiple queues
 * by recording compute dispatching and draw calls in just one command buffer.
 */

vkcv::PushConstants pushConstants = vkcv::pushConstants<float>();
pushConstants.appendDrawcall(dt); // append delta time for the whole dispatch

core.recordComputeDispatchToCmdStream(
  cmdStream,       // command stream
  computePipeline, // compute pipeline
  
  vkcv::dispatchInvocations(
    particles.size(), // amount of global invocations targeted
    256               // size of work group (local size in shader)
  ),
  
  {
    // use the written descriptor for all the invocations
    vkcv::useDescriptorSet(0 /* binding = 0 */, descriptorSet)
  },
  
  pushConstants       // push constants
);

/* Record a memory barrier to ensure no thread is writing 
 * to the storage buffer while rendering.
 */
core.recordBufferMemoryBarrier(cmdStream, particleBuffer.getHandle());

// actual rendering of the particles...

core.prepareSwapchainImageForPresent(cmdStream);
core.submitCommandStream(cmdStream);

Between the compute pass and the rendering should be a memory barrier in case your graphics shaders want to read from the buffers you are writing to in the compute shader. Otherwise you get unreliable results because of potential parallel memory access.

Now to end this guide with an idea how to render the simulated particles. You can reuse some of the things for rendering a single triangle. But instead of rendering one instance of the triangle you could render as many instances as particles are simulated. Then in the vertex shader you could use the storage buffer to read positions and translate each triangle instance to a particles position. Happy coding!

Search This Blog

VkCV Tutorials

Dispatching

Popular posts from this blog

Introduction

Application development

First setup