Compute pipeline

Compute shader

For your compute pipeline you need only one shader in comparison to a graphics pipeline. This does mean that you have to specify more things for it to optimize certain aspects. For example compute shaders are dispatched in so called work groups which then distribute their invocations on threads from your GPU. To optimize throughput and synchronization the pipeline still expects you to define the size of your work groups in a compute shader. The application will only dispatch an amount of work groups afterwards but more about this topic later on in the part about dispatching.

shaders/shader.comp

#version 450 core

// work group size (x = 256, y = 1, z = 1)
layout(local_size_x = 256) in;

// particle structure
struct Particle {
  vec3 position;
  float mass;
  vec3 velocity;
  float lifetime;
};

// particles via shared storage buffer
layout(std430, set = 0, binding = 0) buffer particleBuffer {
    Particle particles[];
};

// relative time difference via push constants
layout( push_constant ) uniform constants{
  float deltaTime;
};

// defining points of mass at different positions in space
const int n = 4;
vec4 gravityPoints[n] = vec4[n](
  vec4(-0.8, -0.5,  0.0, 3),
  vec4(-0.4,  0.5,  0.8, 2),
  vec4( 0.8,  0.8, -0.3, 4),
  vec4( 0.5, -0.7, -0.5, 1)
);

// gravitational constant with adjusted magnitude
const float G = 6.6743015e-3;

void main() {
  // each global invocation accesses a different index
  uint index = gl_GlobalInvocationID.x;
  
  // check bounds of particle buffer for safe access
  if (index >= particles.length()) {
    return;
  }
  
  // read required data from particle buffer
  vec3 position = particles[index].position;
  vec3 velocity = particles[index].velocity;
  
  for (uint i = 0; i < n; i++) {
    vec3 d = (gravityPoints[i].xyz - position);
    float r = length(d);
    
    if (r > 0) {
      float g = G * gravityPoints[i].w / (r * r);
      vec3 deltaVelocity = deltaTime * g * (d / r);
      
      // update the velocity of each particle
      velocity += deltaVelocity;
    }
  }
  
  // update the position of each particle
  position += deltaTime * velocity;
  
  // write changes into particle buffer
  particles[index].position = position;
  particles[index].velocity = velocity;
}

Similar as for our application code you need to define the structure for particles. Then you can use a shared storage buffer by providing layout rules (in this case std430), the descriptor sets index and its binding for this particular pipeline. Then the buffer can be used as array of your own structure like here in the code. You don't have to provide the actual amount of entries in such an array as long as the array is the last entry in your buffer block. However it's recommended to use storage buffers as seen here with the VkCV framework for convenient usage.

The main function starts by using the global invocation ID as index to access each particle individually per compute shader invocation. The global invocation ID is different for each shader invocation because it is calculated by using the work group id, its size and the given local invocation ID within the work group which is different per thread in a group.

So the result is that each particle gets processed individually per shader invocation. The advantage is that you don't need to synchronize parallel memory accesses and because of the coherent usage you don't even need a memory barrier within the shader. Therefore this example shown here is not a particular complex compute shader to explain how to solve more difficult problems with compute pipelines on the GPU (other papers or guides might be better for this topic). But because of those coherent memory accesses the performance we can expect from this shader is pretty good. The size of the work group is set to 256 which should be fine for most GPU hardware but for more details on the optimal work group sizes depending on the used architecture, you should research elsewhere.

The algorithm in the compute shader is calculating gravity between 4 constant points in space with an individual mass. Notice that this calculation is only giving discrete results using the euler method as well as a gravitational constant with adjusted magnitude to prevent "explosions" of the simulation.


Shader compilation

Because the compute pipeline will just contain one stage written as only one compute shader, you can compile the whole program by just compiling one shader stage. That is the reason the example code will use a different method than in other guides to compile a single shader stage individually. You could use the compileProgram() method as well though.

vkcv::ShaderProgram computeShaderProgram;

// compile compute shader stage from GLSL to SPIR-V
compiler.compile(
  vkcv::ShaderStage::COMPUTE, // shader stage
  "shaders/shader.comp",      // path of the shader
  
  // callback to use the SPIR-V output from the compilation as file
  [&computeShaderProgram](vkcv::ShaderStage shaderStage, 
                          const std::filesystem::path& path) {
    // add compiled SPIR-V shader to the shader program
    computeShaderProgram.addShader(shaderStage, path);
  }
);

Descriptor set

In a similar way as with the materials from another guide, you need to use a descriptor set to pass the storage buffer when recording the pipeline. For this you need to have the descriptor set layout of it first. Fortunately you can create the needed descriptor set layout by passing the reflected descriptor bindings from the compiled shader program to a provided method of the framework. Creating the descriptor set isn't much more difficult either.

// get descriptor bindings from the shader via reflection
const vkcv::DescriptorBindings& descriptorBindings = (
  computeShaderProgram.getReflectedDescriptors().at(0 /* set = 0 */)
);

// create descriptor set
auto descriptorSetLayout = core.createDescriptorSetLayout(descriptorBindings);
auto descriptorSet = core.createDescriptorSet(descriptorSetLayout);

Once the descriptor set is created with the proper layout it still needs to be connected to your storage buffer. This can be done by writing to the descriptor set using an own structure from the framework containing multiple descriptor write commands at once. In this example you only need to write the buffer handle of the storage buffer with the particles to the binding of index equals to zero. But it's possible to append it later on of course.

vkcv::DescriptorWrites descriptorWrites;

descriptorWrites.writeStorageBuffer(
  0 /* binding = 0 */,
  particleBuffer.getHandle()
);

core.writeDescriptorSet(descriptorSet, descriptorWrites);

Pipeline creation

In comparison to a graphics pipeline the creation of a compute pipeline is far less complicated. It is only necessary to provide the compiled shader program and the list of desctiptor set layouts used in the shader. So the last remaining step is recording the actual dispatching from the next part.

vkcv::ComputePipelineHandle computePipeline = core.createComputePipeline(
  vkcv::ComputePipelineConfig(
    computeShaderProgram,          // shader program
    { computeDescriptorSetLayout } // list of descriptor set layouts
  )
);

Previous

Next

Popular posts from this blog

Introduction

Application development

First setup