Compute Shader
For certain types of calculations, compute shaders on the GPU can be thousands of times faster than on the CPU alone.
In this tutorial, we will simulate a star field using an ‘N-Body simulation’. Each star is affected by the gravity of every other star. For 1,000 stars, this means we have 1,000 x 1,000 = 1,000,000 million calculations to perform for each frame. The video has 65,000 stars, requiring 4.2 billion gravity force calculations per frame. On high-end hardware it can still run at 60 fps!
How does this work? There are three major parts to this program:
The Python code, which allocates buffers & glues everything together
The visualization shaders, which let us see the data in the buffers
The compute shader, which moves everything
Buffers
We need a place to store the data we’ll visualize. To do so, we’ll create two Shader Storage Buffer Objects (SSBOs) of floating point numbers from within our Python code. One will hold the previous frame’s star positions, and the other will be used to store calculate the next frame’s positions.
Each buffer must be able to store the following for each star:
The x, y, and radius of each star stored
The velocity of the star, which will be unused by the visualization
The floating point RGBA color of the star
Generating Aligned Data
To avoid issues with GPU memory alignment quirks, we’ll use the function below to generate well-aligned data ready to load into the SSBO. The docstrings & comments explain why in greater detail:
def gen_initial_data(
screen_size: Tuple[int, int],
num_stars: int = NUM_STARS,
use_color: bool = False
) -> array:
"""
Generate an :py:class:`~array.array` of randomly positioned star data.
Some of this data is wasted as padding because:
1. GPUs expect SSBO data to be aligned to multiples of 4
2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
so we have to use 4-length vectors anyway.
Args:
screen_size: A (width, height) of the area to generate stars in
num_stars: How many stars to generate
use_color: Whether to generate white or randomized pastel stars
Returns:
An array of star position data
"""
width, height = screen_size
color_channel_min = 0.5 if use_color else 1.0
def _data_generator() -> Generator[float, None, None]:
"""Inner generator function used to illustrate memory layout"""
for i in range(num_stars):
# Position/radius
yield random.randrange(0, width)
yield random.randrange(0, height)
yield 0.0 # z (padding, unused by shaders)
yield 6.0
# Velocity (unused by visualization shaders)
yield 0.0
yield 0.0
yield 0.0 # vz (padding, unused by shaders)
yield 0.0 # vw (padding, unused by shaders)
# Color
yield random.uniform(color_channel_min, 1.0) # r
yield random.uniform(color_channel_min, 1.0) # g
yield random.uniform(color_channel_min, 1.0) # b
yield 1.0 # a
Allocating the Buffers
self.center_window()
# --- Create buffers
# Create pairs of buffers for the compute & visualization shaders.
# We will swap which buffer instance is the initial value and
# which is used as the current value to write to.
# ssbo = shader storage buffer object
initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
self.ssbo_previous = self.ctx.buffer(data=initial_data)
self.ssbo_current = self.ctx.buffer(data=initial_data)
# vao = vertex array object
# Format string describing how to interpret the SSBO buffer data.
# 4f = position and size -> x, y, z, radius
# 4x4 = Four floats used for calculating velocity. Not needed for visualization.
# 4f = color -> rgba
buffer_format = "4f 4x4 4f"
# Attribute variable names for the vertex shader
attributes = ["in_vertex", "in_color"]
self.vao_previous = self.ctx.geometry(
[BufferDescription(self.ssbo_previous, buffer_format, attributes)],
mode=self.ctx.POINTS,
)
self.vao_current = self.ctx.geometry(
[BufferDescription(self.ssbo_current, buffer_format, attributes)],
Visualization Shaders
Now that we have the data, we need to be able to visualize it. We’ll do it by applying vertex, geometry, and fragment shaders to convert the data in the SSBO into pixels. For each star’s 12 floats in the array, the following flow of data will take place:
Vertex Shader
In this tutorial, the vertex shader will be run for each star’s 12 float
long stretch of raw padded data in self.ssbo_current
. Each execution
will output clean typed data to an instance of the geometry shader.
Data is read in as follows:
The x, y, and radius of each star are accessed via
in_vertex
The floating point RGBA color of the star, via
in_color
1#version 330
2
3in vec4 in_vertex;
4in vec4 in_color;
5
6out vec2 vertex_pos;
7out float vertex_radius;
8out vec4 vertex_color;
9
10void main()
11{
12 vertex_pos = in_vertex.xy;
13 vertex_radius = in_vertex.w;
14 vertex_color = in_color;
15}
The variables below are then passed as inputs to the geometry shader:
vertex_pos
vertex_radius
vertex_color
Geometry Shader
The geometry shader converts a single point into a quad, in this case a square, which the GPU can render. It does this by emitting four points centered on the input point.
1#version 330
2
3layout (points) in;
4layout (triangle_strip, max_vertices = 4) out;
5
6// Use Arcade's global projection UBO
7uniform Projection {
8 uniform mat4 matrix;
9} proj;
10
11
12// The outputs from the vertex shader are used as inputs
13in vec2 vertex_pos[];
14in float vertex_radius[];
15in vec4 vertex_color[];
16
17// These are used with EmitVertex to generate four points of
18// a quad centered around vertex_pos[0].
19out vec2 g_uv;
20out vec3 g_color;
21
22void main() {
23 vec2 center = vertex_pos[0];
24 vec2 hsize = vec2(vertex_radius[0]);
25
26 g_color = vertex_color[0].rgb;
27
28 gl_Position = proj.matrix * vec4(vec2(-hsize.x, hsize.y) + center, 0.0, 1.0);
29 g_uv = vec2(0, 1);
30 EmitVertex();
31
32 gl_Position = proj.matrix * vec4(vec2(-hsize.x, -hsize.y) + center, 0.0, 1.0);
33 g_uv = vec2(0, 0);
34 EmitVertex();
35
36 gl_Position = proj.matrix * vec4(vec2(hsize.x, hsize.y) + center, 0.0, 1.0);
37 g_uv = vec2(1, 1);
38 EmitVertex();
39
40 gl_Position = proj.matrix * vec4(vec2(hsize.x, -hsize.y) + center, 0.0, 1.0);
41 g_uv = vec2(1, 0);
42 EmitVertex();
43
44 // End geometry emmission
45 EndPrimitive();
46}
Fragment Shader
A fragment shader runs for each pixel in a quad. It converts a UV coordinate within the quad to a float RGBA value. In this tutorial’s case, the shader produces the soft glowing circle on the surface of each star’s quad.
1#version 330
2
3in vec2 g_uv;
4in vec3 g_color;
5
6out vec4 out_color;
7
8void main()
9{
10 float l = length(vec2(0.5, 0.5) - g_uv.xy);
11 if ( l > 0.5)
12 {
13 discard;
14 }
15 float alpha;
16 if (l == 0.0)
17 alpha = 1.0;
18 else
19 alpha = min(1.0, .60-l * 2);
20
21 vec3 c = g_color.rgb;
22 // c.xy += v_uv.xy * 0.05;
23 // c.xy += v_pos.xy * 0.75;
24 out_color = vec4(c, alpha);
25}
Compute Shader
Now that we have a way to display data, we should update it.
We created pairs of buffers earlier. We will use one SSBO as an input buffer holding the previous frame’s data, and another as our output buffer to write results to.
We then swap our buffers each frame after drawing, using the output as the input of the next frame, and repeat the process until the program stops running.
1#version 430
2
3// Set up our compute groups.
4// The COMPUTE_SIZE_X and COMPUTE_SIZE_Y values will be replaced
5// by the Python code with actual values. This does not happen
6// automatically, and must be called manually.
7layout(local_size_x=COMPUTE_SIZE_X, local_size_y=COMPUTE_SIZE_Y) in;
8
9// Input uniforms would go here if you need them.
10// The examples below match the ones commented out in main.py
11//uniform vec2 screen_size;
12//uniform float frame_time;
13
14// Structure of the star data
15struct Star
16{
17 vec4 pos;
18 vec4 vel;
19 vec4 color;
20};
21
22// Input buffer
23layout(std430, binding=0) buffer stars_in
24{
25 Star stars[];
26} In;
27
28// Output buffer
29layout(std430, binding=1) buffer stars_out
30{
31 Star stars[];
32} Out;
33
34void main()
35{
36 int curStarIndex = int(gl_GlobalInvocationID);
37
38 Star in_star = In.stars[curStarIndex];
39
40 vec4 p = in_star.pos.xyzw;
41 vec4 v = in_star.vel.xyzw;
42
43 // Move the star according to the current force
44 p.xy += v.xy;
45
46 // Calculate the new force based on all the other bodies
47 for (int i=0; i < In.stars.length(); i++) {
48 // If enabled, this will keep the star from calculating gravity on itself
49 // However, it does slow down the calcluations do do this check.
50 // if (i == x)
51 // continue;
52
53 // Calculate distance squared
54 float dist = distance(In.stars[i].pos.xyzw.xy, p.xy);
55 float distanceSquared = dist * dist;
56
57 // If distance is too small, extremely high forces can result and
58 // fling the star into escape velocity and forever off the screen.
59 // Using a reasonable minimum distance to prevents this.
60 float minDistance = 0.02;
61 float gravityStrength = 0.3;
62 float simulationSpeed = 0.002;
63 float force = min(minDistance, gravityStrength / distanceSquared) * -simulationSpeed;
64
65 vec2 diff = p.xy - In.stars[i].pos.xyzw.xy;
66 // We should normalize this I think, but it doesn't work.
67 // diff = normalize(diff);
68 vec2 delta_v = diff * force;
69 v.xy += delta_v;
70 }
71
72
73 Star out_star;
74 out_star.pos.xyzw = p.xyzw;
75 out_star.vel.xyzw = v.xyzw;
76
77 vec4 c = in_star.color.xyzw;
78 out_star.color.xyzw = c.xyzw;
79
80 Out.stars[curStarIndex] = out_star;
81}
The Finished Python Program
The code includes thorough docstrings and annotations explaining how it works.
1"""
2N-Body Gravity with Compute Shaders & Buffers
3"""
4import random
5from array import array
6from pathlib import Path
7from typing import Generator, Tuple
8
9import arcade
10from arcade.gl import BufferDescription
11
12# Window dimensions in pixels
13WINDOW_WIDTH = 800
14WINDOW_HEIGHT = 600
15
16# Size of performance graphs in pixels
17GRAPH_WIDTH = 200
18GRAPH_HEIGHT = 120
19GRAPH_MARGIN = 5
20
21NUM_STARS: int = 4000
22USE_COLORED_STARS: bool = True
23
24
25def gen_initial_data(
26 screen_size: Tuple[int, int],
27 num_stars: int = NUM_STARS,
28 use_color: bool = False
29) -> array:
30 """
31 Generate an :py:class:`~array.array` of randomly positioned star data.
32
33 Some of this data is wasted as padding because:
34
35 1. GPUs expect SSBO data to be aligned to multiples of 4
36 2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
37 so we have to use 4-length vectors anyway.
38
39 Args:
40 screen_size: A (width, height) of the area to generate stars in
41 num_stars: How many stars to generate
42 use_color: Whether to generate white or randomized pastel stars
43 Returns:
44 An array of star position data
45 """
46 width, height = screen_size
47 color_channel_min = 0.5 if use_color else 1.0
48
49 def _data_generator() -> Generator[float, None, None]:
50 """Inner generator function used to illustrate memory layout"""
51
52 for i in range(num_stars):
53 # Position/radius
54 yield random.randrange(0, width)
55 yield random.randrange(0, height)
56 yield 0.0 # z (padding, unused by shaders)
57 yield 6.0
58
59 # Velocity (unused by visualization shaders)
60 yield 0.0
61 yield 0.0
62 yield 0.0 # vz (padding, unused by shaders)
63 yield 0.0 # vw (padding, unused by shaders)
64
65 # Color
66 yield random.uniform(color_channel_min, 1.0) # r
67 yield random.uniform(color_channel_min, 1.0) # g
68 yield random.uniform(color_channel_min, 1.0) # b
69 yield 1.0 # a
70
71 # Use the generator function to fill an array in RAM
72 return array('f', _data_generator())
73
74
75class NBodyGravityWindow(arcade.Window):
76
77 def __init__(self):
78 # Ask for OpenGL context supporting version 4.3 or greater when
79 # calling the parent initializer to make sure we have compute shader
80 # support.
81 super().__init__(
82 WINDOW_WIDTH, WINDOW_HEIGHT,
83 "N-Body Gravity with Compute Shaders & Buffers",
84 gl_version=(4, 3),
85 resizable=False
86 )
87 # Attempt to put the window in the center of the screen.
88 self.center_window()
89
90 # --- Create buffers
91
92 # Create pairs of buffers for the compute & visualization shaders.
93 # We will swap which buffer instance is the initial value and
94 # which is used as the current value to write to.
95
96 # ssbo = shader storage buffer object
97 initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
98 self.ssbo_previous = self.ctx.buffer(data=initial_data)
99 self.ssbo_current = self.ctx.buffer(data=initial_data)
100
101 # vao = vertex array object
102 # Format string describing how to interpret the SSBO buffer data.
103 # 4f = position and size -> x, y, z, radius
104 # 4x4 = Four floats used for calculating velocity. Not needed for visualization.
105 # 4f = color -> rgba
106 buffer_format = "4f 4x4 4f"
107
108 # Attribute variable names for the vertex shader
109 attributes = ["in_vertex", "in_color"]
110
111 self.vao_previous = self.ctx.geometry(
112 [BufferDescription(self.ssbo_previous, buffer_format, attributes)],
113 mode=self.ctx.POINTS,
114 )
115 self.vao_current = self.ctx.geometry(
116 [BufferDescription(self.ssbo_current, buffer_format, attributes)],
117 mode=self.ctx.POINTS,
118 )
119
120 # --- Create the visualization shaders
121
122 vertex_shader_source = Path("shaders/vertex_shader.glsl").read_text()
123 fragment_shader_source = Path("shaders/fragment_shader.glsl").read_text()
124 geometry_shader_source = Path("shaders/geometry_shader.glsl").read_text()
125
126 # Create the complete shader program which will draw the stars
127 self.program = self.ctx.program(
128 vertex_shader=vertex_shader_source,
129 geometry_shader=geometry_shader_source,
130 fragment_shader=fragment_shader_source,
131 )
132
133 # --- Create our compute shader
134
135 # Load in the raw source code safely & auto-close the file
136 compute_shader_source = Path("shaders/compute_shader.glsl").read_text()
137
138 # Compute shaders use groups to parallelize execution.
139 # You don't need to understand how this works yet, but the
140 # values below should serve as reasonable defaults. Later, we'll
141 # preprocess the shader source by replacing the templating token
142 # with its corresponding value.
143 self.group_x = 256
144 self.group_y = 1
145
146 self.compute_shader_defines = {
147 "COMPUTE_SIZE_X": self.group_x,
148 "COMPUTE_SIZE_Y": self.group_y
149 }
150
151 # Preprocess the source by replacing each define with its value as a string
152 for templating_token, value in self.compute_shader_defines.items():
153 compute_shader_source = compute_shader_source.replace(templating_token, str(value))
154
155 self.compute_shader = self.ctx.compute_shader(source=compute_shader_source)
156
157 # --- Create the FPS graph
158
159 # Enable timings for the performance graph
160 arcade.enable_timings()
161
162 # Create a sprite list to put the performance graph into
163 self.perf_graph_list = arcade.SpriteList()
164
165 # Create the FPS performance graph
166 graph = arcade.PerfGraph(GRAPH_WIDTH, GRAPH_HEIGHT, graph_data="FPS")
167 graph.position = GRAPH_WIDTH / 2, self.height - GRAPH_HEIGHT / 2
168 self.perf_graph_list.append(graph)
169
170 def on_draw(self):
171 # Clear the screen
172 self.clear()
173 # Enable blending so our alpha channel works
174 self.ctx.enable(self.ctx.BLEND)
175
176 # Bind buffers
177 self.ssbo_previous.bind_to_storage_buffer(binding=0)
178 self.ssbo_current.bind_to_storage_buffer(binding=1)
179
180 # If you wanted, you could set input variables for compute shader
181 # as in the lines commented out below. You would have to add or
182 # uncomment corresponding lines in compute_shader.glsl
183 # self.compute_shader["screen_size"] = self.get_size()
184 # self.compute_shader["frame_time"] = self.frame_time
185
186 # Run compute shader to calculate new positions for this frame
187 self.compute_shader.run(group_x=self.group_x, group_y=self.group_y)
188
189 # Draw the current star positions
190 self.vao_current.render(self.program)
191
192 # Swap the buffer pairs.
193 # The buffers for the current state become the initial state,
194 # and the data of this frame's initial state will be overwritten.
195 self.ssbo_previous, self.ssbo_current = self.ssbo_current, self.ssbo_previous
196 self.vao_previous, self.vao_current = self.vao_current, self.vao_previous
197
198 # Draw the graphs
199 self.perf_graph_list.draw()
200
201
202
203if __name__ == "__main__":
204 app = NBodyGravityWindow()
205 arcade.run()
An expanded version of this tutorial whith support for 3D is available at: https://github.com/pvcraven/n-body