Compute Shader#
For certain types of calculations, compute shaders on the GPU can be thousands of times faster than on the CPU alone.
In this tutorial, we will simulate a star field using an ‘N-Body simulation’. Each star is affected by the gravity of every other star. For 1,000 stars, this means we have 1,000 x 1,000 = 1,000,000 million calculations to perform for each frame. The video has 65,000 stars, requiring 4.2 billion gravity force calculations per frame. On high-end hardware it can still run at 60 fps!
How does this work? There are three major parts to this program:
The Python code, which allocates buffers & glues everything together
The visualization shaders, which let us see the data in the buffers
The compute shader, which moves everything
Buffers#
We need a place to store the data we’ll visualize. To do so, we’ll create two Shader Storage Buffer Objects (SSBOs) of floating point numbers from within our Python code. One will hold the previous frame’s star positions, and the other will be used to store calculate the next frame’s positions.
Each buffer must be able to store the following for each star:
The x, y, and radius of each star stored
The velocity of the star, which will be unused by the visualization
The floating point RGBA color of the star
Generating Aligned Data#
To avoid issues with GPU memory alignment quirks, we’ll use the function below to generate well-aligned data ready to load into the SSBO. The docstrings & comments explain why in greater detail:
def gen_initial_data(
screen_size: Tuple[int, int],
num_stars: int = NUM_STARS,
use_color: bool = False
) -> array:
"""
Generate an :py:class:`~array.array` of randomly positioned star data.
Some of this data is wasted as padding because:
1. GPUs expect SSBO data to be aligned to multiples of 4
2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
so we have to use 4-length vectors anyway.
:param screen_size: A (width, height) of the area to generate stars in
:param num_stars: How many stars to generate
:param use_color: Whether to generate white or randomized pastel stars
:return: an array of star position data
"""
width, height = screen_size
color_channel_min = 0.5 if use_color else 1.0
def _data_generator() -> Generator[float, None, None]:
"""Inner generator function used to illustrate memory layout"""
for i in range(num_stars):
# Position/radius
yield random.randrange(0, width)
yield random.randrange(0, height)
yield 0.0 # z (padding, unused by shaders)
yield 6.0
# Velocity (unused by visualization shaders)
yield 0.0
yield 0.0
yield 0.0 # vz (padding, unused by shaders)
yield 0.0 # vw (padding, unused by shaders)
# Color
yield random.uniform(color_channel_min, 1.0) # r
yield random.uniform(color_channel_min, 1.0) # g
yield random.uniform(color_channel_min, 1.0) # b
yield 1.0 # a
# Use the generator function to fill an array in RAM
return array('f', _data_generator())
Allocating the Buffers#
# --- Create buffers
# Create pairs of buffers for the compute & visualization shaders.
# We will swap which buffer instance is the initial value and
# which is used as the current value to write to.
# ssbo = shader storage buffer object
initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
self.ssbo_previous = self.ctx.buffer(data=initial_data)
self.ssbo_current = self.ctx.buffer(data=initial_data)
# vao = vertex array object
# Format string describing how to interpret the SSBO buffer data.
# 4f = position and size -> x, y, z, radius
# 4x4 = Four floats used for calculating velocity. Not needed for visualization.
# 4f = color -> rgba
buffer_format = "4f 4x4 4f"
# Attribute variable names for the vertex shader
attributes = ["in_vertex", "in_color"]
self.vao_previous = self.ctx.geometry(
[BufferDescription(self.ssbo_previous, buffer_format, attributes)],
mode=self.ctx.POINTS,
)
self.vao_current = self.ctx.geometry(
[BufferDescription(self.ssbo_current, buffer_format, attributes)],
mode=self.ctx.POINTS,
)
Visualization Shaders#
Now that we have the data, we need to be able to visualize it. We’ll do it by applying vertex, geometry, and fragment shaders to convert the data in the SSBO into pixels. For each star’s 12 floats in the array, the following flow of data will take place:
Vertex Shader#
In this tutorial, the vertex shader will be run for each star’s 12 float
long stretch of raw padded data in self.ssbo_current
. Each execution
will output clean typed data to an instance of the geometry shader.
Data is read in as follows:
The x, y, and radius of each star are accessed via
in_vertex
The floating point RGBA color of the star, via
in_color
1#version 330
2
3in vec4 in_vertex;
4in vec4 in_color;
5
6out vec2 vertex_pos;
7out float vertex_radius;
8out vec4 vertex_color;
9
10void main()
11{
12 vertex_pos = in_vertex.xy;
13 vertex_radius = in_vertex.w;
14 vertex_color = in_color;
15}
The variables below are then passed as inputs to the geometry shader:
vertex_pos
vertex_radius
vertex_color
Geometry Shader#
The geometry shader converts a single point into a quad, in this case a square, which the GPU can render. It does this by emitting four points centered on the input point.
1#version 330
2
3layout (points) in;
4layout (triangle_strip, max_vertices = 4) out;
5
6// Use arcade's global projection UBO
7uniform Projection {
8 uniform mat4 matrix;
9} proj;
10
11
12// The outputs from the vertex shader are used as inputs
13in vec2 vertex_pos[];
14in float vertex_radius[];
15in vec4 vertex_color[];
16
17// These are used with EmitVertex to generate four points of
18// a quad centered around vertex_pos[0].
19out vec2 g_uv;
20out vec3 g_color;
21
22void main() {
23 vec2 center = vertex_pos[0];
24 vec2 hsize = vec2(vertex_radius[0]);
25
26 g_color = vertex_color[0].rgb;
27
28 gl_Position = proj.matrix * vec4(vec2(-hsize.x, hsize.y) + center, 0.0, 1.0);
29 g_uv = vec2(0, 1);
30 EmitVertex();
31
32 gl_Position = proj.matrix * vec4(vec2(-hsize.x, -hsize.y) + center, 0.0, 1.0);
33 g_uv = vec2(0, 0);
34 EmitVertex();
35
36 gl_Position = proj.matrix * vec4(vec2(hsize.x, hsize.y) + center, 0.0, 1.0);
37 g_uv = vec2(1, 1);
38 EmitVertex();
39
40 gl_Position = proj.matrix * vec4(vec2(hsize.x, -hsize.y) + center, 0.0, 1.0);
41 g_uv = vec2(1, 0);
42 EmitVertex();
43
44 // End geometry emmission
45 EndPrimitive();
46}
Fragment Shader#
A fragment shader runs for each pixel in a quad. It converts a UV coordinate within the quad to a float RGBA value. In this tutorial’s case, the shader produces the soft glowing circle on the surface of each star’s quad.
1#version 330
2
3in vec2 g_uv;
4in vec3 g_color;
5
6out vec4 out_color;
7
8void main()
9{
10 float l = length(vec2(0.5, 0.5) - g_uv.xy);
11 if ( l > 0.5)
12 {
13 discard;
14 }
15 float alpha;
16 if (l == 0.0)
17 alpha = 1.0;
18 else
19 alpha = min(1.0, .60-l * 2);
20
21 vec3 c = g_color.rgb;
22 // c.xy += v_uv.xy * 0.05;
23 // c.xy += v_pos.xy * 0.75;
24 out_color = vec4(c, alpha);
25}
Compute Shader#
Now that we have a way to display data, we should update it.
We created pairs of buffers earlier. We will use one SSBO as an input buffer holding the previous frame’s data, and another as our output buffer to write results to.
We then swap our buffers each frame after drawing, using the output as the input of the next frame, and repeat the process until the program stops running.
1#version 430
2
3// Set up our compute groups.
4// The COMPUTE_SIZE_X and COMPUTE_SIZE_Y values will be replaced
5// by the Python code with actual values. This does not happen
6// automatically, and must be called manually.
7layout(local_size_x=COMPUTE_SIZE_X, local_size_y=COMPUTE_SIZE_Y) in;
8
9// Input uniforms would go here if you need them.
10// The examples below match the ones commented out in main.py
11//uniform vec2 screen_size;
12//uniform float frame_time;
13
14// Structure of the star data
15struct Star
16{
17 vec4 pos;
18 vec4 vel;
19 vec4 color;
20};
21
22// Input buffer
23layout(std430, binding=0) buffer stars_in
24{
25 Star stars[];
26} In;
27
28// Output buffer
29layout(std430, binding=1) buffer stars_out
30{
31 Star stars[];
32} Out;
33
34void main()
35{
36 int curStarIndex = int(gl_GlobalInvocationID);
37
38 Star in_star = In.stars[curStarIndex];
39
40 vec4 p = in_star.pos.xyzw;
41 vec4 v = in_star.vel.xyzw;
42
43 // Move the star according to the current force
44 p.xy += v.xy;
45
46 // Calculate the new force based on all the other bodies
47 for (int i=0; i < In.stars.length(); i++) {
48 // If enabled, this will keep the star from calculating gravity on itself
49 // However, it does slow down the calcluations do do this check.
50 // if (i == x)
51 // continue;
52
53 // Calculate distance squared
54 float dist = distance(In.stars[i].pos.xyzw.xy, p.xy);
55 float distanceSquared = dist * dist;
56
57 // If distance is too small, extremely high forces can result and
58 // fling the star into escape velocity and forever off the screen.
59 // Using a reasonable minimum distance to prevents this.
60 float minDistance = 0.02;
61 float gravityStrength = 0.3;
62 float simulationSpeed = 0.002;
63 float force = min(minDistance, gravityStrength / distanceSquared) * -simulationSpeed;
64
65 vec2 diff = p.xy - In.stars[i].pos.xyzw.xy;
66 // We should normalize this I think, but it doesn't work.
67 // diff = normalize(diff);
68 vec2 delta_v = diff * force;
69 v.xy += delta_v;
70 }
71
72
73 Star out_star;
74 out_star.pos.xyzw = p.xyzw;
75 out_star.vel.xyzw = v.xyzw;
76
77 vec4 c = in_star.color.xyzw;
78 out_star.color.xyzw = c.xyzw;
79
80 Out.stars[curStarIndex] = out_star;
81}
The Finished Python Program#
The code includes thorough docstrings and annotations explaining how it works.
1"""
2N-Body Gravity with Compute Shaders & Buffers
3"""
4import random
5from array import array
6from pathlib import Path
7from typing import Generator, Tuple
8
9import arcade
10from arcade.gl import BufferDescription
11
12# Window dimensions in pixels
13WINDOW_WIDTH = 800
14WINDOW_HEIGHT = 600
15
16# Size of performance graphs in pixels
17GRAPH_WIDTH = 200
18GRAPH_HEIGHT = 120
19GRAPH_MARGIN = 5
20
21NUM_STARS: int = 4000
22USE_COLORED_STARS: bool = True
23
24
25def gen_initial_data(
26 screen_size: Tuple[int, int],
27 num_stars: int = NUM_STARS,
28 use_color: bool = False
29) -> array:
30 """
31 Generate an :py:class:`~array.array` of randomly positioned star data.
32
33 Some of this data is wasted as padding because:
34
35 1. GPUs expect SSBO data to be aligned to multiples of 4
36 2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
37 so we have to use 4-length vectors anyway.
38
39 :param screen_size: A (width, height) of the area to generate stars in
40 :param num_stars: How many stars to generate
41 :param use_color: Whether to generate white or randomized pastel stars
42 :return: an array of star position data
43 """
44 width, height = screen_size
45 color_channel_min = 0.5 if use_color else 1.0
46
47 def _data_generator() -> Generator[float, None, None]:
48 """Inner generator function used to illustrate memory layout"""
49
50 for i in range(num_stars):
51 # Position/radius
52 yield random.randrange(0, width)
53 yield random.randrange(0, height)
54 yield 0.0 # z (padding, unused by shaders)
55 yield 6.0
56
57 # Velocity (unused by visualization shaders)
58 yield 0.0
59 yield 0.0
60 yield 0.0 # vz (padding, unused by shaders)
61 yield 0.0 # vw (padding, unused by shaders)
62
63 # Color
64 yield random.uniform(color_channel_min, 1.0) # r
65 yield random.uniform(color_channel_min, 1.0) # g
66 yield random.uniform(color_channel_min, 1.0) # b
67 yield 1.0 # a
68
69 # Use the generator function to fill an array in RAM
70 return array('f', _data_generator())
71
72
73class NBodyGravityWindow(arcade.Window):
74
75 def __init__(self):
76 # Ask for OpenGL context supporting version 4.3 or greater when
77 # calling the parent initializer to make sure we have compute shader
78 # support.
79 super().__init__(
80 WINDOW_WIDTH, WINDOW_HEIGHT,
81 "N-Body Gravity with Compute Shaders & Buffers",
82 gl_version=(4, 3),
83 resizable=False
84 )
85 # Attempt to put the window in the center of the screen.
86 self.center_window()
87
88 # --- Create buffers
89
90 # Create pairs of buffers for the compute & visualization shaders.
91 # We will swap which buffer instance is the initial value and
92 # which is used as the current value to write to.
93
94 # ssbo = shader storage buffer object
95 initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
96 self.ssbo_previous = self.ctx.buffer(data=initial_data)
97 self.ssbo_current = self.ctx.buffer(data=initial_data)
98
99 # vao = vertex array object
100 # Format string describing how to interpret the SSBO buffer data.
101 # 4f = position and size -> x, y, z, radius
102 # 4x4 = Four floats used for calculating velocity. Not needed for visualization.
103 # 4f = color -> rgba
104 buffer_format = "4f 4x4 4f"
105
106 # Attribute variable names for the vertex shader
107 attributes = ["in_vertex", "in_color"]
108
109 self.vao_previous = self.ctx.geometry(
110 [BufferDescription(self.ssbo_previous, buffer_format, attributes)],
111 mode=self.ctx.POINTS,
112 )
113 self.vao_current = self.ctx.geometry(
114 [BufferDescription(self.ssbo_current, buffer_format, attributes)],
115 mode=self.ctx.POINTS,
116 )
117
118 # --- Create the visualization shaders
119
120 vertex_shader_source = Path("shaders/vertex_shader.glsl").read_text()
121 fragment_shader_source = Path("shaders/fragment_shader.glsl").read_text()
122 geometry_shader_source = Path("shaders/geometry_shader.glsl").read_text()
123
124 # Create the complete shader program which will draw the stars
125 self.program = self.ctx.program(
126 vertex_shader=vertex_shader_source,
127 geometry_shader=geometry_shader_source,
128 fragment_shader=fragment_shader_source,
129 )
130
131 # --- Create our compute shader
132
133 # Load in the raw source code safely & auto-close the file
134 compute_shader_source = Path("shaders/compute_shader.glsl").read_text()
135
136 # Compute shaders use groups to parallelize execution.
137 # You don't need to understand how this works yet, but the
138 # values below should serve as reasonable defaults. Later, we'll
139 # preprocess the shader source by replacing the templating token
140 # with its corresponding value.
141 self.group_x = 256
142 self.group_y = 1
143
144 self.compute_shader_defines = {
145 "COMPUTE_SIZE_X": self.group_x,
146 "COMPUTE_SIZE_Y": self.group_y
147 }
148
149 # Preprocess the source by replacing each define with its value as a string
150 for templating_token, value in self.compute_shader_defines.items():
151 compute_shader_source = compute_shader_source.replace(templating_token, str(value))
152
153 self.compute_shader = self.ctx.compute_shader(source=compute_shader_source)
154
155 # --- Create the FPS graph
156
157 # Enable timings for the performance graph
158 arcade.enable_timings()
159
160 # Create a sprite list to put the performance graph into
161 self.perf_graph_list = arcade.SpriteList()
162
163 # Create the FPS performance graph
164 graph = arcade.PerfGraph(GRAPH_WIDTH, GRAPH_HEIGHT, graph_data="FPS")
165 graph.position = GRAPH_WIDTH / 2, self.height - GRAPH_HEIGHT / 2
166 self.perf_graph_list.append(graph)
167
168 def on_draw(self):
169 # Clear the screen
170 self.clear()
171 # Enable blending so our alpha channel works
172 self.ctx.enable(self.ctx.BLEND)
173
174 # Bind buffers
175 self.ssbo_previous.bind_to_storage_buffer(binding=0)
176 self.ssbo_current.bind_to_storage_buffer(binding=1)
177
178 # If you wanted, you could set input variables for compute shader
179 # as in the lines commented out below. You would have to add or
180 # uncomment corresponding lines in compute_shader.glsl
181 # self.compute_shader["screen_size"] = self.get_size()
182 # self.compute_shader["frame_time"] = self.frame_time
183
184 # Run compute shader to calculate new positions for this frame
185 self.compute_shader.run(group_x=self.group_x, group_y=self.group_y)
186
187 # Draw the current star positions
188 self.vao_current.render(self.program)
189
190 # Swap the buffer pairs.
191 # The buffers for the current state become the initial state,
192 # and the data of this frame's initial state will be overwritten.
193 self.ssbo_previous, self.ssbo_current = self.ssbo_current, self.ssbo_previous
194 self.vao_previous, self.vao_current = self.vao_current, self.vao_previous
195
196 # Draw the graphs
197 self.perf_graph_list.draw()
198
199
200
201if __name__ == "__main__":
202 app = NBodyGravityWindow()
203 arcade.run()
An expanded version of this tutorial whith support for 3D is available at: https://github.com/pvcraven/n-body