For certain types of calculations, compute shaders on the GPU can be thousands of times faster than on the CPU alone.

In this tutorial, we will simulate a star field using an ‘N-Body simulation’. Each star is affected by the gravity of every other star. For 1,000 stars, this means we have 1,000 x 1,000 = 1,000,000 million calculations to perform for each frame. The video has 65,000 stars, requiring 4.2 billion gravity force calculations per frame. On high-end hardware it can still run at 60 fps!

How does this work? There are three major parts to this program:

• The Python code, which allocates buffers & glues everything together

• The visualization shaders, which let us see the data in the buffers

• The compute shader, which moves everything

## Buffers#

We need a place to store the data we’ll visualize. To do so, we’ll create two Shader Storage Buffer Objects (SSBOs) of floating point numbers from within our Python code. One will hold the previous frame’s star positions, and the other will be used to store calculate the next frame’s positions.

Each buffer must be able to store the following for each star:

1. The x, y, and radius of each star stored

2. The velocity of the star, which will be unused by the visualization

3. The floating point RGBA color of the star

### Generating Aligned Data#

To avoid issues with GPU memory alignment quirks, we’ll use the function below to generate well-aligned data ready to load into the SSBO. The docstrings & comments explain why in greater detail:

Generating Well-Aligned Data to Load onto the GPU#
```def gen_initial_data(
screen_size: Tuple[int, int],
num_stars: int = NUM_STARS,
use_color: bool = False
) -> array:
"""
Generate an :py:class:`~array.array` of randomly positioned star data.

Some of this data is wasted as padding because:

1. GPUs expect SSBO data to be aligned to multiples of 4
2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
so we have to use 4-length vectors anyway.

:param screen_size: A (width, height) of the area to generate stars in
:param num_stars: How many stars to generate
:param use_color: Whether to generate white or randomized pastel stars
:return: an array of star position data
"""
width, height = screen_size
color_channel_min = 0.5 if use_color else 1.0

def _data_generator() -> Generator[float, None, None]:
"""Inner generator function used to illustrate memory layout"""

for i in range(num_stars):
yield random.randrange(0, width)
yield random.randrange(0, height)
yield 0.0  # z (padding, unused by shaders)
yield 6.0

# Velocity (unused by visualization shaders)
yield 0.0
yield 0.0
yield 0.0  # vz (padding, unused by shaders)
yield 0.0  # vw (padding, unused by shaders)

# Color
yield random.uniform(color_channel_min, 1.0)  # r
yield random.uniform(color_channel_min, 1.0)  # g
yield random.uniform(color_channel_min, 1.0)  # b
yield 1.0  # a

# Use the generator function to fill an array in RAM
return array('f', _data_generator())
```

### Allocating the Buffers#

Allocating the Buffers & Loading the Data onto the GPU#
```        # --- Create buffers

# Create pairs of buffers for the compute & visualization shaders.
# We will swap which buffer instance is the initial value and
# which is used as the current value to write to.

# ssbo = shader storage buffer object
initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
self.ssbo_previous = self.ctx.buffer(data=initial_data)
self.ssbo_current = self.ctx.buffer(data=initial_data)

# vao = vertex array object
# Format string describing how to interpret the SSBO buffer data.
# 4f = position and size -> x, y, z, radius
# 4x4 = Four floats used for calculating velocity. Not needed for visualization.
# 4f = color -> rgba
buffer_format = "4f 4x4 4f"

# Attribute variable names for the vertex shader
attributes = ["in_vertex", "in_color"]

self.vao_previous = self.ctx.geometry(
[BufferDescription(self.ssbo_previous, buffer_format, attributes)],
mode=self.ctx.POINTS,
)
self.vao_current = self.ctx.geometry(
[BufferDescription(self.ssbo_current, buffer_format, attributes)],
mode=self.ctx.POINTS,
)
```

Now that we have the data, we need to be able to visualize it. We’ll do it by applying vertex, geometry, and fragment shaders to convert the data in the SSBO into pixels. For each star’s 12 floats in the array, the following flow of data will take place:

In this tutorial, the vertex shader will be run for each star’s 12 float long stretch of raw padded data in `self.ssbo_current`. Each execution will output clean typed data to an instance of the geometry shader.

Data is read in as follows:

• The x, y, and radius of each star are accessed via `in_vertex`

• The floating point RGBA color of the star, via `in_color`

``` 1#version 330
2
3in vec4 in_vertex;
4in vec4 in_color;
5
6out vec2 vertex_pos;
8out vec4 vertex_color;
9
10void main()
11{
12    vertex_pos = in_vertex.xy;
13    vertex_radius = in_vertex.w;
14    vertex_color = in_color;
15}
```

The variables below are then passed as inputs to the geometry shader:

• `vertex_pos`

• `vertex_radius`

• `vertex_color`

The geometry shader converts a single point into a quad, in this case a square, which the GPU can render. It does this by emitting four points centered on the input point.

``` 1#version 330
2
3layout (points) in;
4layout (triangle_strip, max_vertices = 4) out;
5
6// Use arcade's global projection UBO
7uniform Projection {
8    uniform mat4 matrix;
9} proj;
10
11
12// The outputs from the vertex shader are used as inputs
13in vec2 vertex_pos[];
15in vec4 vertex_color[];
16
17// These are used with EmitVertex to generate four points of
18// a quad centered around vertex_pos[0].
19out vec2 g_uv;
20out vec3 g_color;
21
22void main() {
23    vec2 center = vertex_pos[0];
24    vec2 hsize = vec2(vertex_radius[0]);
25
26    g_color = vertex_color[0].rgb;
27
28    gl_Position = proj.matrix * vec4(vec2(-hsize.x, hsize.y) + center, 0.0, 1.0);
29    g_uv = vec2(0, 1);
30    EmitVertex();
31
32    gl_Position = proj.matrix * vec4(vec2(-hsize.x, -hsize.y) + center, 0.0, 1.0);
33    g_uv = vec2(0, 0);
34    EmitVertex();
35
36    gl_Position = proj.matrix * vec4(vec2(hsize.x, hsize.y) + center, 0.0, 1.0);
37    g_uv = vec2(1, 1);
38    EmitVertex();
39
40    gl_Position = proj.matrix * vec4(vec2(hsize.x, -hsize.y) + center, 0.0, 1.0);
41    g_uv = vec2(1, 0);
42    EmitVertex();
43
44    // End geometry emmission
45    EndPrimitive();
46}
```

A fragment shader runs for each pixel in a quad. It converts a UV coordinate within the quad to a float RGBA value. In this tutorial’s case, the shader produces the soft glowing circle on the surface of each star’s quad.

``` 1#version 330
2
3in vec2 g_uv;
4in vec3 g_color;
5
6out vec4 out_color;
7
8void main()
9{
10    float l = length(vec2(0.5, 0.5) - g_uv.xy);
11    if ( l > 0.5)
12    {
14    }
15    float alpha;
16    if (l == 0.0)
17        alpha = 1.0;
18    else
19        alpha = min(1.0, .60-l * 2);
20
21    vec3 c = g_color.rgb;
22    // c.xy += v_uv.xy * 0.05;
23    // c.xy += v_pos.xy * 0.75;
24    out_color = vec4(c, alpha);
25}
```

Now that we have a way to display data, we should update it.

We created pairs of buffers earlier. We will use one SSBO as an input buffer holding the previous frame’s data, and another as our output buffer to write results to.

We then swap our buffers each frame after drawing, using the output as the input of the next frame, and repeat the process until the program stops running.

``` 1#version 430
2
3// Set up our compute groups.
4// The COMPUTE_SIZE_X and COMPUTE_SIZE_Y values will be replaced
5// by the Python code with actual values. This does not happen
6// automatically, and must be called manually.
7layout(local_size_x=COMPUTE_SIZE_X, local_size_y=COMPUTE_SIZE_Y) in;
8
9// Input uniforms would go here if you need them.
10// The examples below match the ones commented out in main.py
11//uniform vec2 screen_size;
12//uniform float frame_time;
13
14// Structure of the star data
15struct Star
16{
17    vec4 pos;
18    vec4 vel;
19    vec4 color;
20};
21
22// Input buffer
23layout(std430, binding=0) buffer stars_in
24{
25    Star stars[];
26} In;
27
28// Output buffer
29layout(std430, binding=1) buffer stars_out
30{
31    Star stars[];
32} Out;
33
34void main()
35{
36    int curStarIndex = int(gl_GlobalInvocationID);
37
38    Star in_star = In.stars[curStarIndex];
39
40    vec4 p = in_star.pos.xyzw;
41    vec4 v = in_star.vel.xyzw;
42
43    // Move the star according to the current force
44    p.xy += v.xy;
45
46    // Calculate the new force based on all the other bodies
47    for (int i=0; i < In.stars.length(); i++) {
48        // If enabled, this will keep the star from calculating gravity on itself
49        // However, it does slow down the calcluations do do this check.
50        //  if (i == x)
51        //      continue;
52
53        // Calculate distance squared
54        float dist = distance(In.stars[i].pos.xyzw.xy, p.xy);
55        float distanceSquared = dist * dist;
56
57        // If distance is too small, extremely high forces can result and
58        // fling the star into escape velocity and forever off the screen.
59        // Using a reasonable minimum distance to prevents this.
60        float minDistance = 0.02;
61        float gravityStrength = 0.3;
62        float simulationSpeed = 0.002;
63        float force = min(minDistance, gravityStrength / distanceSquared) * -simulationSpeed;
64
65        vec2 diff = p.xy - In.stars[i].pos.xyzw.xy;
66        // We should normalize this I think, but it doesn't work.
67        //  diff = normalize(diff);
68        vec2 delta_v = diff * force;
69        v.xy += delta_v;
70    }
71
72
73    Star out_star;
74    out_star.pos.xyzw = p.xyzw;
75    out_star.vel.xyzw = v.xyzw;
76
77    vec4 c = in_star.color.xyzw;
78    out_star.color.xyzw = c.xyzw;
79
80    Out.stars[curStarIndex] = out_star;
81}
```

## The Finished Python Program#

The code includes thorough docstrings and annotations explaining how it works.

main.py#
```  1"""
2N-Body Gravity with Compute Shaders & Buffers
3"""
4import random
5from array import array
6from pathlib import Path
7from typing import Generator, Tuple
8
10from arcade.gl import BufferDescription
11
12# Window dimensions in pixels
13WINDOW_WIDTH = 800
14WINDOW_HEIGHT = 600
15
16# Size of performance graphs in pixels
17GRAPH_WIDTH = 200
18GRAPH_HEIGHT = 120
19GRAPH_MARGIN = 5
20
21NUM_STARS: int = 4000
22USE_COLORED_STARS: bool = True
23
24
25def gen_initial_data(
26        screen_size: Tuple[int, int],
27        num_stars: int = NUM_STARS,
28        use_color: bool = False
29) -> array:
30    """
31    Generate an :py:class:`~array.array` of randomly positioned star data.
32
33    Some of this data is wasted as padding because:
34
35    1. GPUs expect SSBO data to be aligned to multiples of 4
36    2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
37       so we have to use 4-length vectors anyway.
38
39    :param screen_size: A (width, height) of the area to generate stars in
40    :param num_stars: How many stars to generate
41    :param use_color: Whether to generate white or randomized pastel stars
42    :return: an array of star position data
43    """
44    width, height = screen_size
45    color_channel_min = 0.5 if use_color else 1.0
46
47    def _data_generator() -> Generator[float, None, None]:
48        """Inner generator function used to illustrate memory layout"""
49
50        for i in range(num_stars):
52            yield random.randrange(0, width)
53            yield random.randrange(0, height)
54            yield 0.0  # z (padding, unused by shaders)
55            yield 6.0
56
57            # Velocity (unused by visualization shaders)
58            yield 0.0
59            yield 0.0
60            yield 0.0  # vz (padding, unused by shaders)
61            yield 0.0  # vw (padding, unused by shaders)
62
63            # Color
64            yield random.uniform(color_channel_min, 1.0)  # r
65            yield random.uniform(color_channel_min, 1.0)  # g
66            yield random.uniform(color_channel_min, 1.0)  # b
67            yield 1.0  # a
68
69    # Use the generator function to fill an array in RAM
70    return array('f', _data_generator())
71
72
74
75    def __init__(self):
76        # Ask for OpenGL context supporting version 4.3 or greater when
77        # calling the parent initializer to make sure we have compute shader
78        # support.
79        super().__init__(
80            WINDOW_WIDTH, WINDOW_HEIGHT,
81            "N-Body Gravity with Compute Shaders & Buffers",
82            gl_version=(4, 3),
83            resizable=False
84        )
85        # Attempt to put the window in the center of the screen.
86        self.center_window()
87
88        # --- Create buffers
89
90        # Create pairs of buffers for the compute & visualization shaders.
91        # We will swap which buffer instance is the initial value and
92        # which is used as the current value to write to.
93
94        # ssbo = shader storage buffer object
95        initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
96        self.ssbo_previous = self.ctx.buffer(data=initial_data)
97        self.ssbo_current = self.ctx.buffer(data=initial_data)
98
99        # vao = vertex array object
100        # Format string describing how to interpret the SSBO buffer data.
101        # 4f = position and size -> x, y, z, radius
102        # 4x4 = Four floats used for calculating velocity. Not needed for visualization.
103        # 4f = color -> rgba
104        buffer_format = "4f 4x4 4f"
105
106        # Attribute variable names for the vertex shader
107        attributes = ["in_vertex", "in_color"]
108
109        self.vao_previous = self.ctx.geometry(
110            [BufferDescription(self.ssbo_previous, buffer_format, attributes)],
111            mode=self.ctx.POINTS,
112        )
113        self.vao_current = self.ctx.geometry(
114            [BufferDescription(self.ssbo_current, buffer_format, attributes)],
115            mode=self.ctx.POINTS,
116        )
117
118        # --- Create the visualization shaders
119
123
124        # Create the complete shader program which will draw the stars
125        self.program = self.ctx.program(
129        )
130
131        # --- Create our compute shader
132
133        # Load in the raw source code safely & auto-close the file
135
136        # Compute shaders use groups to parallelize execution.
137        # You don't need to understand how this works yet, but the
138        # values below should serve as reasonable defaults. Later, we'll
139        # preprocess the shader source by replacing the templating token
140        # with its corresponding value.
141        self.group_x = 256
142        self.group_y = 1
143
144        self.compute_shader_defines = {
145            "COMPUTE_SIZE_X": self.group_x,
146            "COMPUTE_SIZE_Y": self.group_y
147        }
148
149        # Preprocess the source by replacing each define with its value as a string
150        for templating_token, value in self.compute_shader_defines.items():
152
154
155        # --- Create the FPS graph
156
157        # Enable timings for the performance graph
159
160        # Create a sprite list to put the performance graph into
161        self.perf_graph_list = arcade.SpriteList()
162
163        # Create the FPS performance graph
164        graph = arcade.PerfGraph(GRAPH_WIDTH, GRAPH_HEIGHT, graph_data="FPS")
165        graph.position = GRAPH_WIDTH / 2, self.height - GRAPH_HEIGHT / 2
166        self.perf_graph_list.append(graph)
167
168    def on_draw(self):
169        # Clear the screen
170        self.clear()
171        # Enable blending so our alpha channel works
172        self.ctx.enable(self.ctx.BLEND)
173
174        # Bind buffers
175        self.ssbo_previous.bind_to_storage_buffer(binding=0)
176        self.ssbo_current.bind_to_storage_buffer(binding=1)
177
178        # If you wanted, you could set input variables for compute shader
179        # as in the lines commented out below. You would have to add or
180        # uncomment corresponding lines in compute_shader.glsl
181        # self.compute_shader["screen_size"] = self.get_size()
182        # self.compute_shader["frame_time"] = self.frame_time
183
184        # Run compute shader to calculate new positions for this frame
186
187        # Draw the current star positions
188        self.vao_current.render(self.program)
189
190        # Swap the buffer pairs.
191        # The buffers for the current state become the initial state,
192        # and the data of this frame's initial state will be overwritten.
193        self.ssbo_previous, self.ssbo_current = self.ssbo_current, self.ssbo_previous
194        self.vao_previous, self.vao_current = self.vao_current, self.vao_previous
195
196        # Draw the graphs
197        self.perf_graph_list.draw()
198
199
200
201if __name__ == "__main__":
202    app = NBodyGravityWindow()