ALIEN on AMD GPUs

Porting an artificial life simulator from CUDA to ROCm/HIP

24 Apr 2026

Mauri de Souza Meneguzzo

What is ALIEN?

Artificial LIfe ENvironment.

A 2D particle physics simulation where digital organisms evolve, replicate, and compete.

The simulation kernel runs entirely on the GPU. Every particle update, collision, and cell-function evaluation happens in a CUDA (now HIP) kernel.

Winner of the ALIFE 2024 Virtual Creatures Competition.

2

ALIEN in action

Watch on YouTube

3

The problem: CUDA is NVIDIA-only

The upstream ALIEN requires an NVIDIA GPU with compute capability 6.0+.

I have an AMD RX 7900 XTX with 24 GB of VRAM. It speaks ROCm/HIP, not CUDA.

Automatic translation tools exist, such as SCALE and ZLUDA.

The goal: port ALIEN source code to HIP and compile/run natively on AMD GPUs, with no runtime behaviour changes.

4

CUDA vs HIP

HIP is AMD's portability layer. It mirrors the CUDA API almost exactly.

CUDA                      HIP
------------------------  ------------------------
cudaMalloc                hipMalloc
cudaMemcpy                hipMemcpy
__syncthreads()           __syncthreads()
cudaDeviceSynchronize     hipDeviceSynchronize
.cu / .cuh                .hip / .hip.h

The semantics are the same. The namespaces differ.

HIP compiles to either NVIDIA (nvcc backend) or AMD (hipcc via ROCm). With __HIP_PLATFORM_AMD__ defined, you target ROCm.

5

hipify-perl

AMD ships hipify-perl, a Perl script that mechanically translates CUDA source to HIP.

#!/usr/bin/env bash
find source -type f \( -name "*.cu" -o -name "*.cuh" \) | while read -r file; do
    hipify-perl --inplace "$file"
    case "$file" in
        *.cu)  mv "$file" "${file%.cu}.hip"   ;;
        *.cuh) mv "$file" "${file%.cuh}.hip.h" ;;
    esac
done

This handles the mechanical part: namespace prefixes, type names, function names.

After running it, ALIEN had .hip kernel files where there were .cu files.

6

What hipify cannot do

hipify handles the easy cases. The hard cases are manual.

7

CMake changes

The upstream CMakeLists used LANGUAGES CUDA. HIP needs its own language declaration:

# Before
project(alien-project LANGUAGES C CXX CUDA)

# After
project(alien-project LANGUAGES C CXX HIP)

And the compile definition that tells HIP which platform we target:

add_compile_definitions(__HIP_PLATFORM_AMD__)

The build script sets up the ROCm environment and pins the architecture:

cmake \
    -S . -B build \
    -DCMAKE_MODULE_PATH=/opt/rocm/lib/cmake/hip \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_HIP_ARCHITECTURES="gfx1100"

gfx1100 is the shader model for the RX 7900 XTX (RDNA 3).

8

OpenGL interop

CUDA-OpenGL interop lets kernels write directly into an OpenGL texture on the GPU without a round-trip through CPU memory.

AMD does not support hipGraphicsGLRegisterImage. The port uses a Pixel Buffer Object (PBO) instead:

// CUDA: register the texture directly
cudaGraphicsGLRegisterImage(&res, texId, GL_TEXTURE_2D,
    cudaGraphicsRegisterFlagsWriteDiscard);

// HIP/AMD: register a PBO, copy rendered data into it, upload via GL
hipGraphicsGLRegisterBuffer(&res, pboId, hipGraphicsRegisterFlagsNone);
hipGraphicsMapResources(1, &res, 0);
hipGraphicsResourceGetMappedPointer(&pboPtr, &size, res);
hipMemcpy(pboPtr, imageData, size, hipMemcpyDeviceToDevice);
hipGraphicsUnmapResources(1, &res, 0);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, w, h, GL_RGBA, GL_UNSIGNED_BYTE, 0);

hipGLGetDevices() selects the HIP device that shares the current OpenGL context.

9

Tuning for RDNA 3

RDNA 3 wavefront size is 32 (same as NVIDIA warp). GCN was 64; some legacy AMD code assumes 64.

The main tuning change was reducing numBlocks in GpuSettings.h:

// Before (tuned for NVIDIA)
int numBlocks = 16384;

// After (tuned for RDNA 3)
int numBlocks = 1024;

All physics kernels are launched as:

func<<<gpuSettings.numBlocks, 8>>>(...);

8 threads per block keeps each wavefront slot flexible; numBlocks controls total occupancy.

10

Result

ALIEN compiles and runs on an RX 7900 XTX with ROCm (CMake 3.31, ROCm 7.2).

Benchmark on the default simulation with an RX 7900 XTX:

500 time steps, 2934 ms → 170.4 TPS  (CLI)
~130 TPS, ~40 FPS @ 4K              (GUI)
~200 TPS                             (GUI, rendering disabled with ALT+I)
11

Useful links

github.com/mauri870/alien (this fork)

chrxh/alien (upstream ALIEN)

ROCm HIP documentation

HIPIFY tools

My talks are written with golang.org/x/tools/present

Find this talk at talks.mauri870.com

12

Thank you

24 Apr 2026

Mauri de Souza Meneguzzo

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)