ALIEN on AMD GPUs
Porting an artificial life simulator from CUDA to ROCm/HIP
24 Apr 2026
Mauri de Souza Meneguzzo
24 Apr 2026
Mauri de Souza Meneguzzo
Artificial LIfe ENvironment.
A 2D particle physics simulation where digital organisms evolve, replicate, and compete.
The simulation kernel runs entirely on the GPU. Every particle update, collision, and cell-function evaluation happens in a CUDA (now HIP) kernel.
Winner of the ALIFE 2024 Virtual Creatures Competition.
2
The upstream ALIEN requires an NVIDIA GPU with compute capability 6.0+.
I have an AMD RX 7900 XTX with 24 GB of VRAM. It speaks ROCm/HIP, not CUDA.
Automatic translation tools exist, such as SCALE and ZLUDA.
The goal: port ALIEN source code to HIP and compile/run natively on AMD GPUs, with no runtime behaviour changes.
4HIP is AMD's portability layer. It mirrors the CUDA API almost exactly.
CUDA HIP
------------------------ ------------------------
cudaMalloc hipMalloc
cudaMemcpy hipMemcpy
__syncthreads() __syncthreads()
cudaDeviceSynchronize hipDeviceSynchronize
.cu / .cuh .hip / .hip.h
The semantics are the same. The namespaces differ.
HIP compiles to either NVIDIA (nvcc backend) or AMD (hipcc via ROCm).
With __HIP_PLATFORM_AMD__ defined, you target ROCm.
AMD ships hipify-perl, a Perl script that mechanically translates CUDA source to HIP.
#!/usr/bin/env bash
find source -type f \( -name "*.cu" -o -name "*.cuh" \) | while read -r file; do
hipify-perl --inplace "$file"
case "$file" in
*.cu) mv "$file" "${file%.cu}.hip" ;;
*.cuh) mv "$file" "${file%.cuh}.hip.h" ;;
esac
done
This handles the mechanical part: namespace prefixes, type names, function names.
After running it, ALIEN had .hip kernel files where there were .cu files.
hipify handles the easy cases. The hard cases are manual.
cudaGraphicsGLRegisterImage, cudaGraphicsMapResources, etc. — these are NVIDIA-only extension paths. AMD has its own hipGraphicsGLRegisterImage but the semantics and extension string differ.__device__ / __host__ combinations behave differently under hipcc.The upstream CMakeLists used LANGUAGES CUDA. HIP needs its own language declaration:
# Before
project(alien-project LANGUAGES C CXX CUDA)
# After
project(alien-project LANGUAGES C CXX HIP)
And the compile definition that tells HIP which platform we target:
add_compile_definitions(__HIP_PLATFORM_AMD__)
The build script sets up the ROCm environment and pins the architecture:
cmake \
-S . -B build \
-DCMAKE_MODULE_PATH=/opt/rocm/lib/cmake/hip \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_HIP_ARCHITECTURES="gfx1100"
gfx1100 is the shader model for the RX 7900 XTX (RDNA 3).
CUDA-OpenGL interop lets kernels write directly into an OpenGL texture on the GPU without a round-trip through CPU memory.
AMD does not support hipGraphicsGLRegisterImage. The port uses a Pixel Buffer Object (PBO) instead:
// CUDA: register the texture directly
cudaGraphicsGLRegisterImage(&res, texId, GL_TEXTURE_2D,
cudaGraphicsRegisterFlagsWriteDiscard);
// HIP/AMD: register a PBO, copy rendered data into it, upload via GL
hipGraphicsGLRegisterBuffer(&res, pboId, hipGraphicsRegisterFlagsNone);
hipGraphicsMapResources(1, &res, 0);
hipGraphicsResourceGetMappedPointer(&pboPtr, &size, res);
hipMemcpy(pboPtr, imageData, size, hipMemcpyDeviceToDevice);
hipGraphicsUnmapResources(1, &res, 0);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, w, h, GL_RGBA, GL_UNSIGNED_BYTE, 0);
hipGLGetDevices() selects the HIP device that shares the current OpenGL context.
RDNA 3 wavefront size is 32 (same as NVIDIA warp). GCN was 64; some legacy AMD code assumes 64.
The main tuning change was reducing numBlocks in GpuSettings.h:
// Before (tuned for NVIDIA)
int numBlocks = 16384;
// After (tuned for RDNA 3)
int numBlocks = 1024;
All physics kernels are launched as:
func<<<gpuSettings.numBlocks, 8>>>(...);
8 threads per block keeps each wavefront slot flexible; numBlocks controls total occupancy.
ALIEN compiles and runs on an RX 7900 XTX with ROCm (CMake 3.31, ROCm 7.2).
Benchmark on the default simulation with an RX 7900 XTX:
500 time steps, 2934 ms → 170.4 TPS (CLI)
~130 TPS, ~40 FPS @ 4K (GUI)
~200 TPS (GUI, rendering disabled with ALT+I)
github.com/mauri870/alien (this fork)
chrxh/alien (upstream ALIEN)
My talks are written with golang.org/x/tools/present
Find this talk at talks.mauri870.com
12