GPU Offloads for Gravity Calculations in SWIFT Cosmology Code

GPU Offloads for Gravity Calculations in SWIFT Cosmology Code

Tuesday, June 10, 2025 3:00 PM to Thursday, June 12, 2025 4:00 PM · 2 days 1 hr. (Europe/Berlin)
Foyer D-G - 2nd floor
Women in HPC Poster
Computational PhysicsHeterogeneous System ArchitecturesParallel Programming Languages

Information

Poster is on display and will be presented at the poster pitch session.
To be compliant with modern heterogeneous HPC systems, large astronomy codes are needing to move towards GPU compatibility. This can be through total redesign or through partial offloading of key sections to GPU. SWIFT (SPH With Inter-dependent Fine-grained Tasking) is a versatile, open-source astronomy code used for a range of research areas in astronomy including galaxy formation, planetary impacts, and cosmology. SWIFT utilizes task-based parallelism, which means that the workload is divided into independent tasks that can be executed concurrently, maximizing the use of available CPU resources, and is optimised for memory-intensive large CPU-only clusters. A significant portion of SWIFT’s runtime is dedicated to gravity calculations. In gravity n-body codes, each particle (representing a celestial object) interacts with every other particle based on gravitational forces, making the calculations computationally intensive. However, the repetitive and non-interdependent nature of these n-body interactions makes them ideal candidates for GPU acceleration.

In this work, I build on the existing SWIFT code by replacing specific CPU-based gravity calculation functions with new GPU kernels, minimizing disruption to the rest of the code while preserving the task-based parallelism. This creates a new hybrid C and CUDA version of the code which transfers gravity calculations to the GPU, freeing up the CPU to carry out the other tasks.

Our GPU-accelerated gravity kernels achieve high accuracy, with less than 1% deviation from CPU results below the Nyquist frequency. We also successfully produce nearly identical final particle distributions to the CPU-only implementation. Furthermore, the utilisation of GPUs allows for a redistribution of the gravity calculations meaning more interactions can be carried out using direct particle-particle summations which are more accurate, leaving only inexpensive multipole approximations to calculate on the CPU.

Although I currently face a memory transfer bottleneck, optimization efforts using CUDA atomics and streams have shown promising improvements. The memory transfer bottleneck arises from the time it takes to move data between the CPU and GPU, which can reduce the overall performance benefit. Future work will focus on eliminating this bottleneck, further integrating GPU offloading into SWIFT’s task system, and leveraging additional GPU features to achieve an overall performance boost for the SWIFT code.
Format
On DemandOn Site