

GENE Efficient Exascale Scaling Strategies
Tuesday, June 10, 2025 3:00 PM to Thursday, June 12, 2025 4:00 PM · 2 days 1 hr. (Europe/Berlin)
Foyer D-G - 2nd floor
Research Poster
Computational PhysicsHeterogeneous System ArchitecturesPerformance Measurement
Information
Poster is on display and will be presented at the poster pitch session.
The Gyrokinetic Electromagnetic Numerical Experiment (GENE) code is a partial integro-differential equation solver that simulates microturbulence in magnetically confined plasma in nuclear fusion devices. In order to keep pace with the trend of GPU-accelerated exascale supercomputers, identification of GENE’s potential scaling challenges on GPUs is a necessity. This ensures both longevity of the code and proper utilisation of hardware resources in the age of GPU dominated supercomputing.
GENE simulates plasma turbulence by solving the gyroaveraged-Vlasov equation in combination with Ampère’s law, Poisson’s and the induction equation, which results in a 5 dimensional partial integro-differential system of equations, with 3 spatial dimensions (x, k_y, z) and 2 velocity spaces (parallel to background magnetic field ν_∥, magnetic moment μ). The simulation domain further becomes a 6 dimensional grid when multiple particle species are simulated. Numerical computation of the individual terms in the system of equations are done on the GPU during time-stepping. Current GPU kernel implementation works best with mapping an MPI process to the GPU within the same NUMA domain, meaning 4 MPI processes per JUWELS Booster node that hosts 4 A100 GPUs each.
In the current work, we have identified two aspects that pose challenges when attempting to scale up the resolution of the simulation grid above 200 × 960 × 84× 80 × 30 × 2 in x, k_y, z, ν_∥, μ and species. First is the high demand of GPU memory due to the high dimensionality of the numerical problem, already requiring >3TB of total GPU memory footprint for a modest resolution of 200 × 128 × 96 × 70 × 24 × 2, which is approximately factor 9 smaller than the prior mentioned grid. The second challenge relates to sub-optimal usage of CPU resources in a scenario where gyroaveraging matrix computation is combined with time-stepping. The matrix computation is a costly CPU-intensive component at the start of the GENE simulation, which also scales with both the grid resolution as well as the dimensionality described earlier. In this work, we present the mitigation strategies for the respective highlighted challenges and demonstrate the scaling behaviour of the GENE code, running on the JUWELS Booster’s Nvidia A100 GPUs.
Contributors:
The Gyrokinetic Electromagnetic Numerical Experiment (GENE) code is a partial integro-differential equation solver that simulates microturbulence in magnetically confined plasma in nuclear fusion devices. In order to keep pace with the trend of GPU-accelerated exascale supercomputers, identification of GENE’s potential scaling challenges on GPUs is a necessity. This ensures both longevity of the code and proper utilisation of hardware resources in the age of GPU dominated supercomputing.
GENE simulates plasma turbulence by solving the gyroaveraged-Vlasov equation in combination with Ampère’s law, Poisson’s and the induction equation, which results in a 5 dimensional partial integro-differential system of equations, with 3 spatial dimensions (x, k_y, z) and 2 velocity spaces (parallel to background magnetic field ν_∥, magnetic moment μ). The simulation domain further becomes a 6 dimensional grid when multiple particle species are simulated. Numerical computation of the individual terms in the system of equations are done on the GPU during time-stepping. Current GPU kernel implementation works best with mapping an MPI process to the GPU within the same NUMA domain, meaning 4 MPI processes per JUWELS Booster node that hosts 4 A100 GPUs each.
In the current work, we have identified two aspects that pose challenges when attempting to scale up the resolution of the simulation grid above 200 × 960 × 84× 80 × 30 × 2 in x, k_y, z, ν_∥, μ and species. First is the high demand of GPU memory due to the high dimensionality of the numerical problem, already requiring >3TB of total GPU memory footprint for a modest resolution of 200 × 128 × 96 × 70 × 24 × 2, which is approximately factor 9 smaller than the prior mentioned grid. The second challenge relates to sub-optimal usage of CPU resources in a scenario where gyroaveraging matrix computation is combined with time-stepping. The matrix computation is a costly CPU-intensive component at the start of the GENE simulation, which also scales with both the grid resolution as well as the dimensionality described earlier. In this work, we present the mitigation strategies for the respective highlighted challenges and demonstrate the scaling behaviour of the GENE code, running on the JUWELS Booster’s Nvidia A100 GPUs.
Contributors:
Format
On DemandOn Site

