Poster is on display.
This poster outlines RIST's user--support activities for full-system executions on the supercomputer Fugaku, a 158,976-node Arm-based system. Full-system executions allocate nearly the entire machine to a single project and are conducted only twice a year. Over the last two years, RIST has supported eight awarded projects, providing guidance from application preparation to performance tuning.
The call for full-node-scale simulations includes strict requirements, such as prior completion of a half-size full-node run and eligibility for future Gordon Bell Prize submission. Successful execution requires extremely high parallelism, hybrid MPI/OpenMP programming, balanced loads, optimized MPI communication, distributed I/O, uniform memory usage, and robust runtime monitoring. Among these, MPI communication optimization emerged as one of the most critical challenges at extreme scale.
To mitigate communication bottlenecks on Fugaku’s Tofu-D interconnect, RIST developed the FugakuNodeMappingTools, which automatically generate optimized rank-mapping files. The tool arranges processes to confine intensive communication within local 2x2x1 Tofu-unit blocks and minimizes hop counts along the Z dimension. This method significantly improves the efficiency of communication-intensive kernels such as MPI_Alltoall. The tool has demonstrated 3.0–5.5x performance improvements in real projects and contributed to the successful completion of full-system executions.
Future work includes further enhancement of tuning methodologies and broader support for full and half-node-scale simulations.
Contributors: