

Energy-Efficient GPU Allocation and Frequency Management in Exascale Computing Systems
Wednesday, June 11, 2025 9:25 AM to 9:50 AM · 25 min. (Europe/Berlin)
Hall F - 2nd floor
Research Paper
Energy ManagementHeterogeneous System ArchitecturesResource Management and SchedulingRuntime Systems for HPCSustainability and Energy Efficiency
Information
Heterogeneous CPU-GPU architectures dominate high-performance computing (HPC), but their increasing power demands pose significant challenges at exascale levels. Dynamic voltage and frequency scaling (DVFS) is the most efficient technique for managing power consumption, but its effectiveness is hindered by hardware variability and aging effects, which introduce power and performance heterogeneity across identical GPUs. With that in mind, we propose V-FORGE, a variability and frequency-aware optimization framework for improving GPU energy efficiency in HPC environments. V-FORGE integrates power-performance variability profiling, a Random Forest classifier for GPU frequency prediction, and dynamic optimization to select the most suitable GPU node and frequency for each application. Through experiments on 400 AMD MI250X GPUs across fifteen HPC applications, V-FORGE improves the performance and energy efficiency (represented by the energy-delay product -- EDP) by up to 41% compared to default execution of HPC applications and achieves over 73% of solutions within the Top-1% of optimal configurations found via exhaustive search. Additionally, V-FORGE demonstrates adaptability in dynamic environments by efficiently managing scenarios where the ideal GPU node is unavailable.
Contributors:
Contributors:
Format
On DemandOn Site
Documents & Links
Read the Full Paper Open Access at IEEE Xplore!




