UoPC: A User-Based Online Framework to Predict Job Power Consumption in HPC Systems

UoPC: A User-Based Online Framework to Predict Job Power Consumption in HPC Systems

Wednesday, June 11, 2025 1:00 PM to 1:25 PM · 25 min. (Europe/Berlin)
Hall F - 2nd floor
Research Paper
Energy ManagementML Systems and ToolsResource Management and SchedulingSystem and Performance Monitoring

Information

It is fundamental to design accurate workload power prediction techniques to address environmental sustainability challenges in modern high-performance computing (HPC) systems. While existing Machine Learning (ML) approaches are effective, they retain some limitations in production environments.
To address these, we introduce UoPC, a user-based online framework for predicting job power consumption in HPC systems. UoPC leverages ML-based predictive models tailored for individual users, eliminating the need for voluminous data and training. It offers a user-friendly Python implementation suitable for both end-user usage and integration into workload management systems.
We evaluate UoPC on more than 1.3 million jobs executed on Fugaku, a supercomputer hosted at RIKEN, demonstrating its effectiveness. It achieves only a 10% prediction error, with minimal overhead on the system operations. By employing a k-nearest neighbours (KNN) prediction model augmented with Natural Language Processing (NLP), UoPC streamlines prediction processes for newly submitted jobs. It requires only limited historical data, making it practical for diverse high-performance computing environments and workloads.
Contributors:
Format
On DemandOn Site