

Bridging the Gap Between Genericity and Programmability of Dynamic Resources in HPC
Wednesday, June 11, 2025 11:10 AM to 11:35 AM · 25 min. (Europe/Berlin)
Hall F - 2nd floor
Research Paper
Resource Management and SchedulingRuntime Systems for HPC
Information
With the increasing scale of High-Performance Computing (HPC) systems and a new awareness of the environmental impact of HPC, new strategies are required to improve resource usage efficiency on these systems.
One such strategy is Dynamic Resource Management (DRM) which allows dynamically changing the resources assigned to a job during its execution.
This flexibility can improve productivity in completed jobs per unit of time, resource utilization rate, and energy consumption, among other metrics.
Despite its benefits, DRM remains complex to implement, which is one of the reasons it has not yet been widely adopted in production HPC systems. To address this challenge, we design, implement, and evaluate a methodology aimed at improving the programmability of DRM for iterative algorithm-based applications, while maintaining a generic and flexible foundation. Specifically, we interface the Dynamic Management of Resources API (DMR-API), an application-level abstraction layer that simplifies the adoption of dynamic resources—particularly in classical iteration-based HPC applications—with the Dynamic Processes with PSets (DPP) approach, which provides a set of generic design principles for dynamic resource management in high-performance parallel programming models. This integration makes DRM more accessible for iteration-based HPC applications through the DMR-API, while preserving the generality and flexibility of DPP at the system software level to support a wider range of HPC application types.
Our results show that DRM can be effectively leveraged in HPC environments with minimal coding effort, unlocking the benefits of dynamic resource allocation for job throughput and system utilization.
Contributors:
One such strategy is Dynamic Resource Management (DRM) which allows dynamically changing the resources assigned to a job during its execution.
This flexibility can improve productivity in completed jobs per unit of time, resource utilization rate, and energy consumption, among other metrics.
Despite its benefits, DRM remains complex to implement, which is one of the reasons it has not yet been widely adopted in production HPC systems. To address this challenge, we design, implement, and evaluate a methodology aimed at improving the programmability of DRM for iterative algorithm-based applications, while maintaining a generic and flexible foundation. Specifically, we interface the Dynamic Management of Resources API (DMR-API), an application-level abstraction layer that simplifies the adoption of dynamic resources—particularly in classical iteration-based HPC applications—with the Dynamic Processes with PSets (DPP) approach, which provides a set of generic design principles for dynamic resource management in high-performance parallel programming models. This integration makes DRM more accessible for iteration-based HPC applications through the DMR-API, while preserving the generality and flexibility of DPP at the system software level to support a wider range of HPC application types.
Our results show that DRM can be effectively leveraged in HPC environments with minimal coding effort, unlocking the benefits of dynamic resource allocation for job throughput and system utilization.
Contributors:
Format
On DemandOn Site
Documents & Links
Read the Full Paper Open Access at IEEE Xplore!

