Sharing Experiences and Challenges in the Dynamic Use of Resources in HPC/AI

Sharing Experiences and Challenges in the Dynamic Use of Resources in HPC/AI

Tuesday, June 23, 2026 4:00 PM to 5:00 PM · 1 hr. (Europe/Berlin)
Hall G1 - 2nd Floor
Birds of a Feather
Community EngagementOptimizing for Energy and PerformanceResource Management and SchedulingRuntime Systems for HPC

Information

Breaking out of the constraints of static resource allocation and management, as used today in classical batch scheduling systems prevalent in HPC/AI systems, is mandatory to enable further growth in supercomputers. Traditional schedulers do not efficiently account for temporal variations in workload intensity for jobs running on large-scale resources and thus lead to under-use of resources from an entire system point of view. Contrary, dynamic scheduling and resource management frameworks can dynamically optimize the allocation of compute, memory, network, and I/O resources. They achieve this by means of adaptive/predictive scheduling algorithms, leveraging advanced, potentially AI-based data analytics and prediction methods based on real-time system monitoring and application-provided telemetry information.

This BoF will discuss the potential and challenges of dynamic scheduling and resource management from a practical perspective, driven by existing use cases for which these approaches can significantly improve utilization rates, decrease energy consumption and operating costs, and thereby reduce the carbon footprint of HPC/AI systems and make their operation more affordable and sustainable.

The proposed BoF will consist of a short “lightning” talk (up to 3 minutes) by each speaker, followed by a discussion between the panel of presenters and the audience. The objective is to engage the audience with practical examples that resonate with the challenges attendees face in their own work. We expect this format to encourage audience participation and enhance the interactivity of the session.

The proposed BoF will address fundamental research challenges such as effective and efficient dynamic scheduling, resource management and control methods, minimizing data transfers by achieving data co-location, and very importantly enabling applications to interact with dynamic scheduling and resource management and realize benefits, which in turn will require judicious co-design. The research directions discussed will be put into the perspective of leading-edge strategies, tools, and methodologies for achieving highly optimized, energy-efficient, and green/sustainable use of large-scale HPC/AI resources. The panel of presenters consist of internationally leading experts who will drive a relevant and meaningful discussion with the BoF participants.

Achieving effective and efficient dynamic scheduling and resource utilization requires changes across the entire HPC software stack, and potentially innovative HW support. Consequently, this BoF will look for collaborative opportunities to advance research and development. This includes fostering partnerships between academia, industry, and government to develop scalable open-source tools for dynamic scheduling and resource management and ensure that the required support will be integrated into important software and standards like MPI, OpenMP, Flux, and AI frameworks. This will lead to the next generation of sustainable HPC/AI technologies.
Organizers:
Format
on-site
Targeted Audience
We bring together researchers from diverse areas of HPC, AI and Data Analytics impacted or actively pursuing dynamic scheduling or resource management concepts. This targets in particular HPC/AI end users, application developers, system software researchers, system architects and operators of HPC/AI supercomputers.
BoF Format
Birds of a Feather Presentation