Scheduling HPC and AI Workloads the Cloud-Native Way

Monday, June 22, 2026 9:00 AM to 1:00 PM · 4 hr. (Europe/Berlin)

Hall X7 - 1st Floor

Tutorial

AI Applications powered by HPC TechnologiesAI FactoriesHPC in the Cloud and HPC ContainersResource Management and SchedulingRuntime Systems for HPC

Information

High performance computing (HPC) practitioners and AI researchers have long relied on traditional schedulers such as Slurm, PBS, and LSF to manage workloads. While these systems remain foundational, they can struggle to accommodate the growing convergence of simulation, data analytics, and AI/ML workflows — particularly when heterogeneous accelerators and cloud environments come into play.

This tutorial explores cloud-native approaches to workload scheduling using Kubernetes, focusing on how modern tools can complement and extend traditional HPC schedulers. We will cover both conceptual and practical dimensions:
- The state of scheduling today: where traditional HPC tools succeed and where they fall short for modern AI/ML workflows.
- Running Slurm on Kubernetes with the Slinky project: extending familiar HPC scheduling concepts into a container-native environment.
- Using Kueue with Kubernetes: a purpose-built cloud-native job scheduler for AI and HPC-style batch workloads.
- Hybrid scheduling models: how to integrate simulation pipelines, GPU-heavy ML training, and inference workloads into a unified Kubernetes platform.
- Real-world use cases: lessons from deployments in scientific, aerospace, and space computing environments.

Format

on-site

Targeted Audience

HPC practitioners and researchers, System administrators and architects, Developers and data scientists

Beginner Level

100%

Prerequesites

Laptop