Online Deep Learning Training and Inference in HPC Programs with TorchFort Library

Monday, June 22, 2026 2:00 PM to 6:00 PM · 4 hr. (Europe/Berlin)

Hall X9 - 1st Floor

Tutorial

AI Applications powered by HPC TechnologiesEngineeringHPC Simulations enhanced by Machine LearningML Systems and FrameworksPhysics

Information

Researchers are using numerical simulation data to train deep learning (DL) models for a wide variety of tasks. These models include surrogate models for efficient parameter space exploration applications, regression models for approximating numerics, generative models for super-resolution applications and reinforcement learning (RL) models for control applications. However, as researchers undertake simulations at increasingly high resolutions, it can lead to an explosion of data which is difficult to harness for deep learning purposes. For example, a high-resolution direct numerical simulation (DNS) computational fluid dynamics (CFD) data can be hundreds of GB per single time snapshot. To circumvent this, we can adopt the online training approach where the DL training process is run concurrently to the simulation and the training data is read directly from the memory without the need for storing it to disk. Online training is also a natural framework for reinforcement learning applications as they require interaction between the agent and simulation environment.

Fortran and C/C++ HPC codes underpin the majority of scientific computing applications, whereas deep learning is dominated by Python. In this tutorial, we will show how to use the TorchFort library to perform online DL training and inference with Fortran and C++ -based numerical simulation programs. The tutorial is structured as follows. First, we start with a lecture that covers most common techniques, model architectures and applications in AI for Science. In the lecture, we also delve deeper into the online (in-situ) learning approach and detail the TorchFort library. The last two hours of the tutorial are dedicated to a series of hands-on exercises where participants are guided to implement the online training and inference approach within a real Fortran-based simulation code.

Prerequisite: We will use NVIDIA Brev platform to run the exercises. Prior to the tutorial, please email teaching assistant Benet Eiximeno (beiximeno@nvidia.com) to receive an invitation to register and join the tutorial group on Brev.

Format

on-site

Targeted Audience

Numerical simulation researchers and scientific AI researchers, in particular those who are interested in combining Fortran and C++ -based HPC codes with AI capabilities.

Beginner Level

50%

Intermediate Level

50%

Prerequesites

The participants should bring their laptop. We will arrange a compute platform for the duration of the tutorial together with a containerised environment, including pre-built TorchFort-enabled applications that participants can modify.

Speakers

Niki Loppi

Sr. AI/HPC Solutions ArchitectNVIDIA

Benet Eiximeno

Sr. AI/HPC Solutions ArchitectNVIDIA

Frédéric Parienté

Solutions Architect DirectorNVIDIA

Registered attendees

BSJ

Benjamin Starostka Jakobsen

Compute CoordinatorPioneer Centre for Artificial Intelligence

CKF

Christian Kracher Fischer

ProjectmanagerUniversität Wien

Desara Papa

StudentEPCC - The University of Edinburgh