Performance Engineering for Sparse Linear Solvers

Monday, June 22, 2026 9:00 AM to 1:00 PM · 4 hr. (Europe/Berlin)

Hall X5 - 1st Floor

Tutorial

Development of HPC SkillsMixed PrecisionOptimizing for Energy and PerformanceParallel Numerical AlgorithmsPerformance and Resource Modeling

Information

This tutorial covers code analysis, performance modeling, and optimization for sparse linear solvers on CPU and GPU nodes. Performance Engineering is often taught using simple loops as instructive examples for performance models and how they can guide optimization; however, full, preconditioned linear solvers comprise multiple back-to-back loops enclosed in an iteration scheme that is executed until convergence is achieved. Consequently, the concept of “optimal performance” has to account for both hardware resource efficiency and iterative solver convergence. We convey a performance engineering process that is geared towards linear iterative solvers. After introducing basic notions of hardware organization and storage for dense and sparse data structures, we show how the Roofline performance model can be applied to such solvers in predictive and diagnostic ways and how it can be used to assess the hardware efficiency of a solver, covering important corner cases such as pure memory boundedness. Then we advance to the structure of preconditioned solvers, using the Conjugate Gradient Method (CG) algorithm as a leading example. Hotspots and bottlenecks of the complete solver are identified followed by the introduction of advanced algorithmic and implementation-centric optimization techniques like the use of preconditioners and cache blocking. The interplay among solver performance, convergence, and actual time to solution is given special attention. In hands-on exercises, attendees will be able to carry out experiments on a GPU cluster and study the influence of matrix data formats, preconditioners, and cache optimizations.

Contributors:

Christie Alappat

Format

on-site

Targeted Audience

Computational scientists (students, developers, users) who want to gain deeper insight into the hardware performance and convergence properties of the solvers they learn about, develop, and employ.

Beginner Level

30%

Intermediate Level

70%

Prerequesites

Attendees should have a grasp of simple linear solvers (e.g., conjugate gradient) and of parallel programming. The tutorial contains hands-on exercises in Python. The attendees access HPC resources via Jupyter Notebooks. Some knowledge of Python programming and Jupyter Notebook usage is advantageous but not mandatory.

Speakers

Georg Hager

Head of ResearchUniversity of Erlangen-Nuremberg, Erlangen Nartional High Performance Computing Center

Jonas Thies

Assistant ProfessorTU Delft

Hartwig Anzt

ProfessorUniversity of Tennessee, Technical University of Munich

Registered attendees

Andreas Henkel

Lead HPC Operations and User SupportJohannes Gutenberg-Universität Mainz

Eun-Kyeong Kim

HPC Software EngineerLuxProvide S.A.

Georg Hager

Head of ResearchUniversity of Erlangen-Nuremberg, Erlangen Nartional High Performance Computing Center