Rigorous Evaluation of LLM Components in HPC Research

Monday, June 22, 2026 2:00 PM to 6:00 PM · 4 hr. (Europe/Berlin)

Hall X12 - 1st Floor

Tutorial

Large Language Models and Generative AI in HPC

Information

Large language models (LLMs) are increasingly used as components inside HPC research workflows. They are used from code generation and translation, to agentic tool use for debugging, profiling, and experiment orchestration. While these systems can accelerate development, they also introduce new challenges for rigorous evaluation: outputs are stochastic, behavior is sensitive to prompts and configuration, commercial models change silently over time, and the true contribution of the LLM is often confounded with many other experimental factors. As a result, a growing number of papers report results using LLMs in their study without sufficient transparency, suitable baselines, or statistically sound measurements.

This tutorial will teach participants a practical framework for rigorous evaluation and reporting of LLM components in HPC research. We cover what must be reported for reproducibility, how to design experiments around stochastic components, and when and how to incorporate human validation. We then focus on HPC-specific evaluation and the unique challenges that arise in using LLMs in HPC research. Throughout, we highlight common pitfalls and provide guidance and templates that participants can directly apply to their own projects and papers.

Format

on-site

Targeted Audience

The tutorial is aimed at HPC developers, researchers, and students who use or benefit from AI in their workflows. This includes those who are experts at using LLMs in their work or beginners.

Beginner Level

90%

Intermediate Level

10%

Prerequesites

While we will provide hands-on demos to better understand the concepts, they are not essential to learning the material. Only a modern laptop with internet connection is needed to follow along with the tutorial contents.

Speakers

Daniel Nichols

PostdocHome

Abhinav Bhatele

Associate ProfessorUniversity of Maryland

Harshitha Menon

Research ScientistLawrence Livermore National Laboratory

Registered attendees

Andre Sternbeck

Head of Digital Research ServicesFriedrich-Schiller-University Jena

ADF

Arne Dag Fidjestøl

Head of IT Infrastructure and OperationsNTNU - Norwegian U of Science and Technology

Bryon Foster

DirectorAFRL