

Agent 4 Science: Bringing AI Agents into Scientific Workflows and HPC Simulation
Thursday, June 25, 2026 10:40 AM to 11:00 AM · 20 min. (Europe/Berlin)
Hall H, Booth L01 - Ground Floor
HPC Solutions Forum
HPC Simulations enhanced by Machine LearningLarge Language Models and Generative AI in HPCOptimizing for Energy and Performance
Information
Large language models have demonstrated remarkable capabilities in code understanding and generation, yet their integration into HPC and scientific computing workflows remains largely unexplored. The gap is not in what LLMs know about
optimization — they possess broad knowledge of compiler flags, vectorization, memory hierarchy, and algorithmic techniques — but in how to let them act reliably on real codebases with real performance constraints.
In this talk, we present our experience building and iterating on an LLM-driven agent system for automated HPC performance optimization. The agent autonomously profiles applications, forms structured hypotheses about performance bottlenecks, applies source-level and build-level optimizations, and verifies results through statistically rigorous benchmarking — all while maintaining a separation between immutable experimental facts and the agent's evolving interpretations. We describe the key architectural decisions that make this practical: a trust boundary that grants the LLM full strategic autonomy while enforcing execution safety through a minimal harness; a multi-tier memory system that prevents the agent from rewriting history; and search discipline mechanisms — including hypothesis-driven experimentation, exploration tracking, and automated narrative auditing — that keep the agent from going in circles.
We report results on real HPC applications such as BWA (genomic sequence alignment) , discussing both where the agent succeeds (identifying PGO opportunities, build-flag tuning, targeted source patches) and where it hits fundamental limits (serial data dependencies, DRAM-bound workloads). We reflect on what this approach reveals about the current capabilities and limitations of LLM agents in scientific computing, and outline directions for cross-application experience transfer and human-in-the-loop collaboration at the strategic level.
optimization — they possess broad knowledge of compiler flags, vectorization, memory hierarchy, and algorithmic techniques — but in how to let them act reliably on real codebases with real performance constraints.
In this talk, we present our experience building and iterating on an LLM-driven agent system for automated HPC performance optimization. The agent autonomously profiles applications, forms structured hypotheses about performance bottlenecks, applies source-level and build-level optimizations, and verifies results through statistically rigorous benchmarking — all while maintaining a separation between immutable experimental facts and the agent's evolving interpretations. We describe the key architectural decisions that make this practical: a trust boundary that grants the LLM full strategic autonomy while enforcing execution safety through a minimal harness; a multi-tier memory system that prevents the agent from rewriting history; and search discipline mechanisms — including hypothesis-driven experimentation, exploration tracking, and automated narrative auditing — that keep the agent from going in circles.
We report results on real HPC applications such as BWA (genomic sequence alignment) , discussing both where the agent succeeds (identifying PGO opportunities, build-flag tuning, targeted source patches) and where it hits fundamental limits (serial data dependencies, DRAM-bound workloads). We reflect on what this approach reveals about the current capabilities and limitations of LLM agents in scientific computing, and outline directions for cross-application experience transfer and human-in-the-loop collaboration at the strategic level.
HPC Solutions Forum Questions
What is the best way to keep advancing HPC in an AI-driven world?
Format
on-site
Speakers

Di Wang
AIHPC ExpertKAYTUS