Wattlytics: Peak FLOPS Don’t Buy the Most Science per Euro

Wednesday, June 24, 2026 3:45 PM to 5:15 PM · 1 hr. 30 min. (Europe/Berlin)

Foyer D-G - 2nd Floor

Research Poster

Energy Efficiency and SustainabilityExtreme-scale SystemsOptimizing for Energy and PerformancePerformance and Resource ModelingPerformance Tools and Simulators

Information

Poster is on display and will be presented at the poster pitch session.

##Problem Statement

Modern GPU-accelerated HPC systems are constrained by performance, energy consumption, total cost of ownership (TCO), sustainability mandates, and long system lifetimes. Peak floating-point performance (FLOPS) alone does not reflect the realities of contemporary HPC deployments such as power caps, volatile electricity prices, and fixed budgets.
Procurement often relies on isolated metrics, peak FLOPS, TDP, or TCO estimates, ignoring effects that determine lifetime scientific output: workload-dependent performance scaling, DVFS, limited application scalability, deployment constraints, and uncertainty in economic and operational parameters. Systems optimized for peak FLOPS can suffer diminishing returns: higher per-device flop rates can increase power draw and costs, limit deployable GPUs, and reduce total scientific work. This motivates the central question: Which GPUs and deployment strategies maximize scientific output per euro under realistic constraints?

##Research Approach

We present Wattlytics, an open, browser-based framework integrating performance, power, and cost modeling into a single, uncertainty-aware pipeline. It combines workload-driven performance benchmarks, DVFS-aware power modeling, and multi-year TCO accounting covering capital and operational costs. Sensitivity and uncertainty analysis using elasticity metrics, Sobol indices, and Monte Carlo sampling quantifies robustness to variations in prices, efficiencies, and deployment assumptions. Wattlytics identifies not only the best GPU, but why, under which constraints, and how stable the decision is, revealing decision reversals missed by single-metric approaches.
Supporting modern GPUs (GH200, H100, L40S, L40, A100, A40, L4) and real scientific workloads (GROMACS, AMBER), Wattlytics is FAIR-aligned, and enables reproducible, interactive what-if exploration in the browser.

##Key-Results

We evaluate the triple optimization challenge of performance, power, and TCO under realistic deployment constraints. Wattlytics supports four deployment modes (fixed budget, fixed power envelope, fixed performance target, fixed GPU count), demonstrating that optimal GPU choices strongly depend on the dominant constraint.
Several non-obvious insights emerge. First, energy-efficient GPUs often deliver more lifetime scientific work than flagship accelerators. For molecular dynamics workloads such as GROMACS, Wattlytics shows up to about a four-fold increase in lifetime work per TCO under a fixed €10M budget when selecting energy-efficient GPUs instead of top-end accelerators. These gains result from lower acquisition costs, reduced power draw, and the ability to deploy substantially more GPUs within fixed budgets and power envelopes, despite moderate per-GPU performance disadvantages.
Second, GPU rankings are highly sensitive to second-order system effects that are typically ignored. Even small multi-GPU parallel efficiency losses (~0.5%), modest changes in frequency, or deployment strategy changes can fully reverse procurement outcomes. Sensitivity and uncertainty analysis reveals a separation between baseline efficiency drivers and risk drivers: GPU acquisition cost dominates baseline work-per-TCO, while PUE, system lifetime and similar parameters dominate procurement risk by driving ranking reversals, despite modest impact on mean output.

##Take-Home Message and Outlook

Peak FLOPS do not buy the most science per euro. Integrated performance–power–cost modeling consistently outperforms single-metric decision making. Wattlytics turns HPC system design into a well-informed, explainable process, revealing that energy-efficient GPUs often maximize scientific output under realistic constraints.
Future work includes support for non-NVIDIA GPUs, heterogeneous clusters, CPU uncore frequency modeling, and tighter integration with schedulers and energy-aware runtime systems

Contributors:

Format

on-demandon-site

Speakers

Ayesha Afzal

RsearcherFriedrich-Alexander-University Erlangen-Nuremberg (FAU), Erlangen National High Performance Computing Center (NHR@FAU)

Session

Research Poster Reception

Wednesday, June 24, 2026 3:45 PM to 5:15 PM

Foyer D-G - 2nd Floor

Registered attendees

Christian Muth

Computational ScientistJohannes Gutenberg-Universität Mainz