Hans Meuer Award Ceremony and Presentation: PICO: Performance Insights for Collective Operations

Hans Meuer Award Ceremony and Presentation: PICO: Performance Insights for Collective Operations

Wednesday, June 24, 2026 11:15 AM to 11:45 AM · 30 min. (Europe/Berlin)
Hall 4 - Ground Floor
Research Paper
ML Model OptimizationNetworking and InterconnectsPerformance MeasurementPerformance Tools and SimulatorsSystem and Performance Monitoring

Information

Collective operations are cornerstones of both HPC applications and large-scale AI training and inference, yet benchmarking them in a systematic and reproducible way remains difficult on modern systems due to the complexity of their hardware and software stacks. Existing suites primarily report end-to-end timings and offer limited support for controlled algorithm and configuration selection, fine-grained profiling, and capturing the runtime environment. We present PICO (Performance Insights for Collective Operations), an open-source framework that decouples portable experiment setup from platform execution, provides a backend-adaptive parameter selection interface across MPI and NCCL, supplies plain-MPI reference collective implementations, optionally instrumentable, and records the system configuration for reproducible comparisons. Evaluated on three major supercomputers, PICO shows that default collective algorithms and transport settings can be up to 5x slower than the best available choice. It provides diagnostic evidence by isolating topology sensitive algorithmic choices and, through instrumentation, reveals detailed algorithmic breakdowns. To assess end-to-end effects of benchmark-informed tuning and evaluate application-level impacts, we replay open-source LLM training traces in ATLAHS simulator with optimized collective profiles identified by PICO, achieving reductions in training times of up to 44%.
Contributors:
Format
on-site

Log in

See all the content and easy-to-use features by logging in or registering!