Perceive, Predict, Perform: Architecting the Cognitive Stack for Generalist Robots

Thursday, June 25, 2026 1:32 PM to 1:59 PM · 27 min. (Europe/Berlin)

Hall Z - 3rd Floor

Invited Talk

Large Language Models and Generative AI in HPCML Systems and Frameworks

Information

Robotics is approaching the kind of shift that foundation models brought to language, driven by an emerging convergence of three capabilities: embodied foundation models that turn multimodal perception into action, predictive world models that learn an internal "common-sense" model of physical dynamics, and learning-based control. Brought together — still largely in research rather than in deployed systems — they point toward robots that can imagine the consequences of an action before committing to a motion.

This talk goes under the hood of that cognitive stack: how large multimodal models connect down to low-level control, and how such intelligence is made to run in real time on the robot itself, where strict latency, reliability, and energy budgets at the edge sit in tension with the compute used to train these models. The aim is a clear-eyed, hype-free view of where the field actually stands, and where open-world physical autonomy realistically leads.

Format

on-demandon-site

Beginner Level

50%

Intermediate Level

50%