

Perceive, Predict, Perform: Architecting the Cognitive Stack for Generalist Robots
Thursday, June 25, 2026 1:32 PM to 1:59 PM · 27 min. (Europe/Berlin)
Hall Z - 3rd Floor
Invited Talk
Large Language Models and Generative AI in HPCML Systems and Frameworks
Information
Robotics is approaching the kind of shift that foundation models brought to language, driven by an emerging convergence of three capabilities: embodied foundation models that turn multimodal perception into action, predictive world models that learn an internal "common-sense" model of physical dynamics, and learning-based control. Brought together — still largely in research rather than in deployed systems — they point toward robots that can imagine the consequences of an action before committing to a motion.
This talk goes under the hood of that cognitive stack: how large multimodal models connect down to low-level control, and how such intelligence is made to run in real time on the robot itself, where strict latency, reliability, and energy budgets at the edge sit in tension with the compute used to train these models. The aim is a clear-eyed, hype-free view of where the field actually stands, and where open-world physical autonomy realistically leads.
This talk goes under the hood of that cognitive stack: how large multimodal models connect down to low-level control, and how such intelligence is made to run in real time on the robot itself, where strict latency, reliability, and energy budgets at the edge sit in tension with the compute used to train these models. The aim is a clear-eyed, hype-free view of where the field actually stands, and where open-world physical autonomy realistically leads.
Format
on-demandon-site
Beginner Level
50%
Intermediate Level
50%


