Google DeepMind robotics lead, RT-2 co-creator
Karol Hausman
Profile
Karol Hausman spent a decade pushing the hardest problem in AI: making robots actually do things. As a Staff Research Scientist at Google DeepMind and Adjunct Professor at Stanford, he was at the center of the team that proved foundation models could control physical robots. RT-2 and SayCan — the two systems that most developers point to when explaining “LLMs meeting the real world” — both have his fingerprints on them.
In 2024 he left Google to co-found Physical Intelligence (Pi), a San Francisco startup building foundation models for robots that are hardware-agnostic and task-general. The founding team reads like a robotics dream roster: Hausman as CEO, alongside Sergey Levine, Chelsea Finn, Brian Ichter, and others from the same Google Brain/Stanford orbit. By late 2025 Pi had raised over $1B at a $5.6B valuation — one of the most closely watched bets in AI right now.
The thesis is worth understanding. Pi’s pitch is that the bottleneck in robotics is intelligence, not hardware — and that the classical pipeline of perception → planning → control is the wrong frame. Instead: one big end-to-end model trained on heterogeneous data from many robot platforms, speaking the language of visual input and action output. Their π₀ model (October 2024) was the first serious demonstration; π₀.₅ followed with stronger open-world generalization — cleaning an unfamiliar kitchen, not a demo one.
For developers trying to understand where AI meets the physical world, Hausman is one of the clearest voices on the shift from “clever engineering” to “scale the data, scale the model.” He explains the work without hype and is honest about what still doesn’t work. If you want to know why robot learning suddenly looks tractable after decades of false starts, start with him.
Key Articles & Papers
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan) π₀: A Vision-Language-Action Flow Model for General Robot Control π₀.₅: a Vision-Language-Action Model with Open-World Generalization RT-2: New model translates vision and language into action Why Robots Still Struggle With Simple Tasks Karol Hausman's personal siteSpotify Podcasts