Led AlphaGo, AlphaZero — AI that masters games
David Silver
Profile
David Silver is the reinforcement learning researcher who made the world pay attention. As lead researcher on AlphaGo at Google DeepMind, he built the system that beat Lee Sedol 4-1 in 2016 — a moment widely considered AI’s “Sputnik” for the public. Go had been the grand unsolved challenge of game AI for decades, with most researchers estimating superhuman play was still ten years away. Silver’s team did it with deep neural networks, Monte Carlo tree search, and a brutal amount of self-play.
Then he did it again, harder. AlphaGo Zero threw out human game data entirely and learned from scratch, surpassing the version that beat Lee Sedol in three days. AlphaZero generalized the method to chess and shogi, and MuZero removed the last piece of hand-coded knowledge — the rules themselves — learning a world model from pixels and rewards alone. If you want to understand the modern lineage of self-play, planning, and learned world models, this is the trajectory to study.
Silver did his PhD at the University of Alberta under Rich Sutton, the godfather of reinforcement learning, and he carries that torch. His 2021 paper “Reward Is Enough” (with Sutton and others) argues that reward maximization alone is sufficient to drive the emergence of intelligence — a philosophical bet on RL as the path to AGI that lands very differently in a world obsessed with next-token prediction. He’s a Principal Research Scientist at DeepMind and teaches reinforcement learning at UCL, where his 2015 lecture series remains the most-watched RL course on the internet.
For developers building with AI today, Silver matters because RL is back. RLHF powers every frontier chatbot, and the new wave of reasoning models (OpenAI o1, DeepMind’s Gemini thinking, Anthropic’s extended thinking) lean heavily on the same self-play and search ideas Silver pioneered in games. His work is the intellectual foundation for an increasingly large chunk of the frontier.
Key Articles & Papers
Mastering the game of Go with deep neural networks and tree search Mastering the game of Go without human knowledge A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play Mastering Atari, Go, chess and shogi by planning with a learned model Grandmaster level in StarCraft II using multi-agent reinforcement learning Reward is enough Deterministic Policy Gradient AlgorithmsVideos
Spotify Podcasts