Transformer co-author, Google character.ai founder
Noam Shazeer
Profile
Noam Shazeer is one of those researchers whose name shows up on an improbable number of the papers that actually matter. He’s a co-author of Attention Is All You Need — the 2017 paper that introduced the Transformer — but that’s just the most famous entry on a CV that also includes Mixture-of-Experts, Multi-Query Attention, GLU variants, Mesh-TensorFlow, T5, and the Switch Transformer. If you trace the architectural DNA of modern large language models back to its source, a surprising amount of it runs through him.
Shazeer spent roughly two decades at Google, starting in the early 2000s on search and ads before moving into Google Brain. In 2021 he left with his collaborator Daniel De Freitas — reportedly frustrated that Google wouldn’t ship a conversational agent they’d built internally (LaMDA-era work) — and founded Character.AI, a consumer product letting anyone chat with user-created AI personas. It became one of the stickiest consumer AI apps on the market, with tens of millions of users spending unusually long sessions talking to fictional characters. Not a product most ML researchers predicted would take off the way it did.
In August 2024, Google paid around $2.7 billion to license Character.AI’s technology and hire Shazeer, De Freitas, and a chunk of their team back. The deal was widely read as a thinly disguised acqui-hire structured to sidestep merger review, and the FTC has been paying attention to this pattern across the industry. Shazeer is now back at Google, reportedly as a technical co-lead on Gemini — arguably the single most valuable “return” hire in the current frontier-model race.
For developers, the reason to care about Shazeer is simple: he’s obsessive about making transformers faster and cheaper to run. Multi-Query Attention, the trick that dramatically reduces KV-cache memory during inference, is his. So is a lot of the work that made sparse mixture-of-experts actually usable at scale. When you’re running inference on a modern LLM and it feels fast, Shazeer’s fingerprints are on that.
Key Articles & Papers
Attention Is All You Need Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Fast Transformer Decoding: One Write-Head is All You Need Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) GLU Variants Improve Transformer Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Mesh-TensorFlow: Deep Learning for Supercomputers GShard: Scaling Giant Models with Conditional Computation and Automatic ShardingControversies
The Google / Character.AI deal (2024): The $2.7B licensing-plus-hire structure avoided a formal acquisition review and is part of a pattern regulators are actively examining — alongside Microsoft/Inflection and Amazon/Adept. Reporting at The Information and Bloomberg covered it critically.
Character.AI safety concerns: Character.AI has faced lawsuits and journalistic scrutiny over harms to minors, including a wrongful death suit filed in 2024 after a 14-year-old’s suicide. The company had shipped to a teen audience without the guardrails the frontier labs had started putting in place. Shazeer had stepped back to Google before the most serious incidents surfaced, but he was the founding CEO during the period when product decisions set the direction.
Spotify Podcasts