PrometheusRoot
Blog Links Prometheans 100+ Why are you here?
← Prometheans 100+
N
builder
ResearcherFounder
Website Wikipedia
transformercharacter-aigoogle

Related

builder Ashish Vaswani builder Jeff Dean
← Prometheans 100+

Transformer co-author, Google character.ai founder

Noam Shazeer

Researcher (returned 2024) — Google

Profile

Noam Shazeer is one of those researchers whose name shows up on an improbable number of the papers that actually matter. He’s a co-author of Attention Is All You Need — the 2017 paper that introduced the Transformer — but that’s just the most famous entry on a CV that also includes Mixture-of-Experts, Multi-Query Attention, GLU variants, Mesh-TensorFlow, T5, and the Switch Transformer. If you trace the architectural DNA of modern large language models back to its source, a surprising amount of it runs through him.

Shazeer spent roughly two decades at Google, starting in the early 2000s on search and ads before moving into Google Brain. In 2021 he left with his collaborator Daniel De Freitas — reportedly frustrated that Google wouldn’t ship a conversational agent they’d built internally (LaMDA-era work) — and founded Character.AI, a consumer product letting anyone chat with user-created AI personas. It became one of the stickiest consumer AI apps on the market, with tens of millions of users spending unusually long sessions talking to fictional characters. Not a product most ML researchers predicted would take off the way it did.

In August 2024, Google paid around $2.7 billion to license Character.AI’s technology and hire Shazeer, De Freitas, and a chunk of their team back. The deal was widely read as a thinly disguised acqui-hire structured to sidestep merger review, and the FTC has been paying attention to this pattern across the industry. Shazeer is now back at Google, reportedly as a technical co-lead on Gemini — arguably the single most valuable “return” hire in the current frontier-model race.

For developers, the reason to care about Shazeer is simple: he’s obsessive about making transformers faster and cheaper to run. Multi-Query Attention, the trick that dramatically reduces KV-cache memory during inference, is his. So is a lot of the work that made sparse mixture-of-experts actually usable at scale. When you’re running inference on a modern LLM and it feels fast, Shazeer’s fingerprints are on that.

Key Articles & Papers

Attention Is All You Need 2017 — The Transformer paper. Co-authored with [Ashish Vaswani](/people/ashish-vaswani/) and six others. Foundational. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer 2017 — First serious demonstration that MoE could scale neural nets far beyond dense limits. Direct ancestor of today's MoE models. Fast Transformer Decoding: One Write-Head is All You Need 2019 — Introduced Multi-Query Attention. The reason modern inference stacks can fit long-context KV caches in memory. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) 2019 — The text-to-text framing that shaped how we think about instruction-tuned models. GLU Variants Improve Transformer 2020 — Short, practical paper. SwiGLU — used in LLaMA, PaLM, and most modern LLMs — comes from here. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 2021 — Made sparse MoE routing simple enough to actually train at scale. With Barret Zoph and William Fedus. Mesh-TensorFlow: Deep Learning for Supercomputers 2018 — The model-parallelism framework that enabled training the giant models that followed. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding 2020 — Infrastructure paper that made trillion-parameter training practical on TPU pods.

Controversies

The Google / Character.AI deal (2024): The $2.7B licensing-plus-hire structure avoided a formal acquisition review and is part of a pattern regulators are actively examining — alongside Microsoft/Inflection and Amazon/Adept. Reporting at The Information and Bloomberg covered it critically.

Character.AI safety concerns: Character.AI has faced lawsuits and journalistic scrutiny over harms to minors, including a wrongful death suit filed in 2024 after a 14-year-old’s suicide. The company had shipped to a teen audience without the guardrails the frontier labs had started putting in place. Shazeer had stepped back to Google before the most serious incidents surfaced, but he was the founding CEO during the period when product decisions set the direction.

Spotify Podcasts

Jeff Dean & Noam Shazeer — 25 years at Google: from PageRank to AGI
Jeff Dean & Noam Shazeer — 25 years at Google: from PageRank to AGI
Bitter Lessons in Venture vs Growth: Anthropic vs OpenAI, Noam Shazeer, World Labs, Thinking Machines, Cursor, ASIC Economics — Martin Casado & Sarah Wang of a16z
Bitter Lessons in Venture vs Growth: Anthropic vs OpenAI, Noam Shazeer, World Labs, Thinking Machines, Cursor, ASIC Economics — Martin Casado & Sarah Wang of a16z
EP 31 Noam Shazeer - Google veteran and AI inventor on future of AI
EP 31 Noam Shazeer - Google veteran and AI inventor on future of AI
Ep 58: Google Researchers Noam Shazeer and Jack Rae on Scaling Test-time Compute, Reactions to Ilya & AGI
Ep 58: Google Researchers Noam Shazeer and Jack Rae on Scaling Test-time Compute, Reactions to Ilya & AGI
Your AI Friends Have Awoken, With Noam Shazeer
Your AI Friends Have Awoken, With Noam Shazeer
Is The College Promise Broken? - ft. Noam Scheiber
Is The College Promise Broken? - ft. Noam Scheiber
Noam Dworman on Stand-Up Comedy and Staying Open-Minded
Noam Dworman on Stand-Up Comedy and Staying Open-Minded
Trailer: Wondering Jews with Mijal and Noam
Trailer: Wondering Jews with Mijal and Noam
Conversations In Chassidus Episode 08 | Rabbi Noam Wagner | The Geshmak of Chassidus
Conversations In Chassidus Episode 08 | Rabbi Noam Wagner | The Geshmak of Chassidus
Shabbat and the Radical Idea of Rest (#1 Staff Pick)
Shabbat and the Radical Idea of Rest (#1 Staff Pick)

Related People

builder Ashish Vaswani builder Jeff Dean
© 2026 PrometheusRoot