PrometheusRoot
Blog Links Prometheans 100+ Why are you here?
← Prometheans 100+
J
builder
ResearcherPolicy
X / Twitter
alignmentai-safetyopenaianthropic

Related

pioneer Ilya Sutskever pioneer Dario Amodei
← Prometheans 100+

Former OpenAI alignment lead, now at Anthropic

Jan Leike

Alignment Lead — Anthropic

Profile

Jan Leike is one of the most recognizable names in AI alignment, and the person whose 2024 resignation from OpenAI forced the entire industry to have an uncomfortable conversation about whether frontier labs are actually investing in safety. A German-born researcher trained under Marcus Hutter at the Australian National University, Leike spent years at DeepMind before joining OpenAI, where he eventually co-led the Superalignment team alongside Ilya Sutskever. That team was announced in July 2023 with a headline commitment: 20% of OpenAI’s compute dedicated to solving alignment of superintelligent systems within four years.

It didn’t last. In May 2024, Leike resigned publicly, posting a thread on X saying he had “been disagreeing with OpenAI leadership about the company’s core priorities for quite some time” and that “over the past years, safety culture and processes have taken a backseat to shiny products.” Within days OpenAI dissolved the Superalignment team entirely. The episode became a defining moment for AI safety discourse — a senior researcher at the world’s most hyped AI lab publicly saying the resources simply weren’t there. He joined Anthropic two weeks later, announcing his new mission: “scalable oversight, weak-to-strong generalization, and automated alignment research.”

Before the drama, Leike built the technical foundations that much of modern alignment rests on. His 2017 paper with Paul Christiano on deep reinforcement learning from human preferences is the direct ancestor of RLHF — the technique that made ChatGPT feel usable. His 2018 “Scalable Agent Alignment via Reward Modeling” sketched the iterated-amplification approach that Anthropic’s Constitutional AI later built on. He also led the work on summarizing books with human feedback and the 2023 weak-to-strong generalization paper, which asked whether weaker supervisors (including humans) can still align stronger-than-them models.

For developers, Leike matters because he represents a specific bet: that alignment is an engineering problem you can make progress on if you actually staff it. His blog at aligned.substack.com is one of the clearest windows into how a senior alignment researcher actually thinks about the problem — not doom prophecy, not dismissal, but work.

Key Articles & Papers

Deep Reinforcement Learning from Human Preferences 2017 — The foundational RLHF paper — training agents from pairwise human comparisons instead of hand-crafted rewards. The direct technical ancestor of ChatGPT. Scalable Agent Alignment via Reward Modeling: A Research Direction 2018 — Leike's blueprint for how to align AI systems more capable than their supervisors — iterative reward modeling as a path to scalable oversight. Recursively Summarizing Books with Human Feedback 2021 — Using recursive task decomposition and human feedback to summarize entire novels — an early demo of scalable oversight in practice. Introducing Superalignment 2023 — The announcement that defined the era: OpenAI pledged 20% of compute to solving superintelligence alignment in four years, with Leike and Sutskever co-leading. Weak-to-Strong Generalization 2023 — Can weaker models (or humans) successfully supervise stronger models? Leike's team ran the first empirical study of this question. Why I'm Leaving OpenAI 2024 — The resignation thread. Calm, specific, and damning — the moment AI safety concerns stopped being abstract for the industry. Aligned — Jan Leike's Substack 2024 — His personal blog on alignment research, scalable oversight, and how to actually make progress on the problem. Personal Research Site 2024 — Publication list and current projects — the canonical source for what Leike is actually working on.

Videos

Spotify Podcasts

24 - Superalignment with Jan Leike
24 - Superalignment with Jan Leike
#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less
#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less
#23 - How to actually become an AI alignment researcher, according to Dr Jan Leike
#23 - How to actually become an AI alignment researcher, according to Dr Jan Leike
EA - OpenAI's massive push to make superintelligence safe in 4 years or less (Jan Leike on the 80,000 Hours Podcast) by 80000 Hours
EA - OpenAI's massive push to make superintelligence safe in 4 years or less (Jan Leike on the 80,000 Hours Podcast) by 80000 Hours
EA - OpenAI's massive push to make superintelligence safe in 4 years or less (Jan Leike on the 80,000 Hours Podcast) by 80000 Hours
EA - OpenAI's massive push to make superintelligence safe in 4 years or less (Jan Leike on the 80,000 Hours Podcast) by 80000 Hours
Sam Altman WRECKS OpenAI - Jan Leike joins Anthropic - Brain Drain from OpenAI | Artificial Intelligence Masterclass
Sam Altman WRECKS OpenAI - Jan Leike joins Anthropic - Brain Drain from OpenAI | Artificial Intelligence Masterclass
LW - Ilya Sutskever and Jan Leike resign from OpenAI by Zach Stein-Perlman
LW - Ilya Sutskever and Jan Leike resign from OpenAI by Zach Stein-Perlman
Ilya Sutskever y Jan Leike salen de OpenAI
Ilya Sutskever y Jan Leike salen de OpenAI
AI, Robot
AI, Robot
KI-Update kompakt: Jan Leike, OpenAI, Jameda AI Assistant, Rabbit R1
KI-Update kompakt: Jan Leike, OpenAI, Jameda AI Assistant, Rabbit R1

Related People

pioneer Ilya Sutskever pioneer Dario Amodei
© 2026 PrometheusRoot