Former OpenAI alignment lead, now at Anthropic
Jan Leike
Profile
Jan Leike is one of the most recognizable names in AI alignment, and the person whose 2024 resignation from OpenAI forced the entire industry to have an uncomfortable conversation about whether frontier labs are actually investing in safety. A German-born researcher trained under Marcus Hutter at the Australian National University, Leike spent years at DeepMind before joining OpenAI, where he eventually co-led the Superalignment team alongside Ilya Sutskever. That team was announced in July 2023 with a headline commitment: 20% of OpenAI’s compute dedicated to solving alignment of superintelligent systems within four years.
It didn’t last. In May 2024, Leike resigned publicly, posting a thread on X saying he had “been disagreeing with OpenAI leadership about the company’s core priorities for quite some time” and that “over the past years, safety culture and processes have taken a backseat to shiny products.” Within days OpenAI dissolved the Superalignment team entirely. The episode became a defining moment for AI safety discourse — a senior researcher at the world’s most hyped AI lab publicly saying the resources simply weren’t there. He joined Anthropic two weeks later, announcing his new mission: “scalable oversight, weak-to-strong generalization, and automated alignment research.”
Before the drama, Leike built the technical foundations that much of modern alignment rests on. His 2017 paper with Paul Christiano on deep reinforcement learning from human preferences is the direct ancestor of RLHF — the technique that made ChatGPT feel usable. His 2018 “Scalable Agent Alignment via Reward Modeling” sketched the iterated-amplification approach that Anthropic’s Constitutional AI later built on. He also led the work on summarizing books with human feedback and the 2023 weak-to-strong generalization paper, which asked whether weaker supervisors (including humans) can still align stronger-than-them models.
For developers, Leike matters because he represents a specific bet: that alignment is an engineering problem you can make progress on if you actually staff it. His blog at aligned.substack.com is one of the clearest windows into how a senior alignment researcher actually thinks about the problem — not doom prophecy, not dismissal, but work.
Key Articles & Papers
Deep Reinforcement Learning from Human Preferences Scalable Agent Alignment via Reward Modeling: A Research Direction Recursively Summarizing Books with Human Feedback Introducing Superalignment Weak-to-Strong Generalization Why I'm Leaving OpenAI Aligned — Jan Leike's Substack Personal Research SiteVideos
Spotify Podcasts