PrometheusRoot
Blog Links Prometheans 100+ AI Books AI Companies Why are you here?
← Prometheans 100+
×
Paul Christiano
rising
ResearcherPolicy
X / Twitter Website GitHub Wikipedia
rlhfalignmentarcopenai

Recognition

TIME 100 AI 2023

Related

builder Jan Leike pioneer Ilya Sutskever
← Prometheans 100+ Paul Christiano
TIME 100 AI 2023

Head of AI Safety at US AI Safety Institute, NIST

Paul Christiano

Head of AI Safety — US AI Safety Institute (NIST) Founder — Alignment Research Center Researcher (former) — OpenAI
Listen — profile
0:00 / 2:15

Profile

Paul Christiano is the researcher whose fingerprints are on nearly every chatbot you’ve ever talked to. In 2017, while at OpenAI, he co-authored Deep Reinforcement Learning from Human Preferences with Jan Leike, Dario Amodei, and others — the paper that introduced RLHF. That technique, refined over the next few years into InstructGPT and then ChatGPT, is the reason a raw next-token predictor became something you’d actually want to talk to. If you’ve ever wondered what turned GPT-3 from a weird autocomplete toy into a useful assistant, this is it.

He left OpenAI in 2021 to found the Alignment Research Center (ARC), a small nonprofit focused on the theoretical side of making AI systems honest and controllable. ARC’s best-known output is the Eliciting Latent Knowledge (ELK) report with Mark Xu — a research program that asks a deceptively simple question: how do you train a model to tell you what it actually “knows,” especially when the truth is inconvenient? ARC also did the early dangerous-capability evaluations on GPT-4 before release.

In April 2024 Christiano was appointed Head of AI Safety at the U.S. AI Safety Institute within NIST, where he now runs evaluations of frontier models for national-security-relevant capabilities. The appointment caused an internal staff revolt at NIST over his effective altruism ties, but he took the job anyway. As of 2026 he leads AI safety at the renamed Center for AI Standards and Innovation.

For developers learning AI, Christiano is worth paying attention to for two reasons. First, RLHF — the thing you’re interacting with every day — came out of his head. Second, he’s one of the few safety researchers who is technically deep, publicly honest about uncertainty (he puts his own odds of an AI catastrophe somewhere around 20%), and not doom-shouting. When he says something is a problem, it’s worth taking seriously.

Key Articles & Papers

Deep Reinforcement Learning from Human Preferences 2017 — The foundational RLHF paper. Every assistant-tuned LLM since traces back to this. Eliciting Latent Knowledge (ELK) 2021 — ARC's flagship research problem: how do you get a model to honestly report what it internally believes? My views on 'doom' 2023 — Christiano's own numbers on AI risk. Sober, calibrated, and worth reading even if you disagree. Current work in AI alignment 2023 — A working map of the alignment research landscape from someone who built much of it. Thoughts on the impact of RLHF research 2023 — The inventor of RLHF reflects on whether it was a net good for alignment. Honest and uncomfortable. ARC evaluations of GPT-4 2023 — The early dangerous-capability evals on GPT-4 that set the template for frontier model testing.

Videos

YouTube video

YouTube

YouTube video
2023
YouTube video
2023
YouTube video
2023
YouTube video
2021
YouTube video
2019
YouTube video
2014

Spotify Podcasts

LW - Paul Christiano named as US AI Safety Institute Head of AI Safety by Joel Burget
LW - Paul Christiano named as US AI Safety Institute Head of AI Safety by Joel Burget
The Nonlinear Library
2024
Paul Christiano — Preventing an AI takeover
Paul Christiano — Preventing an AI takeover
Dwarkesh Podcast
2023
Paul Christiano's views on "doom" (ft. Robert Miles)
Paul Christiano's views on "doom" (ft. Robert Miles)
The Inside View
2023
168 - How to Solve AI Alignment with Paul Christiano
168 - How to Solve AI Alignment with Paul Christiano
Bankless
2023
LessWrong: "What failure looks like" by Paul Christiano
LessWrong: "What failure looks like" by Paul Christiano
"Artificial Intelligence" by TYPE III AUDIO
2022
"Where I agree and disagree with Eliezer" by Paul Christiano
"Where I agree and disagree with Eliezer" by Paul Christiano
LessWrong (Curated & Popular)
2022
Week 3 Core Readings: What failure looks like. By Paul Christiano
Week 3 Core Readings: What failure looks like. By Paul Christiano
AI Governance Fundamentals
2022
#44 Classic episode - Paul Christiano on finding real solutions to the AI alignment problem
#44 Classic episode - Paul Christiano on finding real solutions to the AI alignment problem
80,000 Hours Podcast
2020
#62 – Paul Christiano on messaging the future, increasing compute, & how CO2 impacts your brain
#62 – Paul Christiano on messaging the future, increasing compute, & how CO2 impacts your brain
80,000 Hours Podcast
2019
#44 - Paul Christiano on how we'll hand the future off to AI, & solving the alignment problem
#44 - Paul Christiano on how we'll hand the future off to AI, & solving the alignment problem
80,000 Hours Podcast
2018

Related People

builder Jan Leike pioneer Ilya Sutskever
© 2026 PrometheusRoot