PrometheusRoot
Blog Links Prometheans 100+ Why are you here?
← Prometheans 100+
×
Yejin Choi
pioneer
Researcher
Website
common-sensereasoningmacarthur

Related

builder Francois Chollet builder Gary Marcus
← Prometheans 100+ Yejin Choi

MacArthur Fellow, AI common sense researcher

Yejin Choi

Professor — University of Washington

Profile

Yejin Choi is the researcher who keeps asking the question the rest of the field would rather skip: do these models actually understand anything? She’s spent the better part of two decades trying to teach machines common sense — the unglamorous, unwritten knowledge that lets a human know ice cream melts, knives cut, and you can’t push a rope. In 2022 the MacArthur Foundation gave her a “genius” grant for the work. In 2023 TIME named her one of the 100 most influential people in AI.

For most of her career she was the Wissner-Slivka Professor at the Paul G. Allen School at the University of Washington and a senior research manager at AI2, where she led the Mosaic project on commonsense reasoning. In January 2025 she joined Stanford HAI as the Dieter Schwarz Foundation HAI Professor while also serving as Senior Director of Language and Cognition Research at NVIDIA. She’s now openly skeptical of pure scaling and is pushing the field toward smaller, more grounded models trained on human norms rather than scraped web text.

Her technical legacy is substantial. COMET and ATOMIC turned commonsense reasoning into a benchmark and a knowledge graph the field could actually work on. Delphi asked whether a neural network could make moral judgments — and produced a controversy and a pile of follow-up research when it sometimes got things absurdly wrong. And the nucleus-sampling paper she co-authored (“The Curious Case of Neural Text Degeneration”) gave us top-p sampling, which is now in basically every text generation pipeline. If you’ve tuned top_p in an API call, you’ve used her work.

For developers learning AI, Choi is the corrective voice worth keeping in your head. Models that ace the bar exam still fail at “if I put a candle in a microwave, what happens?” Her benchmarks — HellaSwag, WinoGrande, the various follow-ups — exist specifically to find these gaps. She’s not a doomer and not a hype merchant. She’s the one reminding everyone that pattern-matching at scale is not the same as understanding, and that the gap matters when you’re building anything that has to deal with the real world.

Key Articles & Papers

The Curious Case of Neural Text Degeneration 2019 — Introduced nucleus (top-p) sampling — the decoding strategy now baked into virtually every LLM API. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction 2019 — Showed transformers could generate novel commonsense knowledge, not just retrieve it. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning 2019 — The 877k if-then knowledge graph that anchored a decade of commonsense research. HellaSwag: Can a Machine Really Finish Your Sentence? 2019 — The benchmark that exposed how easily models fake comprehension — still a standard eval today. WinoGrande: An Adversarial Winograd Schema Challenge at Scale 2019 — Large-scale commonsense benchmark designed to resist annotator artifacts and pattern shortcuts. Can Machines Learn Morality? The Delphi Experiment 2021 — Provocative attempt to model descriptive ethics — sparked a debate the field is still having. The Curious Case of Commonsense Intelligence 2022 — Daedalus essay laying out why commonsense is the bottleneck for AI, written for a general audience. (Comet-)Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs 2021 — The expanded version that added social and physical commonsense at scale.

Videos

YouTube video

Controversies

Delphi (2021) drew sharp criticism when users posted screenshots of the system producing obviously bad moral judgments — including racially insensitive outputs and absurd context-free verdicts. Critics including Margaret Mitchell argued that framing a neural net as a moral oracle was itself the problem, regardless of accuracy. Choi and her team responded that Delphi was always an experiment intended to expose the gap between AI and human ethics, not to deploy moral judgment, and added clearer disclaimers and a paper documenting the limitations. The episode became a useful case study in how research demos get read in public.

Spotify Podcasts

Episode 5: Yejin Choi
Episode 5: Yejin Choi
The Evolution of Reasoning in Small Language Models with Yejin Choi - #761
The Evolution of Reasoning in Small Language Models with Yejin Choi - #761
[NEW SHOW] Get to know ZIN CHOI | 💗*thrilled* Come launch my podcast with me! | Vlog
[NEW SHOW] Get to know ZIN CHOI | 💗*thrilled* Come launch my podcast with me! | Vlog
yeonjun and beomgyu : gotta go
yeonjun and beomgyu : gotta go
Everyone’s Crush at First Sight
Everyone’s Crush at First Sight
[리무진서비스] EP.195 코르티스 마틴 | CORTIS MARTIN | Lullaby, Thinking Out Loud, 그땐 미처 알지 못했지, 난춘
[리무진서비스] EP.195 코르티스 마틴 | CORTIS MARTIN | Lullaby, Thinking Out Loud, 그땐 미처 알지 못했지, 난춘
Cuts Both Ways
Cuts Both Ways
Will AI Ever Have Common Sense?
Will AI Ever Have Common Sense?
The double-edged nature of parenting, mental health and artificial intelligence
The double-edged nature of parenting, mental health and artificial intelligence
Spring Frenzy and Calendar Tetris | Reply Yeochin Ep. 138
Spring Frenzy and Calendar Tetris | Reply Yeochin Ep. 138

Related People

builder Francois Chollet builder Gary Marcus
© 2026 PrometheusRoot