MacArthur Fellow, AI common sense researcher
Yejin Choi
Profile
Yejin Choi is the researcher who keeps asking the question the rest of the field would rather skip: do these models actually understand anything? She’s spent the better part of two decades trying to teach machines common sense — the unglamorous, unwritten knowledge that lets a human know ice cream melts, knives cut, and you can’t push a rope. In 2022 the MacArthur Foundation gave her a “genius” grant for the work. In 2023 TIME named her one of the 100 most influential people in AI.
For most of her career she was the Wissner-Slivka Professor at the Paul G. Allen School at the University of Washington and a senior research manager at AI2, where she led the Mosaic project on commonsense reasoning. In January 2025 she joined Stanford HAI as the Dieter Schwarz Foundation HAI Professor while also serving as Senior Director of Language and Cognition Research at NVIDIA. She’s now openly skeptical of pure scaling and is pushing the field toward smaller, more grounded models trained on human norms rather than scraped web text.
Her technical legacy is substantial. COMET and ATOMIC turned commonsense reasoning into a benchmark and a knowledge graph the field could actually work on. Delphi asked whether a neural network could make moral judgments — and produced a controversy and a pile of follow-up research when it sometimes got things absurdly wrong. And the nucleus-sampling paper she co-authored (“The Curious Case of Neural Text Degeneration”) gave us top-p sampling, which is now in basically every text generation pipeline. If you’ve tuned top_p in an API call, you’ve used her work.
For developers learning AI, Choi is the corrective voice worth keeping in your head. Models that ace the bar exam still fail at “if I put a candle in a microwave, what happens?” Her benchmarks — HellaSwag, WinoGrande, the various follow-ups — exist specifically to find these gaps. She’s not a doomer and not a hype merchant. She’s the one reminding everyone that pattern-matching at scale is not the same as understanding, and that the gap matters when you’re building anything that has to deal with the real world.
Key Articles & Papers
The Curious Case of Neural Text Degeneration COMET: Commonsense Transformers for Automatic Knowledge Graph Construction ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning HellaSwag: Can a Machine Really Finish Your Sentence? WinoGrande: An Adversarial Winograd Schema Challenge at Scale Can Machines Learn Morality? The Delphi Experiment The Curious Case of Commonsense Intelligence (Comet-)Atomic 2020: On Symbolic and Neural Commonsense Knowledge GraphsVideos
Controversies
Delphi (2021) drew sharp criticism when users posted screenshots of the system producing obviously bad moral judgments — including racially insensitive outputs and absurd context-free verdicts. Critics including Margaret Mitchell argued that framing a neural net as a moral oracle was itself the problem, regardless of accuracy. Choi and her team responded that Delphi was always an experiment intended to expose the gap between AI and human ethics, not to deploy moral judgment, and added clearer disclaimers and a paper documenting the limitations. The episode became a useful case study in how research demos get read in public.
Spotify Podcasts