Stanford HELM benchmarks, foundation model evaluator
Percy Liang
Profile
Percy Liang is the closest thing the AI field has to a standards body of one. A Stanford computer science professor and director of the Center for Research on Foundation Models (CRFM), he built HELM — Holistic Evaluation of Language Models — which became the default place to check whether a new model’s claimed capabilities hold up across dozens of scenarios and metrics. When a lab drops a model with a splashy benchmark win, HELM is where you go to see how it actually behaves on reasoning, knowledge, bias, toxicity, calibration, and robustness, all scored side by side with every major open and closed model.
Before HELM, Liang was already a heavyweight in NLP. He co-created SQuAD, the Stanford Question Answering Dataset that defined reading comprehension evaluation for years, and he’s advised a generation of students who now populate OpenAI, Anthropic, and Google DeepMind. In 2021 he co-authored On the Opportunities and Risks of Foundation Models — the paper that coined the term “foundation model” and gave the field a shared vocabulary for what GPT-style systems actually are.
He also co-founded Together AI, a company building open-source infrastructure for training and running foundation models — putting his money where his benchmarks are on the idea that closed labs shouldn’t be the only game in town. His lab continues to release open models, open evaluations, and open datasets at a pace that embarrasses most companies.
For developers learning AI, Liang matters because he’s the person keeping the field honest. Every time a CEO tweets “state of the art,” someone at CRFM is quietly running the numbers. If you want to understand what a model can actually do — not what a marketing page says — start with HELM and work backwards.
Key Articles & Papers
On the Opportunities and Risks of Foundation Models Holistic Evaluation of Language Models (HELM) SQuAD: 100,000+ Questions for Machine Comprehension of Text HELM Leaderboard The Stanford AI Index Report (contributor) Percy Liang's Stanford HomepageSpotify Podcasts