Introduction to AI Safety, Ethics, and Society
The central bet Hendrycks makes in this book is that AI safety is not one problem but four, and that conflating them is why so much safety writing goes nowhere. The four: malicious use (people weaponizing AI), AI race dynamics (competitive pressure eroding safety standards), organizational accidents (complex sociotechnical failures), and rogue AI (systems pursuing goals humans didn't intend). Each cluster has different causes, different timelines, different solutions. Getting clear on which threat you're actually discussing is more than half the work, and this four-part taxonomy is the most useful thing the book gives you.
What separates this from most AI safety writing is Hendrycks' refusal to stay in one lane. He's a machine learning researcher who created MMLU and did foundational work on robustness and out-of-distribution detection, so when he writes about technical failure modes, he's not extrapolating from philosophy. The chapters on single-agent safety and safety engineering are the book's strongest. They apply real principles from aviation and nuclear risk management — Swiss cheese models, nines of reliability, tail event analysis — to the problem of deploying ML systems, which turns out to be a more productive frame than the alignment discourse typically reaches for. The collective action chapter is also genuinely good: it uses game theory to explain why AI safety is structurally similar to other coordination failures, which is a more honest framing than "smart people will figure it out."
Where it gets thinner is governance and machine ethics. These chapters are broader and more survey-like, spending time on ideas — moral uncertainty, social welfare functions, the economics of AI growth — that get introduced but not resolved. The book is designed as a university course textbook, and it shows in the later chapters. If you're a practitioner looking for actionable recommendations, you'll finish Chapter 8 with a solid map of the governance problem and no particular path through it. That's an honest limitation of the genre, not a flaw specific to this book.
The book is freely available at aisafetybook.com under an open-access license, which matters more than it might seem. Most of the serious AI safety discourse is siloed inside specific research communities; Hendrycks is making a genuine effort to lower the on-ramp. For a developer who thinks about AI professionally but has never read the alignment literature, this is the one book that actually earns its "introduction" label — covering technical foundations, safety engineering principles, game-theoretic complications, and governance challenges without assuming you already know which rabbit hole you're in. Whether the concerns about catastrophic risk ultimately prove warranted, the book at least makes those concerns legible in a way that almost nothing else in the field does.