Probabilistic Machine Learning: Advanced Topics
Advanced Topics
Most ML books teach you how to use the tools. Murphy's *Probabilistic Machine Learning: Advanced Topics* argues there's only one tool worth knowing — probability — and that everything else is a special case. At 1,360 pages, it earns that argument.
The organizing claim is that probability theory provides a unified language for inference, prediction, generation, and decision-making. A Kalman filter and a variational autoencoder are solving the same problem at different scales. Diffusion models and score matching are the same idea from different angles. Reinforcement learning and causal inference are both, at their core, about reasoning under uncertainty toward a goal. Murphy doesn't just assert these connections — he works them out, using consistent notation across 36 chapters and six thematic parts. For readers who've encountered these fields in isolation, the synthesis is genuinely clarifying.
The book is strongest where the synthesis pays off most directly: the generative models section. VAEs, normalizing flows, diffusion models, energy-based models, and GANs get placed in the same conceptual frame, and the relationships between them become visible. Why does score matching recover the same generative process as the ELBO under certain conditions? Why do diffusion models outperform GANs in practice despite seemingly similar objectives? The book doesn't just describe the methods — it explains why the landscape looks the way it does. This is where Murphy's approach earns its price of admission.
The weaknesses are real, though. Different chapters were written by different contributors, and it shows. Some sections read like carefully crafted exposition; others read like annotated reading lists. The causality chapter covers the do-calculus and instrumental variables correctly but at a pace that assumes you've already read the primary literature — not ideal for a first encounter. Several chapters in Part V feel undercooked, particularly on graph learning and nonparametric Bayes, which get a dozen pages each on topics that deserve ten times that. The book is wide by necessity, but readers should know going in that breadth comes at the cost of depth in specific areas.
There's also the question of who this book is actually for. The required background is significant: linear algebra, real probability theory, at minimum a working knowledge of deep learning and classical ML. And at 1,360 pages, a cover-to-cover read is something only the most committed will attempt. Most people will use it the way they use a reference manual — locate the topic, absorb the framing, follow the citations.
For that use case, it's excellent. If you're a graduate student or researcher who already has a rough map of the territory and wants to understand how the pieces connect, this is the most comprehensive single-volume treatment available. The diffusion models chapter alone is worth having on your shelf if you're working in generative modeling. Geoff Hinton called that section a masterpiece, which I'm inclined to agree with. For anyone else — the practitioner who just wants to build things, the newcomer who wants a foundation — start with the companion *Introduction* volume, and come back to this one when you're ready for it.