Probabilistic Machine Learning: Advanced Topics

Advanced Topics

Listen — short summary

0:00 / 3:34

Most ML books teach you how to use the tools. Murphy's *Probabilistic Machine Learning: Advanced Topics* argues there's only one tool worth knowing — probability — and that everything else is a special case. At 1,360 pages, it earns that argument.

The organizing claim is that probability theory provides a unified language for inference, prediction, generation, and decision-making. A Kalman filter and a variational autoencoder are solving the same problem at different scales. Diffusion models and score matching are the same idea from different angles. Reinforcement learning and causal inference are both, at their core, about reasoning under uncertainty toward a goal. Murphy doesn't just assert these connections — he works them out, using consistent notation across 36 chapters and six thematic parts. For readers who've encountered these fields in isolation, the synthesis is genuinely clarifying.

The book is strongest where the synthesis pays off most directly: the generative models section. VAEs, normalizing flows, diffusion models, energy-based models, and GANs get placed in the same conceptual frame, and the relationships between them become visible. Why does score matching recover the same generative process as the ELBO under certain conditions? Why do diffusion models outperform GANs in practice despite seemingly similar objectives? The book doesn't just describe the methods — it explains why the landscape looks the way it does. This is where Murphy's approach earns its price of admission.

The weaknesses are real, though. Different chapters were written by different contributors, and it shows. Some sections read like carefully crafted exposition; others read like annotated reading lists. The causality chapter covers the do-calculus and instrumental variables correctly but at a pace that assumes you've already read the primary literature — not ideal for a first encounter. Several chapters in Part V feel undercooked, particularly on graph learning and nonparametric Bayes, which get a dozen pages each on topics that deserve ten times that. The book is wide by necessity, but readers should know going in that breadth comes at the cost of depth in specific areas.

There's also the question of who this book is actually for. The required background is significant: linear algebra, real probability theory, at minimum a working knowledge of deep learning and classical ML. And at 1,360 pages, a cover-to-cover read is something only the most committed will attempt. Most people will use it the way they use a reference manual — locate the topic, absorb the framing, follow the citations.

For that use case, it's excellent. If you're a graduate student or researcher who already has a rough map of the territory and wants to understand how the pieces connect, this is the most comprehensive single-volume treatment available. The diffusion models chapter alone is worth having on your shelf if you're working in generative modeling. Geoff Hinton called that section a masterpiece, which I'm inclined to agree with. For anyone else — the practitioner who just wants to build things, the newcomer who wants a foundation — start with the companion *Introduction* volume, and come back to this one when you're ready for it.

Key takeaways

Deep learning is not a separate paradigm from probabilistic modeling — it is one implementation of it, and that unification explains phenomena that black-box accounts of neural networks cannot.
Variational inference and MCMC are not competing camps but two roads to the same posterior; grasping that connection changes how you choose inference algorithms in practice.
Diffusion models make sense only once you understand score matching and stochastic differential equations — placed in that context, you can reason about their strengths and failure modes rather than treat them as empirical recipes.
A model that outputs a point estimate is hiding its uncertainty; Bayesian neural networks and conformal prediction give calibrated distributions over outcomes instead, and that difference matters whenever decisions carry real stakes.
Causality requires a different formalism than prediction — do-calculus and instrumental variables let you reason about interventions, not just correlations, and most real decisions require that distinction.
Reinforcement learning reframed as probabilistic inference (the 'control as inference' paradigm) turns policy search into a posterior computation, directly connecting RL to variational methods in a non-trivial way.
Distribution shift — when the test distribution differs from training — is the dominant failure mode for deployed ML systems, and handling it requires explicit probabilistic tools, not just more data.