Deep Learning with Python, Third Edition

Listen — short summary

0:00 / 2:59

The person who built Keras writing about deep learning is a bit like asking the architect of a bridge to explain how bridges work — the explanation is unusually good, and unusually practical, but it tends to stop before the physics gets hard.

That's the honest summary of *Deep Learning with Python, Third Edition*. Chollet (now co-founder of Ndea, formerly at Google) and Watson (still at Google, working on Gemini) have done a complete rewrite for the LLM era. Where previous editions got you through convnets and RNNs, this one adds transformers, walks you through building a GPT-style language model, and covers diffusion models for image generation. The coverage is genuinely broad: 648 pages, 20 chapters, examples in Keras, PyTorch, JAX, and TensorFlow. If you want to understand how the tools that produce today's AI outputs actually work at the code level, this book will take you there.

What Chollet does better than almost anyone else writing about deep learning is the middle layer of explanation — the intuition between the math and the code. He doesn't just show you a training call and tell you it works; he explains what the training loop is doing, why dropout helps, what a convolution is actually computing. The ConvNet chapters remain some of the best practical explanations written anywhere. The new transformer chapter is solid: you'll end up with working code you understand, which is exactly the point. For a developer who's been watching AI rewrite every industry and wants to stop feeling like an outsider, this is the fastest honest path in.

The weakness is real, though, and critics who point to it are right. Chollet moves fast, and he moves code-first. Some concepts — attention mechanisms, the specifics of backpropagation through complex architectures, the subtleties of what diffusion is actually doing — get enough explanation to let you use them but not enough to let you reason about them when something goes wrong. If you want the theory, the Goodfellow et al. textbook is the standard complement. What this book doesn't pretend to replace: a solid grip on how gradient descent actually works at the calculus level, or a proper understanding of the attention paper. Those gaps aren't failures — the book's title says Python, not mathematics — but they're worth knowing before you start.

The third edition earns its existence. Generative AI has changed what practitioners need to know, and the book updates accordingly without losing what made the earlier editions work. It's the right introduction for a technical reader who wants to understand the models powering the current moment, not just call an API. The readers who'll get the most from it are working programmers with solid Python who are willing to read the code, not just copy it.

Key takeaways

Keras 3 runs on TensorFlow, PyTorch, or JAX without code changes — framework lock-in is an engineering choice now, not a constraint.
The transformer architecture, a 2017 invention implementable in a few hundred lines of Python, is the single mechanism behind every modern LLM.
Diffusion models generate images by learning to reverse a noise process, not by learning to draw — the inversion framing is what makes them work.
Deep learning's core operation is learning a continuous geometric transform from input space to output space; language, images, and time series are just different geometries.
Overfitting is the central adversary in machine learning: every technique from dropout to batch normalization is a tactic in that fight.
Foundation models trained with self-supervision can be fine-tuned for specialized tasks cheaply, replacing the need to build task-specific models from scratch.
The universal ML workflow — define the problem, develop a model, deploy and monitor — applies regardless of domain or architecture, and skipping steps is where most projects fail.