The Hundred-Page Machine Learning Book

Listen — short summary

0:00 / 3:08

The premise is audacious: everything a working engineer needs to know about machine learning, in a hundred pages. Burkov wrote this in 2019 as a solo project and released it under a "read first, buy later" model, which tells you something about his confidence in the argument. That model — and the book's translation into eleven languages and adoption across thousands of universities — suggests the bet paid off.

The book covers supervised learning, unsupervised learning, neural networks, feature engineering, model evaluation, and a handful of topics that rarely appear in introductory texts: metric learning, learning to rank, recommendation systems. The editorial choice that makes all this possible is a particular stance on math — present but not dominant. Burkov explains algorithms at the level where understanding clicks, not at the level where the derivation is technically complete. The intuition for why gradient descent works is more useful to most readers than a full proof, and that's what you get. This isn't sloppiness. It reflects genuine practitioner judgment about where understanding actually lives versus where textbook authors signal their credentials.

Where the book earns maximum credibility is the chapter on machine learning in practice. Most introductory texts treat the gap between a working model and something deployed in production as someone else's problem. Burkov doesn't. Feature engineering, regularization, class imbalance, evaluation metrics that don't lie about your model's real behavior — this is the section that separates people who understand machine learning from people who've just read equations about it. A reader who absorbs only this chapter and nothing else will have cleared the most common failure mode in junior ML work.

The limitation is also structural and worth naming. Deep learning gets a chapter, not a course. The treatment of neural networks is honest as a conceptual map but thin on what's actually driven results since 2017 — attention mechanisms, transformers, the shift from narrow task models to foundation models. You won't finish this book ready to work with modern LLMs. That's not what it promises, and it's not really a fair criticism, but it is worth knowing before you pick it up. The field Burkov describes is the field as it existed in 2019, and some of what reads as "cutting edge" now needs updating.

The clearest way to place this book: it's the first read, not the last. A developer new to machine learning should go cover to cover in a weekend and come out with a coherent mental model of what the field is doing and why. Someone already working in the field will find value mainly in Burkov's opinionated emphasis on what matters practically — a useful corrective to the tendency in ML education to treat theoretical completeness as the goal. Neither group should stop here. Both groups should start here.

Key takeaways

Machine learning shifts programming from writing explicit rules to letting algorithms infer rules from data — the same pattern that makes spam filters work also powers facial recognition.
The bias-variance tradeoff is the central tension in every ML model: a model that fits training data perfectly will almost always fail to generalize.
Feature engineering — choosing and transforming input variables — matters more to practical results than picking the right algorithm.
Regularization is not optional polish; it is the primary tool for keeping models from memorizing noise and failing on new data.
Ensemble methods consistently outperform any single model because averaging diverse predictions cancels out individual errors.
Unsupervised learning discovers structure in unlabeled data, enabling customer segmentation and topic modeling without a human ever labeling a training example.
Transfer learning lets you borrow representations a model learned on one task and apply them to a new problem, collapsing training cost from weeks to hours.