Machine Learning Engineering

Listen — short summary

0:00 / 2:46

Most ML practitioners know how to train a model. Burkov's argument, implicit in every chapter, is that this knowledge is the easy part — the engineering that makes models actually work in production is where most projects die.

*Machine Learning Engineering* covers the full lifecycle: problem framing, data collection and labeling, feature engineering, model selection and training, testing, deployment, monitoring. What distinguishes it from a standard ML textbook is the relentless focus on non-algorithmic work. Roughly half the book lives before you ever touch gradient descent. Burkov is explicit that getting the data right — cleaning, labeling, splitting, preventing leakage — takes longer and matters more than hyperparameter tuning. This is the material that most ML courses skip because it's unglamorous to teach and expensive to get wrong.

The strongest sections deal with model evaluation and the gap between offline metrics and business value. Burkov doesn't pretend that AUC tells you whether your model is worth deploying. He works through whether your model is actually better than the baseline (which might be a lookup table and three if-statements), and whether the maintenance cost is justified by the improvement. The deployment chapters — model serving, versioning, A/B testing, monitoring for distribution shift — are dense but practical. There's a recurring honesty about how much of "ML engineering" is closer to software engineering than data science, and the book is better for it.

The weaknesses are mostly of timing and emphasis. MLOps tooling has moved fast since 2020; some infrastructure recommendations feel dated against today's defaults. Large language models get thin coverage, unsurprisingly given the publication date. A few technically dense topics, like distributed training, get compressed into orientation rather than instruction — enough to know the terrain exists, not enough to navigate it. These aren't failures of the book so much as the cost of scope: Burkov chose breadth across the entire lifecycle over depth in any single area.

The "read first, buy later" distribution model shapes the writing. There is no padding. The prose runs tight, sometimes to the point of terseness. For working ML engineers, or software engineers handed an ML project for the first time, this is the most honest account of what the work actually involves. It won't teach you the math behind transformers, but it will stop you from making the mistakes that kill otherwise competent ML projects before they reach production.

Key takeaways

Most ML projects fail before training a single model — they fail at problem definition, where a vague business goal gets translated into the wrong optimization target.
Data collection and labeling cost more time and money than model development, yet teams budget for them last.
Feature engineering outperforms architecture search in most production systems; the bottleneck is rarely the model.
The gap between a notebook prototype and a production system is not an engineering detail — it is the majority of the work.
Model performance degrades silently after deployment; monitoring for distribution shift is not optional, it is part of the product.
Serving constraints — latency, cost, throughput — must be defined before the model architecture is chosen, not after.
Reproducibility is an engineering discipline: experiments that cannot be reliably reproduced cannot be reliably improved.