Machine Learning with PyTorch and Scikit-Learn

Develop Machine Learning and Deep Learning Models with Python

Listen — short summary

0:00 / 3:07

If you want to understand machine learning rather than just use it, this book is the most complete single-volume answer currently available in Python. The argument Raschka and co-authors make — implicitly but consistently — is that the foundations matter: you cannot troubleshoot a failing neural network if you never built one from gradients up.

The first half earns its keep by refusing to be a scikit-learn tutorial. Yes, it covers logistic regression, SVMs, decision trees, and ensemble methods through the library's API — but only after showing you the math that makes them work. Chapter 2 implements the perceptron from scratch. Chapter 11 builds a multilayer network in pure NumPy before a single `torch.nn` module appears. This pedagogical insistence on understanding before abstracting is where the book separates itself from the average Kaggle cookbook. The data preprocessing chapter is particularly good — it doesn't assume you've already cleaned your data into submission, and it explains why certain transformations matter rather than just showing the function calls.

The second half moves into PyTorch, and here the book shows both its ambition and its limits. The PyTorch chapters are genuinely excellent: dynamic computation graphs, autograd mechanics, the `torch.nn` module hierarchy — explained clearly, with working examples. The chapter on transformers does a credible job building from attention mechanisms up through BERT fine-tuning, which is harder to do well than most books attempt. The GAN chapter is thorough enough to be useful rather than gestural. Where the book struggles is in the final chapters on graph neural networks and reinforcement learning, which feel rushed compared to the earlier material. You get enough to understand what these areas are and how they connect to the rest, but not enough to do serious work in either domain. That's a reasonable tradeoff for a survey at this depth — just know going in that you'll need dedicated resources if GNNs or RL are your destination.

One honest observation: published in early 2022, the book predates the large language model explosion. The transformer chapter covers BERT and GPT-2 fine-tuning, which is solid ground, but readers coming to this book today will notice the gap between what's covered and where the field has moved. That's not a criticism of the authors — the fundamentals they teach are exactly what you need to understand the current landscape — but it means the book functions best as a foundation rather than a guide to the frontier.

For a Python developer who wants genuine ML comprehension rather than recipe lookup, this is the book to read. It's long, it requires real math engagement, and it doesn't reward skimming — but those are features, not bugs.

Key takeaways

PyTorch's dynamic computation graph lets you debug neural networks with print statements and a Python debugger, which is why researchers chose it over static-graph frameworks.
Data preprocessing — handling missing values, scaling features, encoding categoricals — contributes more to real-world model performance than algorithm selection does.
Gradient boosting (and XGBoost specifically) outperforms deep learning on tabular datasets; the tradeoff only shifts when your data is images, text, or graphs.
Scikit-learn's unified fit/transform/predict API is the right abstraction: it forces a clean separation between data-dependent preprocessing and stateless transformation.
Transformers replaced RNNs not because attention is inherently better at modeling sequences, but because self-attention is parallelizable, which enabled training on orders-of-magnitude more data.
Building a multilayer neural network from scratch before touching PyTorch is the only reliable way to understand what backpropagation actually computes — the framework hides the math until you've done it by hand.
GANs work because generator and discriminator improve only in response to each other; mode collapse happens when the generator finds a local equilibrium the discriminator can't escape.