Founder, Principal AI & LLM Research Engineer — RAIR LabLLM Research Engineer — Lightning AIAssistant Professor (Statistics) — University of Wisconsin-Madison
Listen — profile
0:00 / 1:48
Profile
Sebastian Raschka is the rare ML educator who actually makes you build the thing. His book Build a Large Language Model (From Scratch) walks you through coding a GPT-style transformer in PyTorch — tokenizer, attention heads, pretraining loop, fine-tuning — without pulling in Hugging Face or any other LLM library. The accompanying GitHub repo is one of the clearest teaching artifacts on the internet for anyone who wants to stop treating LLMs as magic boxes.
He’s a PhD statistician turned Staff Research Engineer at Lightning AI, and previously a professor at the University of Wisconsin–Madison. That dual background shows in his work: he explains the math without hiding behind it, and writes code that runs without hand-waving past the hard parts. Before the LLM book he co-authored the bestselling Machine Learning with PyTorch and Scikit-Learn, which for years has been the default “teach yourself ML properly” recommendation.
His newsletter Ahead of AI, read by 150k+ subscribers, is where he breaks down LLM research — LoRA, DPO, positional embeddings, KV caching, reasoning models — with annotated diagrams and minimum-viable code. No hype cycles, no AGI takes. Just “here’s the paper, here’s what’s actually new, here’s what it looks like in 40 lines of PyTorch.”
For a father-son duo learning AI together, Raschka is the bridge. If university gives you the theory and decades of engineering give you the instincts, Raschka’s from-scratch approach is where those meet: code-first, math-honest, no shortcuts.
Practical guide to constructing a GPT-style language model from scratch using Python and PyTorch, covering transformer architectures, attention mechanisms, dataset preparation, pretraining, fine-tuning, and instruction-following capabilities.