by Gary Marcus, and Ernest Davis

Published: 2019
Publisher: Vintage
Pages: 288
ISBN-13: 9780525566045

Cited on

Gary Marcus

Rebooting AI: Building Artificial Intelligence We Can Trust

Listen — short summary

0:00 / 3:29

The current generation of AI systems is not intelligent. That's the core provocation of *Rebooting AI*, and Gary Marcus and Ernest Davis spend 288 pages building the case that what we call artificial intelligence is better understood as a sophisticated pattern-matcher — extraordinarily capable within narrow bounds, completely lost outside them.

The argument holds up in the middle of the book better than it does at either end. Marcus and Davis are at their sharpest when they dissect specific failures: IBM Watson's oncology collaborations that were eventually quietly shelved, self-driving car programs that promised full autonomy "within years" and then kept backing off that timeline, language systems that would confidently give you a list of every Macy's in your city when asked to find a department store that *isn't* Macy's. The "gullibility gap" — our tendency to project understanding onto systems that are doing something much dumber than understanding — is a genuinely useful concept. The observation that AI performance in closed, well-defined domains like chess or Jeopardy gets mistaken for progress toward general intelligence is worth drilling into anyone who's about to build a product on the assumption that current models understand what they're saying.

If we could give computers one gift that they don't already have, it would be the gift of understanding language.
— Marcus & Davis, *Rebooting AI*, ch. 4

Where the book strains is in its prescription. Marcus and Davis want AI to incorporate cognitive models, symbolic reasoning, and an architecture closer to how the human brain actually works. They draw on cognitive science, developmental psychology, Bayesian reasoning. All well and good. But the prescription, when you try to pin it down, amounts to: build systems that can represent time, space, causality, and common sense, combine top-down and bottom-up learning, and do it at human-level flexibility. That's not a roadmap. That's a description of the destination. The authors acknowledge the difficulty, but the acknowledgment doesn't make the vagueness less frustrating. You finish the second half wondering if it was the right length.

Our biggest fear is not that machines will seek to obliterate us or turn us into paper clips; it's that our aspirations for AI will exceed our grasp.
— Marcus & Davis, *Rebooting AI*, ch. 8, pp. 198–199

The bigger problem, reading this now, is that events have partially outrun the book's empirical claims without settling its deeper argument. ChatGPT handles the kind of language comprehension tasks that Marcus and Davis described as definitively beyond current AI — ask a modern LLM to find a non-Macy's department store and it will manage it. The surface failures they documented have been patched, at least enough to pass casual inspection. But this doesn't refute the book's core thesis: that statistical correlation without genuine understanding will eventually hit a wall, that failures will resurface in higher-stakes settings, that you can't hallucinate your way to trustworthy AI. Marcus has continued arguing exactly this, and the argument isn't obviously wrong. What *Rebooting AI* got wrong was the timeline and the shape of the failure; what it got right was the category.

The reason you can't count on deep learning to do inference and abstract reasoning is that it's not geared toward representing precise factual knowledge in the first place.
— Marcus & Davis, *Rebooting AI*, ch. 3

Most useful as a calibration tool for people new to the hype cycle. If you've never thought carefully about the difference between pattern recognition and comprehension, this book will do that work efficiently and readably. If you already hold that distinction and you're wondering what to do about it, the book runs out of gas. Read the first half, skim the second.

Key takeaways

Deep learning's 'depth' refers to layers of computation, not depth of conceptual understanding — a network that classifies images has not learned what images mean.
The core missing ingredient in AI is common sense: the unstated knowledge about time, space, causality, and physical objects that every five-year-old has and that no amount of training data can substitute for.
AI succeeds inside closed, rule-bound environments and fails unpredictably the moment real-world variables fall outside its training distribution.
The real danger from today's AI is not superintelligent machines but narrow, unreliable systems deployed in high-stakes domains — medicine, autonomous vehicles, weapons — where their unpredictable failures carry catastrophic consequences.
AI hype is structurally incentivized: tech companies, media, and investors all benefit from treating narrow benchmark victories as evidence of general intelligence, so the gap between perception and reality compounds.
Progress toward trustworthy AI requires combining symbolic knowledge representation — causality, compositionality, tracking individuals over time — with statistical learning; scaling data and layers alone is a dead end.
The software industry applies its lowest engineering standards to AI, yet AI is being deployed in safety-critical contexts that demand the same rigor as aerospace or medical device certification.

Read the longer summary

Listen — long summary

0:00 / 13:35

The book’s wager

Gary Marcus and Ernest Davis wrote Rebooting AI in 2019, sitting at what now looks like a strange inflection point. GPT-2 had just shipped. Waymo was still parking safety drivers in every car. IBM’s Watson Health was already collapsing. The book’s central wager is simple: the deep-learning revolution that delivered Jeopardy-winning Watson, ImageNet, and AlphaGo is not on a path to general intelligence, and pretending otherwise is going to get someone killed. Marcus and Davis call for a reboot — keep what deep learning does well, but stop treating bigger neural nets and more data as the answer to every hard problem. Add common sense. Add cognitive models. Add the kind of engineering rigor that the rest of the software industry has been doing for decades.

The book is aimed squarely at the hype cycle. The authors’ fear, stated late in the book, is sharp: “Our biggest fear is not that machines will seek to obliterate us or turn us into paper clips; it’s that our aspirations for AI will exceed our grasp.” That sentence is the spine of the project. Forget Skynet — worry about idiots savants in load-bearing positions.

The three gaps

Chapter 1 — “Mind the Gap” — diagnoses why the public, the press, and even working researchers consistently overestimate how close we are to general AI. Marcus and Davis name three gaps. The gullibility gap: humans can’t reliably tell a machine from a person, especially when the machine performs one specific task well. The easy/hard gap: a system that masters a closed problem like chess gets credit for progress on open-ended ones like driving. The illusion-of-progress gap: a working demo in a curated setting gets mistaken for a working product in the wild.

It’s a useful taxonomy, and it’s the strongest stretch of the book. The examples land. IBM Watson winning Jeopardy and then completely failing at oncology, with MD Anderson shelving its collaboration in 2017 after some recommendations were judged “unsafe and incorrect.” Facebook’s M assistant, killed three years after launch. Geoffrey Hinton in 2016 declaring it “quite obvious we should stop training radiologists” — a claim that looks deeply silly seven years later, since radiologists are still training and still very much employed. Marcus and Davis catalog these moments without joy; they want the reader to notice the pattern.

What I think still matters here is the argument that demos are not products. Anyone who has shipped software knows this. A model that gets 95% on a benchmark is not a system you can deploy in a hospital. The Marburg rare-diseases project that shelved Watson because “the performance was unacceptable” is the kind of receipt the AI industry mostly tries to forget. Rebooting AI makes sure you don’t.

The deep learning critique

Chapters 2 and 3 are where the authors lay out their case against deep learning as a sufficient path to AGI. Their argument is not that neural networks are useless — they’re emphatic that they aren’t — but that pattern-matching at scale is doing something fundamentally different from understanding. A neural network trained on millions of cat photos can label a Roomba-riding cat dressed as a shark in some plausible way, but it doesn’t know there’s a Roomba, a costume, or an absurd situation. It knows pixels.

The book’s term for what’s missing is “deep understanding,” which Marcus and Davis define against the way AI researchers use “depth” to describe network layers. Their point: more layers do not equal more comprehension. The reason neural nets hallucinate, miss obvious things, and get fooled by adversarial inputs is that they don’t have a model of the world the data describes. They have correlations.

This is the strongest part of the technical critique, and it’s the part that has aged most interestingly. From a 2026 vantage point, frontier language models do something the 2019 Marcus would have said was impossible: they appear to reason, hold context across hundreds of pages, and pass benchmarks the book treats as deep-understanding gates. But “appear” is doing a lot of work in that sentence. The hallucination problem hasn’t gone away. The lack of grounding hasn’t gone away. The brittleness in the long tail hasn’t gone away. The book’s diagnosis was partly wrong on speed — language models got dramatically better at faking comprehension faster than the authors expected — but right on substance: you can scale a system into a much better autocomplete without scaling it into a thinking thing.

Where the book runs out of road

Chapter 4, on language, and Chapter 5, “Where’s Rosie?” on robotics, are where the book starts to feel padded. The authors stack examples — Siri can’t answer this, Alexa can’t do that, robots can’t turn doorknobs — and the rhetorical hammer gets dull. The 2019 examples have also dated badly. Modern voice assistants can in fact handle the “find the nearest department store that isn’t Macy’s” kind of query Marcus and Davis used as a gotcha. Waymo is operating driverless robotaxis in multiple cities. The “robots can’t open doors” line was never quite the slam it sounded like — Boston Dynamics’ Atlas was already opening doors when the book went to print.

This is the trap any anti-hype book sets for itself: pin your critique to specific 2019 failures and you become hostage to the next two years of engineering progress. The authors are clearly aware of this — they keep stressing that their argument is structural, not about benchmarks — but the prose doesn’t always honor that distinction. A reader in 2026 hits a chapter on language assistants and thinks “this is wrong” before catching that the underlying critique (no real comprehension, just pattern matching) might still hold. The book would have been sharper at half the length. Multiple reviewers, including admirers of the project, made the same complaint at the time.

The prescription, such as it is

The weakest stretch is the prescription side. The authors say AI needs common sense, then say we should give it common sense, then summarize their entire program with: “Start by developing systems that can represent the core frameworks of human knowledge: time, space, causality…” The full passage in the conclusion is gloriously self-aware — they tack on “It’s a tall order, but it’s what has to be done” — but it doesn’t tell you how. A blueprint that boils down to “build a hybrid neuro-symbolic system that incorporates everything we know about human cognition” is not actually a blueprint.

That said, there’s more meat in the prescription than the cynical reading allows. Chapter 6 walks through eleven clues from cognitive science — the kinds of inductive biases human infants seem to come pre-equipped with, like object permanence, agency detection, and basic physics. Marcus has spent his career arguing that minds need innate structure, not just learning algorithms, and that argument is genuinely interesting whether or not you agree with it. The implication for AI is concrete: stop trying to learn everything from raw data and start engineering knowledge frameworks the way you’d engineer any other complex system.

Chapter 7 is more technical and to my mind the best of the prescription chapters. The authors argue for combining symbolic AI’s ability to represent precise knowledge — facts, rules, relationships — with neural networks’ ability to handle messy perceptual data. This is the “neuro-symbolic” hybrid that has since become a small but real research program. Whether it ends up mattering more than scaling laws is one of the open questions of the field; Marcus has been more right than the field’s center of mass has wanted to admit, but less right than his own marketing implies.

Chapter 8, on engineering standards, is the part of the book I’d press on a working AI engineer. Marcus and Davis make the case that compared to aviation, civil engineering, or even traditional software, AI has shockingly low standards for what counts as good performance. There’s no equivalent of program verification. There’s no formal way to prove a model won’t hallucinate in a particular regime. Released systems get tested on benchmarks that don’t resemble the long-tail conditions they’ll face in deployment. This stuff matters and the book is right to bang the drum on it. Every postmortem of every AI-in-the-wild incident since 2019 has reaffirmed it.

The 2026 verdict

The strongest single argument in Rebooting AI is that statistical correlation is not understanding, and that systems built only on correlation will fail in unexpected, unbounded ways. Seven years on, that argument is still load-bearing. Frontier language models still hallucinate. Vision systems still get wrecked by adversarial patches. Self-driving systems still struggle in long-tail conditions — fog, construction zones, weird debris — because the long tail by definition isn’t well represented in training data.

Marcus and Davis also nailed the political economy problem. They argued that AI had been deployed mostly in domains where being wrong was cheap — recommendation engines, ad targeting — and that pushing into high-stakes domains without raising the engineering bar would cause serious harm. The 2023–2025 generation of AI healthcare deployments, hiring tools, and legal-research products produced enough actual harm to vindicate the warning. Lawyers cited hallucinated cases. Hiring tools amplified biases that nobody auditing the system caught. The pattern played out exactly as the book said it would.

The book’s most consistent miss is on speed. The authors assumed that the deep-learning paradigm would hit a wall and that the wall would be visible by, roughly, now. Instead the paradigm keeps absorbing new tricks — RLHF, chain-of-thought, mixture-of-experts, retrieval, tool use — and the wall keeps moving. A reasonable person reading the book in 2019 would not have predicted that 2026’s frontier models would solve graduate-level math problems, write working code, or hold context across hundreds of pages. The book treats some of these capabilities as deep-understanding-or-bust gates, and the field walked through them without solving the deeper problem. That’s both a vindication (the deeper problem is still there) and a partial refutation (the gates were less load-bearing than the book made them seem).

The other miss is tonal. Marcus, in particular, has spent the last seven years repeating these arguments on Substack with an edge that has hardened from skepticism into something closer to grievance. The book itself is gentler than the public Marcus persona that came after, but you can see the seeds. Every example is a failure. Every counterargument is an opponent’s mistake. The genuine intellectual core — that hybrid systems and engineering rigor are needed — gets harder to hear over the drumbeat of “they’re all wrong, we told you.”

Who should read it

If you’re building AI systems and you have not internalized that benchmarks are not products, that statistical correlation is not understanding, and that engineering standards in AI are an embarrassment by any other industry’s measure — read this book. The first chapter alone is worth the price of admission, and Chapter 8 is required reading for anyone shipping AI into a domain where being wrong has real-world consequences.

If you’re an AI researcher already living this stuff, you can skim. The middle chapters won’t tell you anything new and the prescription will frustrate you with its vagueness.

If you’re an AI-curious reader trying to calibrate against the hype, this is one of the better defenses available, but it should not be the only book you read. Pair it with a 2024-or-later overview that takes the surprising successes of scaling seriously, and you’ll come out with a more honest picture than either book alone provides.

The book is repetitive — multiple reviewers, including some of its admirers, said so, and they’re right. There’s a 200-page argument squeezed into 270. But the 200 pages are good. Marcus and Davis read the field clearly in 2019, called the parts that would age well, and miscalled some of the parts that wouldn’t, which is roughly the best you can ask of any book about a fast-moving technology. Read it less as prophecy and more as a calibration tool. The hype is still here. So is the gap between demos and products. So is the absence of common sense. The reboot the authors called for has not happened. Whether it needs to — or whether scaling will eventually paper over the gap — is the bet the AI industry is making right now, and Rebooting AI is a useful guide to the side of that bet most of the money isn’t on.