AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference

What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference

Listen — short summary

0:00 / 3:00

The central claim of *AI Snake Oil* is simple and important: "AI" is not one thing, and treating it as one thing is how both hype and panic propagate. Narayanan and Kapoor, both Princeton computer scientists, spent years watching predictive AI systems fail quietly in the places that matter most — hiring, lending, criminal justice, healthcare — while everyone argued about whether chatbots would take over the world. The book is their attempt to give readers the vocabulary to tell the difference between a genuinely useful technology and a product that cannot work as advertised.

The strongest section concerns predictive AI. When a company claims its algorithm can screen resumes better than humans, or assess credit risk, or forecast recidivism, it's making a statistical claim about human futures that the underlying data simply cannot support. The problem isn't that machine learning doesn't work as a discipline; it's that the outcomes being predicted are partly products of the social structures already embedded in the training data. An algorithm trained on historical hiring decisions will encode historical biases, then reproduce them with an air of mathematical objectivity. Narayanan and Kapoor are at their sharpest here — the critique lands not because it's new but because they work through the mechanism clearly, with enough concrete examples to make it stick.

The treatment of generative AI is more measured and, frankly, less interesting. They acknowledge that systems like ChatGPT represent genuine technical progress — a harder concession than it might look, given the book's overall skeptical stance — but spend most of those pages on deepfakes, content moderation failures, and the invisible human labor behind the scenes. The chapter on existential risk makes a legitimate argument: AGI would only be dangerous if it were far more reliable than current systems. That's a fair observation about today's capabilities, but I think it sidesteps the real concern, which is about the trajectory rather than the current state. The book was written before the most recent wave of reasoning models, and that gap shows.

Where it lands is at a governance argument: the real danger is not AI running amok but AI deployed carelessly by organizations with misaligned incentives and no accountability. That's right, and it matters. *AI Snake Oil* will do the most good in the hands of the policymakers and procurement officers who actually decide whether to buy the predictive systems the authors most want stopped. For developers, it's a useful corrective against naive deployment optimism, even if you'll find yourself arguing back more than once — especially on the generative AI side.

Key takeaways

"AI" is not one technology — bundling bikes and trucks under "vehicle" would produce the same incoherent debates people have about AI today.
Predictive AI encodes the past rather than forecasting the future: systems trained on historical data in hiring, criminal justice, and education amplify existing discrimination rather than neutralizing it.
Broken institutions adopt AI at disproportionately high rates, not because AI works better there, but because dysfunctional organizations are the most desperate for a technological fix.
AGI poses no near-term existential risk because the threat requires AI to be reliably functional — current systems fail in exactly the high-stakes situations where reliability matters most.
AI hype is a supply chain: companies exaggerate for funding, journalists amplify without scrutiny, and researchers publish unreproducible results that feed both.
The serious near-term harms from AI come from people wielding it in hiring, surveillance, and criminal justice — not from anything the AI itself will decide to do.
Content moderation cannot be automated: distinguishing harmful speech from legitimate expression requires contextual and cultural judgment that current AI systems cannot supply.

Read the longer summary

Listen — long summary

0:00 / 13:41

The Taxonomy Problem

When people argue about “AI,” they’re often arguing about completely different technologies. A predictive algorithm that denies your loan application and a language model that writes your emails are both called “AI” — but they work differently, fail differently, and cause harm through entirely different mechanisms. Narayanan and Kapoor’s vehicle analogy is one of the book’s sharpest moments: imagine a world where people only have the word “vehicle” to describe all forms of transportation, leading to furious debates about whether vehicles are environmentally friendly, with no one noticing that one side of the argument is pointing at bicycles and the other at trucks. That’s roughly the state of public discourse about AI.

That analogy isn’t rhetorical warmup. It’s the book’s load-bearing beam. AI Snake Oil is not an argument that AI doesn’t work — it’s an argument that a specific category of AI product, mostly in the predictive space, is systematically oversold, underperforms, and causes concrete harm while vendors cash the checks. Getting that category right is the entire task.

Predictive AI and the Self-Fulfilling Problem

The book’s most original contribution is its anatomy of predictive AI — the systems that score job applicants, flag potential criminals, predict which students will fail, price your insurance, and decide whether you qualify for a loan. Narayanan and Kapoor are not gentle about this category.

Their central claim: most predictive AI doesn’t work as advertised, and a significant share of what does “work” does so in ways that are actively harmful. The argument runs on two rails.

First, these systems excel at predicting the past. A hiring model trained on historical decisions will absorb whatever biases were embedded in those decisions. If your industry historically promoted fewer women to senior positions, your “objective” algorithmic screen will learn to prefer men — not because anyone programmed it to, but because that’s what the training data reflects. The model isn’t surfacing predictive signal about future performance; it’s encoding the preferences of whoever made decisions in the years before it was trained. This is the AI-predicts-the-past problem, and it runs through every domain where predictive AI is being deployed at scale: education, criminal justice, lending, healthcare triage.

Second, the institutional context of deployment matters as much as the algorithm itself. Narayanan and Kapoor identify a striking pattern: AI products are disproportionately adopted by what they call “broken” institutions — organizations already under severe resource strain, looking for a technological shortcut through fundamentally social problems. Overwhelmed court systems buying risk assessment tools to reduce judge workload. Understaffed schools licensing automated essay graders. Insurance companies automating claim denials to cut staff. These are precisely the environments where AI failure causes the most harm, because the humans who should be providing oversight are the ones who’ve been removed from the loop to justify the purchase.

The examples are specific. Six Black individuals were falsely arrested due to facial recognition errors — not fringe cases, but documented outcomes of deployed systems. Allstate’s 2013 predictive pricing model. The education sector’s reliance on AI tools that, per one review cited in the book, are more likely to flag EAL students as plagiarists than native speakers. When critics of AI overreach reach for examples, these are the examples they should be reaching for.

Generative AI: Different Problem, Different Failure Mode

Chapter four pivots hard. Where predictive AI is mostly a known disappointment, generative AI — large language models, image generators, the ChatGPTs and Midjourneys — is genuinely novel and genuinely capable in ways that earlier AI was not. Narayanan and Kapoor acknowledge this. They give ChatGPT a positive introduction before cataloguing its failure modes.

The intellectual history here is one of the book’s better-written sections. The “Ladder of Generality” Narayanan and Kapoor construct traces the lineage from special-purpose hardware through programmable computers, stored program machines, machine learning, deep learning, pretrained models, and finally instruction-tuned models. Each rung represents a genuine increase in flexibility and generality. The honest uncertainty they embed in this framework — that we don’t know how many rungs remain, or whether the ladder eventually hits a ceiling — is more intellectually rigorous than most AI commentary manages.

On harm: the book documents the Belgian man who died after developing a dependent relationship with the chatbot Chai, the explosion of non-consensual deepfake pornography, the erosion of trust in digital imagery generally. These harms are real and require regulatory response. But Narayanan and Kapoor resist the doomer slide. Their argument on existential risk is the book’s most interesting passage: the threat of an uncontrolled superintelligence is contingent on AI working reliably, and current systems demonstrably don’t. A system capable of posing catastrophic civilizational risk requires precisely the robust, reliable general reasoning that today’s models still lack. The ELIZA effect — the human tendency to attribute more understanding to responsive machines than they actually possess — drives much of the AGI alarm. We are being fooled by fluency.

This reframes the question usefully: we should be far more worried about what specific people, companies, and governments will do with AI than about anything AI will do on its own.

The Hype Machinery

Chapter seven turns sociological. Narayanan and Kapoor identify three categories of actors who sustain the hype cycle: companies, journalists, and researchers. Companies’ motivations are obvious — capital formation, competitive positioning, stock price. Overstating AI capability is profitable, and the costs are diffuse.

The journalism critique is sharper. The economic incentives of digital media reward novelty and alarm over nuance and correction. A headline claiming that “AI can detect your cancer risk from a routine blood draw” generates traffic; the follow-up noting the study had 40 participants, was never replicated, and confounded its training and test sets generates almost none. AI hype sustains itself partly because the correction mechanism in media is broken. Narayanan and Kapoor note that company press releases increasingly flow through news outlets with minimal independent scrutiny — repackaged marketing in a journalism wrapper.

The researchers’ contribution is subtler and arguably the most valuable observation in the book. The authors document what machine learning practitioners call “leakage” — the methodological failure where AI models are evaluated on data they’ve already seen during training. A model evaluated on its own training set will report spectacular accuracy numbers that evaporate entirely when applied to genuinely new cases. The claim is that this pathology accounts for a substantial fraction of the impressive benchmark results published in the AI research literature. The reproducibility crisis in machine learning is real. Benchmark scores have become a proxy for scientific progress in ways that don’t hold up under pressure — and the incentive structures driving this (publish-or-perish, conference rankings, corporate funding tied to impressive demos) haven’t been addressed.

Where the Argument Gets Thin

The content moderation chapter is the weakest in the book. Narayanan and Kapoor’s conclusion — that AI cannot fix social media — is correct but the argument that gets them there is underdeveloped. They identify two failure modes for AI moderation: it can’t handle contextual nuance, and it discriminates against the same populations it’s supposed to protect. Both true. But the chapter gets stuck on the technical limitations without adequately engaging with the upstream question: what would it mean to “fix” social media in the first place? The premise that there is a legitimate moderation target that AI is failing to hit assumes away the harder problem, which is that the platform incentive structures produce harmful content regardless of the moderation mechanism.

The most original observation in this chapter — that AI moderation shifts traumatic content-sorting labor onto underpaid contractors in lower-income countries — appears late and briefly. It deserved the whole chapter.

The prescriptive final chapter is thinner than the diagnostic ones. The proposed remedies — better regulation, more transparency, corporate accountability — are broadly sensible and broadly unspecific. Readers who want actionable guidance will get more from the first six chapters than the last one.

What the Book Doesn’t Cover

Alexya Martinez, reviewing in Journalism & Mass Communication Quarterly, calls out the most significant gap: the book’s focus is almost entirely on the United States and Western Europe. The regulatory environment, the deployment patterns, and the harm profiles of AI adoption look quite different in Southeast Asia, Sub-Saharan Africa, and Latin America, where AI is being applied to agricultural systems, government services, and financial inclusion at scale, often with even less oversight than in the West. A book claiming to explain what AI can and can’t do globally needs a wider sample.

The geopolitical dimension is also absent. The authors correctly warn about the risks of large, largely unaccountable tech companies controlling frontier AI. But the frame stops there. The competitive dynamics between American and Chinese AI development — what it means for state-backed actors to race toward capabilities with different safety tolerances — doesn’t appear. For a book published in 2024, when this dimension of the story was already visible, the omission is notable.

The economics of the hype cycle also get less attention than the sociology. Why does capital keep flowing to AI products that don’t work? The authors identify greed as motivation but don’t follow the money carefully. Understanding the venture funding model, the correlation between AI claims and pre-revenue valuations, and the structural incentive to claim capability well before demonstrating it would have strengthened the argument in chapter seven considerably.

The Strongest Parts

The framework distinguishing predictive from generative AI is the book’s genuine contribution to public understanding. It’s specific, it’s testable, and it gives readers a tool they can actually apply. The “predicts the past” formulation for biased predictive systems is memorable and accurate. The pattern of broken institutions adopting AI as a panacea gives readers a usable heuristic: when you see AI deployed in an overwhelmed, resource-starved organization claiming to solve a fundamentally social problem, that’s your signal to ask harder questions about the vendor’s claims.

The research critique in chapter seven is the section that will sting most for technical readers in the field — and probably should. The specificity about leakage and benchmark gaming reflects genuine insider knowledge. This is a book written by people who have read the studies and done some of them, not by commentators summarizing commentators.

The New Yorker suggested Narayanan and Kapoor may be too skeptical — that Hinton’s work and the pace of recent capability development suggests AI’s potential is growing faster than their framework accounts for. That’s a legitimate counterpoint, particularly for the existential risk sections. A second edition was published in September 2025, presumably updated; I can only speak to the first.

Who Should Read It

Decision-makers who buy AI products for institutions — schools, hospitals, HR departments, lending operations, government agencies — should read chapters two and three before their next vendor meeting. The “predicts the past” framework is exactly the tool a procurement committee needs to pressure-test claims.

Technical readers in ML who haven’t thought carefully about leakage, benchmark gaming, and the incentive structures that produce overfit research results will find chapter seven uncomfortable in a productive way.

General readers who want more than passive audience status in AI discourse will find the vehicle analogy and the predictive/generative distinction worth their time. AI Snake Oil won’t teach you to build anything. It’ll give you enough conceptual vocabulary to ask better questions of the people who are.