Jürgen Schmidhuber

Director, Center of Excellence for Generative AI — King Abdullah University of Science and Technology (KAUST) Scientific Director — Swiss AI Lab IDSIA Founder — juergen.AI Co-founder, Former President — NNAISENSE

Listen — profile

0:00 / 2:20

Profile

Jürgen Schmidhuber is the scientist deep learning never knew how to thank properly. In 1997, with his student Sepp Hochreiter, he co-invented the Long Short-Term Memory network — the architecture that, for roughly two decades, was how machines learned to handle sequences. Before transformers ate the world, LSTMs powered Google Translate, Siri, Alexa, and a huge chunk of production NLP. If you used speech-to-text on a phone between 2015 and 2018, an LSTM was almost certainly in the loop.

He ran IDSIA in Switzerland for decades, where his lab produced the first deep networks to win vision contests (before AlexNet), early work on highway networks (the direct predecessor of ResNets), fast weight programmers (which he argues are linear transformers), meta-learning, world models, and a formal theory of curiosity and creativity. Since 2021 he’s been Director of the AI Initiative at KAUST in Saudi Arabia, and he co-founded NNAISENSE to commercialize recurrent nets for industrial problems.

What developers should actually take from him: read his 1990-1991 “Annus Mirabilis” retrospective. Set aside the litigation tone and you’ll find the cleanest map of where modern ideas came from — self-supervised pretraining, attention, GAN-like adversarial objectives, and sequence-to-sequence learning were all kicking around in his group before they had the names they do now. His Deep Learning in Neural Networks: An Overview remains one of the most thorough historical surveys of the field.

He is also, without competition, the most polarizing figure in AI. Schmidhuber has spent the modern boom loudly insisting that Geoffrey Hinton, Yann LeCun, Yoshua Bengio, and Ian Goodfellow get credit for things his lab did first. The complaints are often technically defensible and socially exhausting. Take him seriously — the field did not spring from a 2012 ImageNet paper — but read him with the understanding that priority, as he practices it, is a full-time hobby.

Key Articles & Papers

Long Short-Term Memory 1997 — The LSTM paper with Sepp Hochreiter. The architecture that made RNNs actually trainable and dominated sequence modeling for twenty years. Deep Learning in Neural Networks: An Overview 2014 — A 35-page single-author survey of the field's history. Dense, exhaustive, and more historically careful than anything else written at the time. Highway Networks 2015 — Gated skip connections that made it possible to train networks hundreds of layers deep. ResNets (which landed months later) are a simplified special case. World Models 2018 — With David Ha. Agents learn compressed generative models of their environment and train policies inside the model. A foundational read for model-based RL. Linear Transformers Are Secretly Fast Weight Programmers 2021 — Formalizes the equivalence between his 1991 fast weight controllers and modern linear attention. Central to his transformer priority claim. The 1990-1991 Annus Mirabilis of Deep Learning 2021 — Schmidhuber's personal retrospective. Essential reading — part history lesson, part grievance — but the map of early ideas is genuinely valuable. A Formal Theory of Creativity, Fun, and Intrinsic Motivation 2010 — His framework where agents learn because compressing new observations is intrinsically rewarding. The intellectual ancestor of modern curiosity-driven RL. Flat Minima 1997 — With Hochreiter. Argued that flat loss minima generalize better — an idea that went quiet and then came roaring back in the deep learning era. Annotated History of Modern AI and Deep Learning 2022 — Expanded version of the Annus Mirabilis essay. Reads like a one-man Wikipedia of who-did-what-first, with citations.

Controversies

Schmidhuber’s public feuds are a feature, not a footnote. The greatest hits:

The 2015 NeurIPS GAN tutorial: he stood up mid-talk during Ian Goodfellow’s session to argue that GANs were a rediscovery of his 1992 Predictability Minimization. The exchange is now folklore.
The “Deep Learning Conspiracy” essay: in Critique of Paper by “Deep Learning Conspiracy”, he attacked a 2015 Nature paper by Hinton, LeCun, and Bengio for omitting prior work — largely his own.
The 2018 Turing Award: when Hinton, LeCun, and Bengio won, Schmidhuber publicly disputed the citation’s framing of who invented what, and has kept doing so.
Transformer priority: he claims his 1991 fast weight controllers were the real first transformers, and that Ashish Vaswani et al. should cite them as such.

Gary Marcus once summarized the dynamic as “Schmidhuber is right about the history and wrong about how to talk about it.” That’s roughly the consensus. The receipts are usually real; the tone makes them hard to hear.