LSTM co-inventor, AI history's loudest voice
Jürgen Schmidhuber
Profile
Jürgen Schmidhuber is the scientist deep learning never knew how to thank properly. In 1997, with his student Sepp Hochreiter, he co-invented the Long Short-Term Memory network — the architecture that, for roughly two decades, was how machines learned to handle sequences. Before transformers ate the world, LSTMs powered Google Translate, Siri, Alexa, and a huge chunk of production NLP. If you used speech-to-text on a phone between 2015 and 2018, an LSTM was almost certainly in the loop.
He ran IDSIA in Switzerland for decades, where his lab produced the first deep networks to win vision contests (before AlexNet), early work on highway networks (the direct predecessor of ResNets), fast weight programmers (which he argues are linear transformers), meta-learning, world models, and a formal theory of curiosity and creativity. Since 2021 he’s been Director of the AI Initiative at KAUST in Saudi Arabia, and he co-founded NNAISENSE to commercialize recurrent nets for industrial problems.
What developers should actually take from him: read his 1990-1991 “Annus Mirabilis” retrospective. Set aside the litigation tone and you’ll find the cleanest map of where modern ideas came from — self-supervised pretraining, attention, GAN-like adversarial objectives, and sequence-to-sequence learning were all kicking around in his group before they had the names they do now. His Deep Learning in Neural Networks: An Overview remains one of the most thorough historical surveys of the field.
He is also, without competition, the most polarizing figure in AI. Schmidhuber has spent the modern boom loudly insisting that Geoffrey Hinton, Yann LeCun, Yoshua Bengio, and Ian Goodfellow get credit for things his lab did first. The complaints are often technically defensible and socially exhausting. Take him seriously — the field did not spring from a 2012 ImageNet paper — but read him with the understanding that priority, as he practices it, is a full-time hobby.
Key Articles & Papers
Long Short-Term Memory Deep Learning in Neural Networks: An Overview Highway Networks World Models Linear Transformers Are Secretly Fast Weight Programmers The 1990-1991 Annus Mirabilis of Deep Learning A Formal Theory of Creativity, Fun, and Intrinsic Motivation Flat Minima Annotated History of Modern AI and Deep LearningControversies
Schmidhuber’s public feuds are a feature, not a footnote. The greatest hits:
- The 2015 NeurIPS GAN tutorial: he stood up mid-talk during Ian Goodfellow’s session to argue that GANs were a rediscovery of his 1992 Predictability Minimization. The exchange is now folklore.
- The “Deep Learning Conspiracy” essay: in Critique of Paper by “Deep Learning Conspiracy”, he attacked a 2015 Nature paper by Hinton, LeCun, and Bengio for omitting prior work — largely his own.
- The 2018 Turing Award: when Hinton, LeCun, and Bengio won, Schmidhuber publicly disputed the citation’s framing of who invented what, and has kept doing so.
- Transformer priority: he claims his 1991 fast weight controllers were the real first transformers, and that Ashish Vaswani et al. should cite them as such.
Gary Marcus once summarized the dynamic as “Schmidhuber is right about the history and wrong about how to talk about it.” That’s roughly the consensus. The receipts are usually real; the tone makes them hard to hear.
Spotify Podcasts