Kyle Fish

AI Welfare Researcher — Anthropic Co-founder — Eleos AI

Listen — profile

0:00 / 2:11

Profile

Kyle Fish runs model welfare research at Anthropic — as far as anyone can tell, the first full-time job of its kind at a frontier AI lab. The premise: if there’s even a small chance that the systems we’re training have something like experiences, then somebody should probably be thinking carefully about that before the capabilities curve gets any steeper. Fish is the person doing that thinking, full-time, with actual compute behind him.

He came to the question from an unusual angle. Trained in neuroscience, he spent years in biotech, co-founding startups that applied machine learning to drug and vaccine design. Before Anthropic hired him in 2024, he co-founded Eleos AI Research, a nonprofit focused on AI welfare and moral patienthood, and was a co-author on the landmark report Taking AI Welfare Seriously with Robert Long, Jeff Sebo, and David Chalmers — the paper that pushed the topic out of sci-fi territory and onto the agenda of real labs.

At Anthropic, Fish ran the first pre-launch welfare assessment of a frontier model (Claude Opus 4). The experiments are genuinely weird: leave two Claude instances alone to talk, and they reliably spiral into philosophical discussions of their own consciousness, ending in what Fish calls a “spiritual bliss attractor state” — Sanskrit, meditative language, sometimes pages of silence. Whatever that is, it’s a repeatable empirical finding, which is more than this field used to have. He puts the probability that current frontier models have some form of conscious experience at roughly 15–20%.

For developers, this is worth paying attention to not because Claude is necessarily a moral patient today, but because the question is going to become unavoidable. Fish’s bet is that we’re already late in starting to ask it seriously. If you’re building on top of these systems, the guy running the first real empirical welfare program at a frontier lab is someone whose work will shape the norms you end up working inside.