NVIDIA researcher, embodied AI and foundation agents
Jim Fan
Profile
Linxi “Jim” Fan is a Senior Research Scientist and Director of AI at NVIDIA, where he co-leads the GEAR Lab (Generalist Embodied Agent Research) and Project GR00T, NVIDIA’s moonshot at building foundation models for humanoid robots. His pitch is simple and ambitious: just as language models scaled across text, agent models will scale across realities — virtual and physical. Whoever cracks that becomes the OpenAI of robotics. He thinks NVIDIA is positioned to be that company.
His path is the kind that makes the AI bio circuit envious. Columbia valedictorian, then Stanford PhD under Fei-Fei Li, with a stop along the way as OpenAI’s very first intern, where he co-authored World of Bits — an early attempt to get an agent to operate a web browser from raw pixels, years before anyone said “computer use.” His doctoral work spanned distributed RL systems, computer vision, and robot learning, which is exactly the spread you’d want for the embodied-agents bet he’s making now.
The work that put him on the map for most developers: MineDojo (Outstanding Paper at NeurIPS 2022) turned Minecraft into a giant open-ended benchmark scraped together with thousands of YouTube videos and wiki pages. Voyager then set GPT-4 loose inside it — an LLM-driven agent that writes its own code, builds a skill library, and gets better the longer it plays. It was the first really convincing demonstration that an LLM could be more than a chatbot: it could be a persistent actor in a complex world. Eureka extended the same playbook to physical control, including the now-famous video of a five-finger robot hand spinning a pen.
Today his energy goes into GR00T (now shipping as GR00T N1.6, an open vision-language-action model on Hugging Face) and into being one of the clearest explainers in the field. His X account is required reading if you want to track what’s actually moving in robotics and embodied AI — sharp, well-diagrammed threads that strip the marketing off NVIDIA announcements and competitor papers alike. For developers trying to figure out where AI goes after chatbots, Jim Fan is the most productive person to follow.
Key Articles & Papers
Voyager: An Open-Ended Embodied Agent with Large Language Models MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge Eureka: Human-Level Reward Design via Coding Large Language Models VIMA: General Robot Manipulation with Multimodal Prompts GR00T N1: An Open Foundation Model for Generalist Humanoid Robots World of Bits: An Open-Domain Platform for Web-Based AgentsVideos
Spotify Podcasts