← All stories
AI & Tech

AI Researcher Claims Modern Go Bots Match AlphaGo for $3K Using LLM Coding Assistance

Dwarkesh Patel Podcast · Eric Jang – Building AlphaGo from scratch · May 15, 2026
AI Researcher Claims Modern Go Bots Match AlphaGo for $3K Using LLM Coding Assistance
Dwarkesh Patel Podcast
Dwarkesh Patel Podcast
Eric Jang – Building AlphaGo from scratch
"What took a whole team of research scientists at DeepMind and millions of dollars of research and compute can now be done for a few thousand dollars of rented compute. I spent maybe the first $4K doing exploratory research and then about $3K on the final run."
Eric Zhang, former VP of AI at 1X Technologies and ex-Google DeepMind senior research scientist, replicated AlphaGo-level performance on a budget of approximately $7,000 using modern compute and LLM-assisted coding. This represents a dramatic reduction from the estimated millions DeepMind spent, suggesting massive efficiency gains through better hardware, simplified architectures, and AI-assisted research. Zhang argues many of DeepMind's original algorithmic tricks are now unnecessary with modern GPUs and proper initialization against existing strong bots like Katago.

About this episode

In this technical deep dive, host Dwarkesh Patel interviews Eric Zhang, former VP of AI at 1X Technologies and ex-senior research scientist at Google DeepMind Robotics, who spent his recent sabbatical rebuilding AlphaGo from scratch. Zhang achieved AlphaGo-level performance for approximately $7,000 in compute costs—a dramatic reduction from DeepMind's original multi-million-dollar effort—using modern GPUs, LLM-assisted coding, and simplified architectures. The conversation provides an accessible explanation of how AlphaGo works, breaking down Monte Carlo Tree Search, policy and value networks, and the self-play training loop that enables the system to iteratively improve by distilling search into neural network forward passes. Zhang argues that AlphaGo represents a profound computational accomplishment: a 10-layer neural network somehow compresses what should be an intractable search problem, challenging traditional notions of computational complexity and suggesting NP-hard problems may be more tractable than theory predicts. He contrasts AlphaGo's elegant training approach—which provides improved action labels at every step via MCTS—with the far less efficient policy gradient methods used in LLM reinforcement learning, where models must randomly stumble upon correct answers before receiving any learning signal. Zhang also discusses his experience using Claude for automated research, finding it excellent for hyperparameter optimization and executing specific experiments but incapable of the lateral thinking required to abandon unproductive research directions. The episode concludes with broader reflections on AI research methodology, the validity of the 'bitter lesson' that compute matters more than algorithmic tricks, and what Go as a research environment might teach us about automating scientific discovery itself.

Key takeaways

More stories More from Dwarkesh Patel Podcast