AI & Tech

Former DeepMind Scientist Argues AlphaGo Solved NP-Hard Problem in Disturbing Way

Dwarkesh Patel Podcast · Eric Jang – Building AlphaGo from scratch · May 15, 2026

Dwarkesh Patel Podcast

Eric Jang – Building AlphaGo from scratch

"A 10-layer neural network can only do 10 sequential steps of thinking, right? 10 steps of neural network, parallelized, distributed representation thinking is able to amortize and approximate to a very, very high fidelity a nearly intractable search problem. It actually makes me wonder if our understanding of problems like P NP or these very fundamental computational hardness problems are incomplete."

Zhang argues AlphaGo's ability to compress what should be an intractable computational search into a small neural network represents a fundamental challenge to computer science's understanding of computational complexity. He suggests problems proven to be NP-hard in worst-case scenarios may be tractable in practice through neural networks that identify macroscopic structure, with implications extending to protein folding and weather prediction. This pattern—observable in AlphaFold and AlphaTensor—suggests certain problems assumed computationally intractable may yield to remarkably small amounts of compute.

About this episode

In this technical deep dive, host Dwarkesh Patel interviews Eric Zhang, former VP of AI at 1X Technologies and ex-senior research scientist at Google DeepMind Robotics, who spent his recent sabbatical rebuilding AlphaGo from scratch. Zhang achieved AlphaGo-level performance for approximately $7,000 in compute costs—a dramatic reduction from DeepMind's original multi-million-dollar effort—using modern GPUs, LLM-assisted coding, and simplified architectures. The conversation provides an accessible explanation of how AlphaGo works, breaking down Monte Carlo Tree Search, policy and value networks, and the self-play training loop that enables the system to iteratively improve by distilling search into neural network forward passes. Zhang argues that AlphaGo represents a profound computational accomplishment: a 10-layer neural network somehow compresses what should be an intractable search problem, challenging traditional notions of computational complexity and suggesting NP-hard problems may be more tractable than theory predicts. He contrasts AlphaGo's elegant training approach—which provides improved action labels at every step via MCTS—with the far less efficient policy gradient methods used in LLM reinforcement learning, where models must randomly stumble upon correct answers before receiving any learning signal. Zhang also discusses his experience using Claude for automated research, finding it excellent for hyperparameter optimization and executing specific experiments but incapable of the lateral thinking required to abandon unproductive research directions. The episode concludes with broader reflections on AI research methodology, the validity of the 'bitter lesson' that compute matters more than algorithmic tricks, and what Go as a research environment might teach us about automating scientific discovery itself.

Key takeaways

Zhang replicated AlphaGo-level Go bot performance for approximately $7,000 in compute costs using modern GPUs and LLM-assisted coding, down from DeepMind's estimated millions in the original 2016-2017 effort.
Zhang argues AlphaGo demonstrated that 10-layer neural networks can compress intractable search problems, challenging traditional computational complexity theory and suggesting NP-hard problems may yield to small amounts of compute when they exhibit macroscopic structure.
Monte Carlo Tree Search in AlphaGo provides improved action labels at every game state through forward search, enabling stable supervised learning, whereas LLM policy gradient methods suffer from sparse reward signals requiring random exploration to stumble upon correct answers.
Modern architectural choices like Transformers versus ResNets matter less for Go AI than previously thought, validating aspects of the bitter lesson, though proper initialization against strong existing models remains critical for sample efficiency.
Claude 4.6 and 4.7 excel at hyperparameter optimization and executing specific experiments but cannot perform lateral thinking to abandon unproductive research tracks or identify fundamental bugs versus bad ideas, representing a key bottleneck in automated AI research.
Many algorithmic tricks developed for Go AI like Katago's auxiliary supervision objectives are now obsolete with modern hardware, suggesting compute multipliers from clever algorithms may be transitory and non-stacking as hardware improves.
AlphaGo's training fundamentally differs from model-free RL by relabeling every action with MCTS-improved targets rather than solving credit assignment across full trajectories, maintaining low-variance learning signals throughout training.

More stories More from Dwarkesh Patel Podcast