← All stories
AI & Tech

Computer Use Progress Slower Because AI Cannot Grind Against Real Websites

Dwarkesh Patel Podcast · The next big breakthrough will be AIs learning on the job · June 26, 2026
Computer Use Progress Slower Because AI Cannot Grind Against Real Websites
Dwarkesh Patel Podcast
Dwarkesh Patel Podcast
The next big breakthrough will be AIs learning on the job
"You can't just have 1,000 agents go try the same checkout flow on Amazon to get better at using websites because Andy Jassy will find your bots and shut your ass down. You can solve this by making clones of Slack and Gmail and all the other common applications and websites. But at least currently, this is a very labor-intensive and unscalable way to build environments."
The speaker reveals that progress on AI computer use capabilities lags behind other domains because training requires replayable simulators, and companies like Amazon will block bot training on real websites. This forces labs to build labor-intensive clones of applications, highlighting an underrated bottleneck in AI development that won't be solved until AIs can build high-fidelity application clones themselves.

About this episode

In this monologue episode, AI researcher and podcast host Dwarkesh Patel examines the fundamental strategic bet major AI labs are making: that training models on millions of verifiable tasks across thousands of reinforcement learning environments will create artificial general intelligence. Patel reveals that current models are one-millionth as sample efficient as humans during training, though labs argue this inefficiency is a one-time cost amortized across billions of deployment sessions. He identifies an underrated bottleneck in AI progress: computer use capabilities lag because training requires replayable simulators, and companies like Amazon block bot training on real websites, forcing labs to build labor-intensive application clones. Patel argues that critical real-world skills like building businesses, winning elections, or succeeding in markets cannot be trained through current RL methods because they require months of real-world interaction that cannot be simulated in data centers. He cites a revealing quote from Anthropic CEO Dario Amodei suggesting short-horizon RL training may not generalize to long-horizon performance, potentially undermining the core AGI scaling hypothesis. The episode explores why continual learning and sample efficiency are deeply connected problems, discussing architectural innovations and alternative training methods like on-policy self-distillation and speculative "dreaming" approaches where AIs build and train against self-generated simulations. Patel concludes with a 2027-2028 scenario where deployed AIs learn primarily from real-world interactions across users rather than pre-deployment training, fundamentally changing how AI capabilities improve.

Key takeaways

More stories More from Dwarkesh Patel Podcast