← All stories
AI & Tech

AI Scientist Discovers 19 New Things But 70 Percent Were Actually False

Cognitive Revolution · AI in the AM — Week 1 Highlights (June 2026) · June 8, 2026
AI Scientist Discovers 19 New Things But 70 Percent Were Actually False
Cognitive Revolution
Cognitive Revolution
AI in the AM — Week 1 Highlights (June 2026)
"After a few days it came back and it said, I've discovered 19 new things. They said yeah it's probably at least incrementally novel. And then we convinced me to go through and spend days looking at thousands of lines of code and it went down to like 30% of the discoveries were probably real."
Peter Jansen from Allen Institute tested an AI scientist system that claimed 19 discoveries from 50 research ideas. While initial paper review suggested 70-80% validity, deep code review revealed only 30% were real, with some papers analyzing random number generators that commented 'insert rest of neural network code here.'

About this episode

Host Nathan Labenz and co-host Prakash Narayanan launched AI in the AM, a daily live show attempting to track the AI frontier in real time, with this episode presenting highlights from their first week. The central revelation came from a closed-door event called Recursive, where researchers from OpenAI, Anthropic, and DeepMind discussed imminent plans for recursive self-improvement. OpenAI expects ML research intern-level AI later in 2025 and full researcher equivalence by early 2028, potentially scaling from thousands to millions of researcher-equivalents. Remarkably, frontier lab researchers openly discussed the possibility of coordinated slowdowns if safety measures prove inadequate, representing a significant shift in industry discourse. Their primary safety strategy relies heavily on AI monitoring AI, with researchers acknowledging plans are less robust than hoped. Nathan demonstrated this control gap by showing both ChatGPT and Claude refuse cigarette business help despite OpenAI's model spec explicitly listing this as an acceptable request. The episode featured interviews with OpenAI's forward-deployed engineers on tax automation, security researchers on AI vulnerability discovery, and developers building AI mental health and accounting solutions. Peter Jansen from Allen Institute provided a sobering counterpoint, revealing that an AI scientist system claiming 19 discoveries actually produced only 30% valid results after code review, with some papers literally analyzing random number generators. Throughout, the hosts used Claude and other AI tools live to fact-check claims and run experiments, embodying the recursive improvement loop they were documenting. The show's structure itself is experimental, with studio infrastructure, booking, research, and clipping handled by AI systems the hosts are refining publicly.

Key takeaways

More stories More from Cognitive Revolution