AI & Tech

Internal OpenAI Model Disproves Mathematical Conjecture at Low Budget Cost

Name: Internal OpenAI Model Disproves Mathematical Conjecture at Low Budget Cost
Uploaded: 2026-06-29T14:02:00+00:00
Description: Brown revealed that an internal OpenAI model recently disproved the Erdős unit distance conjecture, a longstanding mathematical problem, at minimal computational cost. He noted that publicly available models like GPT-5.5 could likely achieve the same result with proper scaffolding at a budget of $1,000 to $100,000, demonstrating latent capabilities not yet fully explored.

No Priors Podcast · Really Big Test-Time Compute in AI Changes Benchmarks, Safety and Research with OpenAI Research Scientist Noam Brown · June 29, 2026

No Priors Podcast

Really Big Test-Time Compute in AI Changes Benchmarks, Safety and Research with OpenAI Research Scientist Noam Brown

"We used an internal model at OpenAI a few weeks ago to disprove the Erdős unit distance conjecture. Now, I'm not a mathematician, but this seems like it was a pretty big deal in the math community. It was like the first problem that a lot of mathematicians had really spent a lot of time on, and the model was able to do something that they weren't able to do and do it in a way that was actually interesting and useful for mathematicians. Honestly, it did it at a budget that was dirt cheap."

Brown revealed that an internal OpenAI model recently disproved the Erdős unit distance conjecture, a longstanding mathematical problem, at minimal computational cost. He noted that publicly available models like GPT-5.5 could likely achieve the same result with proper scaffolding at a budget of $1,000 to $100,000, demonstrating latent capabilities not yet fully explored.

About this episode

On this episode of No Priors, host Sarah Guo interviews Noam Brown, an OpenAI researcher who pioneered inference-time scaling techniques, about the broken state of AI model evaluations and the implications of large-scale test-time compute. Brown argues that current model benchmarking practices fail to account for the fact that modern AI capabilities are now a function of inference budget rather than fixed model properties, making comparisons misleading and safety evaluations inadequate. He revealed that existing responsible scaling policies and preparedness frameworks, developed during the ChatGPT era, don't address how much test-time compute should be allocated when evaluating dangerous capabilities, creating a critical blindspot as models can perform dramatically differently at $10 versus $10 million budgets. Brown disclosed that an internal OpenAI model recently disproved the Erdős unit distance conjecture at minimal cost, and that publicly available models like GPT-5.5 contain significant unexplored capabilities because the rapid release cycle means nobody runs models long enough to discover their limits. He revealed OpenAI is deliberately discouraging internal researchers from solving open problems in mathematics and physics to focus on building more capable models faster. The conversation explored recursive self-improvement, with Brown arguing against fears of overnight intelligence explosion because large-scale test-time compute creates a time bottleneck. He noted current models lack research taste and cannot yet fully replace researchers, though they dramatically accelerate certain tasks like code optimization. Brown predicted that within 6 to 12 months, models will be capable of completing PhD-level work zero-shot and emphasized the need for evaluation practices that plot performance against inference budget rather than reporting single benchmark scores.

Key takeaways

Brown revealed that existing AI safety frameworks fail to account for test-time compute scaling, creating evaluation blindspots as model capabilities vary dramatically with inference budget from $10 to $10 million.
An internal OpenAI model recently disproved the Erdős unit distance conjecture at minimal cost, and Brown said GPT-5.5 could likely achieve the same with $1,000 to $100,000 in scaffolded compute.
OpenAI is actively discouraging researchers from using internal models to solve open mathematical and physics problems to prioritize rapid model development over demonstrating current capabilities.
Nobody knows the capability ceiling of current AI models because the 2-3 month release cycle is faster than the weeks or months required to fully test model limits.
Brown predicted models will complete PhD-level research projects zero-shot within 6 to 12 months based on GPT-5.5's poker bot development capabilities with minimal human guidance.
Brown argued against overnight intelligence explosion scenarios because large-scale test-time compute creates an inherent time bottleneck that prevents instantaneous capability jumps.
Current model benchmark grids are misleading because they don't control for test-time compute, with GPT-5.5 appearing only marginally better than 5.4 despite being substantially more efficient when compute is equalized.

More stories More from No Priors Podcast