Architecture search reveals transformers need less structure at trillion-parameter scale
"Scaling laws define architecture. When we are talking about Transformers being this revolutionary thing, we are talking about maximum scale. The reason why transformer architecture and attention mechanism is such a brilliant architecture is the fact that it is unstructured. There is no structure. The larger neural networks that you make into infinite size, the larger neural networks you make, the more you want them to become less and less structured. As soon as you start adding a little bit of bias in that architecture at scale, things become completely messed up."
About this episode
Nathan Labenz interviews Ramin Hassani, CEO and co-founder of Liquid AI, in a technically deep exploration of biologically-inspired neural architectures and the future of efficient AI systems. Hassani traces Liquid AI's origin to a decade of MIT research into liquid neural networks—differential equation-based systems inspired by the 300-neuron brain of C. elegans worms that can perform complex control tasks like autonomous parking with just 12 neurons. The breakthrough came in 2022 when the team solved century-old neuronal dynamics equations in closed form, enabling these nonlinear systems to scale from hundreds to potentially billions of neurons. Today, Liquid AI ranks fifth in the US for foundation model downloads on Hugging Face with over 1 million weekly downloads, competing against Google, Meta, Microsoft, and NVIDIA while using just 1,000 GPUs. The company developed an Automated Foundation Model Design system that searches architecture space with hardware in the loop, testing on actual downstream tasks rather than proxy metrics. This revealed a fundamental scaling principle: smaller models benefit from complex gating and architectural bias, while trillion-parameter systems require maximal unstructured computation like pure attention. Liquid's LFM models use primarily gated convolutions rather than attention, achieving competitive quality at dramatically lower compute and memory footprints. The company has secured partnerships with Shopify for production deployment and Mercedes-Benz for in-car intelligence using 600-megabyte models. Hassani argues the trillion dollars of smartphones and laptops shipped annually represents untapped substrate for local AI that current foundation models cannot efficiently utilize, and warns semiconductor companies they must build their own intelligence layers like NVIDIA's Nematron or risk losing competitiveness. He closes with a techno-optimist vision of curiosity-driven research enabled by AI agents, while noting current architectures likely cannot match human brain efficiency without discovering new emergent learning mechanisms beyond next-token prediction.
Key takeaways
- Liquid AI achieved fifth place ranking for US foundation model downloads using only 1,000 GPUs, competing with Google, Meta, Microsoft, and NVIDIA with over 1 million weekly downloads
- MIT researchers demonstrated autonomous car parking with 12-neuron liquid neural networks and driving with 19 neurons by mimicking C. elegans worm brain dynamics
- Liquid AI solved century-old neuronal dynamics equations in closed form in 2022, enabling biologically-inspired networks to scale from hundreds to billions of neurons
- Automated architecture search with hardware in the loop revealed larger models require less structure while smaller specialized models benefit from complex gating mechanisms
- Mercedes-Benz signed contract for Liquid AI to power in-car voice and visual intelligence with 600-megabyte models running entirely on local automotive processors
- Annual smartphone and laptop markets totaling $1 trillion represent massive untapped substrate for local AI that current foundation models cannot efficiently utilize
- Hassani warns semiconductor companies must build their own foundation model intelligence layers like NVIDIA's Nematron or risk losing market share to vertically integrated competitors