Scaling AI via Heterogeneous Intelligence

The Shift from Homogeneous to Heterogeneous Intelligence

The prevailing paradigm of AI scaling relies on homogeneous intelligence: scaling single, massive models across identical GPU clusters. While this drove early progress, it is increasingly inefficient for complex, multi-step, real-world problems. Adrian Bertagnoli argues that intelligence should be treated as a heterogeneous system where model architectures, chip types, and workflows co-evolve. By decomposing complex tasks into sub-problems, builders can route specific subtasks to specialized, smaller models rather than relying on frontier models for every step.

Optimizing Workflows and Hardware

Callosum’s approach to heterogeneous intelligence focuses on three layers of optimization:

Workflow (Heterogeneous Recursion): Building on the concept of recursive language models, Callosum treats context as an environment rather than a static prompt. By programmatically extracting sub-contexts (using tools like regex or Python REPLs), the system routes specific sub-tasks to smaller, more efficient models. In long-context reasoning tasks, this approach on Cerebras hardware achieved 7x lower costs and 5x lower latency compared to frontier models while maintaining accuracy.
Agent Layer (Multimodal Navigation): In visual web navigation, the team decomposed tasks into visual parsing, zooming, and textual reasoning. By offloading simple tasks like zooming to smaller models (e.g., Qwen 3 VL8B) and reserving frontier models for high-level reasoning, they outperformed state-of-the-art models like GPT-4o and Gemini 1.5 Pro on the Video Web Arena benchmark. Specifically, offloading subtasks resulted in 11x faster execution and 43x lower costs for those specific steps.
Automation Layer: Rather than making bespoke manual decisions for every subtask, Callosum is building an automation layer that detects task complexity and dynamically routes work to the most cost-effective and performant model-hardware combination.

The Future of Compute

Bertagnoli posits that we are entering the third era of compute. The first was defined by CPU speed, the second by massive parallelism (Nvidia), and the third will be defined by heterogeneous mapping. This paradigm shift suggests that as new silicon enters the market, the ability to unify these disparate resources into a single, intelligent orchestration layer will be the primary driver of efficiency and performance gains.

The Shift from Homogeneous to Heterogeneous Intelligence

Optimizing Workflows and Hardware

The Future of Compute

More from AI & LLMs

AI Agent QBee Cuts SaaStr CS Hours 70% Internally + Externally

Evaluating LLM Agents in High-Stakes Energy Analytics

Implementing DeepMind's Deep Research API

The Shift to Agentic Loops in AI Development