The 3D Shift in Semiconductor Scaling

IBM has announced a breakthrough in semiconductor manufacturing with its 0.7nm "nano stack" technology. According to Huiming Bu, VP of Silicon Technology R&D, the industry has reached the physical limits of traditional 2D transistor scaling. The nano stack architecture introduces a vertical, staggered design that stacks transistors in the Z-direction for the first time in 60 years. This innovation provides a 50% performance increase or a 70% reduction in power consumption compared to current 2nm chips. Crucially, the design allows for independent optimization of top and bottom devices, providing a roadmap for scaling that could sustain the industry for the next 10–15 years.

The Rise of Multi-Model Orchestration

The panel discussed the emergence of new models like Sakana AI’s Fugu and Z.ai’s GLM-5.2, which are challenging the dominance of frontier labs like OpenAI and Anthropic. The panelists reached a consensus that these "models" are better understood as orchestration platforms. Sakana’s Fugu, for instance, functions as a router that intelligently directs tasks to various underlying models.

This shift suggests that the future of AI capability lies in the orchestration layer rather than just the raw weights of a single monolithic model. Gabe Goodhart noted that while these systems can achieve superior benchmark results through perfect routing, they introduce significant non-determinism. The quality of output becomes dependent on the routing logic, creating a trade-off between consistent performance and the ability to leverage the best-in-class model for any given sub-task.

The Future of "Tokenminning" and Efficiency

Moving away from "tokenmaxxing" (the pursuit of ever-larger context windows and output lengths), the panel highlighted a new trend: "tokenminning." This approach focuses on efficiency and precision, prioritizing the most important outcomes of AI usage rather than sheer volume. The panelists argued that the real innovation will occur when these orchestration techniques are applied to smaller models, allowing frontier-level capabilities to run on commodity hardware like smartphones or laptops. This horizontal spread of smaller, highly capable, orchestrated models is viewed as a more sustainable and impactful path than the current race for massive, centralized models.

Key Takeaways

  • Vertical Scaling is Here: The transition from 2D to 3D (Z-direction) transistor stacking is the primary solution to the slowing of Moore's Law.
  • Orchestration as Product: The most significant AI innovation is shifting from building better base models to building better routing and orchestration systems.
  • Beware of Non-Determinism: Multi-model systems offer high performance but increase unpredictability, as the user cannot always control or predict which model will service a specific request.
  • Efficiency Over Scale: The industry is pivoting from "tokenmaxxing" to "tokenminning," focusing on getting higher quality results with smaller, more efficient footprints.
  • Enterprise Resilience: Orchestration provides a buffer against model fluctuations, allowing developers to swap out underlying models without breaking the entire application pipeline.

Notable Quotes

  • "For the first time in the history of the semiconductor industry, we are stacking the device in the vertical direction, Z-direction, which is a direction our industry has not explored in the past 60 plus years." — Huiming Bu
  • "I think we're doing it a disservice by calling it a model. Because fundamentally, that's not the innovation here... they are figuring out how to get the best results out of the models that already exist, and stitch those together." — Gabe Goodhart
  • "I don't see this as a fundamental jump in terms of a new model. I think we're gearing towards orchestration is the product as opposed to the actual model." — Abraham Daniels