Hardware-Software Co-Design for LLM Inference

OpenAI and Broadcom have unveiled 'Jalapeño,' a custom AI accelerator built from the ground up for LLM inference. Unlike general-purpose accelerators, Jalapeño is architected specifically to handle the kernels, memory movement, and networking patterns required by frontier models. By controlling the full stack—from chip architecture and kernels to serving systems and product deployment—OpenAI aims to achieve realized utilization rates closer to the hardware's theoretical peak performance.

Rapid Development and Efficiency Gains

The development of Jalapeño was completed in nine months, which OpenAI claims is among the fastest ASIC development cycles for high-performance semiconductors. This speed was achieved by leveraging OpenAI's own models to assist in the design and optimization process. Early lab testing indicates that the chip delivers substantially better performance-per-watt than current state-of-the-art accelerators. The architecture focuses on reducing data movement and balancing compute, memory, and networking resources to support large-scale, gigawatt-level data center deployments starting in 2026.

A Multi-Generation Infrastructure Strategy

Jalapeño is the first iteration of a multi-generation compute platform. The collaboration integrates Broadcom’s silicon implementation and networking technologies (including Tomahawk networking silicon) with Celestica’s system integration expertise. This vertical integration is intended to lower the cost of compute, making advanced AI models more accessible and reliable for developers and end-users by reducing latency and increasing throughput for interactive applications.