The Shift to Agentic Systems Engineering

Coding agents have moved beyond simple script generation into the realm of AI systems engineering. By leveraging "skills"—defined as versioned, file-based context—engineers can move tasks from zero-shot attempts to robust, few-shot workflows. The core philosophy is to expose open primitives to agents rather than hiding them behind opaque, abstracted APIs, allowing agents to interact directly with the hardware and data layers.

Three Levels of Agentic Autonomy

Ben Burtenshaw outlines three tiers of increasing complexity for agent-driven engineering:

  1. Interactive CUDA Kernel Development: Agents can now write and optimize CUDA kernels, traditionally a highly specialized task. By using the Hugging Face kernels library, agents can benchmark performance against specific hardware matrices. This approach treats kernels as versioned repositories on the Hub, allowing agents to act as publishers who can achieve significant speedups (e.g., 94% improvement in specific inference scenarios) by optimizing for memory bandwidth rather than just compute.
  2. End-to-End Model Fine-tuning: Agents can automate the training pipeline by taking a prompt and executing a full fine-tuning run. Tools like Unsloth and Hugging Face CLI skills allow agents to manage the entire lifecycle, from data preparation to model deployment, making high-level ML engineering accessible in hours rather than days.
  3. Multi-Agent Research Labs: This represents the most autonomous tier, where a team of specialized agents (Researcher, Planner, Worker, Reporter) collaborates to improve training scripts. The Researcher scouts literature via the Hub, the Planner manages job queues, Workers implement architectural changes, and the Reporter pushes metrics to an open dashboard (Trackio). This setup allows for parallel, overnight experimentation where agents iterate on training efficiency based on verifiable metrics.

The Importance of Open Primitives

For agents to be effective, they require access to open, transparent data layers. Trackio is highlighted as a superior tool for agentic workflows because it stores data in an open format (Parquet), allowing agents to query, visualize, or manipulate metrics without being restricted by a proprietary UI. The key takeaway for builders is to prioritize tools that provide this level of data accessibility, as it prevents the agent from hitting a "ceiling" where it can no longer reason about or improve the system it is managing.