AI Agents: Skills Beat MD Files for Token Efficiency
Modern models like Opus and GPT are excellent—focus on context via skills with progressive disclosure, built iteratively from real workflows, to avoid token waste and scale productivity.
Models Excel, But Context Separates Quality from Slop
Ras Mic asserts that current LLMs like Claude's Opus 4.6 and OpenAI's GPT 5.4 are "exceptionally good," shifting the battle from model choice to context engineering. "The models are good. The models are exceptionally good," he says, dismissing endless debates on which is superior for coding or UI. Instead, the differentiator is the "harness" around them: system prompts, files, tools, codebase, and conversation history stacking into a context window capped at ~250k tokens.
Every element loads cumulatively. Agent.md or Claude.md files—common for defining agent behavior—get injected on every turn, burning tokens relentlessly. Ras estimates a 1,000-line agent.md at 7,000 tokens per interaction. "95% of people don't need this," he claims, unless it's proprietary company methodology required constantly. For most, the model infers from the codebase or task; redundantly stating "this uses React" is pointless when the code is in context.
This leads to bloat: early conversations start at 20k tokens, ballooning over turns until agents "compact" the history, degrading output. Ras advocates minimalism: strip unnecessary context to steer models toward quality, not slop.
Skills Enable Progressive Disclosure and Token Savings
Skills revolutionize this via progressive disclosure: only the skill's name and short description (~53 tokens) load into context initially. The agent pulls the full instructions (name, description, detailed steps) only when relevant. A full agent.md equivalent might cost 944+ tokens per turn; skills defer that expense.
"I'm a skills maxi," Ras declares. He demos a skill structure:
Name: Notion Report Skill
Description: Generates structured Notion reports from data.
[Full instructions here—loaded on-demand]
This keeps context lean while granting access precisely when needed, saving "thousands of tokens per conversation."
Ras shares his sponsor email screening agent story. Initially, forwarding sponsor emails to an OpenClaw agent yielded all-positive verdicts—no rejections, shallow research. He walked it step-by-step: "Check Twitter, YouTube, Trustpilot, funding. Reject if two lack good standing." After corrections and a successful run (marking bad companies in Google Sheets), he prompted: "Review what you did and create the skill." The agent codified the workflow with real context, achieving reliable performance.
He warns against pre-made skills from marketplaces: they lack your workflow context and pose security risks. "I don't install skills... your agent needs the context of a successful run."
Iterative Refinement and Productivity Scaling
Skills aren't set-it-and-forget-it. Ras recursively improves them: on failure, diagnose, fix live, then update the skill file to embed the lesson. For his YouTube analytics generator, five iterations across eight data sources yielded flawless 10-minute execution.
"You have to walk with it step by step," mimicking employee training. Models predict tokens via vector similarity, not true reasoning—they mimic provided examples perfectly but flail without them. Common pitfall: jumping to skill creation sans successful run, leading to API errors or misfires.
Scaling advice rejects hype: start with one agent mastering core workflows (email, spreadsheets, research) before sub-agents. Ras built single-agent reliability first, then layered sub-agents for marketing/business/personal tasks. Tools like Paperclip dazzle but prioritize flash over productivity; build custom for true gains. "Scale for productivity, not scaling for what looks cool."
Host Greg Isenberg probes: treat agents as "very new employees" needing mentorship, not omniscient oracles. Ras agrees, positioning skill-crafters as future-proof against AI displacement: "Anyone who knows how to build agents... we're in for a good run."
"The permanent underclass"—those ignoring these tools—face obsolescence, but hands-on builders thrive as models remain token predictors, not thinkers.
Key Takeaways
- Ditch agent.md files for 95% of cases; they're token sinks loaded every turn—use only for constant proprietary info.
- Build skills via progressive disclosure: name + description in context, full file on-demand, saving thousands of tokens.
- Walk workflows step-by-step with the agent to a successful run before codifying as skill—provide mimicable context.
- Recursively refine: feed failures back, fix live, update skill to prevent repeats (e.g., 5 iterations for flawless analytics).
- Scale simply: one agent + skills first, add sub-agents later; prioritize productivity over multi-agent flash.
- Minimal context wins: models like Opus/GPT infer well—don't redundantly describe obvious elements like frameworks.
- Security first: avoid marketplace skills; build custom to embed your workflows and dodge attack vectors.
- Future-proof yourself: mastering agent skills > generic prompting; models mimic, humans design harnesses.
Notable quotes:
- Ras Mic: "95% of people don't need agent.md... it's added in the context every time you go back and forth."
- Ras Mic: "Skills are used in a way that's called progressive disclosure... the agent only gets the bunch of info when it realizes it needs this skill."
- Ras Mic: "The way I've been creating skills... I actually walk with it step by step... then I tell the AI, review what you did and create the skill."
- Ras Mic: "Scale for productivity, not scaling for what looks cool... it starts with one agent and you building up the skills."
- Greg Isenberg (echoing): "Treat models and these agents like very new employees versus like these black magic boxes."