AI Agents: Skills Beat MD Files for Token Efficiency

Modern models like Opus and GPT are excellent—focus on context via skills with progressive disclosure, built iteratively from real workflows, to avoid token waste and scale productivity.

Models Excel, But Context Separates Quality from Slop

Ras Mic asserts that current LLMs like Claude's Opus 4.6 and OpenAI's GPT 5.4 are "exceptionally good," shifting the battle from model choice to context engineering. "The models are good. The models are exceptionally good," he says, dismissing endless debates on which is superior for coding or UI. Instead, the differentiator is the "harness" around them: system prompts, files, tools, codebase, and conversation history stacking into a context window capped at ~250k tokens.

Every element loads cumulatively. Agent.md or Claude.md files—common for defining agent behavior—get injected on every turn, burning tokens relentlessly. Ras estimates a 1,000-line agent.md at 7,000 tokens per interaction. "95% of people don't need this," he claims, unless it's proprietary company methodology required constantly. For most, the model infers from the codebase or task; redundantly stating "this uses React" is pointless when the code is in context.

This leads to bloat: early conversations start at 20k tokens, ballooning over turns until agents "compact" the history, degrading output. Ras advocates minimalism: strip unnecessary context to steer models toward quality, not slop.

Skills Enable Progressive Disclosure and Token Savings

Skills revolutionize this via progressive disclosure: only the skill's name and short description (~53 tokens) load into context initially. The agent pulls the full instructions (name, description, detailed steps) only when relevant. A full agent.md equivalent might cost 944+ tokens per turn; skills defer that expense.

"I'm a skills maxi," Ras declares. He demos a skill structure:

Name: Notion Report Skill
Description: Generates structured Notion reports from data.
[Full instructions here—loaded on-demand]

This keeps context lean while granting access precisely when needed, saving "thousands of tokens per conversation."

Ras shares his sponsor email screening agent story. Initially, forwarding sponsor emails to an OpenClaw agent yielded all-positive verdicts—no rejections, shallow research. He walked it step-by-step: "Check Twitter, YouTube, Trustpilot, funding. Reject if two lack good standing." After corrections and a successful run (marking bad companies in Google Sheets), he prompted: "Review what you did and create the skill." The agent codified the workflow with real context, achieving reliable performance.

He warns against pre-made skills from marketplaces: they lack your workflow context and pose security risks. "I don't install skills... your agent needs the context of a successful run."

Iterative Refinement and Productivity Scaling

Skills aren't set-it-and-forget-it. Ras recursively improves them: on failure, diagnose, fix live, then update the skill file to embed the lesson. For his YouTube analytics generator, five iterations across eight data sources yielded flawless 10-minute execution.

"You have to walk with it step by step," mimicking employee training. Models predict tokens via vector similarity, not true reasoning—they mimic provided examples perfectly but flail without them. Common pitfall: jumping to skill creation sans successful run, leading to API errors or misfires.

Scaling advice rejects hype: start with one agent mastering core workflows (email, spreadsheets, research) before sub-agents. Ras built single-agent reliability first, then layered sub-agents for marketing/business/personal tasks. Tools like Paperclip dazzle but prioritize flash over productivity; build custom for true gains. "Scale for productivity, not scaling for what looks cool."

Host Greg Isenberg probes: treat agents as "very new employees" needing mentorship, not omniscient oracles. Ras agrees, positioning skill-crafters as future-proof against AI displacement: "Anyone who knows how to build agents... we're in for a good run."

"The permanent underclass"—those ignoring these tools—face obsolescence, but hands-on builders thrive as models remain token predictors, not thinkers.

Key Takeaways

  • Ditch agent.md files for 95% of cases; they're token sinks loaded every turn—use only for constant proprietary info.
  • Build skills via progressive disclosure: name + description in context, full file on-demand, saving thousands of tokens.
  • Walk workflows step-by-step with the agent to a successful run before codifying as skill—provide mimicable context.
  • Recursively refine: feed failures back, fix live, update skill to prevent repeats (e.g., 5 iterations for flawless analytics).
  • Scale simply: one agent + skills first, add sub-agents later; prioritize productivity over multi-agent flash.
  • Minimal context wins: models like Opus/GPT infer well—don't redundantly describe obvious elements like frameworks.
  • Security first: avoid marketplace skills; build custom to embed your workflows and dodge attack vectors.
  • Future-proof yourself: mastering agent skills > generic prompting; models mimic, humans design harnesses.

Notable quotes:

  • Ras Mic: "95% of people don't need agent.md... it's added in the context every time you go back and forth."
  • Ras Mic: "Skills are used in a way that's called progressive disclosure... the agent only gets the bunch of info when it realizes it needs this skill."
  • Ras Mic: "The way I've been creating skills... I actually walk with it step by step... then I tell the AI, review what you did and create the skill."
  • Ras Mic: "Scale for productivity, not scaling for what looks cool... it starts with one agent and you building up the skills."
  • Greg Isenberg (echoing): "Treat models and these agents like very new employees versus like these black magic boxes."
Video description
I sit down with Ras Mic to break down how AI agents actually work and why most people are using them wrong. Ras Mic explains the mechanics of context windows, makes the case that agent md files are largely unnecessary, and shares his step-by-step methodology for building custom skills that make agents dramatically more productive. Whether you're coding with Claude Code or automating workflows with OpenClaw, this episode gives you the foundational knowledge to stop wasting tokens and start getting real results from your AI tools. Timestamps 00:00 – Intro 00:42 – The Models Are Good Now 01:20 – How Context Windows Actually Work 04:55 – The Power of Skills 09:17 – How to create Skills 16:35 – Skill Maxxing 19:05 – What you need too build a project 20:40 – Recursively Building and Improving Skills 29:23 – Context Window Management and Token Efficiency 33:02 – Closing Thoughts Key Points * The models (Opus 4.6, GPT 5.4) are exceptionally good now — the differentiator is the context and harness you build around them. * Agent md and claude md files get loaded into context on every single turn, burning tokens and degrading performance as the context window fills up. 95% of users can skip them entirely. * Skills use progressive disclosure: only the name and description sit in context until the agent determines it needs the full file, saving thousands of tokens per conversation. * The best way to create a skill is to walk through the workflow with the agent step by step, achieve a successful run, and then have the agent write the skill based on that real context. * Recursively refine skills by feeding failures back into the agent and having it update the skill file so the same mistake is avoided going forward. * Scale for productivity by starting with one agent and building up workflows before adding sub-agents — start simple, then expand. Numbered Section Summaries 1. The Models Are Good — Context Is What Matters Ras Mic opens by declaring that the current generation of models, Opus 4.6 and GPT 5.4, are exceptionally capable. The conversation is no longer about which model is "better" in a general sense. What matters now is the quality of context you feed them — that is what separates quality output from slop. 2. How Context Windows Work Ras Mic walks through the anatomy of a context window: system prompt, agent.md files, skills, tools, the codebase, and the user conversation. All of these stack up as tokens, and the window has a hard limit (around 250,000 tokens). When you hit that limit, agents compact — and performance drops. Understanding this structure is the foundation for everything else in the episode. 3. Skills and Progressive Disclosure Skills solve the token-bloat problem. A skill file contains a name, description, and the detailed instructions — but only the name and description are loaded into context. The agent reads the full file only when it determines the skill is relevant. This means a skill costs roughly 53 tokens per turn versus 944+ for an equivalent agent.md file. 4. Building Skills the Right Way Ras Mic shares his methodology: identify a workflow, walk through it with the agent step by step, correct mistakes in real time, and only create the skill after you have completed a successful run. He illustrates this with his sponsor email screening agent — the first attempt returned all-positive results because the agent had no criteria for rejection. 5. Recursively Improving Skills Even after a skill is created, the agent will still hit edge cases and fail. Ras Mic treats each failure as an opportunity: identify the error, have the agent fix it, then tell the agent to update the skill so the failure is documented. After five iterations of this loop on his YouTube analytics report generator, the agent now executes flawlessly across eight data sources in about ten minutes. 6. Scaling for Productivity Over Flash Ras Mic started with a single agent handling everything — email, spreadsheets, research. Only after building reliable skills did he add sub-agents for marketing, business, and personal tasks. He argues that jumping straight to multi-agent architectures (or adopting tools like Paperclip without building foundational workflows first) optimizes for what looks cool rather than what is productive. The #1 tool to find startup ideas/trends - https://www.ideabrowser.com/ LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/ The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/ FIND ME ON SOCIAL X/Twitter: https://twitter.com/gregisenberg Instagram: https://instagram.com/gregisenberg/ LinkedIn: https://www.linkedin.com/in/gisenberg/ FIND MIC ON SOCIAL X/Twitter: https://x.com/Rasmic Youtube: https://www.youtube.com/@rasmic

Summarized by x-ai/grok-4.1-fast via openrouter

8673 input / 1858 output tokens in 19645ms

© 2026 Edge