Skills: Markdown Standard for Agentic AI Infrastructure

Anthropic's 'skills'—simple Markdown folders encoding methodologies—have evolved into agent-callable infrastructure, now standardized by Anthropic, OpenAI, and Microsoft for predictable AI workflows across tools like Claude, Copilot, and ChatGPT.

Skills as Organizational Infrastructure, Not Personal Prompts

Skills started as personal configurations in October—simple folders with a skill.markdown file containing metadata and plain-English instructions for LLMs. Today, they're enterprise-grade: version-controlled, sidebar-accessible in Claude, Copilot, Excel, and PowerPoint. Teams upload them organization-wide, shifting methodologies from individual heads to shared repos. A real estate firm, Texas Paintbrush, built 50 repos with 50,000+ lines covering rent rolls, comps analysis, cash flows, and handoffs—serving agents for automation and humans for onboarding context.

This substrate delivers persistent, accurate outcomes businesses need. Unlike one-off prompts, skills compound: refine them via feedback loops ("Update your skill file with X"), and they improve over time. Prompts remain basic blocks, but skills build the "castle"—specialized, reusable primitives.

"Skills compound for you. Skills compound by the weight of industry investment in the ecosystem and by the weight of your own commitment to having a predictable pattern."

Nate Jones emphasizes compounding during a discussion on why skills outperform repeated prompting after six months of iteration.

Shift to Agent-Callable, Not Human-Driven

Initially human-called (a few per conversation), skills now see most calls from agents—hundreds per run. Agents chain them predictably: specialist stacks decompose vague instructions into PRDs, GitHub issues, tests. Cursor agents invoke them seamlessly, offloading nuance from prompts.

Orchestrator skills analyze requests, spawning sub-agents for research, coding, UI, docs (documented on Reddit). Failures hurt more without human correction, so quantitatively test: run test suites, version, measure performance. Wording tweaks trigger latent model behaviors unpredictably—iterate 3-4x for aesthetics like PowerPoint formatting.

Cross-tool compatibility (Claude, ChatGPT, Copilot) creates ecosystem lock-in. Open-sourcing skills trades like baseball cards: signals talent for acqui-hires, accelerates community best practices discovery.

"Agents can make hundreds of skill calls over the course of a single run. We humans were calling maybe a few skills... The math just doesn't math for humans."

Nate Jones highlights the scale advantage of agent calling, explaining why skills must be agent-first.

Building Reliable Skills: Avoid Common Pitfalls

Core structure: Single-line description + methodology body. Bad descriptions are vague ("helps with competitive analysis")—they undertrigger. Good ones name artifacts ("analyze competitors"), triggers ("who are the players?"), outputs (markdown/Excel fields), and push aggressively per Anthropic guidance.

Gotcha: Descriptions must stay one line; formatters break Claude parsing.

Methodology needs:

  • Reasoning frameworks, not linear steps—for generalization.
  • Exact output formats (sections/fields).
  • Explicit edge cases—LLMs lack human common sense.
  • Examples for pattern-matching (in separate files).

Keep lean: 100-150 lines max in core file (80% effort on description for triggers, 20% on reasoning). Bloated folders waste context windows.

"A short skill that fires reliably is going to outperform a long skill with competing instructions."

Nate Jones on leanness, countering intuition to overload with details.

Agent-First Design: Contracts, Composability, Hardwiring

Agents as primary callers demand:

  • Routing descriptions matching agent goals.
  • Contract outputs like API SLAs—controllable fields, guarantees, limits.
  • Composability—outputs handoff cleanly to sub-agents (e.g., ticket workflows).

For determinism, pair with scripts: skills for general reasoning, scripts for hardwired steps. Humans + agents teams use skills as actionable context: agent-readable, human-legible markdown.

Three Tiers for Team Skills Adoption

High-performing teams tier skills:

  1. Standard: Org-wide (brand voice, templates)—admin-provisioned.
  2. Methodology: Team craft (client deliverables, senior practices)—extract from heads for new hires, alpha sharing across PM/eng/CS.
  3. Personal workflows: Day-to-day hacks—repo them for resilience (vacation/sick coverage).

Avoid siloed personal skills; systemic thinking encodes expertise at access levels.

"Methodology doesn't live in someone's mind anymore. It lives in a repository."

Nate Jones on Texas Paintbrush example, showing dual human/agent benefits.

Community-Driven Evolution and Next Steps

Anthropic/Microsoft partnership brings skills to Copilot; OpenAI adopts as open standard. Value flips: open-source agent skills as resumes. Missing: domain-specific packs (e.g., rent rolls)—speaker launching community repo for real-problem solvers, beyond generic GitHub starters.

"We're all learning together... making a lowly markdown file actually function as an agent callable context layer."

Nate Jones on collective discovery, contrasting known '90s software with emergent LLM best practices.

Key Takeaways

  • Craft pushy, single-line descriptions with triggers, artifacts, outputs to ensure reliable firing—80% effort here.
  • Embed reasoning frameworks, edge cases, exact formats, and examples; cap core file at 100-150 lines.
  • Test skills quantitatively with suites for agent reliability; iterate wording for latent behaviors.
  • Design agent-first: routing descriptions, contract outputs, composable handoffs; script for determinism.
  • Tier org skills: standards (org-wide), methodology (team craft), personal (repo'd workflows).
  • Open-source domain skills for community alpha, talent signaling; compound via iteration/ecosystem.
  • Leverage across tools (Claude, Copilot, ChatGPT) for specialist stacks/orchestrators in dev/ops.
  • Extract expertise from heads to repos—benefits agents, humans, onboarding.
Video description
My site: https://natebjones.com Full Story w/ Prompts: https://natesnewsletter.substack.com/p/your-ai-skills-fail-10-of-the-time?r=1z4sm5&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true ___________________ What's really happening inside the skills ecosystem when agents now call skills more often than humans do? The common story is that skills are just personal configuration files from October — but the reality is that skills have become organizational infrastructure, and most teams haven't updated their approach to match. In this video, I share the inside scoop on how to build agent-readable skills that actually compound: • Why the description field is where most skills go to die • How agent-first design changes handoffs and contracts • What three-tier skill architecture looks like for teams • Where community repositories fill the domain-specific gap Builders who keep treating skills as glorified prompts will miss the compounding advantage — the practitioners who version, test, and share skills are pulling ahead every week. Chapters 00:00 Skills launched in October, everything changed since 02:30 Four big trends reshaping the skills landscape 05:00 Skills compound, prompts evaporate 07:00 The specialist stack pattern in production 09:30 Real estate GP with 50,000 lines of skills 11:30 How to build a skill that actually works 14:00 The single-line description gotcha 16:00 Methodology body: reasoning over procedures 18:00 Agent-first skill design principles 20:30 Descriptions as routing signals, outputs as contracts 22:30 Three-tier skill architecture for teams 24:30 The community skills repository announcement 26:00 Skills are what persists Subscribe for daily AI strategy and news. For deeper playbooks and analysis: https://natesnewsletter.substack.com/ Listen to this video as a podcast. - Spotify: https://open.spotify.com/show/0gkFdjd1wptEKJKLu9LbZ4 - Apple Podcasts: https://podcasts.apple.com/us/podcast/ai-news-strategy-daily-with-nate-b-jones/id1877109372

Summarized by x-ai/grok-4.1-fast via openrouter

8032 input / 1833 output tokens in 11745ms

© 2026 Edge