Skills: Markdown Standard for Agentic AI Infrastructure
Anthropic's 'skills'—simple Markdown folders encoding methodologies—have evolved into agent-callable infrastructure, now standardized by Anthropic, OpenAI, and Microsoft for predictable AI workflows across tools like Claude, Copilot, and ChatGPT.
Skills as Organizational Infrastructure, Not Personal Prompts
Skills started as personal configurations in October—simple folders with a skill.markdown file containing metadata and plain-English instructions for LLMs. Today, they're enterprise-grade: version-controlled, sidebar-accessible in Claude, Copilot, Excel, and PowerPoint. Teams upload them organization-wide, shifting methodologies from individual heads to shared repos. A real estate firm, Texas Paintbrush, built 50 repos with 50,000+ lines covering rent rolls, comps analysis, cash flows, and handoffs—serving agents for automation and humans for onboarding context.
This substrate delivers persistent, accurate outcomes businesses need. Unlike one-off prompts, skills compound: refine them via feedback loops ("Update your skill file with X"), and they improve over time. Prompts remain basic blocks, but skills build the "castle"—specialized, reusable primitives.
"Skills compound for you. Skills compound by the weight of industry investment in the ecosystem and by the weight of your own commitment to having a predictable pattern."
Nate Jones emphasizes compounding during a discussion on why skills outperform repeated prompting after six months of iteration.
Shift to Agent-Callable, Not Human-Driven
Initially human-called (a few per conversation), skills now see most calls from agents—hundreds per run. Agents chain them predictably: specialist stacks decompose vague instructions into PRDs, GitHub issues, tests. Cursor agents invoke them seamlessly, offloading nuance from prompts.
Orchestrator skills analyze requests, spawning sub-agents for research, coding, UI, docs (documented on Reddit). Failures hurt more without human correction, so quantitatively test: run test suites, version, measure performance. Wording tweaks trigger latent model behaviors unpredictably—iterate 3-4x for aesthetics like PowerPoint formatting.
Cross-tool compatibility (Claude, ChatGPT, Copilot) creates ecosystem lock-in. Open-sourcing skills trades like baseball cards: signals talent for acqui-hires, accelerates community best practices discovery.
"Agents can make hundreds of skill calls over the course of a single run. We humans were calling maybe a few skills... The math just doesn't math for humans."
Nate Jones highlights the scale advantage of agent calling, explaining why skills must be agent-first.
Building Reliable Skills: Avoid Common Pitfalls
Core structure: Single-line description + methodology body. Bad descriptions are vague ("helps with competitive analysis")—they undertrigger. Good ones name artifacts ("analyze competitors"), triggers ("who are the players?"), outputs (markdown/Excel fields), and push aggressively per Anthropic guidance.
Gotcha: Descriptions must stay one line; formatters break Claude parsing.
Methodology needs:
- Reasoning frameworks, not linear steps—for generalization.
- Exact output formats (sections/fields).
- Explicit edge cases—LLMs lack human common sense.
- Examples for pattern-matching (in separate files).
Keep lean: 100-150 lines max in core file (80% effort on description for triggers, 20% on reasoning). Bloated folders waste context windows.
"A short skill that fires reliably is going to outperform a long skill with competing instructions."
Nate Jones on leanness, countering intuition to overload with details.
Agent-First Design: Contracts, Composability, Hardwiring
Agents as primary callers demand:
- Routing descriptions matching agent goals.
- Contract outputs like API SLAs—controllable fields, guarantees, limits.
- Composability—outputs handoff cleanly to sub-agents (e.g., ticket workflows).
For determinism, pair with scripts: skills for general reasoning, scripts for hardwired steps. Humans + agents teams use skills as actionable context: agent-readable, human-legible markdown.
Three Tiers for Team Skills Adoption
High-performing teams tier skills:
- Standard: Org-wide (brand voice, templates)—admin-provisioned.
- Methodology: Team craft (client deliverables, senior practices)—extract from heads for new hires, alpha sharing across PM/eng/CS.
- Personal workflows: Day-to-day hacks—repo them for resilience (vacation/sick coverage).
Avoid siloed personal skills; systemic thinking encodes expertise at access levels.
"Methodology doesn't live in someone's mind anymore. It lives in a repository."
Nate Jones on Texas Paintbrush example, showing dual human/agent benefits.
Community-Driven Evolution and Next Steps
Anthropic/Microsoft partnership brings skills to Copilot; OpenAI adopts as open standard. Value flips: open-source agent skills as resumes. Missing: domain-specific packs (e.g., rent rolls)—speaker launching community repo for real-problem solvers, beyond generic GitHub starters.
"We're all learning together... making a lowly markdown file actually function as an agent callable context layer."
Nate Jones on collective discovery, contrasting known '90s software with emergent LLM best practices.
Key Takeaways
- Craft pushy, single-line descriptions with triggers, artifacts, outputs to ensure reliable firing—80% effort here.
- Embed reasoning frameworks, edge cases, exact formats, and examples; cap core file at 100-150 lines.
- Test skills quantitatively with suites for agent reliability; iterate wording for latent behaviors.
- Design agent-first: routing descriptions, contract outputs, composable handoffs; script for determinism.
- Tier org skills: standards (org-wide), methodology (team craft), personal (repo'd workflows).
- Open-source domain skills for community alpha, talent signaling; compound via iteration/ecosystem.
- Leverage across tools (Claude, Copilot, ChatGPT) for specialist stacks/orchestrators in dev/ops.
- Extract expertise from heads to repos—benefits agents, humans, onboarding.