Agent Skills: From Playbooks to Org Libraries

Skills as Portable AI Playbooks

Nufar Gaspar positions skills as the core primitive for the agent era: simple folders containing markdown instructions, scripts, and resources that give AI agents (or humans) actionable playbooks for tasks. Unlike locked custom GPTs, skills are human-readable, editable without engineering expertise, and portable across 44+ tools like Claude, Cursor, Windsurf, GitHub Copilot, and Notion. They operate in two modes—agents auto-discover and invoke them, or users trigger manually via slash commands or phrases like "research this topic."

"Skills are not just for agents to read... an agent can discover the skills... automatically and invoke them on its own or us humans can trigger them manually," Gaspar explains. This portability solves past silos, letting teams share and iterate freely. But Gaspar warns: third-party skills from marketplaces like OpenClaw can run malicious scripts, so vet sources like any software install.

Host NLW reinforces: treat downloaded skills as templates, not black boxes, enabling customization. Gaspar agrees, noting Claude's new skill creator tool interviews users, runs evals, and A/B tests to extract expertise automatically.

When and Why Build Custom Skills

Build skills for repetition (tasks done >3x), frustration from copy-pasted prompts, or inconsistent outputs. Gaspar pushes beyond fixes: use skills to standardize team behaviors or unlock bandwidth-intensive tasks like deep research. "Skills are not just a way for you to be more productive it's also a way for you to unlock opportunities of things that you always wanted to do," she says.

Prefer building over marketplaces early—navigation wastes time, and custom skills hone your craft. Reuse later, but adapt: full visibility lets you tweak unlike proprietary formats. One skill per task; split monolithic ones. NLW adds: skills as markdown templates accelerate personalization, like his upcoming personal context portfolio repo.

Anatomy of Skills That Deliver

Effective skills follow a rigid structure for reliability. Start with a loud trigger: explicit phrases (e.g., "prep for the meeting") ensure discovery—models skip subtle ones. The body is a playbook: bulleted/numbered steps, literal as possible. Balance prescription: rigid for fragile tasks (e.g., DB migrations), looser for creative ones (e.g., strategy docs) to avoid railroading.

Mandate output format with examples—tables with headers, doc structures—not descriptions. The gotchas section is highest-signal: preempt model pitfalls like "I know you want to do X but don't, here's why." Skip personas, obvious advice, token-wasters.

"The gotcha section... is probably the highest signal content in any skill because it's the area where you get the model to go out of its own patterns," Gaspar stresses. Keep under 500 lines; offload references/examples to folder files (e.g., examples.md). Bundle skill-specific context; link external for general/company files.

Killers: weak triggers (never picked), over-definition, no gotchas, monolithic blobs. Folder structure wins: main.md + contexts, examples, sub-skills.

Real-World Skill Examples

Gaspar demos a meeting prep skill: triggers on "prep," pulls calendar/email/stakeholder context (bundled or linked), steps include attendee ID, agenda analysis, scenario sims (e.g., hidden agendas, tough questions). Output: structured brief (exec summary, risks). Gotchas: no assumed seniority, no fabricated details, no generic points.

Four knowledge-worker templates included: Research with Confidence (source-specific, fact-checks, confidence scores); Devil's Advocate (stresses proposals, flags biases—yours and AI's—for constructive fixes); Morning Briefing (priorities, calendar, news, goals; auto-prompt to build yours); Board of Advisors (multi-archetype sims: VC, founder, etc., for decisions).

"Every person who does any type of research... should build or reuse Research with Confidence," Gaspar recommends. Nested sub-skills (e.g., meeting sims) and clean I/O enable composability.

Advanced Patterns for Power Users

Scale with dispatcher meta-skill: routes requests when >10-15 skills (handles nuance). Chain sequentially: research → devil's advocate → summary/deck. Ensure clean handoffs.

Loops for iteration: check-act-recheck (e.g., ad optimization: monitor ROAS, adjust bids, compete). Multi-agent orchestration: spin sub-agents explicitly (e.g., research skill does this).

Test rigorously: no post-output iteration needed for ready-to-use results. Eval like products—match stakes (CRM updates demand more). Re-test on model/tool changes. "If you find yourself having to iterate after... that means that your skill is not good enough," Gaspar asserts.

Scaling to Organizational Libraries

Organizations win big: standardize workflows, autonomous execution, bundled knowledge. Gaspar envisions skill libraries as knowledge management holy grail—pipe dream realized. From personal to team: share, iterate, enforce consistency.

"Organizations that are very AI forward already realize that skills are the future of how to streamline work," she says excitedly. Companion resources at play.brief.ai include anatomy templates, examples; Enterprise Claw cohort for agent teams.

NLW notes evolution: human elements persist, tech explodes—skills bridge.

Key Takeaways

Build skills for tasks repeated >3x or frustrating prompts; unlock new opportunities beyond fixes.
Nail triggers: loud, explicit phrases ensure auto-discovery.
Structure bodies as bulleted playbooks; balance prescription with creative freedom.
Always include gotchas and output examples—preempt failures, show don't tell.
Use folders: <500-line main.md + separate contexts/examples/sub-skills.
Test for zero-iteration outputs; re-eval on model changes.
Chain/dispatch/loop for scale: dispatcher at 10+ skills, clean I/O essential.
Start personal, scale to org libraries for standardization and autonomy.
Vet third-party skills like software; build first to learn, adapt templates.
Tools like Claude's skill creator accelerate: interviews, evals, benchmarks.