Skill Abstractions for Modular Capabilities
Skills are the core building blocks, modeled as self-describing, versioned modules analogous to OS syscalls. Each inherits from an abstract Skill base class requiring three methods: _define_metadata() for SkillMetadata (name, description, category, tags, dependencies, etc.), _define_schema() for OpenAI tool parameters (JSON schema), and execute(**kwargs) for implementation.
Metadata uses @dataclass with SkillCategory enum (DATA, REASONING, etc.) for categorization. Execution tracks stats like call count and latency. Skills convert to OpenAI tools via to_openai_tool().
Principle: Encapsulate logic with rich introspection—skills declare dependencies (requires_skills) and costs, enabling runtime validation and optimization.
Example: Calculator skill safely evaluates math expressions:
class CalculatorSkill(Skill):
def _define_metadata(self):
return SkillMetadata(
name="calculator",
description="Evaluate mathematical expressions...",
category=SkillCategory.REASONING,
tags=["math", "arithmetic"],
)
def _define_schema(self):
return {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
def execute(self, expression: str) -> str:
import math
safe = {"__builtins__": {}, "sqrt": math.sqrt, ...} # Sandboxed eval
try:
return f"Result: {eval(expression, safe)}"
except Exception as ex:
return f"Error: {ex}"
This prevents injection attacks via restricted globals. Output: Result: 1024 for '2**10'.
Common pitfall: Unrestricted eval—always sandbox. Quality criteria: Schema must match LLM expectations; metadata descriptions guide tool selection precisely.
Central Registry for Dynamic Discovery
SkillRegistry acts as a catalog: register skills by name, index by category/tags, list/filter, and expose as OpenAI tools. Supports hot-loading via SkillLoader.
registry = SkillRegistry()
registry.register(CalculatorSkill())
registry.register(TextSummarizerSkill())
# ...
console.print(registry.display()) # Rich table view
Registry methods: get_by_category(), to_openai_tools(names=None) filters tools dynamically. Principle: Decouple skill definition from invocation—LLM sees only relevant tools.
Hot-loading example:
loader = SkillLoader(registry)
loader.load(FactCheckerSkill) # Registers instantly
Unload with loader.unload('name'). Enables runtime extensibility without restarts. Avoid overloading LLM with all tools—filter by context or query skill_introspector.
"Central catalog of all agent capabilities. Analogue: OS process/syscall table."
Implementing Specialized Skills
Extend for NLP/reasoning: TextSummarizerSkill uses LLM with mode-specific prompts (brief/standard/detailed). DataAnalystSkill ingests JSON/CSV, answers questions. CodeGeneratorSkill outputs commented Python.
JSON-structured outputs for parseability, e.g., FactCheckerSkill:
{"verdict":"true|false|uncertain","confidence":0.7,"explanation":"..."}
SentimentAnalyzerSkill adds emotion scores optionally. TranslationSkill controls formality. All leverage gpt-4o-mini for cost-efficiency.
Meta-skill: SkillIntrospectorSkill(registry) lists/describes skills:
- Action
list: Bullet list of skills. describe: Full metadata/schema.
Principle: Self-awareness prevents hallucinated tool calls. Prompt LLM to use introspector when unsure: "Use skill_introspector if unsure which skill to pick."
Pitfall: Vague descriptions lead to wrong routing—be precise, e.g., "Assess factual accuracy... Returns verdict, confidence..."
Composite Skills and Orchestration
ResearchReportSkill(registry) composes sub-skills fractally:
- Summarize data (
text_summarizer, detailed mode). - Analyze quantitatively (
data_analyst). - Generate visualization code (
code_generator, optional).
def execute(self, topic: str, data: str, include_code: bool = True) -> str:
summary = self._registry.get("text_summarizer")(text=data, mode="detailed")
analysis = self._registry.get("data_analyst")(data=data, question=f"Key insights about {topic}")
# ...
return markdown_report
Logs sub-calls for observability. Dependencies declared in metadata validate composition.
Agent Execution Loop with Tool Routing
SkillBasedAgent orchestrates via ReAct-like loop (up to max_iterations=6):
- System prompt lists principles and loaded skills.
- LLM gets tools from registry, calls via
tool_choice="auto". - Dispatch:
registry.get(name)(**args), append tool result to messages. - Repeat until
finish_reason="stop"or max iterations.
def run(self, user_input: str) -> str:
messages = [{"role": "system", "content": self.system_prompt}, {"role": "user", "content": user_input}]
for i in range(self.max_iterations):
resp = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
# Handle tool_calls, dispatch, append results
return final_answer
Verbose mode uses Rich panels/tables for traces. Synthesizes multi-tool outputs into coherent response.
Example workflow: User: "Summarize this sales data and check if growth claim is true."
- Calls
text_summarizer→ summary. data_analyst→ insights.fact_checker→ verdict.- Final: Integrated report.
Principle: LLM routes dynamically—no hardcoded if/else. Trade-off: Token cost scales with iterations/tools; mitigate with targeted tools and cheap model.
"PRINCIPLES: 1. Use the most appropriate skill... 2. Chain multiple skills... 3. Use skill_introspector... 4. Synthesize..."
Pitfall: Infinite loops—cap iterations, clear tool results properly. Quality: Final answer must weave tool outputs, not dump raw.
Runtime Extensibility and Observability
SkillLoader mirrors package managers: load(skill_class, *args) instantiates/registers. Supports registry-dependent skills (e.g., pass registry to composites).
Stats via skill.stats: {"calls": 5, "avg_latency_ms": 120}. Display registry table shows usage at glance.
Dashboard-like: Rich tables for skills, iteration traces. Extend with LangSmith/Phoenix for production.
"Hot-loaded skill: research_report"—no restart needed.
Assumes: Python proficiency, OpenAI tool calling basics. Fits after simple function calling, before full agent frameworks like LangGraph.
Practice: Add WebSearchSkill (requires API), compose into MarketResearchSkill. Test chaining: math → plot code → sentiment on results.
"Each Skill is: self-describing · versioned · testable · composable."
Key Takeaways
- Define skills with metadata/schema/execute for LLM compatibility and introspection.
- Use
SkillRegistryto index and expose tools dynamically—filter to avoid context overflow. - Implement agent loop: LLM reasons → tool call → dispatch → synthesize, max 6 iterations.
- Compose skills hierarchically; declare dependencies for validation.
- Hot-load via
SkillLoaderfor extensibility; track stats for optimization. - Sandbox executions (e.g., safe eval); structure outputs as JSON for parsing.
- Prompt with principles: appropriate skill, chain, introspect, synthesize.
- Start with
gpt-4o-minifor cost; upgrade for complex reasoning. - Add
skill_introspectoralways—enables discovery without prompt bloat. - Observe via console traces; productionize with external logging.