Skill Abstractions for Modular Capabilities

Skills are the core building blocks, modeled as self-describing, versioned modules analogous to OS syscalls. Each inherits from an abstract Skill base class requiring three methods: _define_metadata() for SkillMetadata (name, description, category, tags, dependencies, etc.), _define_schema() for OpenAI tool parameters (JSON schema), and execute(**kwargs) for implementation.

Metadata uses @dataclass with SkillCategory enum (DATA, REASONING, etc.) for categorization. Execution tracks stats like call count and latency. Skills convert to OpenAI tools via to_openai_tool().

Principle: Encapsulate logic with rich introspection—skills declare dependencies (requires_skills) and costs, enabling runtime validation and optimization.

Example: Calculator skill safely evaluates math expressions:

class CalculatorSkill(Skill):
    def _define_metadata(self):
        return SkillMetadata(
            name="calculator",
            description="Evaluate mathematical expressions...",
            category=SkillCategory.REASONING,
            tags=["math", "arithmetic"],
        )

    def _define_schema(self):
        return {
            "type": "object",
            "properties": {"expression": {"type": "string"}},
            "required": ["expression"]
        }

    def execute(self, expression: str) -> str:
        import math
        safe = {"__builtins__": {}, "sqrt": math.sqrt, ...}  # Sandboxed eval
        try:
            return f"Result: {eval(expression, safe)}"
        except Exception as ex:
            return f"Error: {ex}"

This prevents injection attacks via restricted globals. Output: Result: 1024 for '2**10'.

Common pitfall: Unrestricted eval—always sandbox. Quality criteria: Schema must match LLM expectations; metadata descriptions guide tool selection precisely.

Central Registry for Dynamic Discovery

SkillRegistry acts as a catalog: register skills by name, index by category/tags, list/filter, and expose as OpenAI tools. Supports hot-loading via SkillLoader.

registry = SkillRegistry()
registry.register(CalculatorSkill())
registry.register(TextSummarizerSkill())
# ...
console.print(registry.display())  # Rich table view

Registry methods: get_by_category(), to_openai_tools(names=None) filters tools dynamically. Principle: Decouple skill definition from invocation—LLM sees only relevant tools.

Hot-loading example:

loader = SkillLoader(registry)
loader.load(FactCheckerSkill)  # Registers instantly

Unload with loader.unload('name'). Enables runtime extensibility without restarts. Avoid overloading LLM with all tools—filter by context or query skill_introspector.

"Central catalog of all agent capabilities. Analogue: OS process/syscall table."

Implementing Specialized Skills

Extend for NLP/reasoning: TextSummarizerSkill uses LLM with mode-specific prompts (brief/standard/detailed). DataAnalystSkill ingests JSON/CSV, answers questions. CodeGeneratorSkill outputs commented Python.

JSON-structured outputs for parseability, e.g., FactCheckerSkill:

{"verdict":"true|false|uncertain","confidence":0.7,"explanation":"..."}

SentimentAnalyzerSkill adds emotion scores optionally. TranslationSkill controls formality. All leverage gpt-4o-mini for cost-efficiency.

Meta-skill: SkillIntrospectorSkill(registry) lists/describes skills:

  • Action list: Bullet list of skills.
  • describe: Full metadata/schema.

Principle: Self-awareness prevents hallucinated tool calls. Prompt LLM to use introspector when unsure: "Use skill_introspector if unsure which skill to pick."

Pitfall: Vague descriptions lead to wrong routing—be precise, e.g., "Assess factual accuracy... Returns verdict, confidence..."

Composite Skills and Orchestration

ResearchReportSkill(registry) composes sub-skills fractally:

  1. Summarize data (text_summarizer, detailed mode).
  2. Analyze quantitatively (data_analyst).
  3. Generate visualization code (code_generator, optional).
def execute(self, topic: str, data: str, include_code: bool = True) -> str:
    summary = self._registry.get("text_summarizer")(text=data, mode="detailed")
    analysis = self._registry.get("data_analyst")(data=data, question=f"Key insights about {topic}")
    # ...
    return markdown_report

Logs sub-calls for observability. Dependencies declared in metadata validate composition.

Agent Execution Loop with Tool Routing

SkillBasedAgent orchestrates via ReAct-like loop (up to max_iterations=6):

  1. System prompt lists principles and loaded skills.
  2. LLM gets tools from registry, calls via tool_choice="auto".
  3. Dispatch: registry.get(name)(**args), append tool result to messages.
  4. Repeat until finish_reason="stop" or max iterations.
def run(self, user_input: str) -> str:
    messages = [{"role": "system", "content": self.system_prompt}, {"role": "user", "content": user_input}]
    for i in range(self.max_iterations):
        resp = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
        # Handle tool_calls, dispatch, append results
    return final_answer

Verbose mode uses Rich panels/tables for traces. Synthesizes multi-tool outputs into coherent response.

Example workflow: User: "Summarize this sales data and check if growth claim is true."

  • Calls text_summarizer → summary.
  • data_analyst → insights.
  • fact_checker → verdict.
  • Final: Integrated report.

Principle: LLM routes dynamically—no hardcoded if/else. Trade-off: Token cost scales with iterations/tools; mitigate with targeted tools and cheap model.

"PRINCIPLES: 1. Use the most appropriate skill... 2. Chain multiple skills... 3. Use skill_introspector... 4. Synthesize..."

Pitfall: Infinite loops—cap iterations, clear tool results properly. Quality: Final answer must weave tool outputs, not dump raw.

Runtime Extensibility and Observability

SkillLoader mirrors package managers: load(skill_class, *args) instantiates/registers. Supports registry-dependent skills (e.g., pass registry to composites).

Stats via skill.stats: {"calls": 5, "avg_latency_ms": 120}. Display registry table shows usage at glance.

Dashboard-like: Rich tables for skills, iteration traces. Extend with LangSmith/Phoenix for production.

"Hot-loaded skill: research_report"—no restart needed.

Assumes: Python proficiency, OpenAI tool calling basics. Fits after simple function calling, before full agent frameworks like LangGraph.

Practice: Add WebSearchSkill (requires API), compose into MarketResearchSkill. Test chaining: math → plot code → sentiment on results.

"Each Skill is: self-describing · versioned · testable · composable."

Key Takeaways

  • Define skills with metadata/schema/execute for LLM compatibility and introspection.
  • Use SkillRegistry to index and expose tools dynamically—filter to avoid context overflow.
  • Implement agent loop: LLM reasons → tool call → dispatch → synthesize, max 6 iterations.
  • Compose skills hierarchically; declare dependencies for validation.
  • Hot-load via SkillLoader for extensibility; track stats for optimization.
  • Sandbox executions (e.g., safe eval); structure outputs as JSON for parsing.
  • Prompt with principles: appropriate skill, chain, introspect, synthesize.
  • Start with gpt-4o-mini for cost; upgrade for complex reasoning.
  • Add skill_introspector always—enables discovery without prompt bloat.
  • Observe via console traces; productionize with external logging.