Modular LLM Agent: Skills, Registry, Dynamic Routing

Skill Abstractions for Modular Capabilities

Skills are the core building blocks, modeled as self-describing, versioned modules analogous to OS syscalls. Each inherits from an abstract Skill base class requiring three methods: _define_metadata() for SkillMetadata (name, description, category, tags, dependencies, etc.), _define_schema() for OpenAI tool parameters (JSON schema), and execute(**kwargs) for implementation.

Metadata uses @dataclass with SkillCategory enum (DATA, REASONING, etc.) for categorization. Execution tracks stats like call count and latency. Skills convert to OpenAI tools via to_openai_tool().

Principle: Encapsulate logic with rich introspection—skills declare dependencies (requires_skills) and costs, enabling runtime validation and optimization.

Example: Calculator skill safely evaluates math expressions:

class CalculatorSkill(Skill):
    def _define_metadata(self):
        return SkillMetadata(
            name="calculator",
            description="Evaluate mathematical expressions...",
            category=SkillCategory.REASONING,
            tags=["math", "arithmetic"],
        )

    def _define_schema(self):
        return {
            "type": "object",
            "properties": {"expression": {"type": "string"}},
            "required": ["expression"]
        }

    def execute(self, expression: str) -> str:
        import math
        safe = {"__builtins__": {}, "sqrt": math.sqrt, ...}  # Sandboxed eval
        try:
            return f"Result: {eval(expression, safe)}"
        except Exception as ex:
            return f"Error: {ex}"

This prevents injection attacks via restricted globals. Output: Result: 1024 for '2**10'.

Common pitfall: Unrestricted eval—always sandbox. Quality criteria: Schema must match LLM expectations; metadata descriptions guide tool selection precisely.

Central Registry for Dynamic Discovery

SkillRegistry acts as a catalog: register skills by name, index by category/tags, list/filter, and expose as OpenAI tools. Supports hot-loading via SkillLoader.

registry = SkillRegistry()
registry.register(CalculatorSkill())
registry.register(TextSummarizerSkill())
# ...
console.print(registry.display())  # Rich table view

Registry methods: get_by_category(), to_openai_tools(names=None) filters tools dynamically. Principle: Decouple skill definition from invocation—LLM sees only relevant tools.

Hot-loading example:

loader = SkillLoader(registry)
loader.load(FactCheckerSkill)  # Registers instantly

Unload with loader.unload('name'). Enables runtime extensibility without restarts. Avoid overloading LLM with all tools—filter by context or query skill_introspector.

"Central catalog of all agent capabilities. Analogue: OS process/syscall table."

Implementing Specialized Skills

Extend for NLP/reasoning: TextSummarizerSkill uses LLM with mode-specific prompts (brief/standard/detailed). DataAnalystSkill ingests JSON/CSV, answers questions. CodeGeneratorSkill outputs commented Python.

JSON-structured outputs for parseability, e.g., FactCheckerSkill:

{"verdict":"true|false|uncertain","confidence":0.7,"explanation":"..."}

SentimentAnalyzerSkill adds emotion scores optionally. TranslationSkill controls formality. All leverage gpt-4o-mini for cost-efficiency.

Meta-skill: SkillIntrospectorSkill(registry) lists/describes skills:

Action list: Bullet list of skills.
describe: Full metadata/schema.

Principle: Self-awareness prevents hallucinated tool calls. Prompt LLM to use introspector when unsure: "Use skill_introspector if unsure which skill to pick."

Pitfall: Vague descriptions lead to wrong routing—be precise, e.g., "Assess factual accuracy... Returns verdict, confidence..."

Composite Skills and Orchestration

ResearchReportSkill(registry) composes sub-skills fractally:

Summarize data (text_summarizer, detailed mode).
Analyze quantitatively (data_analyst).
Generate visualization code (code_generator, optional).

def execute(self, topic: str, data: str, include_code: bool = True) -> str:
    summary = self._registry.get("text_summarizer")(text=data, mode="detailed")
    analysis = self._registry.get("data_analyst")(data=data, question=f"Key insights about {topic}")
    # ...
    return markdown_report

Logs sub-calls for observability. Dependencies declared in metadata validate composition.

Agent Execution Loop with Tool Routing

SkillBasedAgent orchestrates via ReAct-like loop (up to max_iterations=6):

System prompt lists principles and loaded skills.
LLM gets tools from registry, calls via tool_choice="auto".
Dispatch: registry.get(name)(**args), append tool result to messages.
Repeat until finish_reason="stop" or max iterations.

def run(self, user_input: str) -> str:
    messages = [{"role": "system", "content": self.system_prompt}, {"role": "user", "content": user_input}]
    for i in range(self.max_iterations):
        resp = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
        # Handle tool_calls, dispatch, append results
    return final_answer

Verbose mode uses Rich panels/tables for traces. Synthesizes multi-tool outputs into coherent response.

Example workflow: User: "Summarize this sales data and check if growth claim is true."

Calls text_summarizer → summary.
data_analyst → insights.
fact_checker → verdict.
Final: Integrated report.

Principle: LLM routes dynamically—no hardcoded if/else. Trade-off: Token cost scales with iterations/tools; mitigate with targeted tools and cheap model.

"PRINCIPLES: 1. Use the most appropriate skill... 2. Chain multiple skills... 3. Use skill_introspector... 4. Synthesize..."

Pitfall: Infinite loops—cap iterations, clear tool results properly. Quality: Final answer must weave tool outputs, not dump raw.

Runtime Extensibility and Observability

SkillLoader mirrors package managers: load(skill_class, *args) instantiates/registers. Supports registry-dependent skills (e.g., pass registry to composites).

Stats via skill.stats: {"calls": 5, "avg_latency_ms": 120}. Display registry table shows usage at glance.

Dashboard-like: Rich tables for skills, iteration traces. Extend with LangSmith/Phoenix for production.

"Hot-loaded skill: research_report"—no restart needed.

Assumes: Python proficiency, OpenAI tool calling basics. Fits after simple function calling, before full agent frameworks like LangGraph.

Practice: Add WebSearchSkill (requires API), compose into MarketResearchSkill. Test chaining: math → plot code → sentiment on results.

"Each Skill is: self-describing · versioned · testable · composable."

Key Takeaways

Define skills with metadata/schema/execute for LLM compatibility and introspection.
Use SkillRegistry to index and expose tools dynamically—filter to avoid context overflow.
Implement agent loop: LLM reasons → tool call → dispatch → synthesize, max 6 iterations.
Compose skills hierarchically; declare dependencies for validation.
Hot-load via SkillLoader for extensibility; track stats for optimization.
Sandbox executions (e.g., safe eval); structure outputs as JSON for parsing.
Prompt with principles: appropriate skill, chain, introspect, synthesize.
Start with gpt-4o-mini for cost; upgrade for complex reasoning.
Add skill_introspector always—enables discovery without prompt bloat.
Observe via console traces; productionize with external logging.

Skill Abstractions for Modular Capabilities

Central Registry for Dynamic Discovery

Implementing Specialized Skills

Composite Skills and Orchestration

Agent Execution Loop with Tool Routing

Runtime Extensibility and Observability

Key Takeaways

More from AI & LLMs

AI Agents Blur Vibe Coding into Pro Engineering

Customize VS Code Copilot Agents for Repeatable Workflows

MCP Apps: Interactive Branded UI in AI Chats

Bulletproof Taste: Rejections Beat AI Gingerbread