Claude Cookbook: 60+ Recipes for Agents, Tools, RAG

Agent SDK Patterns for Autonomous Multi-Agent Systems

Use Claude Agent SDK to ship research, SRE, and chief-of-staff agents in one-liners or full setups. Start with the one-liner research agent combining Claude Code SDK and WebSearch for autonomous querying. Scale to multi-agent hierarchies: chief-of-staff delegates via subagents, hooks, output styles, and plan mode; observability agent connects via MCP servers for GitHub monitoring and CI. For incident response, build SRE agents with read-write MCP tools for diagnosis, remediation, post-mortems. Migrate OpenAI Agents SDK apps by mapping primitives (tools, guardrails, sessions, handoffs) through expense-approval examples. Manage long sessions: instant memory compaction via background threading and prompt caching; build session browsers to list/read/rename/tag/fork without parsers. Trade-offs: SDK excels for persistent state but watch token costs in loops—use evaluator-optimizer patterns where one LLM critiques another's output for 20-30% accuracy gains over single-model chains.

Tool Use and Context Engineering for Low-Latency Agents

Programmatic tool calling (PTC) lets Claude write code to invoke tools in execution environments, slashing latency and tokens vs. standard calls. Scale to 1000s of tools with embedding-based semantic search for dynamic discovery. Handle context limits: automatic compaction compresses history; memory tools enable persistent recall with editing; compare strategies (memory, compaction, tool clearing) by cost—compaction cheapest for repetitive queries, memory for personalization. Parallel calls on 3.7 Sonnet via batch meta-pattern workaround; tool choice forces specific/auto selection. Crop tool boosts vision on charts/docs by zooming regions. Basic workflows: orchestrator-workers delegate to specialists; evaluator loops refine generations. Pydantic validates inputs for type-safe JSON extraction/customer service agents. Trade-offs: PTC/token savings shine in high-volume but add code exec overhead—test vs. native for <100ms needs.

RAG Pipelines and Knowledge Extraction Techniques

Build RAG from scratch: summary indexing/reranking for docs; contextual embeddings via prompt caching improve chunk accuracy 15-25%. Text-to-SQL chains natural queries to executable code with self-improvement loops. Knowledge graphs: Claude extracts entities/relations, dedups, enables multi-hop queries from unstructured text. Classification via RAG/CoT for tickets; summarization evals for legal docs. Batch API processes volumes asynchronously at 50% cost savings. Generate synthetic test data for prompt evals; tool evals run parallel independently. Haiku sub-agents extract from reports, Opus synthesizes. Trade-offs: RAG cuts hallucinations but embedding overhead—use Haiku for cheap retrieval, Opus for synthesis.

Multimodal, Skills, and Integrations for End-to-End Apps

Vision best practices: pass images for text/charts/slides analysis; tools extract nutrition labels or transcribe PDFs. Voice: ElevenLabs STT/TTS for <500ms assistants. Skills extend Claude: Excel/PowerPoint/PDF for financial dashboards; custom skills for org workflows. Integrations: Wolfram calculator; Deepgram audio transcription to interview Qs; LlamaIndex for ReAct/multi-doc agents, routers, sub-questions; Pinecone/MongoDB vector search; LangChain v1 RAG agents. Admin API tracks usage/costs. Extended thinking budgets transparent reasoning; speculative caching warms TTFT during typing. JSON mode via prompts; metaprompts beat blank-page syndrome; citations verify sources. Finetune Haiku on Bedrock for customs. Trade-offs: Multimodal tokens balloon on high-res—crop/downsample first; skills automate but lock to formats.