Gemini CLI: Context to CI/CD for Production AI Agents

Context Engineering Unlocks Agent Autonomy

The core challenge in AI-assisted coding is giving the model enough structured knowledge to build complex systems like Google's Agent Development Kit (ADK) agents without hallucinations or incomplete outputs. Annie Wang and Ayo Adedeji demonstrate this in their Shadowblade game agent project, starting from the 'agent-vs-developer' repo with starter files (Dockerfile, MCP server stubs, GitHub data).

They begin by analyzing the codebase: gemini CLI invocation reads the entire folder using built-in read_file and read_folder tools, delegating to an 'investigator agent' for multi-agent summarization. This reveals the repo's focus on a multi-agent game system centered on Shadowblade, an LLM-powered combat agent using Google's generative AI and ADK.

Key decision: Download a blueprint 'agent.design.md' via natural language (Download this Shadowblade agent design MD file and store it locally). This provides precise ADK specs—root agent type, model (Gemini), persona instructions, tool imports—without requiring manual curl or git clone. Tradeoff: Local files act as short-term memory (session-specific, read on-demand), avoiding persistent bloat but requiring explicit invocation.

"This is the power of context engineering because essentially now you don't know what is ADK how to create ADK agent but you're giving it correct context and right instructions so that AI can create ADK agent for you" – Annie Wang, emphasizing how targeted docs enable zero-knowledge agent generation.

Next, they create a project-level gemini.md with Python best practices (docstrings, type hints, modular structure). Created via shell (cat > gemini.md << EOF), it's long-term memory: auto-loaded on every gemini session in the folder. View with memory show; add via memory add. Why project-level over user-level (~/.gemini/gemini.md)? Project isolation prevents cross-contamination in multi-project workflows.

Tradeoffs surfaced: Long-term memory (gemini.md) ensures consistency across sessions but risks token limits if overfilled with specifics. Short-term (local docs, chat history) is flexible but forgets on restart. They reject always-on globals for non-general context, opting for layered approach.

Agent Skills Deliver On-Demand Expertise

To avoid bloating context windows, they introduce skills via skill.md files—dynamic, conditional prompts loaded only when relevant. Stored in ~/.gemini/skills/, structured as YAML-like: name (e.g., 'adk-agent-design'), description (triggers), content (principles, architecture, tools, testing).

For ADK, the skill covers agent persona, tool design (e.g., combat logic), hooks for control, eval strategies. Invocation: CLI auto-matches description to query (e.g., 'design ADK agent'). Created via shell templating, mirroring gemini.md but namespaced.

"Agent skills are like on-demand expertise... You don't need a plumber all the time, but when your sink leaks, you call one" – Ayo Adedeji, contrasting persistent gemini.md with efficient, token-saving skills.

Decision chain: Evaluated gemini.md (always-loaded, general) vs. local files (manual read) vs. skills (auto-triggered, specific). Skills win for ADK blueprints—laser-focused, no performance degradation. Result: Gemini CLI generates functional Shadowblade agent code solely from context + memory, filling starter stubs (a2a_server.py, etc.).

Guardrails and Testing Ensure Reliability

Raw generation risks drift, so they layer hooks—custom callbacks in ADK to intercept agent behavior (e.g., validate tool calls, enforce protocols). Gemini CLI writes these using skill context, embedding in agent logic.

Testing suite: Full evals with trajectory analysis (step-by-step traces), response comparisons. ADK evals framework auto-generates test cases from specs. Why? "Shipping blind is not an option" – video description. Tradeoff: Adds dev time upfront but catches 100% of edge cases autonomously.

"Every time we end our session... Gemini is not able to remember your guidance... By saving those in memory in Gemini file, Gemini always know this guidance" – Annie Wang, on why evals + persistent context beat one-shot prompts.

CI/CD Pipeline Automates Production

Final push: Gemini CLI scripts full pipeline—Cloud Build for CI (lint, test, build Docker image), deploy to Cloud Run. Hooks integrate for runtime controls. From vibe (Build and deploy via CI/CD): Generates Cloud Build config, Dockerfile tweaks, triggers via gcloud.

Before: Manual dev in cloned repo. After: Autonomous end-to-end—context → agent code → tests → deploy. 'Boss fight' validates on Cloud Run. Metrics absent, but implies zero manual code; full pipeline in one session.

Tradeoffs: Relies on Google ecosystem (Gemini API, Cloud Build, ADK); portability low. Wins: Scales to production multi-agent systems without eng team.

Key Takeaways

Layer contexts hierarchically: gemini.md (long-term, general), skills.md (on-demand, specific), local files (short-term, explicit).
Trigger skills with precise descriptions to auto-load expertise without token waste—ideal for frameworks like ADK.
Always pair generation with hooks + evals: Use ADK trajectory analysis for reliable agent behavior.
Vibe code CI/CD: Natural language prompts generate Cloud Build + Cloud Run deploys from starters.
Start sessions with analyze entire project for accurate repo awareness via multi-agent tooling.
Project-level gemini.md over global: Isolates instructions, verifiable via memory show.
Download blueprints naturally (store this file locally)—no CLI memorization needed.
Balance memory types: Short-term for one-offs, long-term for cross-session consistency.

"When designing an ADK agent follow these principles..." – Excerpt from adk-agent-design skill, blueprint for scalable agent arch (persona, tools, testing).