6 Projects to Go from AI User to Builder in 2026

Use Skills and RAG for Efficient Context Handling

Start with Skills, the highest-leverage project: create a folder with a skills.md file containing YAML metadata (name and description fields only) followed by markdown instructions. Claude reads just the description first to check relevance via progressive disclosure—loading full instructions and referenced files only if needed—avoiding context window bloat even with 50 skills. To build one, pick a weekly task like status updates, prompt Claude Coder or Anti-Gravity to generate it from plain English instructions. This automates repetitive context explanation without engineering.

Next, implement RAG to ground LLMs in your data: split documents into chunks (a few paragraphs), embed via an embedding model into vectors where semantic similarity clusters concepts (e.g., "hypertension" near "high blood pressure" despite no shared words), store in a vector index. For queries, embed the question, retrieve top 5-10 matches, and feed to LLM for grounded generation. Unlike NotebookLM (a destination tool), RAG is a reusable component for agents or apps. Use it to make proprietary data queryable, as base models lack your specifics.

These two deliver quick wins: Skills for agent instructions, RAG for data retrieval, forming the base for production AI.

Expose Tools via MCP and Wire Voice Agents

Build an MCP (Model Context Protocol) server to universalize access: mark Python functions (e.g., your RAG retriever) with fastMCP SDK, which handles plumbing so any MCP-compatible client (Claude Desktop, Cursor, Gemini) calls it. MCP, released by Anthropic in late 2024, saw 970x SDK downloads in 18 months, was donated to Linux Foundation in Dec 2025, and is now standard across ChatGPT, Cursor, Gemini. Transform scripts into shareable infrastructure—wrap RAG in ~few lines, enabling team-wide or agent use.

Layer voice agents on top using Gemini 3.1 Flash Live API (launched March 2026): processes raw audio natively (90+ languages, barge-in interrupts, 90%+ multi-step tool calling from audio), slashing latency from 2-3s (old VAD/STT/LLM/TTS stack) to under 1s round trips. Speak a query, Gemini calls your MCP/RAG server as a tool, responds aloud—e.g., query company docs while driving. This stacks projects 2-3 for real-time, private voice search impossible two years ago.

Run Local Models and Fine-Tune for Control

Run models locally for privacy/offline/zero-cost: combine open-weights models (Gemma 4: 2B/4B/26B/31B params; smaller on 8GB laptop RAM), 4-bit quantization (3x memory reduction, tiny quality loss), and Ollama runtime (Docker-like: one command pulls/runs, exposes API). Point Ollama Gemma at your RAG/MCP for local querying, trading some speed/quality for no per-token costs.

Fine-tune only for behavior shaping (not knowledge addition): use LoRA (low-rank adaptation) to train a <1% parameter adapter on a frozen base model, customizing voice/jargon (e.g., legal/medical). Skip unless hitting walls—master first five for 90% needs; deeper than others.

Pick 1-2 scariest/closest-to-job projects; building end-to-end proves value over prompting.