Proxy Claude Code to Free/Local LLMs via Free Claude Code

Drop-in Proxy Setup Saves Anthropic Costs

Set up Free Claude Code (14k+ GitHub stars) as a local proxy to intercept Claude Code's Anthropic API requests and forward them to free or local providers. Install via UV (requires Python 3.14): uv tool install git+https://github.com/... or clone repo, then fco-init for config. Edit .env with provider API key and model (e.g., NVIDIA_NIM_API_KEY=... and OPUS_MODEL=nvidia/nim/meta-llama-3.1-70b-instruct), start server with uv run uvicorn server:app --host 0.0.0.0 --port 8082. Point Claude Code to proxy via env vars: ANTHROPIC_BASE_URL=http://localhost:8082/v1 and optional ANTHROPIC_API_KEY=dummy. This enables unlimited Claude Code CLI sessions using NVIDIA NIM's 40 requests/min free tier, OpenRouter free models, cheap DeepSeek, or local runs—no Anthropic key needed. Use claude-pick for interactive model selection at launch, avoiding config edits.

Proxy adds request optimization (intercepts trivial calls like quota probes locally), smart rate limiting (rolling window throttling, 429 backoff, concurrency caps), and streaming support for thinking tokens/tools, smoothing free provider limits during agentic tasks like refactors/debugging.

Flexible Model Mapping Controls Cost/Speed

Map Claude families (Opus/Sonnet/Haiku) to provider-specific models for hybrid setups: route 'opus' to strong NVIDIA NIM/meta-llama-3.1-70b, 'sonnet' to OpenRouter free tier, 'haiku' to fast local Ollama/llama.cpp GGUF. Prefix models (e.g., lmstudio://qwen2.5-coder-32b, ollama://deepseek-coder-v2) mix providers seamlessly. Local options (LM Studio, Ollama, llama.cpp) eliminate token bills/privacy risks but demand strong hardware for speed/quality; cloud free tiers like NIM excel for ease, OpenRouter for variety.

IDE/Bot Integrations Unlock Remote Workflows

In VS Code extension, add proxy env vars in settings.json, reload—bypasses login/credits prompts. IntelliJ: edit JetBrains AIP agent config similarly. For remote: Discord/Telegram bots run sessions in configured workspaces with tree-threading (reply to fork), persistence, voice notes (Whisper/Hugging Face or NIM gRPC transcription to prompts). Monitor live progress from phone, manage concurrent tasks—ideal for on-the-go coding kicks-offs. Restrict via allowed channels/user IDs.

Trade-offs: Backend Quality Drives Results

Proxy preserves Claude Code UX (CLI, agents, tools) but inherits backend limits—weak local models fail tool calls; test strong coders like DeepSeek/GLM. Secure with ANTHROPIC_API_KEY for network exposure (default: none). Not 'free Claude'—it's workflow choice for students/hobbyists avoiding bills, trading polish for flexibility/cost control.