Free Claude Code Proxy: Claude Workflow on Free/Local Models
Route Claude Code requests through a local proxy to free backends like NVIDIA NIM (40 req/min) or local Ollama, preserving the CLI/VS Code workflow without Anthropic API costs—setup via env vars and config file.
Drop-In Proxy Replaces Anthropic API Without Changing Claude Code
Set Claude Code's ANTHROPIC_BASE_URL to http://localhost:8082 (or your proxy port) and optionally an ANTHROPIC_API_KEY for auth. Install via uv tool install from GitHub or clone repo (14k+ stars). Edit .env with provider details: prefix model names (e.g., nvidia/nvidia--nim--qwen2-5-coder-7b-instruct), add API keys for cloud options. Run proxy with uv run uvicorn server:app --host 0.0.0.0 --port 8082. Claude Code now forwards requests to your backend, streaming responses back seamlessly. Handles trivial calls locally (quota probes, titles), adds rate limiting (rolling window throttling, 429 backoff, concurrency caps) to avoid free-tier limits.
This preserves agentic coding (refactors, debugging, long sessions) in terminal, VS Code extension (add env vars in settings, reload), or IntelliJ (edit JetBrains AI config). Use claude-pick for interactive model selection at launch, avoiding repeated .env edits.
Provider Mapping Maximizes Cost Control and Flexibility
Map Claude tiers (Opus/Sonnet/Haiku) to optimized backends: route heavy tasks to strong models, light ones to fast/free. Free cloud: NVIDIA NIM (40 req/min, easiest), OpenRouter (many free models, prefix openrouter/), DeepSeek (affordable coding models, prefix deepseek/). Local: LM Studio (lmstudio/ prefix, run app first), Ollama (ollama/), llama.cpp (GGUF models). Mix providers per tier for hybrid setups—e.g., Opus to NIM, Haiku to local—balancing speed, cost, privacy. No per-token fees locally; NIM/OpenRouter handle free quotas without Anthropic bills.
Bot Integrations Enable Remote Agentic Coding
Run sessions via Discord/Telegram bots: send tasks, stream thinking tokens/tool calls/results, fork threads by replying, persist across restarts. Voice notes transcribe via local Whisper (Hugging Face) or NIM gRPC—dictate prompts from phone for hands-off execution in configured workspaces. Restrict to allowed channels/user IDs; limit directories since it executes code.
Model Quality and Security Define Real-World Limits
Proxy doesn't upgrade weak models—expect Claude-like tool calling/streaming only from capable backends (strong NIM/OpenRouter > local unless top hardware/models). Test iteratively; blame backend if tools fail. Secure exposed servers: always set ANTHROPIC_API_KEY token, avoid public 0.0.0.0 without controls. Trade-offs: free tiers rate-limited (proxy smooths), local needs GPU/RAM, no 'real Claude' magic—just flexible interface to better economics for hobbyists/students.