Free Local LLMs for Coding: Ollama + OpenCode on Windows

Quick Local LLM Setup Cuts Cloud Dependency

Download Ollama directly from ollama.com/download and install it on Windows. This gives you a local server for running open LLMs without API fees or internet reliance, ideal for private coding sessions. Post-install, open Command Prompt (search 'cmd') to verify: ollama list shows available models—expect an empty list on first run. Use ollama ps anytime to monitor running models and their GPU/CPU usage, helping you track resource demands before scaling to larger models.

Launch Recommended Model for Coding

Run ollama run qwen3.5:9b to download and start Qwen 3.5-9B if absent (Ollama handles this automatically). The author favors this 9B-parameter model for its balance of speed and coding capability on consumer hardware, outperforming heavier options like Llama without needing high-end GPUs. Once loaded, it serves as the backend for tools like OpenCode, enabling autocomplete, refactoring, and debugging directly in your editor—pair it with OpenCode to unlock free, offline AI coding workflows.

Monitor and Access via App

Beyond CLI, launch the Ollama desktop app by searching 'ollama' or right-clicking its taskbar icon. This GUI simplifies model management, switching, and usage stats, making it easier for repeated sessions. Trade-off: Initial downloads take time and disk space (Qwen 3.5-9B is several GB), but runtime inference stays fast locally. This stack delivers production-ready AI coding without subscriptions, though expect quantization limits on very large models without tweaks.