Uncensored SuperGemma-4: Local Agent Power on Any Hardware

SuperGemma-4 uncensors Gemma 4 26B for coding, tool-use, and agents. MLX 4-bit runs at 46.2 t/s on Apple Silicon (24GB+ RAM min); GGUF Q4_K_M (16.8GB) for llama.cpp. Pairs with Hermes Agent or OpenClaw via OpenAI-compatible servers.

Build Practical Local Agents with Uncensored Fine-Tune

SuperGemma-4 refines Google's Gemma 4 26B (A4B instruction-tuned) into an uncensored model optimized for text-only tasks like coding, planning, tool-use, browser automation, and logic—avoiding refusals that plague base models. Retains native 256K context, system prompts, function calling, and MoE architecture (3.8B active params of 25B total), making it agent-ready without forcing behaviors. Creator's benchmarks show QuickBench overall 95.8 (vs. 91.4 baseline), 46.2 tokens/second generation (vs. 42.5), plus gains in code, logic, Korean, and browser tasks. Use it where stock Gemma feels restricted but its architecture shines, delivering speed and utility without chaotic role-play drift.

Trade-off: Text-only (no multimodal); requires 24GB unified memory minimum for comfort, 32GB+ ideal on Apple Silicon to avoid tuning sysctl limits.

MLX Setup Delivers Fast Apple Silicon Inference

On Macs, load MLX 4-bit v2 via pip install -U mlx-lm, then mlx_lm.server JunSong/SuperGemma-4-26B-Uncensored-MLX-4bit-v2 --port 8080. Auto-detects bundled chat template—manually forcing one corrupts outputs. Test with mlx_lm.generate and --max-tokens 512. Exposes OpenAI-compatible endpoint for seamless integration, hitting claimed speeds on capable hardware.

Cross-Platform GGUF + Agent Tool Pairing

Non-Mac users grab Q4_K_M GGUF v2 (16.8GB) for llama.cpp, LM Studio, Jan, or Open WebUI—uses neutral template to prevent prompt drift into code/tool modes. Serve locally, then plug into agents:

  • Hermes Agent: Terminal-first with tools, memory, MCP, messaging. Set custom OpenAI endpoint to MLX/GGUF server; leverages Gemma's native function calling for reliable local workflows.
  • OpenClaw: Personal assistant/task runner. Configure custom OpenAI provider to local server for reasoning in multi-channel automation.

This stack turns SuperGemma-4 into a production-like local uncensored agent without cloud dependency, prioritizing practical tasks over edginess.

Summarized by x-ai/grok-4.1-fast via openrouter

5669 input / 1761 output tokens in 14310ms

© 2026 Edge