Tag: mlx

Summaries

M5 Max Crushes M4 in Local LLM Benchmarks via MLX

IndyDevDan

Apr 20, 2026

M5 Max Crushes M4 in Local LLM Benchmarks via MLX

M5 Max MacBook Pro outperforms M4 Max by 15-50% across prefill, decode, and wall times; MLX models double GGUF speeds for Qwen 3.5 and Gemma 4 on Apple Silicon, enabling private, fast local inference.

Simon Willison's Weblog

Run VibeVoice STT Locally on Mac in One uv Command

Transcribe up to 59min audio with Microsoft's MIT-licensed VibeVoice model using mlx-audio: uv one-liner on M5 Max Mac processes 1hr podcast in 524s (8:45min) at 30-61GB RAM peak, outputs speaker-diarized JSON segments.