Porting PyTorch Models to the Browser with Claude Code

The 'Vibe Coding' Workflow

This project demonstrates a 'vibe coding' approach where an engineer directs an AI agent to perform complex porting tasks without manually writing code. By providing the agent with research context (e.g., model documentation and feasibility musings), the agent can handle the end-to-end pipeline: cloning repositories, converting PyTorch models to ONNX, publishing weights to Hugging Face, and building a functional web UI. The human role is restricted to testing, providing feedback on errors, and suggesting feature improvements like progress bars.

Technical Pipeline for Browser-Based Inference

To run a 0.2B parameter model in the browser, the project utilized the following architecture:

Model Conversion: The PyTorch model was exported to the ONNX (Open Neural Network Exchange) format. ONNX acts as a framework-neutral container bundling the computation graph (the 'recipe') and the learned weights.
Execution: The model runs on the client side using ONNX Runtime Web with a WebGPU backend, allowing modern browsers (Chrome, Firefox, Safari) to leverage hardware acceleration.
Caching Strategy: Because the model weights are ~1.3GB, the project implemented the CacheStorage API. By mimicking the caching patterns found in existing projects like Whisper Web, the application ensures that large model files are persisted locally, preventing redundant downloads on page reloads.

Key Learnings

Agentic Capability: Modern coding agents are highly effective at navigating complex dependency chains and converting model formats when provided with clear research goals.
Client-Side Feasibility: Running substantial models in the browser is viable today, provided the user can tolerate the initial download size of the weights.
Understanding the Stack: While the initial porting was done via 'vibe coding,' subsequent deep-dives with LLMs can effectively document the underlying mechanics, such as how torch.onnx.export maps operators to a portable graph, providing the developer with a comprehensive understanding of the produced artifact.

The 'Vibe Coding' Workflow

Technical Pipeline for Browser-Based Inference

Key Learnings

More from Inference & Serving

The Shift in Software Engineering: AI Agents and Production Risk

Building and Auditing Local Coding Agents

GLM-5.2: A New Benchmark for Open-Weight Agentic Coding

Optimizing Software Workflows with AI Code Review