The Architecture of Oasis 3
Oasis 3 is an auto-regressive world model designed to generate photorealistic, multi-camera driving environments in real time. By generating one frame at a time and referencing previous frames, the model creates interactive simulations. Decart leverages its proprietary 'Decart Optimization Stack' (DOS) to achieve high efficiency across Nvidia, Amazon, and Google hardware. This vertical integration allows the company to price API access at $0.02 per second, positioning it as a cost-effective alternative to competitors.
Current Limitations and Research Challenges
Despite its photorealistic output, the model faces significant hurdles in maintaining long-term coherence and physical accuracy:
- Memory Constraints: Because each frame consumes approximately 8,000 tokens, the context window fills rapidly during continuous generation. This leads to 'dream-like' degradation where environments lose thematic integrity and original locations disappear.
- Physics Inconsistency: The model currently fails to simulate solid-body physics, often allowing vehicles to pass through one another. Decart attributes this to a data imbalance, noting that training sets contain significantly more examples of 'good driving' than accident scenarios.
- Control Responsiveness: Users often experience difficulty maintaining precise control over the simulated vehicle, a common issue in current world model architectures.
Strategy: Building a Developer Ecosystem
Decart is explicitly modeling its go-to-market strategy after OpenAI’s early API-first approach. By providing immediate API access rather than keeping the technology behind research previews, the company aims to foster a developer community that discovers novel use cases for world models. While initially targeting autonomous vehicle companies for edge-case testing, Decart intends to expand into broader robotics and physical AI applications. Future iterations are expected to address memory issues by allowing users to seed simulations with video input rather than static images, and by researching methods to compress memory into fewer tokens.