The Shift from Chatbots to Autonomous Research Agents

Sakana Marlin marks a departure from standard conversational AI assistants. Designed as a "Virtual Chief Strategy Officer," Marlin is built for high-stakes enterprise tasks like competitive analysis, risk assessment, and strategy formulation. Unlike chat-based tools that prioritize latency, Marlin operates autonomously for up to eight hours per request, issuing hundreds to thousands of LLM queries to produce comprehensive, cited reports ranging from 60 to 100 pages, complete with presentation slide decks.

AB-MCTS: Scaling Inference-Time Compute

The engine powering Marlin is Adaptive Branching Monte Carlo Tree Search (AB-MCTS). This approach treats research as a tree-search problem where the agent makes iterative decisions to either expand the search (going "wider") or refine existing findings (going "deeper").

Key technical features include:

  • Multi-LLM Routing: The system can route specific steps to different models (e.g., o4-mini, Gemini 2.5 Pro, DeepSeek-R1) to optimize for task-specific performance.
  • Workflow Automation: Leveraging techniques from Sakana’s "AI Scientist" project, the agent handles the entire lifecycle of hypothesis planning, source browsing, and verification.
  • Open-Source Implementation: Sakana has released the core algorithm as TreeQuest under the Apache 2.0 license, allowing developers to implement their own search-based agent workflows.

Trade-offs and Operational Model

Marlin is explicitly designed for depth over speed. While competitors like OpenAI or Perplexity Deep Research provide answers in minutes, Marlin’s extended runtime is intended to facilitate rigorous hypothesis testing.

  • Pricing: The service uses a credit-based system. Pay-as-you-go starts at 100 credits per run, with tiered subscriptions (Pro and Team) offering monthly credit bundles.
  • Reliability: Because sessions are long-running, the system includes checkpointing to mitigate the impact of API errors during the research process.
  • Human-in-the-loop: Despite its autonomy, the agent is designed to provide a structured deliverable that requires human review before final strategic decisions are made.