SurfAgent: Browser Automation for AI Agents Without APIs

Install SurfAgent via NPM to let AI agents control Chrome browsers on logged-in sites like Discord, X, and Google Sheets using page recon mapping—no APIs required, fully open-source.

Recon Mapping Enables Fast, Adaptive Browser Control

SurfAgent uses Chrome DevTools Protocol (CDP) to automate browsers non-headlessly, requiring a machine with a visible browser like a Mac Mini. Install globally with npm i -g surf-agent, then run surf-agent start to launch a controllable instance. The key technique is the 'recon' command, which scans and maps page elements (e.g., buttons, inputs, channels) upfront, allowing agents to reference them by natural language like "general chat" or "search field." This cuts navigation time dramatically—agents adapt to dynamic sites by querying the map instead of brittle selectors. For example, on Hacker News, recon identifies top posts like "DaVinci Resolve," enabling clicks into #10 (DuckDB article) without hardcoded paths.

Agents build context by scraping visible content: last 200 Discord messages in #general (e.g., scam discussions, AI music), X timelines, or YouTube transcripts. Output this context for RAG or summarization—e.g., summarize a Claude 3.5 Sonnet video transcript revealing its zero-day vulnerability exploits.

Automate Research and Data Entry Across Logged-In Apps

Skip APIs by leveraging existing logins. On Discord (Bossy server), recon channels and fetch #general context autonomously. On X.com, search "Claude Mithos," switch to Latest tab, map users/posts, then draft/post short content like a creative note on the model. On YouTube, search queries, play videos, scroll to "Show transcript," extract full text for analysis.

For data tasks, chain recon with actions: research API prices (Claude 3.5 Sonnet/Opus, GPT-4o, Gemini 1.5 Pro/Flash), visit provider sites (Anthropic, OpenAI, Google), scrape rates, navigate to a pre-opened Google Sheets, and populate rows (columns: Model, Input/Output per million tokens). SurfAgent learns Sheets ops like =A1 formulas via recon, then inserts data and generates charts (e.g., pricing bar graph, noting missing Gemini 3.1 Pro output rate). Handles scrolling, errors (e.g., page reloads), and multi-step flows autonomously.

Trade-offs and Extension Path

Not headless—needs GUI browser access, limiting serverless deploys but enabling authenticated sessions without OAuth. Open-source on GitHub (links in video desc); extend via PRs for QA issues, new sites (custom Discord tools added), or pipelines. Pairs with free tools like Freebuf (npm i freebuf at freebuf.com)—a no-subscription coding agent for tasks like FFmpeg silence removal (cuts 5-min MP4 to 2:20). Use SurfAgent for passive income pipelines: recon → research → Sheets/stats → post automation.

Summarized by x-ai/grok-4.1-fast via openrouter

6987 input / 1541 output tokens in 12466ms

© 2026 Edge