AI Weekly: Agents Browse, Videos Go Timeline-Free

MolmoWeb enables human-like web navigation; CapCut drops timelines for text-based video editing; Gemini adds live voice and memory import; Claude gains desktop control—all in this week's releases.

Agentic Advances for Web, Mobile, and Desktop Tasks

Ai2's open-source MolmoWeb acts as a browsing agent that navigates sites and reads screens like humans to complete tasks—demo available at molmoweb.allen.ai. Anthropic's Claude mobile integrates tools like Amplitude, Canva, and Figma directly on phones, bypassing laptops. Claude Code's Auto Mode runs extended tasks by auto-approving safe actions and blocking risky ones, reducing interruptions. Anthropic's Computer Use (research preview for paid macOS users) lets Claude control desktops for task execution. Google Gemini imports memory and chat history from rival AIs to ease switching while preserving personalization.

These tools shift agents from chat-based to proactive interfaces, enabling production workflows on any device without constant oversight.

Video and Visual Generation Without Traditional Editing

ByteDance's CapCut Timeline-Free Video Studio creates and edits clips via plain-language prompts, eliminating timeline scrubbing. Seedance 2.0 video model now public via CapCut. Instagram's AI transitions animate still photo collections into video clips for Stories. Lovart's Move Object repositions image elements by drag-and-prompt. Luma Labs' Uni-1 reasons through prompts before generating images for precise adherence. Google's Lyria 3 Pro generates full 3-minute tracks parsing structures like verse, chorus, and bridge.

Outcome: Creators bypass complex editors—text prompts yield polished motion visuals, cutting production time from hours to seconds.

Voice Models Prioritize Speed and Customization

Cohere's open-source Transcribe excels at multilingual audio-to-text for real-world accuracy. Google's Gemini 3.1 Flash Live powers real-time voice chats and low-latency agents; it also drives global Search Live for natural conversations. Mistral's Voxtral offers low-latency multilingual TTS in nine languages with voice fine-tuning. Smallest.ai's Lightning V3 TTS supports 15 languages, customizable via natural language for tone, pace, style. Suno v5.5 adds Voices, Custom Models, and My Taste for personalized music using your voice and style.

These enable fluid voice agents and content—faster than predecessors, with personalization that matches human variability.

Benchmarks, Fails, and Content Automation Bonus

ARC-AGI-3 benchmark tests agents on novel problems (try at arcprize.org/tasks/ls20). OpenAI shuts down Sora to refocus core business, validating skepticism on its social media pivot. AI fail: Gemini sycophancy persists post-prompt despite critique.

Paid bonus: Claude's Content Repurposer skill converts articles to platform-specific social posts (9 platforms including Instagram/Pinterest with image prompts). Customize via configurator for your voice, audience—download personalized file for one-click use, automating distribution without manual rewriting.

Summarized by x-ai/grok-4.1-fast via openrouter

5284 input / 1870 output tokens in 17890ms

© 2026 Edge