Gemini's Push to Agentic Browser, Robots, and Skill Eval

Browser-Level Agents via Reusable Skills and Enterprise Tabs

Chrome's new Gemini Skills (rolled out April 14, 2024 on Mac/Windows/ChromeOS, English US only) let you save prompts as reusable workflows, triggered by slash or + button, running instantly on current or multiple tabs. This eliminates retyping for tasks like ingredient analysis or spec comparison—open five product pages, trigger once, get unified output. Edit/customize skills anytime; Google's pre-made library covers gift picking (budget/preferences) and document scanning. Safety gates confirm high-impact actions (e.g., email/calendar), using Chrome's red-teaming/auto-updates. Trade-off: Browser-templated prompts democratize LangChain-style libraries for non-devs but tie to Gemini's ecosystem.

Enterprise Gemini tests an Agent tab with 'New Task'/'Inbox' for multi-step execution: side panel tracks goals, agents, apps/files, human-review toggle. Mirrors Claude Projects—define goal, grant tool access, execute autonomously. Signals full desktop agents, potentially via upcoming AI Studio app, evolving from chat to workspace for persistent, tool-using workflows.

NotebookLM Evolves into Visual Data Hub

Tested Canvas turns sources into timelines, interactive pages, or visualizers, shifting from summaries to structured apps. Connectors pull external data (Google ecosystem first), plus autolabeling for large datasets, fixing navigation pain in multi-source analysis. Builds central research layer, enabling dynamic experiences from static uploads.

Robotics-ER 1.6 Enables Reliable Real-World Tasks

Paired with VLA (direct robot control), ER 1.6 reasons/plans: boosted spatial skills (pointing/counting/object relations, pixel-accurate paths/constraints) avoid hallucinations (e.g., correctly IDs hammers/scissors). Multi-view success detection handles occlusions/dynamics for autonomous retry/decide. New: instrument reading (gauges/meters/displays) via agentic vision—zoom/analyze/run code/apply knowledge. On Boston Dynamics Spot: ER 1.5 at 23%, Gemini 3.0 Flash 67%, ER 1.6 86%, +agentic 93%. Unlocks facility navigation/interpretation without humans.

Vantage: LLMs Score 'Durable' Human Skills Accurately

Executive LLM steers AI personas to probe collaboration/creativity/critical thinking (e.g., inject conflict for resolution tests), outperforming independent agents. In 188 participants/373 convos: project mgmt evidence 92.4%, conflict 85%; scoring matches humans (Cohen's Kappa 0.45-0.64). Creativity on 180 student works: 0.88 Pearson vs. experts. Simulates skill levels pre-human studies (lower error, matches real patterns). Outputs skills maps linking scores to convo snippets for interpretability—scales beyond knowledge tests.