#ai-llms
Every summary, chronological. Filter by category, tag, or source from the rail.
SpatialClaw: Using Code as an Action Interface for Spatial Reasoning
SpatialClaw is a training-free agent framework that improves spatial reasoning in VLMs by treating Python code—rather than structured tool calls—as the primary interface for perception and geometric tasks.
The New Software Lifecycle: From Vibe Coding to Agentic Engineering
AI has shifted the software development bottleneck from implementation to specification and verification. Success now depends on 'harness engineering'—the 90% of an agent's architecture that isn't the model—and treating context management as a versioned, architectural decision.
Toten: Ontological Tokenization for Technical Portuguese
Toten is a knowledge-based tokenization framework designed to accurately parse physical quantities and technical notation in Brazilian Portuguese, addressing common failures in standard NLP tokenizers.
The Symbiotic Evolution of AI and Software Engineering
The intersection of AI and Software Engineering (AI4SE and SE4AI) has matured over the last decade, shifting from experimental research to essential production-grade methodologies for building, testing, and maintaining complex systems.
Configurable Clinical Information Extraction with Agentic RAG
Agentic RAG systems for clinical data require modular configuration to balance precision and recall, as monolithic pipelines often fail to handle the high variability of medical documentation.
The Production AI Playbook: Deploying Agents at Enterprise Scale
Moving AI from demo to production requires shifting focus from model selection to five pillars: evaluation, observability, data foundation, orchestration, and governance.
AI EngineerSkill-Guided Continuation Distillation for GUI Agents
The paper introduces a method to improve GUI agent performance by distilling complex task trajectories into modular, skill-based sub-tasks, enhancing generalization and execution reliability.
High-Leverage Python Skills for the Next Decade
Focus on foundational engineering skills like distributed systems, performance optimization, and AI integration to ensure your Python expertise compounds in value over the next ten years.
Predicting AI Model Behavior via Deployment Simulation
OpenAI uses 'Deployment Simulation'—replaying real, de-identified user conversations with new models—to predict safety risks and undesired behaviors before public release, outperforming traditional synthetic evaluations.
Verbal Reinforcement Learning: Closing the Feedback Loop
The paper introduces a framework for 'Verbal Reinforcement Learning' (VRL), shifting from raw reward signals to structured insight governance by extracting and managing verbal feedback from world interactions.
Improving Agentic Search via Diverse Query Initialization
The paper proposes moving beyond simple parallel sampling in agentic search by implementing diverse query initialization, which improves retrieval performance by covering a broader semantic space.
Qwen-RobotSuite: Three Foundation Models for Embodied AI
The Qwen team has released a suite of three specialized foundation models—RobotManip, RobotWorld, and RobotNav—designed to address data fragmentation in robotics through unified action representations, language-conditioned world modeling, and scalable navigation interfaces.
Visual-Seeker: Active Visual Reasoning for Multimodal Agents
Visual-Seeker introduces a visual-native agentic search framework that moves beyond text-based retrieval by employing active visual reasoning to navigate and interpret complex multimodal environments.
Verifiable Agentic Data Science via Tool-Grounded Reasoning
To solve complex, irregular Time-Series Question Answering (TSQA), agents must move beyond pure generation toward tool-grounded reasoning that enforces verifiable, step-by-step execution.
Cognitive Debt: The Hidden Fragility of AI-Augmented Systems
The paper introduces 'Cognitive Debt' as a framework to explain how AI-driven intellectual leverage creates systemic fragility by offloading critical reasoning to models, leading to a loss of human oversight and domain expertise.
Scaling Agentic Search with Dynamic Workspace Expansion
DR-DCI improves agentic search by combining retriever-based scalability with local terminal-style operations, allowing agents to dynamically pull documents into a workspace for precise analysis.
Building Dynamic Experiences with GenUI and Agentic Workflows
GenUI (Agent-to-UI) enables applications to generate custom user interfaces on-demand using Gemini, allowing for real-time personalization that goes beyond static design.
Google Cloud TechHybrid Open-Ended Tri-Evolution for Deep Research Agents
The paper introduces a 'Hybrid Open-Ended Tri-Evolution' framework to improve the performance of deep research AI agents by optimizing their exploration and reasoning capabilities.
Orchestra-o1: A Framework for Omnimodal Agent Orchestration
Orchestra-o1 introduces a specialized architecture for coordinating omnimodal AI agents, enabling them to process and act across diverse data modalities in complex, multi-step tasks.
Hands-On Guide to FineWeb Corpus Processing and Analytics
Learn to stream, filter, deduplicate, and analyze large-scale web datasets like FineWeb using Python, MinHash, and tiktoken to prepare high-quality data for LLM training.
Building Functional Personas with AI for User-Centric Decisions
Move beyond static, demographic-heavy personas by using AI to synthesize research into 'functional' personas focused on user goals, tasks, and objections, then making them interactive via custom chatbots.
The Shift to MANGOS: AI Labs and Deeptech Dominate Public Markets
The public market landscape is shifting from consumer social giants (FAANG) to AI labs and deeptech (MANGOS), with SpaceX's historic IPO triggering a ripple effect of capital and business model emulation across the startup ecosystem.
Formalizing Theory of Mind for AI Agents
The article proposes a formal mathematical specification for a 'Theory of Mind' (ToM) mechanism, enabling AI agents to model and predict the mental states of other agents to improve collaborative decision-making.
Arbor: Enhancing Agent Cognition via Tree Search
Arbor introduces a tree search-based cognition layer for autonomous agents, enabling more robust decision-making by systematically exploring action paths rather than relying solely on single-step inference.
Avataar AI's Varya: A Low-Cost, Culturally Aware Video Model
Avataar AI has launched Varya, a distilled, high-speed video generation model optimized for the Indian market, offering a 20x price reduction compared to global competitors by focusing on efficiency and cultural relevance.
Sustainable AI Development: Balancing Infinite Scaling with Human Limits
To avoid burnout in the era of AI-driven coding, developers must shift from manual execution to an 'agent-orchestrator' model that uses verification gates, voice-first workflows, and remote control to maintain productivity while reclaiming personal time.
AI EngineerOpenAI's Multi-Layered Approach to AI Content Provenance
OpenAI is adopting the EU Code of Practice on Transparency of AI-Generated Content, utilizing a multi-layered strategy that combines C2PA metadata, watermarking, and public verification tools to improve digital content transparency.
Recursive Reasoning for Theory of Mind in AI
The paper proposes that improving AI's Theory of Mind requires recursive perspective-taking, allowing models to model the mental states of others rather than relying on static pattern matching.
Securing Continuous Data Summarization Against Adversarial Attacks
This paper addresses vulnerabilities in continuous data summarization systems by identifying multi-target adversarial attack vectors and proposing robust defense mechanisms to ensure AI trustworthiness.
Hierarchical Memory Navigation for Efficient AI Agents
The paper introduces a hierarchical memory structure that improves agent efficiency by organizing information before retrieval, moving beyond simple flat vector search.
Showing 30 of 228