№ 02 / SUMMARIES

arXiv cs.AI

Every summary, chronological. Filter by category, tag, or source from the rail.

Source · arXiv cs.AI
DAY 01Today MAY 25 · 20268 SUMMARIES
arXiv cs.AIAI & LLMs

Parallel Context Compaction for Long-Horizon LLM Agent Serving

The paper proposes a method to optimize long-horizon LLM agent performance by using parallel context compaction, reducing the computational overhead of maintaining massive context windows during extended agent interactions.

arXiv cs.AI
arXiv cs.AIAI & LLMs

DART: Improving Agent Reliability via Semantic Recoverability

DART (Dynamic Agent Recovery Technique) introduces a framework for structured tool agents to detect and recover from execution failures by leveraging semantic feedback loops, significantly reducing task abandonment.

arXiv cs.AIAI & LLMs

Foundation Protocol: Coordination for Agentic Systems

The Foundation Protocol proposes a standardized coordination layer designed to enable interoperability, trust, and resource allocation between autonomous AI agents in a decentralized society.

arXiv cs.AIAI & LLMs

Inductive Deductive Synthesis for Formally Verified AI Systems

Inductive Deductive Synthesis (IDS) combines inductive AI generation with deductive formal verification to ensure AI-generated code is mathematically correct and reliable.

arXiv cs.AIAI & LLMs

GENSTRAT: A Framework for Strategic Reasoning in LLMs

GENSTRAT provides a structured approach to evaluating and improving how Large Language Models perform in strategic, multi-agent environments, moving beyond simple pattern matching to formal strategic reasoning.

arXiv cs.AIAI & LLMs

EVE-Agent: Improving Self-Evolving Agents with Evidence Verification

EVE-Agent improves self-evolving search agents by requiring them to provide verifiable evidence for their answers, ensuring training data is grounded and auditable without human labels.

arXiv cs.AIAI & LLMs

Energy per Successful Goal: A New Metric for Agentic AI Efficiency

The paper introduces 'Energy per Successful Goal' (ESG) as a critical metric for evaluating AI agent efficiency, shifting focus from raw compute costs to the energy required to complete specific, actionable objectives.

arXiv cs.AIAI & LLMs

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

BOHM introduces a method for attributing performance in compound AI systems without the computational overhead of traditional evaluation methods.

DAY 02Friday MAY 22 · 202614 SUMMARIES
arXiv cs.AIAI & LLMs

Governance by Construction for Generalist Agents

The paper proposes 'Governance by Construction' as a paradigm for AI safety, shifting from post-hoc monitoring to embedding constraints directly into the agent's architecture and execution environment.

arXiv cs.AI
arXiv cs.AIAI & LLMs

Conflict-Aware Additive Guidance for Flow Models

This paper introduces a method to manage conflicting compositional rewards in flow-based generative models by dynamically adjusting guidance to prevent performance degradation.

arXiv cs.AIAI & LLMs

VBFDD-Agent: Translating Battery Signals into Descriptive Text

The VBFDD-Agent framework improves electric vehicle battery diagnostics by converting raw digital sensor signals into descriptive text, enabling LLMs to perform more accurate fault detection and diagnosis.

arXiv cs.AIAI & LLMs

HANA: A Hierarchical Agent-native Network Architecture

HANA transitions network management from static automation to autonomous operation by utilizing a hierarchical agent-based framework that enables decentralized decision-making and self-optimization.

arXiv cs.AIAI & LLMs

Optimizing Agentic Pipelines with Temporal Semantic Caching

The paper introduces a framework for improving agentic plan-execute pipelines by implementing temporal semantic caching, which reduces redundant LLM calls and latency by caching execution results based on semantic similarity and temporal relevance.

arXiv cs.AIAI & LLMs

Personality Engineering: A New Framework for AI Negotiation Agents

Researchers propose 'personality engineering' using the interpersonal circumplex model to parameterize and test AI agent behavior in controlled negotiation experiments.

arXiv cs.AIAI & LLMs

COAgents: A Multi-Agent Framework for Routing Optimization

COAgents is a multi-agent framework designed to navigate complex search spaces in routing problems by combining collaborative agent intelligence with optimization techniques.

arXiv cs.AIAI & LLMs

Open-World Evaluations for Frontier AI Capabilities

The paper proposes shifting AI benchmarking from static, closed-set datasets to open-world evaluations, which better measure true agentic capability and generalization in unpredictable environments.

arXiv cs.AIAI & LLMs

AgentAtlas: Moving Beyond Outcome-Only LLM Agent Evaluation

AgentAtlas shifts the focus of LLM agent evaluation from simple success/failure leaderboards to granular, process-oriented analysis of agent behavior and decision-making patterns.

arXiv cs.AIAI & LLMs

Evaluating Uncertainty in AI Systems with ECUAS_n Metrics

The ECUAS_n family of metrics provides a principled, unified framework for evaluating AI systems that output uncertainty estimates, addressing the lack of standardized benchmarking for uncertainty-augmented models.

arXiv cs.AIAI & LLMs

AgentCo-op: Retrieval-Based Synthesis of Multi-Agent Workflows

AgentCo-op introduces a retrieval-based framework to dynamically synthesize interoperable multi-agent workflows, moving beyond static agent orchestration to modular, reusable task execution.

arXiv cs.AIAI & LLMs

SOLAR: Self-Optimizing Agents for Lifelong Learning

SOLAR introduces a framework for autonomous agents that perform continuous, self-directed learning and adaptation in open-ended environments, addressing the limitations of static model training.

arXiv cs.AIAI & LLMs

OSCToM: Advancing High-Order Theory of Mind via RL-Guided Adversarial Generation

OSCToM improves AI's ability to model complex, recursive mental states (Theory of Mind) by using reinforcement learning to guide adversarial data generation, addressing the scarcity of high-order social reasoning datasets.

arXiv cs.AIAI & LLMs

COSMO-Agent: Automating CAD-CAE Design Loops with LLMs

COSMO-Agent is a reinforcement learning framework that enables LLMs to bridge the CAD-CAE semantic gap by orchestrating external tools to perform iterative, constraint-driven geometric design.

DAY 03Wednesday MAY 20 · 20268 SUMMARIES
arXiv cs.AIAI & LLMs

SimGym: Simulating E-Commerce A/B Tests with VLM Agents

SimGym is a framework that uses traffic-grounded Vision-Language Model (VLM) agents to simulate user behavior in e-commerce environments, enabling faster and more accurate A/B test predictions.

arXiv cs.AI
arXiv cs.AIAI & LLMs

Evaluating the Feasibility of Autonomous AI Research Systems

The article provides a framework for assessing how close current AI systems are to performing end-to-end scientific research, highlighting the gap between task-specific automation and true autonomous discovery.

arXiv cs.AIAI & LLMs

Formalizing Agentic Knowledge Graphs for LLM Discoverability

The paper proposes a formal framework for 'Agentic KG Affordances,' enabling AI agents to programmatically discover and interact with knowledge graphs by standardizing how knowledge is exposed and queried.

arXiv cs.AIAI & LLMs

Distinguishing Uncertainty Types for Better AI Exploration

Effective AI exploration requires distinguishing between aleatoric uncertainty (stochasticity) and epistemic uncertainty (volatility), as treating them identically leads to suboptimal learning behaviors.

arXiv cs.AIAI & LLMs

Hallucination as Exploit: Security Risks in Multimodal AI Agents

Multimodal AI agents are vulnerable to 'evidence-carrying' attacks, where attackers use hallucination to force models into executing malicious code or leaking sensitive data via manipulated visual inputs.

arXiv cs.AIAI & LLMs

DecisionBench: Measuring Agentic Delegation in Long-Horizon Tasks

DecisionBench provides a standardized framework for evaluating how AI agents delegate sub-tasks in complex, long-horizon workflows, addressing a critical gap in multi-agent system performance measurement.

arXiv cs.AIAI & LLMs

Optimizing System Prompts via Embedding by Elicitation

The paper introduces 'Embedding by Elicitation,' a method that uses Bayesian Optimization to dynamically refine system prompts by learning latent representations, overcoming the limitations of static prompt engineering.

arXiv cs.AIAI & LLMs

Developing Data Probes to Quantify LLM Data Impact

The authors propose 'data probes' as a diagnostic framework to move beyond black-box training, enabling developers to measure how specific data characteristics influence model performance and behavior.

Showing 30 of 51