№ 02 / SUMMARIES

#post-training

Every summary, chronological. Filter by category, tag, or source from the rail.

Tag · #post-training
DAY 01Today JUN 29 · 20262 SUMMARIES
arXiv cs.AIAgents & Orchestration

ATOD: Hybrid Distillation for Autonomous Agent Training

ATOD combines on-policy distillation with reinforcement learning using an annealed schedule and turn-level reweighting to train small agent models that outperform their larger teacher models.

arXiv cs.AI
arXiv cs.AIAgents & Orchestration

Internalizing Future-Aware Planning in LLM Agents

To move LLM agents beyond reactive behavior, this paper introduces a three-stage training paradigm that enables agents to perform grounded 'what-if' simulations and success estimation.

Showing 2 of 2