#post-training
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #post-training
ATOD: Hybrid Distillation for Autonomous Agent Training
ATOD combines on-policy distillation with reinforcement learning using an annealed schedule and turn-level reweighting to train small agent models that outperform their larger teacher models.
arXiv cs.AI
Internalizing Future-Aware Planning in LLM Agents
To move LLM agents beyond reactive behavior, this paper introduces a three-stage training paradigm that enables agents to perform grounded 'what-if' simulations and success estimation.
Showing 2 of 2