#policy-optimization
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #policy-optimization
Strategy-Guided Policy Optimization for LLM Reasoning
Strategy-Guided Policy Optimization (SGPO) improves LLM reasoning by distilling reusable problem-solving strategies rather than just imitating specific solution trajectories, leading to better generalization.
arXiv cs.AI
Showing 1 of 1