Reward Queries to Fix RAG Agent Failures

Why Initial Queries Derail LLM Search Agents

Most LLM-based search agents emphasize reasoning ('how to think') but neglect query quality ('how to ask'). A poor first query retrieves irrelevant results, cascading failures. In the ASearcher dataset example:

User query: "An Annapolis Story stars which American stage, film, and television actor born on February 15, 1914?"
Agent query: "birthdate of Kevin McCarthy" (omits 'actor' constraint, causing entity ambiguity).
Retrieval: Politician Kevin McCarthy (born 1965).
Outcome: Agent concludes "Not Found".

This shows how early errors compound, as agents rarely self-correct without guidance. Fix by rewarding good intermediate searches and refining bad ones directly.

SmartSearch introduces reward-guided query refinement to upgrade agent policies. Core steps:

Reward the Query: Score intermediate queries by retrieval quality (e.g., relevance to user intent). High-reward queries reinforce successful patterns.
Fix the Retrieval: For low-reward queries, automatically refine (e.g., add constraints like 'actor' or disambiguate entities) before re-retrieving.
Upgrade the Agent: Bake refinements into the agent's policy via reinforcement learning or fine-tuning, making better querying habitual.

This shifts focus from end-to-end trajectories to process-level optimization, turning brittle agents into robust search systems. Early bad steps no longer doom the entire response.

Why Initial Queries Derail LLM Search Agents

SmartSearch: Process Rewards for Query Refinement