Reward Queries to Fix RAG Agent Failures
LLM search agents fail from poor initial queries; SmartSearch uses process rewards to refine them, preventing bad retrievals like mistaking actor Kevin McCarthy (1914) for politician (1965).
Why Initial Queries Derail LLM Search Agents
Most LLM-based search agents emphasize reasoning ('how to think') but neglect query quality ('how to ask'). A poor first query retrieves irrelevant results, cascading failures. In the ASearcher dataset example:
- User query: "An Annapolis Story stars which American stage, film, and television actor born on February 15, 1914?"
- Agent query: "birthdate of Kevin McCarthy" (omits 'actor' constraint, causing entity ambiguity).
- Retrieval: Politician Kevin McCarthy (born 1965).
- Outcome: Agent concludes "Not Found".
This shows how early errors compound, as agents rarely self-correct without guidance. Fix by rewarding good intermediate searches and refining bad ones directly.
SmartSearch: Process Rewards for Query Refinement
SmartSearch introduces reward-guided query refinement to upgrade agent policies. Core steps:
- Reward the Query: Score intermediate queries by retrieval quality (e.g., relevance to user intent). High-reward queries reinforce successful patterns.
- Fix the Retrieval: For low-reward queries, automatically refine (e.g., add constraints like 'actor' or disambiguate entities) before re-retrieving.
- Upgrade the Agent: Bake refinements into the agent's policy via reinforcement learning or fine-tuning, making better querying habitual.
This shifts focus from end-to-end trajectories to process-level optimization, turning brittle agents into robust search systems. Early bad steps no longer doom the entire response.