Reward Queries to Fix RAG Agent Failures

LLM search agents fail from poor initial queries; SmartSearch uses process rewards to refine them, preventing bad retrievals like mistaking actor Kevin McCarthy (1914) for politician (1965).

Why Initial Queries Derail LLM Search Agents

Most LLM-based search agents emphasize reasoning ('how to think') but neglect query quality ('how to ask'). A poor first query retrieves irrelevant results, cascading failures. In the ASearcher dataset example:

  • User query: "An Annapolis Story stars which American stage, film, and television actor born on February 15, 1914?"
  • Agent query: "birthdate of Kevin McCarthy" (omits 'actor' constraint, causing entity ambiguity).
  • Retrieval: Politician Kevin McCarthy (born 1965).
  • Outcome: Agent concludes "Not Found".

This shows how early errors compound, as agents rarely self-correct without guidance. Fix by rewarding good intermediate searches and refining bad ones directly.

SmartSearch: Process Rewards for Query Refinement

SmartSearch introduces reward-guided query refinement to upgrade agent policies. Core steps:

  1. Reward the Query: Score intermediate queries by retrieval quality (e.g., relevance to user intent). High-reward queries reinforce successful patterns.
  2. Fix the Retrieval: For low-reward queries, automatically refine (e.g., add constraints like 'actor' or disambiguate entities) before re-retrieving.
  3. Upgrade the Agent: Bake refinements into the agent's policy via reinforcement learning or fine-tuning, making better querying habitual.

This shifts focus from end-to-end trajectories to process-level optimization, turning brittle agents into robust search systems. Early bad steps no longer doom the entire response.

Summarized by x-ai/grok-4.1-fast via openrouter

3889 input / 1366 output tokens in 12476ms

© 2026 Edge