Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation

Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Inspired by the success of reinforcement learning (RL) in Large Language Model (LLM) training for domains like math and code, recent works have begun exploring how to train LLMs to use search engines more effectively as tools for retrieval-augmented generation. Although these methods achieve performance improvement across QA benchmarks, many prioritize final answer correctness while overlooking the quality of intermediate reasoning steps, which may lead to chain-of-thought unfaithfulness. In this paper, we first introduce a comprehensive evaluation framework for evaluating RL-based search agents, covering three distinct faithfulness metrics: information-think faithfulness, think-answer faithfulness, and think-search faithfulness. Our evaluations reveal that canonical search agents trained via Reinforcement Learning from Verifiable Reward (RLVR) – including SearchR1 and ReSearch – have significant room for improvement in this regard. To foster faithful reasoning, we introduce VERITAS(Verifying Entailed Reasoning through Intermediate Traceability in Agentic Search), a novel framework that integrates fine-grained faithfulness rewards into the reinforcement learning process. Our experiments show that models trained with VERITAS not only significantly improve reasoning faithfulness, but also achieve better task performance compared to the baselines trained against pure outcome-based reward.


💡 Research Summary

The paper tackles a critical shortcoming of current reinforcement‑learning (RL)‑based agentic search models used in retrieval‑augmented generation (RAG). While these models (e.g., Search‑R1, ReSearch) achieve impressive final‑answer accuracy on QA benchmarks, they are trained solely on outcome‑based rewards, ignoring the quality and faithfulness of the intermediate reasoning steps that lead to those answers. This oversight can produce “chain‑of‑thought unfaithfulness,” where the generated reasoning trace does not truly reflect the evidence retrieved, undermining transparency and reliability.

To address this, the authors first propose a systematic evaluation framework that defines three distinct faithfulness dimensions:

  1. Information‑Think Faithfulness – does the reasoning after a retrieved block actually incorporate that evidence?
  2. Think‑Search Faithfulness – is each generated query a logical consequence of the preceding block, i.e., does the model search for what it believes it needs?
  3. Think‑Answer Faithfulness – is the final entailed by the immediately preceding block, ensuring no new unsupported claims appear at the end.

For each dimension they design automatic metrics that combine pretrained natural‑language‑inference (NLI) models with LLM‑as‑Judge evaluations (using Claude Sonnet‑4.5). The NLI checks capture strict entailment, while the LLM judge captures more implicit or abstract motivations that NLI may miss.

Applying this framework to the canonical agents reveals a stark gap: despite modest gains in Exact Match (EM) scores, Information‑Think faithfulness often falls below 30 % and Think‑Search below 45 %. In other words, the agents frequently ignore retrieved evidence or issue irrelevant queries, even though they manage to produce correct final answers.

Motivated by these findings, the paper introduces VERITAS (Verifying Entailed Reasoning through Intermediate Traceability in Agentic Search). VERITAS augments the RL reward with fine‑grained, process‑level faithfulness signals. The total reward for a rollout becomes

  R = λ·OutcomeReward + α·InfoThink + β·ThinkSearch + γ·ThinkAnswer,

where λ emphasizes the traditional outcome reward and α, β, γ weight the three faithfulness components. Using PPO (or GRPO) with this composite reward, the policy learns not only to produce correct answers but also to generate reasoning steps that are demonstrably grounded in retrieved evidence.

Empirical results on Natural Questions (NQ) and HotpotQA show that VERITAS‑R1 improves Information‑Think faithfulness by ~14 percentage points and Think‑Answer faithfulness by ~7.7 points relative to the baseline Search‑R1, while also achieving a 1.2–2.0 % absolute increase in EM. Moreover, the training exhibits reduced variance in reward signals and converges faster, indicating better sample efficiency.

The contribution of the work is threefold:

  1. A formal definition and evaluation suite for faithfulness in agentic search, filling a gap in current RAG assessment practices.
  2. A comprehensive analysis exposing the disconnect between high task performance and low reasoning faithfulness in existing RL‑based agents.
  3. The VERITAS training framework that integrates process‑level faithfulness rewards, demonstrating that improved reasoning fidelity can coexist with, and even boost, final answer accuracy.

The authors discuss future directions such as incorporating more sophisticated fact‑checking models, extending the framework to multimodal retrieval, and leveraging human‑in‑the‑loop feedback to refine faithfulness rewards. Overall, the paper argues that next‑generation retrieval‑augmented systems should be judged not only on whether they get the right answer, but also on whether the answer is derived from a transparent, evidence‑grounded reasoning process—a shift that promises greater reliability, explainability, and user trust in AI assistants.


Comments & Academic Discussion

Loading comments...

Leave a Comment