Copy-Paste to Mitigate Large Language Model Hallucinations
While Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to generate contextually grounded responses, contextual faithfulness remains challenging as LLMs may not consistently trust provided context, leading to hallucinations that undermine reliability. We observe an inverse correlation between response copying degree and context-unfaithful hallucinations on RAGTruth, suggesting that higher copying degrees reduce hallucinations by fostering genuine contextual belief. We propose CopyPasteLLM, obtained through two-stage high-copying response preference training. We design three prompting methods to enhance copying degree, demonstrating that high-copying responses achieve superior contextual faithfulness and hallucination control. These approaches enable a fully automated pipeline that transforms generated responses into high-copying preference data for training CopyPasteLLM. On FaithEval, ConFiQA and PubMedQA, CopyPasteLLM achieves best performance in both counterfactual and original contexts, remarkably with 12.2% to 24.5% accuracy improvements on FaithEval over the best baseline, while requiring only 365 training samples – 1/50th of baseline data. To elucidate CopyPasteLLM’s effectiveness, we propose the Context-Parameter Copying Capturing algorithm. Interestingly, this reveals that CopyPasteLLM recalibrates reliance on internal parametric knowledge rather than external knowledge during generation. All codes are available at https://github.com/longyongchao/CopyPasteLLM
💡 Research Summary
The paper tackles the persistent problem of hallucinations in Retrieval‑Augmented Generation (RAG) systems, where large language models (LLMs) often distrust the retrieved context and fall back on their internal parametric knowledge, producing answers that are factually inconsistent with the supplied source. Existing remedies—citation generation, constrained decoding, or fine‑tuning—either fail to guarantee that the generated text truly reflects the cited material or lack explicit attribution mechanisms.
To address this gap, the authors propose a simple yet powerful generation paradigm called Copy‑Paste. The central hypothesis is that if a model copies a substantial portion of the original context verbatim into its answer, it will develop a stronger “trust” in the context and rely less on internal knowledge, thereby reducing hallucinations. Two quantitative metrics are introduced to measure copying: Copy Coverage (κ), the fraction of answer tokens that appear in the context, and Copy Density (δ), a length‑sensitive variant that emphasizes longer copied spans. An empirical analysis on the RAGTruth dataset shows a clear inverse correlation between (κ, δ) and hallucination density across six models, supporting the hypothesis.
Three prompting strategies are designed to generate high‑copying responses:
- CP‑Order – a hard‑constraint method that selects relevant context sentences, reorders them, and outputs them directly, with no paraphrasing.
- CP‑Link – similar to CP‑Order but permits short connective phrases between copied spans to improve local coherence.
- CP‑Refine – a soft‑constraint, iterative writer‑reviewer loop that refines an answer until a composite copy score exceeds a threshold, balancing faithfulness, relevance, and fluency.
Experiments demonstrate that CP‑Refine achieves the best trade‑off among these criteria.
The second stage internalizes the high‑copying preference via Direct Preference Optimization (DPO). For each query‑context pair, six candidate answers are generated: three baseline styles (plain, attributed, citation‑rich) and the three Copy‑Paste variants. A multi‑criteria filter enforces contextual faithfulness (AlignScore, MiniCheck), copying strength (κ, δ), query relevance (embedding similarity), and fluency (perplexity). The surviving candidates are then ranked by an Elo‑style LLM‑as‑Judge tournament that distinguishes two hallucination modes—Twist (mis‑aligned facts) and Causal (incorrect reasoning). Preference pairs are constructed by placing the gold answer atop a high‑copying candidate and attaching incorrect answers to other candidates, yielding roughly five preference pairs per sample. Remarkably, only 365 high‑copying samples (≈1/50 of typical fine‑tuning corpora) are needed to train CopyPasteLLM, which learns to favor context‑grounded answers even when they conflict with its parametric knowledge.
To probe the underlying mechanism, the authors introduce Context‑Parameter Copying Capturing (CPCC). For each query, two decoding runs are performed: one with the context and one without. At every token step in a chain‑of‑thought generation, the top‑K token probabilities and hidden states are recorded. Tokens present in the supplied context are labeled “contextual knowledge”; tokens that dominate in the context‑free run are treated as “parametric knowledge”. This token‑level analysis, extending prior Knowledge Token Capturing work, reveals that CopyPasteLLM retains similar internal representations to the base model but recalibrates confidence, relying more heavily on contextual tokens and less on internal priors throughout the reasoning trajectory.
Comprehensive evaluation on three benchmarks—FaithEval, ConFiQA, and PubMedQA—shows that CopyPasteLLM outperforms the strongest baselines by 12.2 % to 24.5 % absolute accuracy, both in original and counterfactual contexts. The model’s ability to trust counterfactual context demonstrates genuine contextual belief rather than mere surface copying. Moreover, the data‑efficiency results (high performance with only 365 samples) suggest that the copying preference is a powerful signal that can be learned with minimal supervision.
In summary, the paper presents a novel, data‑efficient framework that converts the act of copying source text into a learned preference, thereby mitigating hallucinations in RAG‑augmented LLMs. By explicitly encouraging verbatim reuse of retrieved passages, the approach provides both faithfulness (answers stay true to the source) and attributability (the copied span itself serves as evidence). This has immediate implications for high‑stakes domains such as medicine, law, and scientific inquiry, where erroneous model outputs can have serious consequences. The proposed CPCC analysis also offers a valuable diagnostic tool for future work aiming to understand and control the balance between parametric and contextual knowledge in generative AI systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment