CARROT: A Learned Cost-Constrained Retrieval Optimization System for RAG

CARROT: A Learned Cost-Constrained Retrieval Optimization System for RAG
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models (LLMs) have demonstrated impressive ability in generation and reasoning tasks but struggle with handling up-to-date knowledge, leading to inaccuracies or hallucinations. Retrieval-Augmented Generation (RAG) mitigates this by retrieving and incorporating external knowledge into input prompts. In particular, due to LLMs’ context window limitations and long-context hallucinations, only the most relevant “chunks” are retrieved. However, current RAG systems face three key challenges: (1) chunks are often retrieved independently without considering their relationships, such as redundancy and ordering; (2) the utility of chunks is non-monotonic, as adding more chunks can degrade quality; and (3) retrieval strategies fail to adapt to the unique characteristics of different queries. To overcome these challenges, we design a cost-constrained retrieval optimization framework for RAG. We adopt a Monte Carlo Tree Search (MCTS) based strategy to find the optimal chunk combination order, which considers the chunks’ correlations. In addition, to address the non-monotonicity of chunk utility, instead of treating budget exhaustion as the termination condition, we design a utility computation strategy to identify the optimal chunk combination without necessarily exhausting the budget. Furthermore, we propose a configuration agent that predicts optimal configurations for each query domain, improving our framework’s adaptability and efficiency. Experimental results demonstrate up to a 30% improvement over baseline models, highlighting the framework’s effectiveness, scalability, and suitability. Our source code has been released at https://github.com/wang0702/CARROT.


💡 Research Summary

The paper addresses three fundamental shortcomings of current Retrieval‑Augmented Generation (RAG) pipelines: (1) the independent treatment of retrieved chunks, which ignores redundancy and inter‑chunk relationships; (2) the implicit assumption that adding more chunks always improves the final answer, despite empirical evidence that chunk utility is non‑monotonic; and (3) the use of a single, static reranker model for all query types, which fails to adapt to the diverse characteristics of user queries. To overcome these issues, the authors propose CARROT (Cost‑constrained Retrieval Optimization), a rank‑based RAG framework that explicitly models chunk combination order under a token‑budget constraint.

The core technical contribution is a Monte‑Carlo Tree Search (MCTS) policy that explores the exponential space of possible chunk sequences. Each node in the search tree corresponds to a partial ordering of chunks; child nodes extend the sequence by adding another candidate chunk. The tree is guided by an Upper Confidence Bound (UCB) utility function that balances exploration of new combinations with exploitation of promising ones. Crucially, the utility of a candidate sequence is evaluated by feeding the entire concatenated chunk set into a reranker model, rather than summing individual scores. This captures the non‑additive, non‑monotonic nature of chunk utility. The authors also incorporate the token budget directly into the optimization objective, allowing the search to stop before the budget is exhausted if further additions would reduce overall utility.

To handle query‑specific variability, a contrastive‑learning‑based configuration agent predicts optimal MCTS hyper‑parameters (search depth, number of simulations) and selects the most suitable reranker model for each query domain. The agent learns a query embedding space where similar queries share configurations, enabling dynamic adaptation without manual tuning.

The authors formalize the problem as a constrained maximization: maximize W(Φ) subject to Σ cost(χ_i) ≤ B, where Φ is an ordered chunk combination and W(·) is the reranker‑derived benefit. They prove the problem is NP‑hard via reduction from the Maximum Weighted Hyper‑Clique problem.

Experiments span four domains—satellite image description, up‑to‑date news QA, enterprise document retrieval, and general knowledge QA. Baselines include traditional top‑k approximate nearest neighbor retrieval, cluster‑based retrieval, state‑of‑the‑art rank‑based RAG (e.g., DPR + Cross‑Encoder), graph‑based RAG, and instruction‑tuned tuning‑based RAG. Across all settings, CARROT achieves an average 30 % improvement in accuracy/F1/ROUGE‑L while respecting token budgets of 256–768. Ablation studies demonstrate that removing MCTS, the non‑monotonic utility handling, or the configuration agent each leads to substantial performance drops, confirming the necessity of each component. Computational overhead is modest: parallel evaluation of candidate sequences keeps additional GPU time under 0.8× that of the baseline reranker inference, making the approach viable for real‑time services.

The paper’s contributions are threefold: (1) introducing the first RAG system that jointly optimizes chunk selection and ordering under explicit cost constraints; (2) modeling and exploiting the non‑monotonic utility of chunk sets; and (3) providing a learned, query‑aware configuration mechanism that tailors the search process to diverse query types. Limitations include reliance on a fixed reranker (future work could co‑optimize reranker and search) and the need to extend the method to multimodal or very long chunks. Overall, CARROT represents a significant step toward more efficient and accurate knowledge‑augmented generation, with clear pathways for integration into larger agentic or multi‑step LLM systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment