A Survey of Query Optimization in Large Language Models
Query Optimization (QO) has become essential for enhancing Large Language Model (LLM) effectiveness, particularly in Retrieval-Augmented Generation (RAG) systems where query quality directly determines retrieval and response performance. This survey provides a systematic and comprehensive analysis of query optimization techniques with three principal contributions. \textit{First}, we introduce the \textbf{Query Optimization Lifecycle (QOL) Framework}, a five-phase pipeline covering Intent Recognition, Query Transformation, Retrieval Execution, Evidence Integration, and Response Synthesis, providing a unified lens for understanding the optimization process. \textit{Second}, we propose a \textbf{Query Complexity Taxonomy} that classifies queries along two dimensions, namely evidence type (explicit vs.\ implicit) and evidence quantity (single vs.\ multiple), establishing principled mappings between query characteristics and optimization strategies. \textit{Third}, we conduct an in-depth analysis of four atomic operations, namely \textbf{Query Expansion}, \textbf{Query Decomposition}, \textbf{Query Disambiguation}, and \textbf{Query Abstraction}, synthesizing a broad spectrum of representative methods from premier venues. We further examine evaluation methodologies, identify critical gaps in existing benchmarks, and discuss open challenges including process reward models, efficiency optimization, and multi-modal query handling. This survey offers both a structured foundation for research and actionable guidance for practitioners.
💡 Research Summary
This survey provides the first comprehensive overview of query optimization (QO) for large language model (LLM)–based Retrieval‑Augmented Generation (RAG) systems. The authors argue that the quality of the user query is a decisive factor for retrieval effectiveness and, consequently, for the factual accuracy of LLM‑generated answers. To bring order to a rapidly expanding literature that is scattered across information retrieval, natural language processing, knowledge‑graph reasoning, and conversational AI, the paper makes three principal contributions.
First, it introduces the Query Optimization Lifecycle (QOL) Framework, a five‑phase pipeline: (1) Intent Recognition, (2) Query Transformation, (3) Retrieval Execution, (4) Evidence Integration, and (5) Response Synthesis. Each phase is described in detail, and feedback loops are emphasized so that downstream performance can trigger upstream refinements (e.g., low‑quality evidence can cause the system to return to Intent Recognition for further disambiguation).
Second, the authors propose a Query Complexity Taxonomy based on two orthogonal dimensions: evidence type (explicit vs. implicit) and evidence quantity (single vs. multiple). The cross‑product yields four classes: (I) single‑explicit, (II) multi‑explicit, (III) single‑implicit, and (IV) multi‑implicit. The taxonomy maps each class to a primary optimization operation—expansion, decomposition, disambiguation, or abstraction—while allowing secondary strategies. This classification gives practitioners a principled way to select the most appropriate QO technique given the nature of the information need.
Third, the survey conducts an in‑depth analysis of the four atomic operations that constitute the core of Phase 2:
-
Query Expansion – covers classic pseudo‑relevance feedback, dense‑retrieval‑based semantic expansion, and LLM‑generated expansions (e.g., HyDE, self‑ask). The authors discuss “query drift” risks and mitigation via context‑weighted term selection.
-
Query Decomposition – surveys rule‑based parsing, graph‑based intent splitting, and recent meta‑prompt approaches such as Tree‑of‑Thought. The trade‑off between sequential and parallel execution of sub‑queries is examined with empirical evidence.
-
Query Disambiguation – reviews retrieval‑augmented clarification, feedback loops that ask users for clarification, and reinforcement‑learning policies that select the most informative disambiguation action. The impact of clarification questions on overall answer accuracy is quantified.
-
Query Abstraction – explores ontology‑driven concept mapping, LLM‑based high‑level concept extraction, and “prompt‑to‑prompt” transformations that elevate a query to a more abstract representation, which is especially useful for multi‑implicit (class IV) queries.
The paper also critiques current evaluation practices, noting that most benchmarks focus on answer correctness (accuracy, recall) and ignore process‑level metrics such as token cost, latency, and per‑phase success rates. Moreover, there is a lack of standardized tests for multimodal queries (image, audio) and for long‑turn conversational contexts.
Finally, the authors outline four open challenges: (1) designing process‑level reward models that can supervise the entire QOL pipeline, (2) balancing efficiency‑quality trade‑offs through lightweight models or adaptive computation, (3) extending QO to multimodal inputs, and (4) establishing standardized, comprehensive benchmarks that capture both end‑task performance and intermediate optimization quality. They suggest that reinforcement‑learning‑from‑human‑feedback (RLHF) could be extended to reward not only the final answer but also the quality of intermediate query transformations.
In sum, the survey positions query optimization as a distinct research area, provides a unifying theoretical framework, a practical taxonomy, and a detailed review of the four core operations, while also highlighting methodological gaps and future research directions. It serves as both a reference for scholars and a practical guide for engineers building LLM‑RAG systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment