AgentSM: Semantic Memory for Agentic Text-to-SQL
Recent advances in LLM-based Text-to-SQL have achieved remarkable gains on public benchmarks such as BIRD and Spider. Yet, these systems struggle to scale in realistic enterprise settings with large, complex schemas, diverse SQL dialects, and expensive multi-step reasoning. Emerging agentic approaches show potential for adaptive reasoning but often suffer from inefficiency and instability-repeating interactions with databases, producing inconsistent outputs, and occasionally failing to generate valid answers. To address these challenges, we introduce Agent Semantic Memory (AgentSM), an agentic framework for Text-to-SQL that builds and leverages interpretable semantic memory. Instead of relying on raw scratchpads or vector retrieval, AgentSM captures prior execution traces-or synthesizes curated ones-as structured programs that directly guide future reasoning. This design enables systematic reuse of reasoning paths, which allows agents to scale to larger schemas, more complex questions, and longer trajectories efficiently and reliably. Compared to state-of-the-art systems, AgentSM achieves higher efficiency by reducing average token usage and trajectory length by 25% and 35%, respectively, on the Spider 2.0 benchmark. It also improves execution accuracy, reaching a state-of-the-art accuracy of 44.8% on the Spider 2.0 Lite benchmark.
💡 Research Summary
The paper introduces AgentSM, a novel agentic framework for Text‑to‑SQL that tackles three core shortcomings of existing approaches in realistic enterprise settings: (1) redundant schema exploration, (2) rigid, one‑size‑fits‑all reasoning strategies, and (3) high variance leading to unstable query generation. Instead of relying on raw scratchpad logs or simple vector retrieval, AgentSM builds a “semantic memory” – a structured repository of past execution trajectories. Each trajectory records the sequence of tool calls, intermediate code, inputs, outputs, and semantic annotations, effectively turning raw interaction traces into reusable programs.
When a new natural‑language question arrives for a database that the system has previously seen, the planner agent queries this memory for semantically similar past trajectories and reuses relevant steps. This eliminates repeated schema reads, vector searches, and sample queries, cutting average token consumption by 25 % and trajectory length by 35 % on the Spider 2.0 benchmark.
AgentSM’s architecture consists of two tightly coupled agents: a Planner Agent and a Schema‑Linking Agent. Both are coded in Python and have access to rich libraries (Pandas, NumPy, etc.). The Planner orchestrates the overall reasoning loop, generates SQL, validates results, and manages memory retrieval and storage. The Schema‑Linking Agent is invoked only when deeper schema inspection is needed; it operates under a strict step budget (e.g., five steps) and uses a specialized vector‑search tool to locate relevant tables and columns. By delegating fine‑grained exploration to a separate agent, the system avoids context drift and keeps the Planner’s reasoning focused.
Tool design is another key contribution. In addition to generic tools (file reading, SQL execution on back‑ends such as BigQuery, Snowflake, SQLite), AgentSM introduces composite tools that bundle frequently co‑occurring operations (e.g., “search‑and‑sample” combines vector search and a sample‑row query). These composite tools reduce the number of tool calls, simplify planning, and improve consistency across runs. The SQL execution tool incorporates self‑refinement: if a query fails or returns empty results, the tool automatically generates a corrected version based on the error feedback, preventing a single failure from derailing the entire trajectory.
To enrich the memory, the authors synthesize a large set of “synthetic questions” that deliberately probe schema structures. Execution traces from these synthetic explorations are stored alongside real‑world traces, ensuring that even for previously unseen queries the system can draw on rich exploration data.
Empirical evaluation on Spider 2.0 and its Lite variant demonstrates state‑of‑the‑art performance. AgentSM achieves 44.8 % execution accuracy on Spider 2.0 Lite, surpassing all prior methods. Ablation studies reveal that memory reuse contributes the most to trajectory shortening, while composite tools and self‑refinement each add measurable gains in efficiency and robustness.
Beyond Text‑to‑SQL, the authors argue that the semantic‑memory paradigm is applicable to other data‑centric agentic tasks such as data cleaning, transformation, and information extraction, where reusable reasoning paths are essential for scaling to large, heterogeneous databases. In summary, AgentSM offers a scalable, stable, and efficient solution by turning agentic experience into an interpretable, retrievable knowledge base, thereby bridging the gap between research benchmarks and real‑world enterprise analytics.
Comments & Academic Discussion
Loading comments...
Leave a Comment