Multi-Field Tool Retrieval

Multi-Field Tool Retrieval
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Integrating external tools enables Large Language Models (LLMs) to interact with real-world environments and solve complex tasks. Given the growing scale of available tools, effective tool retrieval is essential to mitigate constraints of LLMs’ context windows and ensure computational efficiency. Existing approaches typically treat tool retrieval as a traditional ad-hoc retrieval task, matching user queries against the entire raw tool documentation. In this paper, we identify three fundamental challenges that limit the effectiveness of this paradigm: (i) the incompleteness and structural inconsistency of tool documentation; (ii) the significant semantic and granular mismatch between user queries and technical tool documents; and, most importantly, (iii) the multi-aspect nature of tool utility, that involves distinct dimensions, such as functionality, input constraints, and output formats, varying in format and importance. To address these challenges, we introduce Multi-Field Tool Retrieval, a framework designed to align user intent with tool representations through fine-grained, multi-field modeling. Experimental results show that our framework achieves SOTA performance on five datasets and a mixed benchmark, exhibiting superior generalizability and robustness.


💡 Research Summary

The paper tackles a critical bottleneck in the deployment of large language models (LLMs) for real‑world tasks: efficiently selecting the most appropriate external tools from massive, heterogeneous repositories. While recent work has shown that integrating tools can dramatically extend LLM capabilities—enabling grounding, dynamic adaptation, and long‑term functional learning—the sheer scale of available tools makes naïve inclusion impossible due to limited context windows and computational constraints. Consequently, a dedicated “tool retrieval” step is required, but existing approaches simply treat each tool’s documentation as a single unstructured text and apply traditional ad‑hoc retrieval methods (BM25, dense retrievers, etc.).

The authors identify three fundamental shortcomings of this paradigm: (1) tool documentation is often incomplete, inconsistently formatted, and lacks a unified taxonomy; (2) user queries are high‑level, ambiguous, and frequently describe composite tasks, whereas tool docs are low‑level, technical, and atomic; (3) the utility of a tool for a given query is multi‑dimensional, involving functional fit, input constraints, output format, and usage scenarios. Treating the whole document as one relevance signal ignores these aspects and leads to sub‑optimal retrieval.

To overcome these issues, the authors propose Multi‑Field Tool Retrieval (MFTR), a framework that decomposes both tool representations and queries into multiple fine‑grained fields and aligns them separately. The core components are:

  1. Documentation Standardization – Using an LLM‑driven prompting pipeline, raw tool docs are transformed into a unified schema consisting of four fields: description (high‑level purpose), parameters (input names, types, semantics, required/optional flags), response (expected output behavior), and examples (representative usage scenarios). This step mitigates structural heterogeneity while avoiding over‑specification that could force hallucination.

  2. Query Rewriting & Alignment – User queries are parsed and rewritten into a structured representation that mirrors the four‑field schema. For instance, “find the cheapest flight to New York tomorrow” is mapped to a description (“flight search”), parameters (origin, destination, date, price‑optimisation flag), expected response (flight options), and example intents. This alignment bridges the semantic granularity gap between high‑level intents and low‑level API specifications.

  3. Multi‑Field Matching – Each field pair (query‑field vs. tool‑field) is scored independently using a semantic matcher (dense retriever, cross‑encoder, or other learned similarity model). Because different tasks may rely more heavily on certain fields (e.g., parameters for data‑driven tasks, examples for ambiguous intents), the framework learns adaptive weights for each field through end‑to‑end training, optimizing a ranking loss over relevance judgments. The final relevance score is a weighted sum of the per‑field scores.

The authors evaluate MFTR on five public tool‑retrieval datasets—Gorilla, APIBank, APIGen, and two additional corpora—and also construct a mixed benchmark that aggregates all sources to simulate a large‑scale, heterogeneous repository. Baselines include classic sparse methods (BM25), state‑of‑the‑art dense retrievers (Contriever, BGE‑Large), and tool‑specific models (EasyTool, ToolDE, COLT). Across all metrics (NDCG@10, MAP, Recall@k), MFTR consistently outperforms baselines, achieving average improvements of 8–12 percentage points. Ablation studies with field masking reveal that the contribution of each field varies by dataset, confirming that a single‑document approach discards valuable signals.

Key insights from the work are:

  • Standardization matters: Normalizing documentation into a concise, four‑field schema dramatically reduces noise and enables reliable cross‑tool comparison.
  • Fine‑grained alignment is essential: Mapping user intents to the same multi‑field structure resolves the semantic and granularity mismatch that hampers traditional retrieval.
  • Adaptive weighting captures multi‑aspect utility: Learning field importance allows the system to prioritize the most informative aspects for each query, reflecting the true multi‑dimensional nature of tool usefulness.

In summary, MFTR provides a principled solution to tool retrieval by treating tool documentation as a structured, multi‑aspect object rather than a monolithic text. By jointly standardizing docs, rewriting queries, and learning adaptive field weights, the framework achieves superior accuracy and robustness, paving the way for scalable, tool‑augmented LLM agents capable of operating in dynamic, real‑world environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment