Abacus: A Cost-Based Optimizer for Semantic Operator Systems

Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic operators: a declarative set of AI-powered data transformations with natural language specifications. These include LLM-powered maps, filters, joins, etc. used for document processing tasks such as information extraction, summarization, and more. While systems of semantic operators have achieved strong performance on benchmarks, they can be difficult to optimize. An optimizer for this setting must determine how to physically implement each semantic operator in a way that optimizes the system globally. Existing optimizers are limited in the number of optimizations they can apply, and most (if not all) cannot optimize system quality, cost, or latency subject to constraint(s) on the other dimensions. In this paper we present Abacus, an extensible, cost-based optimizer which searches for the best implementation of a semantic operator system given a (possibly constrained) optimization objective. Abacus estimates operator performance by leveraging a minimal set of validation examples, prior beliefs about operator performance, and/or an LLM judge. We evaluate Abacus on document processing workloads in the biomedical and legal domains (BioDEX; CUAD) and multi-modal question answering (MMQA). We demonstrate that, on-average, systems optimized by Abacus achieve 6.7%-39.4% better quality and are 10.8x cheaper and 3.4x faster than the next best system.


💡 Research Summary

The paper introduces Abacus, a cost‑based optimizer designed for semantic operator systems that power large‑scale, LLM‑driven data processing pipelines over unstructured documents. Modern programming frameworks such as Palimpzest, LOTUS, and DocETL let developers compose “semantic operators” (LLM‑powered maps, filters, joins, etc.) in a declarative fashion. While these frameworks have shown strong benchmark performance, they lack a sophisticated optimizer capable of balancing three critical dimensions: output quality, monetary cost, and latency, especially under user‑specified constraints.

Problem Statement – An optimizer must decide how to physically implement each logical semantic operator. Choices include which LLM model to invoke, whether to use ensembles, context‑reduction techniques, temperature settings, and many other hyper‑parameters. The combinatorial explosion of possible physical implementations makes exhaustive search infeasible, and traditional relational query optimizers cannot be directly applied because semantic operators lack reliable cardinality or selectivity statistics.

Core Contributions

  1. Infinite‑armed bandit sampling – Abacus treats the discovery of high‑quality physical operators as an infinite‑armed bandit problem. It adapts a Upper‑Confidence‑Bound (UCB) algorithm to explore the Pareto frontier of cost vs. quality (or latency) while respecting a limited sampling budget. Prior beliefs about operator performance can be injected to bias early exploration toward promising candidates, dramatically reducing the number of required LLM calls.
  2. Operator‑wise performance decomposition – Instead of estimating every full plan, Abacus measures each sampled physical operator’s cost, latency, and quality on a small validation set. It then approximates a plan’s overall metrics as the sum (or simple composition) of its constituent operators, enabling rapid evaluation of a combinatorially large plan space.
  3. Pareto‑Cascades dynamic programming – Traditional Cascades optimizers maintain only the best cost for each sub‑plan, which is insufficient for constrained optimization. Abacus extends this idea by keeping the entire Pareto frontier of sub‑plans (cost‑quality‑latency triples) during the DP process. This allows the final selection to satisfy arbitrary constraints (e.g., “maximise quality subject to cost ≤ $1”).

System Workflow – The user supplies an AI program (a DAG of semantic operators), an optimization objective (unconstrained or with constraints), the input dataset, and optionally a tiny validation set, prior beliefs, or an LLM judge for quality assessment. Abacus compiles the program into a logical plan, enumerates all valid physical implementations via rule‑based transformation and implementation rules, samples a subset of operators using the bandit strategy, builds a cost model from the validation runs, and finally runs Pareto‑Cascades to return the Pareto‑optimal physical plan.

Evaluation – Experiments cover three domains: biomedical literature extraction (BioDEX), legal contract analysis (CUAD), and multimodal question answering (MMQA). Compared against the best existing optimizers (DocETL and LOTUS), Abacus achieves:

  • Quality improvements of 6.7 %–39.4 % (average 20.8 %).
  • Cost reductions of 10.8×.
  • Latency reductions of 3.4×.
    When a hard cost budget of $1 is imposed, Abacus still attains near‑optimal quality by selecting lighter‑weight models and context‑reduction strategies. With prior belief information, the same sampling budget yields up to 3.04× higher quality than a naïve random sampling baseline. An ablation study isolates each component (MAB sampling, Pareto‑Cascades, priors) and confirms that removing any of them degrades constrained‑optimization performance.

Limitations & Future Work – The current system still incurs an upfront sampling cost; more sophisticated adaptive budgeting could further reduce overhead. The quality of prior beliefs heavily influences early exploration, suggesting a need for automated belief learning. Extending transformation rules to include operator merging, multi‑join reordering, and cross‑modal optimizations are identified as promising directions. Additionally, integrating LLM judges for automatic label generation could eliminate the need for any human‑provided validation data.

Conclusion – Abacus is the first extensible, cost‑based optimizer that simultaneously handles quality, cost, and latency objectives (with optional constraints) for semantic operator systems. By marrying infinite‑armed bandit sampling, operator‑wise decomposition, and Pareto‑Cascades DP, it delivers substantial gains in effectiveness and efficiency across diverse real‑world workloads, establishing a new baseline for LLM‑driven data processing pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment