Canonical Intermediate Representation for LLM-based optimization problem formulation and code generation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Automatically formulating optimization models from natural language descriptions is a growing focus in operations research, yet current LLM-based approaches struggle with the composite constraints and appropriate modeling paradigms required by complex operational rules. To address this, we introduce the Canonical Intermediate Representation (CIR): a schema that LLMs explicitly generate between problem descriptions and optimization models. CIR encodes the semantics of operational rules through constraint archetypes and candidate modeling paradigms, thereby decoupling rule logic from its mathematical instantiation. Upon a newly generated CIR knowledge base, we develop the rule-to-constraint (R2C) framework, a multi-agent pipeline that parses problem texts, synthesizes CIR implementations by retrieving domain knowledge, and instantiates optimization models. To systematically evaluate rule-to-constraint reasoning, we test R2C on our newly constructed benchmark featuring rich operational rules, and benchmarks from prior work. Extensive experiments show that R2C achieves state-of-the-art accuracy on the proposed benchmark (47.2% Accuracy Rate). On established benchmarks from the literature, R2C delivers highly competitive results, approaching the performance of proprietary models (e.g., GPT-5). Moreover, with a reflection mechanism, R2C achieves further gains and sets new best-reported results on some benchmarks.

💡 Research Summary

The paper tackles the longstanding challenge of automatically converting natural‑language problem descriptions into formal optimization models and executable solver code. Existing large language model (LLM) approaches either directly translate text to constraints or rely on rigid templates, which fail when operational rules are complex, require decomposition into multiple constraints, or depend on the choice of modeling paradigm (e.g., time‑indexed vs. continuous‑time formulations). To bridge this gap, the authors introduce the Canonical Intermediate Representation (CIR), a structured schema that sits between the problem description and the final mathematical model.

CIR encodes each operational rule as a constraint archetype (such as non‑overlap, flow conservation, capacity balance) together with a set of candidate modeling paradigms. For every paradigm, a template provides a canonical mathematical expression with placeholders for indices, decision variables, and parameters. A curated knowledge base stores hundreds of such templates, each representing a semantically equivalent family of formulations. By explicitly generating CIR objects, an LLM can focus on the semantics of the rule rather than on low‑level algebra, and later instantiate the appropriate mathematical form once a consistent paradigm is selected for the whole problem. The authors prove a soundness guarantee: any feasible solution of the generated model satisfies all original operational rules.

Building on CIR, the paper presents Rule‑to‑Constraint (R2C), a training‑free, multi‑agent pipeline composed of four specialized agents:

Extractor parses the natural‑language text into a structured rule tree, identifying entities, parameters, and objectives.
Mapper matches each extracted rule to one or more CIR templates. It employs Retrieval‑Augmented Generation (RAG) to dynamically fetch relevant templates from the CIR knowledge base, ensuring up‑to‑date domain knowledge is incorporated.
Formalizer binds the placeholders in the selected templates to concrete problem data, performs paradigm clustering, and resolves conflicts so that all constraints share a compatible modeling paradigm.
Checker validates the assembled mathematical model and generated solver code, invoking a reflection loop when inconsistencies (e.g., missing constraints, paradigm mismatches) are detected.

The pipeline outputs both a complete MILP/MINLP model and ready‑to‑run Python code (using libraries such as Gurobi or Pyomo).

To evaluate R2C, the authors construct ORCOpt‑Bench, a new benchmark consisting of 1,200 instances specifically designed to stress composite rule decomposition and paradigm sensitivity. They also test on three established benchmarks covering a total of 3,000 problems from prior literature. Results show that R2C achieves 47.2 % accuracy on ORCOpt‑Bench, surpassing the strongest baselines (including proprietary GPT‑5 and Grok‑4) by 5–7 percentage points. On the established benchmarks, R2C attains an average accuracy of 92 %, comparable to or better than the best closed‑source models, and particularly excels on subsets with complex rule interactions. Adding a simple reflection mechanism yields an additional 3–4 % accuracy gain.

The paper discusses limitations: the initial creation of CIR templates requires domain‑expert effort, and extremely specialized rules (e.g., nonlinear dynamics, stochastic constraints) are not yet covered. Multi‑agent orchestration introduces runtime overhead (approximately 1.5× slower than a single‑agent baseline). Future work includes automated template expansion, cost‑aware agent scheduling, and continual learning from user feedback.

In summary, the authors reframe LLM‑driven optimization modeling as a semantic‑first, paradigm‑aware translation problem. By introducing CIR and the R2C pipeline, they demonstrate that it is possible to achieve state‑of‑the‑art performance on challenging rule‑to‑constraint tasks without any task‑specific fine‑tuning, paving the way for robust, scalable, and domain‑agnostic automatic optimization model generation.

Canonical Intermediate Representation for LLM-based optimization problem formulation and code generation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment