Sample-Efficient Optimization over Generative Priors via Coarse Learnability

Sample-Efficient Optimization over Generative Priors via Coarse Learnability
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In zeroth-order optimization, we seek to minimize a function $d(\cdot)$, which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior $Ł(\cdot)$, for example, a Large Language Model (LLM). The objective is to find solutions $s$ that minimize $d(s)$ while having high probability under $Ł(s)$, effectively sampling from a target distribution proportional to $Ł(s) \cdot e^{-T \cdot d(s)}$ for a temperature parameter $T$. While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.


💡 Research Summary

The paper tackles a class of zeroth‑order optimization problems in which a cost function d(s) (measuring, for example, distance from global feasibility) can only be accessed through function evaluations, while additional qualitative or contextual constraints are encoded in a generative prior L(s) – such as a large language model (LLM). The goal is to sample from the target distribution

 p_T(s) ∝ L(s)·e^{‑T·d(s)}

where T is a temperature parameter that balances adherence to the prior against minimization of d. This formulation captures many real‑world tasks (e.g., scenic route planning, conference scheduling) where local, user‑specified preferences are naturally modeled by an LLM, but global combinatorial feasibility must be enforced algorithmically.

Existing model‑based optimization (MBO) methods such as the Cross‑Entropy method or Model Reference Adaptive Search (MRAS) provide asymptotic convergence guarantees under strong structural assumptions (e.g., natural exponential families). Those guarantees break down for deep generative models, which are black‑box learners without known parametric form, and no finite‑sample complexity bounds are known.

Key contribution – coarse learnability.
The authors introduce a new statistical assumption called coarse learnability. Informally, if an agent draws a polynomial number of samples from a distribution that already overlaps substantially with the target, it can learn a new model whose point‑wise density approximates the target within a polynomial factor, except on an exponentially small error set. This is a much weaker requirement than exact density estimation; it only demands sufficient coverage of the target’s support.

Algorithm – ALDRIFT.
Building on this assumption, the paper proposes ALDRIFT (Algorithm‑LLM Driven Iterated Fine‑Tuning). The algorithm maintains a temperature τ that is gradually increased from 0 to the desired T. At each temperature it defines an intermediate target

 p_τ(s) ∝ L(s)·e^{‑τ·d(s)}.

The current LLM serves as a proposal distribution q(s). A Metropolis–Hastings (MH) step is performed using only relative likelihood ratios to obtain exact samples from p_τ, thereby avoiding the need to estimate the normalizing constant – a task that would otherwise require exponentially many samples when L and p_T are far apart. The MH samples are treated as “elite” examples; the LLM is fine‑tuned on them, improving its proposal quality for the next temperature. This loop repeats until τ = T.

Theoretical results.
Under the coarse learnability assumption, the authors prove:

  1. Each temperature step requires only a polynomial number of oracle calls to d and a polynomial number of samples from the current model to achieve a desired approximation of p_τ.
  2. The entire ALDRIFT procedure converges to the final target distribution p_T with overall polynomial sample complexity.

The proofs rely on two observations. First, the MH correction guarantees that the distribution of accepted samples matches p_τ exactly, regardless of how far q is from p_τ. Second, coarse learnability ensures that after fine‑tuning, the new proposal q′ covers the support of p_τ well enough to keep the next MH step efficient. Thus the algorithm tolerates imperfect learning (“coarse” updates) while still making provable progress.

Support for the assumption.
The paper provides both theoretical and empirical evidence that coarse learnability is plausible. Theoretically, it shows that maximum‑likelihood estimation (MLE) naturally yields the required coverage in two regimes:

  • In well‑specified exponential families, MLE’s consistency gives pointwise density ratios bounded by a polynomial in the sample size.
  • In misspecified settings (e.g., a unimodal model trying to approximate a multimodal target), the moment‑matching property forces the learned model to form a “coverage envelope” that contains the target’s support, satisfying the coarse condition even though exact density approximation is impossible.

Empirically, the authors run ALDRIFT‑style experiments with GPT‑2 and larger LLMs on combinatorial tasks such as scenic route generation, conference session scheduling, and spanning‑tree construction. The LLM supplies local constraints (e.g., “scenic”, “duration limits”), while d encodes global feasibility (e.g., connectivity, total waiting time). After only a few fine‑tuning iterations, the models produce solutions that respect both local and global constraints, demonstrating that real‑world LLMs can act as coarse learners in practice.

Related work and positioning.
The authors contrast their approach with prior LLM‑algorithm hybrids that are largely empirical and lack formal guarantees. They also clarify the connection to MRAS and to reinforcement‑learning policy‑search methods that view the problem as an EM procedure; the novelty here is the explicit modeling of coarse learning errors and the use of MH to correct them. The work thus bridges a gap between theoretical MBO literature (which previously offered no finite‑sample bounds for deep generative models) and practical LLM‑based optimization pipelines.

Impact and future directions.
By establishing polynomial‑sample guarantees under a realistic, weak learnability condition, the paper provides the first rigorous foundation for using black‑box generative models (including LLMs) in zeroth‑order combinatorial optimization. It opens several avenues: (i) proving coarse learnability for transformer‑based models, (ii) extending the framework to continuous domains or mixed discrete‑continuous spaces, and (iii) integrating richer feedback signals (e.g., human preferences) into the MH‑guided fine‑tuning loop. Overall, the work offers a compelling blend of theory, algorithm design, and empirical validation, setting a new benchmark for hybrid AI systems that combine the expressive power of generative models with the rigor of classical optimization.


Comments & Academic Discussion

Loading comments...

Leave a Comment