Finite and Corruption-Robust Regret Bounds in Online Inverse Linear Optimization under M-Convex Action Sets

Finite and Corruption-Robust Regret Bounds in Online Inverse Linear Optimization under M-Convex Action Sets
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study online inverse linear optimization, also known as contextual recommendation, where a learner sequentially infers an agent’s hidden objective vector from observed optimal actions over feasible sets that change over time. The learner aims to recommend actions that perform well under the agent’s true objective, and the performance is measured by the regret, defined as the cumulative gap between the agent’s optimal values and those achieved by the learner’s recommended actions. Prior work has established a regret bound of $O(d\log T)$, as well as a finite but exponentially large bound of $\exp(O(d\log d))$, where $d$ is the dimension of the optimization problem and $T$ is the time horizon, while a regret lower bound of $Ω(d)$ is known (Gollapudi et al. 2021; Sakaue et al. 2025). Whether a finite regret bound polynomial in $d$ is achievable or not has remained an open question. We partially resolve this by showing that when the feasible sets are M-convex – a broad class that includes matroids – a finite regret bound of $O(d\log d)$ is possible. We achieve this by combining a structural characterization of optimal solutions on M-convex sets with a geometric volume argument. Moreover, we extend our approach to adversarially corrupted feedback in up to $C$ rounds. We obtain a regret bound of $O((C+1)d\log d)$ without prior knowledge of $C$, by monitoring directed graphs induced by the observed feedback to detect corruptions adaptively.


💡 Research Summary

This paper addresses the open problem of achieving a finite regret bound that scales polynomially with the dimension d in online inverse linear optimization (also called contextual recommendation). In the online setting, an agent with an unknown linear objective vector w* repeatedly solves a linear program max⟨w*,x⟩ subject to x∈Xₜ, where the feasible set Xₜ can change each round. The learner observes the agent’s chosen optimal actions (and the feasible sets) and must produce, for each round, an estimate ˆwₜ and a recommended action ˆxₜ that maximizes ⟨ˆwₜ,x⟩ over Xₜ. Regret is defined as the cumulative gap R_T = ∑ₜ⟨w*,xₜ − ˆxₜ⟩. Prior work gave O(d log T) bounds, an exponential‑in‑d finite bound exp(O(d log d)), and a lower bound Ω(d), leaving it unclear whether a finite, d‑polynomial bound independent of T exists.

The authors focus on the case where every feasible set Xₜ is M‑convex, a discrete convexity notion that includes matroids and many integer‑lattice extensions (e.g., bounded‑multiplicity m‑sets). A key structural result (Proposition 2.4) shows that for an M‑convex set, a point x is optimal for a weight vector w iff w respects the exchange order: for any feasible exchange of one unit from coordinate i to j (i.e., x−e_i+e_j∈Xₜ), we must have w(i) ≥ w(j). This property yields far richer information from each observed optimal action than in general polyhedral sets.

The paper first presents a simple O(d²) regret algorithm that updates a set of possible rank orders of w* using the exchange constraints. Then, by introducing a geometric volume argument, the authors refine the method to achieve a finite O(d log d) regret bound that does not depend on the time horizon T. The idea is to maintain a convex region Vₜ in ℝᵈ representing all weight vectors consistent with the observations so far. Each new optimal action yields a collection of half‑space constraints derived from the exchange property; intersecting Vₜ with these half‑spaces shrinks its volume by at least a factor (1 − 1/d). Since the volume can shrink only O(d log d) times before becoming negligible, the total regret is bounded by O(d log d). This argument mirrors classic volume‑reduction analyses in online learning but crucially exploits the M‑convex exchange structure to obtain the stronger per‑round reduction.

The authors also consider adversarial corruption: up to C rounds the agent’s reported action may be arbitrary (not optimal). They design a detection mechanism based on directed graphs built from the observed exchange relations. If the graph becomes cyclic, a contradiction in the implied order of w* is detected, indicating a corrupted round. Upon detection, the algorithm restarts; because there can be at most C corruptions, there are at most C + 1 restarts. Each clean segment incurs O(d log d) regret, leading to an overall bound of O((C + 1) d log d). Importantly, the algorithm does not need prior knowledge of C; corruption is identified online via the acyclicity test.

On the lower‑bound side, the paper adapts the Ω(d) information‑theoretic argument from prior work to the M‑convex setting, showing that any algorithm must suffer at least linear regret in d, even when the feasible sets are M‑convex. Hence the O(d log d) upper bound is tight up to a logarithmic factor.

The contributions are summarized as follows:

  1. An algorithm achieving a finite, dimension‑polynomial regret O(d log d) for online inverse linear optimization over M‑convex action sets.
  2. An extension that is robust to up to C adversarially corrupted feedback rounds, with regret O((C + 1) d log d) and no need to know C in advance.
  3. A matching Ω(d) lower bound for the M‑convex case, establishing near‑optimality of the proposed bound.
  4. A discussion of why M‑convex sets are both broad (covering matroids, integer‑lattice extensions) and structurally rich enough to enable the volume‑reduction technique.

The paper’s results have practical relevance for combinatorial recommendation systems, network design, and resource allocation problems where the feasible actions are naturally matroid‑like and feedback may be noisy or malicious. By removing the dependence on the time horizon and providing corruption robustness, the proposed methods promise scalable, reliable performance in long‑running online platforms. Future directions include extending the approach to other discrete convexity classes (e.g., L‑convex), handling partial or bandit feedback, and empirical validation on real‑world recommendation datasets.


Comments & Academic Discussion

Loading comments...

Leave a Comment