Upper-Linearizability of Online Non-Monotone DR-Submodular Maximization over Down-Closed Convex Sets
We study online maximization of non-monotone Diminishing-Return(DR)-submodular functions over down-closed convex sets, a regime where existing projection-free online methods suffer from suboptimal regret and limited feedback guarantees. Our main cont…
Authors: Yiyang Lu, Haresh Jadav, Mohammad Pedramfar
Upp er-Linea rizabilit y of Online Non-Monotone DR- Submo dula r Maximization over Do wn-Closed Convex Sets Yiy ang Lu lu1202@pur due.e du Pur due University, W est L afayette, IN, USA Haresh Jada v phd2301106001@iiti.ac.in IIT Indore, MP, India Mohammad P edramfar mohammad.p e dr amfar@mila.queb e c Mila - Queb ec AI Institute/McGil l University, Montr eal, QC, Canada Ran veer Singh r anve er@iiti.ac.in IIT Indore, MP, India V aneet Aggarwal vane et@pur due.e du Pur due University, W est L afayette, IN, USA Abstract W e study online maximization of non-monotone Diminishing-Return(DR)-submo dular func- tions o ver down-closed conv ex sets, a regime where existing pro jection-free online metho ds suffer from sub optimal regret and limited feedback guarantees. Our main contribution is a new structural result showing that this class is 1 /e -linearizable under carefully designed exp onen tial reparametrization, scaling parameter, and surrogate p oten tial, enabling a reduc- tion to online linear optimization. As a result, we obtain O ( T 1 / 2 ) static regret with a single gradien t query p er round and unlo c k adaptive and dynamic regret guarantees, together with impro ved rates under semi-bandit, bandit, and zeroth-order feedback. A cross all feedback mo dels, our b ounds strictly impro ve the state of the art. 1 Intro duction Online optimization of submo dular and DR-submo dular functions has become a central primitiv e in mac hine learning, with applications in mean-field inference, rev en ue maximization, influence maximization, supply c hain management, p o w er netw ork reconfiguration, and exp erimen tal design (Bian et al., 2019; Ito & F ujimaki, 2016; Gu et al., 2023; Aldrighetti et al., 2021; Mishra et al., 2017; Li et al., 2023). In these problems, an algorithm rep eatedly selects actions from a con vex domain while an adversary reveals a reward function, and p erformance is measured through notions of static, adaptiv e, or dynamic regret. A long line of work has dev elop ed projection-free algorithms, typically based on F rank–W olfe or b oosting-style up dates (F azel & Sadeghi, 2023; Chen et al., 2018; Zhang et al., 2022; Pedramfar et al., 2023; Zhang et al., 2024). While most of the study in DR-submo dular optimization is for monotone ob jectiv es (Hassani et al., 2017; F azel & Sadeghi, 2023; Chen et al., 2018; Zhang et al., 2022), the non-monotone ob jectiv e plays an imp ortan t role in many applications such as price optimization, so cial netw orks recommendation, and budget allo cation (Ito & F ujimaki, 2016; Gu et al., 2023; Alon et al., 2012). In this work, we study non-monotone DR-submo dular maximization ov er down-closed conv ex sets, which remains particularly challenging. Down-closed domains include b o x constraints, knapsack p olytopes, and intersections of matroids, and arise naturally in resource allo cation and co verage problems. In this regime, even the b est approximation ratio for online optimization remains op en (Buch binder & F eldman, 2024), while the b est ac hiev able approximation rate is 1 /e (Thang & Sriv asta v, 2021; Zhang et al., 2023; Pedramfar et al., 2023). F urther, for this regime, existing pro jection-free metho ds either require multiple oracle queries per round or only achiev e sub optimal regret rates such as O ( T 2 / 3 ) P edramfar et al. (2024), and essentially no results were known for adaptiv e or dynamic regret. 1 T able 1: Comparison of Regret Bounds for Non-Monotone DR-Submo dular Maximization ov er Down-Closed Sets. Oracle denotes the feedback type ( ∇ F : Gradient, F : V alue). Our framework is the first to pro vide A daptive and Dynamic regret guarantees while ac hieving optimal static regret in the O (1) query regime. Thang & Sriv astav (2021) ac hieves O ( T 3 / 4 ) regret with T 3 / 4 queries p er round when β = 3 4 . Zhang et al. (2023) achiev es O ( T 1 / 2 ) regret with T 3 / 2 queries p er round when β = 3 2 , but only achiev es O ( T 4 / 5 ) regret with O (1) queries. Oracle F eedback Reference Approx. Queries Regret Guaran tees Static A daptive Dynamic ( × √ 1 + P T ) ∇ F Full Info (Thang & Sriv ast a v, 2021) 1 /e T β , β ∈ [0 , 3 4 ] O ( T 1 − β/ 3 ) – – Zhang et al. (2023) 1 /e T β , β ∈ [0 , 3 2 ] O ( T 1 − β/ 3 ) – – Zhang et al. (2023) 1 /e 1 O ( T 4 / 5 ) – – (Pedramf ar et al., 2024) 1 /e 1 O ( T 2 / 3 ) – – This Paper 1 / e 1 O ( T 1 / 2 ) (Prop. 1) O ( T 1 / 2 ) (Prop. 2) ˜ O ( T 1 / 2 ) (Prop. 3) Semi-Bandit Zhang et al. (2023) 1 /e 1 O ( T 4 / 5 ) – – (Pedramf ar et al., 2024) 1 /e 1 O ( T 3 / 4 ) – – This Paper 1 / e 1 O ( T 2 / 3 ) (Prop. 4) O ( T 2 / 3 ) (Prop. 4) ˜ O ( T 2 / 3 ) (Prop. 7) F Full Info Pedramf ar et al. (2024) 1 /e 1 O ( T 4 / 5 ) – – This Paper 1 / e 1 O ( T 3 / 4 ) (Prop. 5) O ( T 3 / 4 ) (Prop. 5) ˜ O ( T 3 / 4 ) (Prop. 7) Bandit Zhang et al. (2023) (Det.) ∗ 1 /e 1 O ( T 8 / 9 ) – – Pedramf ar et al. (2024) 1 /e 1 O ( T 5 / 6 ) – – This Paper 1 / e 1 O ( T 4 / 5 ) (Prop. 6) O ( T 4 / 5 ) (Prop. 6) ˜ O ( T 4 / 5 ) (Prop. 7) ∗ Without indication of de terministic (Det.), the oracle is by default assumed to b e stochastic with noise. Recen t w ork in Pedramfar & Aggarw al (2024) in tro duced the notion of line arizable function classes and sho wed how suc h a structure enables a generic reduction from online linear optimization to a wide family of non-con vex problems, including several DR-submo dular settings. While p ow erful, these results do es not cov er the non-monotone do wn-closed case with efficien t query complexit y and O ( T 1 / 2 ) regret. In this pap er, we close this gap b y establishing a new structural characterization for online non-monotone DR-submo dular maximization ov er down-closed conv ex sets. Our main technical contribution is to show that this class is 1 /e -line arizable under a carefully designed exp onential reparametrization and surrogate p otential. T ec hnical No v elty Ac hieving this result requires ov ercoming tw o fundamental theoretical barriers. First, w e b ypass the hea vy computational cost of standard contin uous greedy approximations by in tro ducing a Jacobian- corrected gradien t estimator. Unlik e naive sampling, this estimator explicitly incorp orates the curv ature of the exp onen tial mapping ( h ( x ) = 1 − e − x ) in to the gradient query , allowing us to construct unbiased linear surrogates of the non-monotone ob jective using a strict single-query budget. This is done through the query algorithm BQND (Algorithm 1). Second, we establish a reduction to online linear optimization (Algorithm 2). By proving that the ob jectiv e function is linearizable, we decouple the non-con vex oracle queries from the constrain t handling. This allows our framework to accept any efficient regret-minimizing algorithm for linear functions as a base learner, including pro jection-based, pro jection-free, and non-stationary metho ds, thereby inheriting their computational efficiency and regret guarantees. Once this structure is in place, a broad collection of algorithmic guarantees follow as immediate consequences through existing regret-transfer principles in P edramfar & Aggarwal (2024). W e obtain the first pro jection-free online algorithms ac hieving O ( T 1 / 2 ) static regret with only a single gradien t query per round. Moreov er, the same framew ork unlo cks adaptive and dynamic regret guarantees in adversarial environmen ts, as well as impro ved rates under semi-bandit, bandit, and zeroth-order full-information feedbac k. A cross all feedback mo dels, our results strictly improv e the previously b est-known b ounds, as summarized in T able 1. Summary of Con tributions. Our contributions can b e summarized as follows: 1. W e prov e a new structural theorem (Theorem 1) showing that non-monotone DR-submo dular functions o ver down-closed conv ex sets are 1 /e -linearizable via an exp onential mapping and surrogate p oten tial. 2. Lev eraging this prop erty , we design pro jection-free online algorithms with O ( T 1 / 2 ) static 1 /e -regret with O (1) queries to the function oracle p er round, b eating SOT A of O ( T 2 / 3 ) . 2 3. Through generic reductions, we further obtain the first adaptive and dynamic regret guaran tees for this setting, together with improv ed rates for semi-bandit, bandit, and zeroth-order full-information feedbac k. The results are summarized in T able 1. 2 Related W orks Online DR-Submo dular Maximization While the online maximization of monotone DR-submo dular functions (Zhang et al., 2022; Hassani et al., 2017; Chen et al., 2018; F azel & Sadeghi, 2023; Pedramfar et al., 2023; 2025; Lu et al., 2025) is well-understoo d, p ermitting efficient greedy solutions (Streeter & Golo vin, 2008), the non-monotone regime presents significantly greater challenges. This is particularly true o ver down-closed conv ex sets, where algorithms must balance standard greedy ascen t steps with corrective reduction steps to navigate the non-monotone landscap e. While Thang & Sriv asta v (2021) established the first 1 /e -appro ximation for this setting, their approach only achiev es a regret rate of O ( T 3 / 4 ) . Subsequent w ork by Zhang et al. (2023) impro ved this to the O ( T 1 / 2 ) regret, but at the cost of computationally exp ensiv e batc h queries of O ( T 3 / 2 ) to the gradien t oracle. Their efficien t single-query v ariant, relying on standard pro jection-free techniques (Hazan & Kale, 2012; Jaggi, 2013), degrades to O ( T 4 / 5 ) regret. The regret rate for single-query algorithms was further improv ed to O ( T 2 / 3 ) in Pedramfar et al. (2024). More recen tly , P edramfar & Aggarwal (2024) prop osed the general class of line arizable functions, which include man y DR-submo dular functions. How ever, when applied to the sp ecific regime of non-monotone maximization o ver down-closed conv ex sets, their framework yields only a 1 / 4 -appro ximation ratio, falling short of the standard 1 /e b enc hmark. Consequently , a gap remains for an algorithm that simultaneously achiev es O ( T 1 / 2 ) 1 /e -regret and O (1) query efficiency in the down-closed regime. Non-Stationary and Limited F eedback While dynamic (Zink evic h, 2003; Zhang et al., 2018; Zhao et al., 2021) and adaptive (Hazan & Seshadhri, 2009; Garb er & Kretzu, 2022) regret b ounds are well-established for con vex optimization, these guaran tees remain less explored for non-monotone DR-submo dular functions. The combination of non-conv exity and environmen tal drift presents a formidable barrier that standard exp ert-trac king meta-algorithms fail to address. Recent unified pro jection-free framew orks and uniform wrapp er reductions developed by Pedramfar et al. (2024) and Pedramfar et al. (2025) hav e successfully addressed v arious feedback mo dels (semi-bandit, bandit) for adversarial DR-submo dular optimization, but they stop short of providing adaptive and dynamic regret guarantees for the challenging non-monotone, do wn-closed setting. W e pro vide the first dynamic regret guaran tees ( ˜ O ( p T (1 + P T ) ) ) for this problem class. F urthermore, we extend these robust guarantees to semi-bandit and bandit feedback settings, offering b etter results for non-stationary maximization under limited information of non-monotone DR-submo dular functions o ver down-closed conv ex set. 3 Prelimina ries 3.1 Notations W e use F to denote the function class to which all ob jective functions f t b elong. W e use A to denote an online optimization algorithm. W e denote vectors as b oldface low er-case letters (e.g., x , y ∈ R d ) and denote their co ordinates as x i , y i . F or an y tw o vectors x , y ∈ R d , we denote their element-wise pro duct by x ⊙ y and their elemen t-wise exp onential b y e x . The inequality x ≤ y is understo o d co ordinate-wise. W e define the co ordinate-wise probabilistic sum as x ⊕ y ≜ 1 − ( 1 − x ) ⊙ ( 1 − y ) . Constrain t Set W e consider optimization o ver a conv ex set K ⊆ [0 , 1] d . W e assume K is down-close d , meaning that for any y ∈ K and any 0 ≤ x ≤ y , we hav e x ∈ K . Additionally , we assume that: Assumption 1 (Geometry of Constraint Set) . The constrain t set K has b ounded diameter D > 0 , i.e., max x , y ∈K ∥ x − y ∥ ≤ D . T o ensure computational efficiency , we assume access to a base online linear optimization algorithm A that is efficien t for the constraint set K using either pro jection-based or pro jection-free (e.g., F rank-W olfe, SO-OGA) up dates. 3 DR-Submo dularit y W e fo cus on non-negative, differen tiable functions f : [0 , 1] d → R ≥ 0 . A function f o ver K is called c ontinuous DR-submo dular if ∀ x ≤ y ∈ K , we hav e ∇ f ( x ) ≥ ∇ f ( y ) . Equiv alently , if f is t wice-differentiable, all en tries of its Hessian matrix are non-positive ( ∇ 2 f ( x ) ≤ 0 ). W e explicitly do not assume monotonicit y; entries of ∇ f ( x ) may b e negative. Additionally , we assume that: Assumption 2 (Regularit y of Ob jectiv e F unction) . The functions f t ∈ F are M 1 -Lipsc hitz contin uous, i.e., ∥∇ f t ( x ) ∥ ≤ M 1 for all x ∈ K , and L -smo oth, i.e., ∥∇ f t ( x ) − ∇ f t ( y ) ∥ ≤ L ∥ x − y ∥ . 3.2 Problem Setting: A dversarial Online Optimization W e consider a standard adv ersarial online optimization of time horizon T with function class F o ver constraint set K . A t round t , the pla y er selects a pair of p oints ˆ x t , u t ∈ K , an adv ersary rev eals a ob jective function f t ∈ F with its query oracle O t . Then the play er plays ˆ x t , queries O t at u t , and receives feedback o t and up dates its decision. The query oracle is the only wa y for the play er to learn ab out the functions selected by the adv ersary . In our w ork, for the main algorithm, we assume that: Assumption 3 (Un biased Gradient Oracle) . F or the functions f t , the algorithm has access to a sto chastic gradien t oracle O such that for any query p oint x t , it returns a vector g t satisfying: E [ g t | x t ] = ∇ f t ( x t ) and ∥ g t ∥ ≤ B 1 . 3.3 Limited F eedback Setting When u t = ˆ x t , we sa y the queries are trivial, as the p oin ts of query are the same as the play ed actions; otherwise, we say the queries are non-trivial. W e sa y the provided oracle O t is first-order if it returns the gradien t of the given function at the p oint of query , or zeroth-order if it returns the v alue of the function. When the play er ha ve non-trivial queries, w e say the pla yer takes full-information feedbac k, which can b e either first-order or zeroth-order. When the play er has trivial queries, we say the play er takes semi-bandit feedbac k if the adversarial provide first-order oracles, or bandit feedback if zeroth-order oracles are provided. In Section 5, we b egin our analysis with algorithm under first-order full-information feedback, and giv es regret analysis. Then we consider zeroth-order full-information feedback, semi-bandit feedback, and bandit feedbac k, 3.4 Regret Our goal is to minimize the α -regret, which measures the gap b etw een a algorithm and an α -appro ximate static optim um. α is often referred to as the optimal approximation ratio. The α -static regret is defined as: R α ( T ) ≜ α max x ∈K T X t =1 f t ( x ) − T X t =1 E [ f t ( x t )] . (1) Bey ond static regret, we also consider robustness to non-stationary environmen ts via tw o adv anced metrics. A daptive regret is defined as the maximum regret ov er any contiguous interv al [ s, e ] ⊆ [ T ] , i.e., AR α ( T ) ≜ sup [ s,e ] ⊆ [ T ] ( α max x ∈K e X t = s f t ( x ) − e X t = s E [ f t ( x t )] ) . Dynamic regret is defined as the regret against a sequence of time-v arying comparators u ∗ t ≜ arg max u ∈K f t ( u ) , b ounded b y the path length P T = P T t =1 ∥ u ∗ t − u ∗ t − 1 ∥ , i.e., D R α ( T ) ≜ α T X t =1 f t ( u ∗ t ) − T X t =1 E [ f t ( x t )] . (2) 4 R emark 1 (Optimal Appro ximation Ratio α ) . Maximizing non-monotone DR-submo dular functions is NP- hard, even in the offline setting. F ormally , this means the b est efficient algorithm can only guarantee a v alue of α times the optimal v alue; thus, α is called the optimal approximation ratio. There is no p olynomial-time algorithm can achiev e an approximation ratio b etter than 0.478, even for simple cardinality constraints (Gharan & V ondrák, 2011; Qi, 2024). While recent offline algorithms hav e narrow ed this gap by achieving a 0.401-appro ximation (Buc hbinder & F eldman, 2024), the b est known guarantee for efficient online algorithms remains 1 /e ≈ 0 . 367 (Thang & Sriv astav, 2021; Zhang et al., 2023). Consequently , we define our regret against an α -appro ximate benchmark with α = 1 /e , which represents the optimal ratio ac hiev able by current online metho ds. 3.5 Upp er Linearizabilit y A core to ol in our analysis is the Upp er Line arizable framework introduced by Pedramfar & Aggarwal (2024). This prop ert y allows us to reduce non-conv ex DR-submo dular optimization to online linear optimization. Definition 1 (Upp er Linearizability) . A function f o ver K is α -Upp er Linearizable if there exists a mapping h : K → K and a vector field g : F × K → R d suc h that for all x , y ∈ K : αf ( y ) − f ( h ( x )) ≤ β ⟨ g ( f , x ) , y − x ⟩ . (3) In tuitively , the definition reduces online non-con vex DR-submo dular maximization to online linear optimization (OLO): it upp er-b ounds the scaled reward αf ( y ) (up to f ( h ( x )) ) b y the linear term ⟨ g ( f , x ) , y − x ⟩ . Thus, an y low-regret OLO algorithm run on g ( f t , x t ) transfers to α -regret for the original rew ards, up to the constan t β (and the mapping h ). W e call h reparameterization, β scaling parameter, and g the surrogate p oten tial. Here α is exactly the approximation/competitive factor in our regret b enchmark. 4 Non-monotone DR-submo dular Functions over Do wn-Closed Convex Sets a re Upp er-Linea rizable As previously describ ed, in this pap er, we consider the case where the constraint set K ⊆ [0 , 1] d is a down- closed con vex set containing the origin. W e show that the class of non-monotone DR-submodular functions o ver such sets is Upp er Linearizable with 1 /e -approximation ratio. A cen tral challenge for proving upp er-linearizablility of non-monotone functions ov er K is to construct a global lo wer b ound that relates the algorithm’s trajectory to an arbitrary comparator y (e.g., the global optim um). Unlik e monotone regimes where simple greedy algorithms suffice, the non-monotone landscap e requires balance b et ween ascending gradients and shrinking constraints. The following key Lemma ov ercomes this by establishing a nov el structural inequality for our exp onential reparameterization h z ( x ) . It prov es that this sp ecific reparameterization preserv es a strict low er bound relativ e to any target y , effectively bridging the gap b etw een the DR-submo dular prop erty and our optimal 1 /e appro ximation factor. Lemma 1. L et f b e a non-monotone c ontinuous DR-submo dular function over a down-close d c onvex set. F or any x , y ∈ [0 , 1] d and z ∈ [0 , 1] , let h z ( x ) = 1 − e − z x , we have: f ( h z ( x ) ⊕ y ) ≥ e − z ¯ x f ( y ) ≥ e − z f ( y ) (4) wher e ¯ x = max j [ x ] j . (her e [ x ] j is the j -th c omp onent of the ve ctor x ). The pro of of Lemma 1 is provided in App endix B. Next, we hav e the main result of this pap er: Theorem 1 (Main Theorem) . L et K ⊆ [0 , 1] d b e a down-close d c onvex set such that 0 ∈ K . L et f : [0 , 1] d → R ≥ 0 b e a non-ne gative, differ entiable, non-monotone DR-submo dular function. Define the mapping h : K → K as h ( x ) ≜ 1 − e − x and h z ( x ) ≜ 1 − e − z x . L et the ve ctor field g : F × K → R d b e g ( f , x ) ≜ ∇ F ( x ) , wher e F : K → R is the function define d by: F ( x ) ≜ Z 1 0 e z − 1 (1 − e − 1 ) z f ( 1 − e − z x ) − f ( 0 ) dz . (5) 5 Then for ∀ x , y ∈ K , the fol lowing ine quality holds: 1 e f ( y ) − f ( h ( x )) ≤ (1 − e − 1 ) ⟨ g ( f , x ) , y − x ⟩ . (6) Thus, the function f is α -Upp er-Line arizable with appr oximation c o efficient α = 1 e , the sc aling p ar ameter β = 1 − e − 1 , the r ep ar amterization h ( x ) = 1 − e − x , and surr o gate p otential g . Pr o of. Clearly we hav e F ( 0 ) = 0 . F or any x = 0 , the integrand in Equation (5) is a contin uous function of z that is b ounded b y: e z − 1 (1 − e − 1 ) z f ( 1 − e − z x ) − f ( 0 ) ( a ) ≤ e z − 1 (1 − e − 1 ) z M 1 ∥ 1 − e − z x ∥ ( b ) ≤ 1 1 − e − 1 M 1 , where (a) follows from the M 1 -Lipsc hitz contin uity of f , and (b) uses the b ound ∥ 1 − e − z x ∥ ≤ 1 . Therefore F is well-defined on [0 , 1] d . Next, differen tiating F with resp ect to x , ∇ F ( x ) = Z 1 0 e z − 1 (1 − e − 1 ) z ∇ f ( h z ( x )) − f ( 0 ) dz ( a ) = Z 1 0 e z − 1 (1 − e − 1 ) z ∇ f ( h z ( x )) ⊙ ∂ ∂ x h z ( x ) dz ( b ) = Z 1 0 e z − 1 1 − e − 1 ∇ f ( h z ( x )) ⊙ e − z x dz . (7) where (a) uses the chain rule w.r.t. x , and (b) uses ∂ ∂ x h z ( x ) = z e − z x b ecause h z ( x ) = 1 − e − z x . No w, we use (7), and x is indep endent of z , we ha v e (1 − e − 1 ) ⟨∇ F ( x ) , − x ⟩ ( a ) = Z 1 0 e z − 1 ∇ f ( h z ( x )) ⊙ e − z x , − x dz ( b ) = Z 1 0 e z − 1 ∇ f ( h z ( x )) , − x ⊙ e − z x dz , ( c ) = Z 1 0 e z − 1 − d dz f ( h z ( x )) dz (8) where (a) uses linearity of the inner pro duct and interc hange of in tegral and inner pro duct, (b) uses the iden tity ⟨ a ⊙ b , c ⟩ = ⟨ a , b ⊙ c ⟩ , and (c) uses the chain rule in z that d dz f ( h z ( x )) = ∇ f ( h z ( x )) , d dz h z ( x ) = ⟨∇ f ( h z ( x )) , x ⊙ e − z x ⟩ . W e now apply in tegration by parts to (8) with u ( z ) = e z − 1 and dv ( z ) = − d dz f ( h z ( x )) dz . Then du ( z ) = e z − 1 dz and v ( z ) = − f ( h z ( x )) . Hence, (1 − e − 1 ) ⟨∇ F ( x ) , − x ⟩ = h − e z − 1 f ( h z ( x )) i 1 0 + Z 1 0 e z − 1 f ( h z ( x )) dz = − f ( h 1 ( x )) + 1 e f ( h 0 ( x )) + Z 1 0 e z − 1 f ( h z ( x )) dz ≥ − f ( h ( x )) + Z 1 0 e z − 1 f ( h z ( x )) dz , 6 since h 1 ( x ) = h ( x ) and h 0 ( x ) = 0 . Again, using (7), we hav e (1 − e − 1 ) ⟨∇ F ( x ) , y ⟩ = Z 1 0 e z − 1 ⟨∇ f ( h z ( x )) ⊙ e − z x , y ⟩ dz ( a ) = Z 1 0 e z − 1 ⟨∇ f ( h z ( x )) , y ⊙ e − z x ⟩ dz ( b ) = Z 1 0 e z − 1 ⟨∇ f ( h z ( x )) , y ⊙ (1 − h z ( x )) ⟩ dz ( c ) ≥ Z 1 0 e z − 1 [ f ( h z ( x ) ⊕ y ) − f ( h z ( x ))] dz where (a) uses the identit y ⟨ a ⊙ b , c ⟩ = ⟨ a , b ⊙ c ⟩ , (b) is due to h z ( x ) = 1 − e − z x , and (c) uses Lemma 3 and the fact that h z ( x ) ⊕ y = 1 − ( 1 − h z ( x )) ⊙ ( 1 − y ) = h z ( x ) + y ⊙ (1 − h z ( x )) . Lemma 1 states that, ∀ x , y ∈ [ 0 , 1] d and z ∈ [0 , 1] , we hav e f ( h z ( x ) ⊕ y ) ≥ e − z f ( y ) . Th us, (1 − e − 1 ) ⟨∇F ( x ) , y ⟩ ≥ Z 1 0 e z − 1 e − z f ( y ) − f ( h z ( x )) dz ≥ f ( y ) Z 1 0 e z − 1 e − z dz − Z 1 0 e z − 1 f ( h z ( x )) dz = f ( y ) Z 1 0 e − 1 dz − Z 1 0 e z − 1 f ( h z ( x )) dz = 1 e f ( y ) − Z 1 0 e z − 1 f ( h z ( x )) dz . (9) Summing b oth terms (8) and (9): (1 − e − 1 ) ⟨ g ( f , x ) , y − x ⟩ = (1 − e − 1 ) ⟨∇ F ( x ) , y − x ⟩ = (1 − e − 1 ) [ ⟨∇F ( x ) , y ⟩ + ⟨∇F ( x ) , − x ⟩ ] = 1 e f ( y ) − f ( h ( x )) b ecause the in tegral terms R 1 0 e z − 1 f ( h z ) dz cancels. R emark 2 (Exp ectation form of ∇ F ) . Let Z ∈ [0 , 1] b e a random v ariable with CDF P ( Z ≤ z ) = Z z 0 e u − 1 1 − e − 1 du, (10) equiv alen tly with density p ( z ) = e z − 1 1 − e − 1 on [0 , 1] . Then w e hav e F ( x ) = E Z 1 Z ( f ( h Z ( x )) − f ( 0 )) . (11) and (7) admits the exp ectation representation ∇ F ( x ) = E Z ∇ f ( h Z ( x )) ⊙ e −Z x . (12) 7 In order to attain an unbiased and b ounded estimate of g , we pro vide Bo osted Query algorithm for Non- monotone DR-submo dular functions o v er Do wn-closed conv ex set (BQND), detailed in Algorithm 1, and w e formally state these prop erties in Lemma 2. Algorithm 1 is necessary to pro vide regret guaran tee for our main algorithm, as it satisfies the conditions of Theorem 3, the Regret T ransfer Theorem. Theorem 3 establishes the fundamental regret transfer guarantee for our framework, proving that the difficult problem of non-monotone maximization strictly reduces to the simpler problem of online linear optimization. It formalizes the Upp er-Linearizable reduction, guaranteein g that the regret of our main algorithm is b ounded b y the regret of the linear base learner on the surrogate functions, up to a scaling constant β . This allo ws us to directly inherit the conv ergence rates of the chosen base solver. Algorithm 1 Bo osted Query algorithm for Non-monotone DR-Submo dular functions o ver Do wn-closed con vex set (BQND) 1: Input: Poin t x ∈ K , first-order sto chastic oracle O for f . 2: Sampling: Sample z ∈ [0 , 1] according to Equation 10 3: Query: Construct p oint of query u = 1 − e − z x . 4: Query the first-order oracle O which returns a sto chastic sample of ∇ f ( u ) , denoted as v . 5: Output: Return the estimator g = v ⊙ e − z x . Lemma 2 (Prop erties of BQND Estimator) . L et g b e the output of A lgorithm 1 for an input x ∈ K and a first-or der or acle for function f that satisfies A ssumption 3. Then g satisfies that: E [ g ] = g ( f , x ) = ∇ F ( x ) , and ∥ g ∥ ≤ B 1 . i.e., Algorithm 1 r eturns an unbiase d estimate of g that is b ounde d by B 1 . Pr o of. F rom Algorithm 1, the output is g = v ⊙ e − z x , where under Assumption 3, E [ v ] = ∇ f ( 1 − e − z x ) and z is sampled according to Equation 10. T aking the exp ectation ov er z : E [ g ] = Z 1 0 e z − 1 1 − e − 1 e − z x ⊙ ∇ f ( 1 − e − z x ) dz . This in tegral is identical to the gradient deriv ation of the surrogate function F ( x ) prov ed in Theorem 1. Thus, E [ g ] = ∇ F ( x ) = g ( x ) . The norm of the output is: ∥ g ∥ = ∥ v ⊙ e − z x ∥ ≤ ∥ v ∥ · max i | e − z x i | Since x ∈ K ⊆ [0 , 1] d and z ≥ 0 , the term 0 < e − z x i ≤ 1 for all i . Therefore, the elemen t-wise shrinking cannot increase the norm. Using Assumption 3, we hav e ∥ g ∥ ≤ B 1 · 1 = B 1 . 5 Regret Results Applied from Linea rizable Optimization A key adv antage of the linearizable formulation is that it allows us to seamlessly transfer guarantees from Online Linear/Conv ex Optimization to our non-monotone DR-submo dular setting. T o b egin with, w e consider the most common setting, where the adversary provides a first-order noisy oracle as describ ed in Assumption 3. W e propose Algorithm 2, a mo dular reduction framew ork for maximizing non-monotone DR-submo dular functions o ver down-closed conv ex sets. The algorithm requires initializing tw o subroutines: a base learner A , which can b e any efficient regret- minimizing algorithm for online linear optimization ov er K , and a query algorithm G that is fixed to b e BQND (Algorithm 1). While our framework is compatible with any suc h linear solver, to obtain the sp ecific regret guaran tees in this paper, we instantiate A with Online Gradient Ascent via Separation Oracle (SO-OGA) 8 Algorithm 2 Adaptiv e Pro jection-free Online Non-monotone DR-Submo dular Maximization o v er Down- closed Con vex Sets 1: Input: Horizon T , Constraint set K , Stepsize η . 2: Initialize: 3: Base Learner A : An y online linear optimization algorithm with sublinear regret. 1 4: Query Algorithm G : BQND (Algorithm 1). 5: for t = 1 , . . . , T do 6: Receiv e action x t from Base Learner A 7: Pla y action ˆ x t = 1 − e − x t 8: A dversary selects a function f t and a first-order query oracle O t . 9: R un query algorithm: o t ← G ( O t , x t ) . 10: P ass o t to A to up date its state. 11: end for (App endix C) or Improv ed A der (IA) (App endix E). The base learner receives first-order feedback and up dates its decision to x t . Our main algorithm then plays the action ˆ x t and passes x t to the query algorithm, whic h interacts with the oracle to return an unbiased estimate of g . Algorithm 2 follows the reduction-to-linear-optimization paradigm for linearizable functions, instantiating the Online Maximization By Quadratization (OMBQ) meta-algorithm prop osed by P edramfar & Aggarwal (2024) (see App endix D.1) with our sp ecific exp onential mapping h ( x ) = 1 − e − x and the BQND query algorithm. This configuration yields the first algorithm for this domain with impro ved regret p ermitting O (1) oracle queries p er round. 5.1 Static Regret Our framework (Algorithm 2) works with an y base learner A that minimizes regret for linear functions. T o derive impro v ed regret rate that is efficient, we instantiate A with the pro jection-free learner SO-OGA in App endix C. This algorithm is prop osed by Garber & Kretzu (2022) and later refined by Pedramfar & Aggarw al (2024). The result is as follows: Prop osition 1 (Static Regret) . L et { f t } T t =1 b e a se quenc e of M 1 -Lipschitz non-monotone DR-submo dular functions. Instantiating b ase le arner to b e SO-OGA as describ e d in A lgorithm 3, Algorithm 2 achieves a 1 /e -appr oximation static r e gr et b ounde d by: E " α max u ∈K T X t =1 f t ( u ) − T X t =1 f t ( ˆ x t ) # ≤ O ( M 1 √ T ) (13) for any fixe d c omp ar ator y ∈ K . The algorithm r e quir es only O (1) gr adient queries p er r ound. Pr o of. Since we choose the base learner to b e SO-OGA, it follo ws from Theorem 2 that R SO-OGA 1 = O ( M 1 T 1 / 2 ) In Lemma 2, we prov ed that Algorithm 1 returns an unbiased b ounded estimate of g . Thus, using Theorem 3, w e hav e: E " α max u ∈K T X t =1 f t ( u ) − T X t =1 f t ( ˆ x t ) # ≤ β R SO-OGA 1 = O ( M 1 T 1 / 2 ) 1 Specific instantiations of A for our results are given in the prop ositions. 9 5.2 Adaptive and Dynamic Regrets Since SO-OGA is known to minimize adaptive regret for linear functions (Garb er & Kretzu, 2022; Pedramfar & Aggarw al, 2024), Algorithm 2 instance in Prop osition 1 automatically achiev es an adaptiv e regret b ound describ ed b elo w: Prop osition 2 (Adaptiv e Regret) . L et { f t } T t =1 b e a se quenc e of M 1 -Lipschitz non-monotone DR-submo dular functions over down-close d c onvex sets. Instantiating b ase le arner to b e SO-OGA as describ e d in A lgorithm 3, A lgorithm 2 achieves a 1 /e -appr oximation adaptive r e gr et b ounde d by: AR 1 /e ( T ) ≤ O ( T 1 / 2 ) (14) This is the first adaptive r e gr et guar ante e for non-monotone DR-submo dular maximization over down-close d sets. Pr o of. If we choose the base learner to b e SO-OGA, it follo ws from Theorem 2 that AR SO-OGA 1 = O ( M 1 T 1 / 2 ) In Lemma 2, we prov ed that Algorithm 1 returns an unbiased b ounded estimate of g . Thus, using Theorem 3, w e hav e: AR 1 /e ( T ) ≤ β R SO-OGA 1 = O ( T 1 / 2 ) . T o handle non-stationary en vironments, w e instan tiate Algorithm 2 with Impro ved Ader algorithm (Algorithm 9) in App endix E as the base linear learner A . This algorithm is proposed by Zhang et al. (2018) and later refined by Pedramfar & Aggarwal (2024). The result is as follows: Prop osition 3 (Dynamic Regret Guarantees) . L et P T = P T t =1 ∥ x ∗ t − x ∗ t − 1 ∥ denote the p ath length of the se quenc e of optimal minimizers for the surr o gate line ar functions. L et { f t } T t =1 b e a se quenc e of M 1 -Lipschitz non-monotone DR-submo dular functions over down-close d c onvex sets. Instantiating b ase le arner to b e IA as describ e d in A lgorithm 9, A lgorithm 2 achieves a 1 /e -appr oximation dynamic r e gr et b ounde d by: D R 1 /e ( T ) ≤ ˜ O p T (1 + P T ) . (15) Pr o of. If we choose the base learner to b e IA for linear functions, it follows from Theorem 4 that R IA 1 , L ( u ) = O M 1 p T (1 + P T ( u )) In Lemma 2, we prov ed that Algorithm 1 returns an unbiased b ounded estimate of g . Thus, using Theorem 3, w e hav e: D R 1 /e ( u ) ≤ β R IA 1 ( u ) = ˜ O ( p T (1 + P T )) 5.3 Semi-Bandit, Zeroth-o rder F ull-Information, and Bandit Feedback Recall that for Algorithm 2, w e assumed first-order gradient, and the queries determined by the query algorithm BQND are non-trivial. Th us, the results we obtained in Theorem 1, 2, and 3 are given for first-order full-information feedback. By decoupling the non-conv ex query algorithm from the linear base learner, our mo dular design allows seamless extensions to restrictive feedbac k settings. In this section, we sho w that by applying meta-algorithms developed for linearizable functions by Pedramfar & Aggarwal (2024), our regret guarantees naturally transfer to Semi-Bandit, Zeroth-Order F ull-Information, and Bandit settings. F urthermore, w e provide the first theoretical analysis extending these results to A daptive and Dynamic regret measures, establishing a unified pro jection-free framework that is robust to both limited information and non-stationary feedbac k. Giv en first-order oracles, when only trivial queries are p ermitted, we say that the algorithm handles semi- bandit feedback. Th us, we apply SFTT (Algorithm 8), with A action b eing the SO-OGA and A q uery b eing BQND. Th us, applying Lemma 7 with η = 1 / 2 due to Prop osition 1 and 2, we hav e the following result: 10 Prop osition 4 (Semi-Bandit Guarantees) . L et { f t } T t =1 b e a se quenc e of non-monotone DR-submo dular functions over down-close d c onvex sets. If we instantiate SFTT meta-algorithm (Algorithm 8) with blo ck size L = T 1 / 3 , using SO-OGA as the b ase le arner A action and BQND as the query algorithm A query , the r esulting algorithm achieves the fol lowing r e gr et b ounds against { f t } T t =1 : E [ R ( T )] ≤ O ( T 2 / 3 ) and AR 1 /e ( T ) ≤ O ( T 2 / 3 ) . When pro vided only zeroth-order oracles (i.e., oracles returning v alue estimates for functions instead of gradien t estimates), we say that the algorithm handles zeroth-order full-information feedback if the queries are non-trivial, or bandit feedback if the queries are trivial. F or zeroth-order full-information feedbac k, we apply FOTZO (Algorithm 6), with A b eing the SO-OGA and A q uery b eing BQND. Thus, applying Lemma 5 with η = 1 / 2 due to Prop osition 1 and 2, we hav e the following result: Prop osition 5 (Zeroth-Order F ull-Information Guaran tees) . L et { f t } T t =1 b e a se quenc e of non-monotone DR-submo dular functions over down-close d c onvex sets. L et A lgorithm A b e the instantiation of the FOTZO (A lgorithm 6) using BQND as A query e quipp e d with b ase algorithm SO-OGA . By employing a one-p oint gr adient estimator with smo othing r adius δ , the r esulting Algorithm A achieves the fol lowing r e gr et b ounds against { f t } T t =1 : E [ R ( T )] ≤ O ( T 3 / 4 ) and AR 1 /e ( T ) ≤ O ( T 3 / 4 ) . If only a zeroth-order oracle is provided, and only trivial queries are allow ed, w e say that the algorithm handles bandit feedbac k. F or such limited feedback, we apply STB meta-algorithm to the algorithm instantiated in Prop osition 4, and applying Lemma 6, we obtain the following results: Prop osition 6 (Bandit Guarantees) . L et { f t } T t =1 b e a se quenc e of non-monotone DR-submo dular functions over down-close d c onvex sets. L et A lgorithm A b e the semi-b andit algorithm instantiate d in Pr op osition 4. A pplying STB (Algorithm 7), the r esulting A lgorithm achieves the fol lowing r e gr et b ounds against { f t } T t =1 : E [ R ( T )] ≤ O ( T 2 / 3 ) and AR 1 /e ( T ) ≤ O ( T 2 / 3 ) . Note that the reduction lemmas (Lemmas 7, 5, and 6) transfer the regret guarantees from first-order full- information setting to limited settings regardless of the choice of base learner. Consequently , by instantiating the meta-algorithms with the Impro ved Ader base learner (Prop osition 3), we obtain dynamic regret b ounds for all limited feedback settings. Prop osition 7 (Dynamic Regret for Limited F eedback) . The exp e cte d dynamic r e gr et E [ R dyn 1 /e ( T )] for the describ e d algorithm in the c orr esp onding pr op osition is b ounde d by ˜ O ( T 2 / 3 √ 1 + P T ) for Semi-Bandit fe e db ack, ˜ O ( T 3 / 4 √ 1 + P T ) for Zer oth-Or der F ul l-Info fe e db ack, and ˜ O ( T 5 / 6 √ 1 + P T ) for Bandit fe e db ack. wher e P T is the p ath length of the optimal se quenc e. 6 Conclusion and Future Wo rk W e established that non-monotone DR-submo dular maximization ov er do wn-closed con vex sets is 1 /e - linearizable, enabling efficient online algorithms with O ( T 1 / 2 ) static regret and the first adaptiv e and dynamic guaran tees under multiple feedback mo dels. References Riccardo Aldrighetti, Daria Battini, Dmitry Iv anov, and Ilenia Zennaro. Costs of resilience and disruptions in supply c hain netw ork design mo dels: a review and future researc h directions. International Journal of Pr o duction Ec onomics , 235:108103, 2021. Noga Alon, Iftah Gamzu, and Moshe T ennenholtz. Optimizing budget allocation among channels and influencers. In Pr o c e e dings of the 21st international c onfer enc e on W orld Wide W eb , pp. 381–388, 2012. Y atao Bian, Joachim Buhmann, and Andreas Krause. Optimal contin uous dr-submo dular maximization and applications to prov able mean field inference. In International Confer enc e on Machine L e arning , pp. 644–653. PMLR, 2019. 11 Niv Buc hbinder and Moran F eldman. Constrained submodular maximization via new b ounds for dr- submo dular functions. In Pr o c e e dings of the 56th A nnual A CM Symp osium on The ory of Computing , pp. 1820–1831, 2024. Lin Chen, Hamed Hassani, and Amin Karbasi. Online contin uous submo dular maximization. In International Confer enc e on A rtificial Intel ligenc e and Statistics , pp. 1896–1905. PMLR, 2018. Mary am F azel and Omid Sadeghi. F ast first-order metho ds for monotone strongly dr-submo dular maximization. In SIAM Confer enc e on A pplie d and Computational Discr ete A lgorithms (A CD A23) , pp. 169–179. SIAM, 2023. Dan Garb er and Ben Kretzu. New pro jection-free algorithms for online con v ex optimization with adaptive regret guaran tees. In Pr o c e e dings of Thirty Fifth Confer enc e on L e arning The ory , pp. 2326–2359. PMLR, 2022. Sha yan Oveis Gharan and Jan V ondrák. Submo dular maximization by simulated annealing. In Pr o c e e dings of the twenty-se c ond annual A CM-SIAM symp osium on Discr ete A lgorithms , pp. 1098–1116. SIAM, 2011. Sh uyang Gu, Ch uangen Gao, Jun Huang, and W eili W u. Profit maximization in so cial net works and non-monotone dr-submo dular maximization. The or etic al Computer Scienc e , 957:113847, 2023. Hamed Hassani, Mahdi Soltanolkotabi, and Amin Karbasi. Gradient metho ds for submo dular maximization. A dvanc es in Neur al Information Pr o c essing Systems , 30, 2017. Elad Hazan and Saty en Kale. Projection-free online learning. In Pr o c e e dings of the 29th International Confer enc e on Machine L e arning , pp. 1843–1850, 2012. Elad Hazan and Comandur Seshadhri. Efficient learning algorithms for changing environmen ts. In Pr o c e e dings of the 26th annual international c onfer enc e on machine le arning , pp. 393–400, 2009. Shinji Ito and R y ohei F ujimaki. Large-scale price optimization via net work flow. A dvanc es in Neur al Information Pr o c essing Systems , 29, 2016. Martin Jaggi. Revisiting frank-w olfe: Pro jection-free sparse conv ex optimization. In International c onfer enc e on machine le arning , pp. 427–435. PMLR, 2013. Y uanyuan Li, Y uezhou Liu, Lili Su, Edmund Y eh, and Stratis Ioannidis. Exp erimental design net w orks: A paradigm for serving heterogeneous learners under netw orking constraints. IEEE/A CM T r ansactions on Networking , 31(5):2236–2250, 2023. Yiy ang Lu, Mohammad Pedramfar, and V aneet Aggarwal. Decen tralized pro jection-free online upp er- linearizable optimization with applications to DR-submo dular optimization. T r ansactions on Machine L e arning R ese ar ch , 2025. ISSN 2835-8856. Sivkumar Mishra, Debapriya Das, and Subrata P aul. A comprehensive review on pow er distribution net work reconfiguration. Ener gy Systems , 8(2):227–284, 2017. Mohammad Pedramfar and V aneet Aggarwal. F rom linear to linearizable optimization: A nov el framework with applications to stationary and non-stationary dr-submo dular optimization. A dvanc es in Neur al Information Pr o c essing Systems , 37:37626–37664, 2024. Mohammad Pedramfar, Christopher Quinn, and V aneet Aggarw al. A unified approach for maximizing con tinuous dr-submo dular functions. A dvanc es in Neur al Information Pr o c essing Systems , 36:61103–61114, 2023. Mohammad P edramfar, Yididiya Y. Nadew, Christopher John Quinn, and V aneet Aggarwal. Unified pro jection-free algorithms for adversarial DR-submo dular optimization. In The Twelfth International Confer enc e on L e arning R epr esentations , 2024. 12 Mohammad P edramfar, Christopher John Quinn, and V aneet Aggarwal. Uniform wrapp ers: Bridging concav e to quadratizable functions in online optimization. In The Thirty-ninth A nnual Confer enc e on Neur al Information Pr o c essing Systems , 2025. Benjamin Qi. On maximizing sums of non-monotone submo dular and linear functions. Algorithmic a , 86(4): 1080–1134, 2024. Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submo dular functions. A dvanc es in Neur al Information Pr o c essing Systems , 21, 2008. Nguy en Kim Thang and Abhina v Sriv astav. Online non-monotone dr-submo dular maximization. In Pr o c e e dings of the AAAI Confer enc e on A rtificial Intel ligenc e , volume 35, pp. 9868–9876, 2021. Lijun Zhang, Shiyin Lu, and Zhi-Hua Zhou. Adaptiv e online learning in dynamic en vironments. A dvanc es in neur al information pr o c essing systems , 31, 2018. Qixin Zhang, Zengde Deng, Zaiyi Chen, Hao yuan Hu, and Y u Y ang. Sto chastic contin uous submodular maximization: Bo osting via non-oblivious function. In International Confer enc e on Machine L e arning , pp. 26116–26134. PMLR, 2022. Qixin Zhang, Zengde Deng, Zaiyi Chen, Kuangqi Zhou, Haoyuan Hu, and Y u Y ang. Online learning for non-monotone dr-submo dular maximization: F rom full information to bandit feedback. In International Confer enc e on A rtificial Intel ligenc e and Statistics , pp. 3515–3537. PMLR, 2023. Qixin Zhang, Zongqi W an, Zengde Deng, Zaiyi Chen, Xiaoming Sun, Jialin Zhang, and Y u Y ang. Bo osting gradien t ascent for contin uous dr-submo dular maximization. arXiv pr eprint arXiv:2401.08330 , 2024. P eng Zhao, Guanghui W ang, Lijun Zhang, and Zhi-Hua Zhou. Bandit conv ex optimization in non-stationary en vironments. Journal of Machine L e arning R ese ar ch , 22(125):1–45, 2021. Martin Zink evich. Online conv ex programming and generalized infinitesimal gradient ascent. In Pr o c e e dings of the 20th international c onfer enc e on machine le arning (icml-03) , pp. 928–936, 2003. 13 A Useful Lemmas F or non-negative, contin uous, non-monotone DR-submo dular functions o ver down-closed conv ex sets, Buch- binder & F eldman (2024) established several inequalities that we use in our analysis. Lemma 3 (Lemma 2.1, Buch binder & F eldman (2024)) . L et f b e a non-ne gative c ontinuous DR-submo dular function over [0 , 1] d . Then, ∀ x ∈ [0 , 1] d and y ≥ 0 such that x + y ≤ 1 , we have: ⟨∇ f ( x ) , y ⟩ ≥ f ( x + y ) − f ( x ) . Lemma 4 (Lemma 4.1, Buc hbinder & F eldman (2024)) . L et f : [0 , 1] d → R ≥ 0 b e a non-ne gative DR- submo dular function. Given t ≥ 0 , an inte gr able function x : [0 , t ] → [0 , 1] d , and a ve ctor a ∈ [0 , 1] d , the original lemma states: f 1 − a ⊙ e − R t 0 x ( τ ) dτ ≥ e − t · f ( 1 − a ) + ∞ X i =1 1 i ! · Z τ ∈ [0 ,t ] i f ( 1 − a ) ⊕ i M j =1 x ( τ j ) dτ (16) B Pro of of Lemma 1 Pr o of. In (16) , for online adversarial setting, we assume x ( τ ) = x is constant o ver the interv al [0 , t ] , so on the left hand side, R t 0 x ( τ ) dτ = t x , and on the right hand side, L i j =1 x ( τ j ) = 1 − ⊙ i j =1 (1 − x ) = 1 − (1 − x ) i , and b ecause the function F tak es v alues in R ≥ 0 (it is non-negativ e), every term inside the integral is non-negative. Consequen tly , the entire infinite sum is non-negative: ∞ X i =1 1 i ! · Z τ ∈ [0 ,t ] i f ( 1 − a ) ⊕ i M j =1 x ( τ j ) dτ = ∞ X i =1 1 i ! · Z τ ∈ [0 ,t ] i f ( 1 − a ) ⊕ (1 − (1 − x ) i ) dτ = ∞ X i =1 1 i ! · Z τ ∈ [0 ,t ] i f 1 − a ⊙ (1 − x ) i dτ = ∞ X i =1 t i i ! · f 1 − a ⊙ (1 − x ) i ≥ 0 By dropping the non-negative sum term from the RHS, we obtain a strictly weak er, but simpler low er b ound: f 1 − a ⊙ e − z x ≥ e − t · f ( 1 − a ) Let y = 1 − a ∈ [0 , 1] d . Let the time parameter t b e z . Then, f 1 − ( 1 − y ) ⊙ e − z x ≥ e − z f ( y ) Let h z ( x ) = 1 − e − z x . By definition of probabilist sum ⊕ , h z ( x ) ⊕ y := 1 − ( 1 − y ) ⊙ e − z x . Thus, f ( h z ( x ) ⊕ y ) ≥ e − z f ( y ) Define ¯ x ≜ max j [ x ] j where [ x ] j is the j -th comp onent of the vector x . Th us, ¯ x ∈ [ 0 , 1] . Th us, we hav e f ( h z ( x ) ⊕ y ) ≥ e − z ¯ x f ( y ) ≥ e − z f ( y ) 14 C Infeasible Projection and Online Gradient Ascent via a Sepa ration Oracle (SO-OGA) T o bypass the computational b ottleneck of Euclidean pro jections in high-dimensional spaces (e.g., O ( n 3 ) SVD for trace-norm balls), F rank-W olfe type projection-free metho ds using Linear Optimization Oracles (LOO) hav e b ecome standard, follo wing the foundational w ork of Hazan & Kale (2012) and Jaggi (2013). Ho wev er, standard LOO-based metho ds often face sub optimal conv ergence rates in adversarial online settings. A ddressing this, Garb er & Kretzu (2022) introduced Separation Oracle (SO) based metho ds, which can ac hieve optimal O ( T 1 / 2 ) regret. T o efficien tly solv e our surrogate linear optimization problem, w e utilize their SO-OGA algorithm (adapted by Pedramfar & Aggarwal (2024)). R emark 3 (Oracles for Constrain t Set - LOO and SO) . T o circum v ent the high cost of Euclidean pro jections, w e rely on tw o natural pro jection-free oracles. Giv en a conv ex set K and a query p oin t y , the Linear Optimization Oracle (LOO) returns arg min x ∈K ⟨ y , x ⟩ , while the Separation Oracle (SO) either asserts y ∈ K or returns a separating hyperplane g suc h that ∀ x ∈ K , ⟨ g , y − x ⟩ > 0 . While LOO is more prev alent, these oracles are c omplementary . As noted b y Garb er & Kretzu (2022), for the nuclear norm ball, the LOO is efficien t while the SO is exp ensive. Conv ersely , for the sp e ctr al norm b al l B 2 , the situation is reversed. Th us, the SO enables efficient online learning ov er domains where the LOO is computationally in tractable. W e utilize the SO-OGA instantiation and its subroutine SO-IP from P edramfar & Aggarwal (2024). W e adopt the notation from the original pap er: let c ∈ int ( K ) b e a cen ter p oint, r > 0 b e the radius of a ball con tained in K cen tered at c , and define the shrunk set ˆ K δ ≜ { (1 − δ /r )( x − c ) + c | x ∈ K} . Algorithm 3 Online Gradient Ascen t via Separation Oracle - SO-OGA (Algorithm 8 in P edramfar & Aggarw al (2024)) 1: Input: horizon T , constraint set K , step size η . 2: Initialize: x 1 ← c ∈ ˆ K δ . 3: for t = 1 , 2 , . . . , T do 4: Pla y x t and observ e o t = ∇ f t ( x t ) . 5: x ′ t +1 = x t + η o t {Gradien t Ascent Step} 6: Set x t +1 = SO-IP K ( x ′ t +1 ) . {Output of Algorithm 4} 7: end for Algorithm 4 Infeasible Projection via Separation Oracle - SO-IP K ( y 0 ) (Algorithm 9 in Pedramfar & Aggarw al (2024)) 1: Input: Constraint set K , shrinking parameter δ < r , initial p oint y 0 . 2: y 1 ← P aff ( K ) ( y 0 ) 3: y 2 ← c + y 1 − c max { 1 , ∥ y 1 ∥ /D } {Pro jection of y 0 o ver B d D ( c ) ∩ aff ( K ) } 4: for i = 1 , 2 , . . . do 5: Call Separation Oracle SO K with input y i . 6: if y i / ∈ K then 7: Set g i to b e the hyperplane returned by SO K (i.e., ∀ x ∈ K , ⟨ y i − x , g i ⟩ > 0 ). 8: g ′ i ← P aff ( K ) − c ( g i ) 9: Up date y i +1 ← y i − δ g ′ i ∥ g ′ i ∥ . 10: else 11: Return y ← y i . 12: end if 13: end for Theorem 2 (A daptive Regret of SO-OGA, Theorem 9 in P edramfar & Aggarwal (2024)) . L et L b e a class of line ar functions over K such that ∥ l ∥ ≤ M 1 for al l l ∈ L and let D = diam ( K ) . Fix v > 0 such that 15 δ = v T − 1 / 2 ∈ (0 , 1) and set the step size η = v r 2 M 1 T − 1 / 2 . Then we have: AR SO-OGA 1 , A dv f 1 ( L ) = O ( M 1 T 1 / 2 ) . (17) D Meta-Algo rithms and Regret Bounds T o w a rds Diverse F eedback T yp es D.1 The Generic Meta-Algo rithm (OMBQ) W e rely on the generic reduction framework established in Pedramfar & Aggarwal (2024). The follo wing meta-algorithm, OMBQ, transforms a linear learner A in to a non-monotone DR-submo dular maximizer using a mapping h and a query oracle G . Algorithm 5 Online Maximization By Quadratization - OMBQ( A , G , h ) 1: Input: Base algorithm A , Query algorithm G , Mapping h : K → K . 2: for t = 1 , 2 , . . . , T do 3: Let x t b e the action chosen by A . 4: Pla y: y t = h ( x t ) . 5: Query: Call oracle G at x t to obtain gradient estimate g t . 6: Up date: Pass loss vector − g t to A to up date its state. 7: end for Theorem 3 (Regret T ransfer, Theorem 1 in Pedramfar & Aggarwal (2024)) . L et A b e an algorithm for online optimization with semi-b andit fe e db ack. Also let F b e a function class over K that is line arizable and surr o gate p otential g : F × K → R d and h : K → K . L et G b e a query algorithm for g and let A ′ = OMBQ ( A , G , h ) . If G r eturns an unbiase d estimate of g and the output of G is b ounde d by B 1 , then we have: R A ′ α, A dv o 1 ( F ,B 1 ) ≤ β R A 1 , A dv f 1 ( Q µ [ B 1 ]) (18) wher e β is a sc aling c onstant. D.2 First Order T o Zeroth Order (F OTZO) Algorithm 6 First order to zeroth order - FOTZO( A ) 1: Input: Shrunk domain ˆ K δ , Linear space L 0 , smo othing parameter δ ≤ r , horizon T , algorithm A 2: Pass ˆ K δ as the domain to A 3: k ← dim( L 0 ) 4: for t = 1 , 2 , . . . , T do 5: x t ← the action chosen by A 6: Pla y x t 7: Let f t b e the function chosen by the adversary 8: for i starting from 1, while A query is not terminated for this time-step do 9: Sample v t,i ∈ S 1 ∩ L 0 uniformly 10: Let y t,i b e the query chosen by A query 11: Query the oracle at the p oint y t,i + δ v t,i to get o t,i 12: P ass k δ o t,i v t as the oracle output to A 13: end for 14: end for Lemma 5 (Corollary 4 + Theorem 5 in Pedramfar & Aggarw al (2024)) . L et F b e an M 1 -Lipschitz function class over a c onvex set K and cho ose c and r as describ e d ab ove and let δ < r . L et U ⊆ K T b e a c omp act set and let ˆ U = (1 − δ r ) U + δ r c . A ssume A is an algorithm for online optimization with first or der fe e db ack. Then, if A ′ = F OTZO ( A ) wher e FOTZO is describ e d by A lgorithm 6 and 0 < α ≤ 1 , we have R A ′ α, A dv o 0 ( F ,B 0 ) ( U ) ≤ R A α, A dv o 1 ( ˆ F , k δ B 0 ) ( ˆ U ) + 3 + 2 D r δ M 1 T . 16 F urther, if we have R A α, Adv o 1 ( F ,B 1 ) = O ( B 1 T η ) and δ = T ( η − 1) / 2 , then we have R A ′ α, Adv o 0 ( F ,B 0 ) = O ( B 0 T (1+ η ) / 2 ) . D.3 Semi-bandit T o Bandit (STB) Algorithm 7 Semi-bandit to bandit - STB( A ) 1: Input: Shrunk domain ˆ K δ , Linear space L 0 , smo othing parameter δ ≤ r , horizon T , algorithm A 2: Pass ˆ K δ as the domain to A 3: k ← dim( L 0 ) 4: for t = 1 , 2 , . . . , T do 5: Sample v t ∈ S 1 ∩ L 0 uniformly 6: x t ← the action chosen by A 7: Pla y x t + δ v t 8: Let f t b e the function chosen by the adversary 9: Let o t b e the output of the v alue oracle 10: P ass k δ o t v t as the oracle output to A 11: end for Lemma 6 (Corollary 5 + Theorem 6 in Pedramfar & Aggarwal (2024)) . Under the assumptions of L emma 5, if we assume that A is semi-b andit, then the same r e gr et b ounds hold with A ′ = STB ( A ) , wher e STB is describ e d by Algorithm 6. F urther, if we have δ = T − 1 , then R A ′ α, Adv o 0 ( F ) has the same or der of r e gr et as that of R A α, Adv o 1 ( F ,B 1 ) with B 1 r eplac e d with k M 1 . D.4 Sto chastic Full-info rmation T o T rivial query (SFTT) Algorithm 8 Stochastic F ull-information T o T rivial query - SFTT( A ) 1: Input: base algorithm A , horizon T , blo c k size L > K . 2: for q = 1 , 2 , . . . , T /L do 3: Let ˆ x q b e the action chosen by A action 4: Let ( ˆ y i q ) K i =1 b e the queries selected by A query 5: Let ( t q , 1 , . . . , t q ,L ) b e a random p ermutation of { ( q − 1) L + 1 , . . . , q L } 6: for t = ( q − 1) L + 1 , . . . , q L do 7: if t = t q ,i for some 1 ≤ i ≤ K then 8: Pla y the action x t = ˆ y i q 9: Return the observ ation to the query oracle as the resp onse to the i -th query 10: else 11: Pla y the action x t = ˆ x q 12: end if 13: end for 14: end for Lemma 7 (Corollary 6 + Theorem 7 in Pedramfar & Aggarwal (2024)) . L et A b e an online optimization algorithm with ful l-information fe e db ack and with K queries at e ach time-step wher e A query do es not dep end on the observations in the curr ent r ound and A ′ = SFTT ( A ) . Then, for any M 1 -Lipschitz function class F that is close d under c onvex c ombination and any B 1 ≥ M 1 , 0 < α ≤ 1 and 1 ≤ a ≤ b ≤ T , let a ′ = ⌊ ( a − 1) /L ⌋ + 1 , b ′ = ⌈ b/L ⌉ , D = diam ( K ) and let { T } and { T /L } denote the horizon of the adversary. Then, we have R A ′ α, Adv o 1 ( F ,B 1 ) { T } ( K T ⋆ )[ a, b ] ≤ M 1 D K ( b ′ − a ′ + 1) + L R A α, Adv o 1 ( F ,B 1 ) { T /L } ( K T /L ⋆ )[ a ′ , b ′ ] , F urther, if we have R A α, Adv o i ( F ,B ) ( K T ⋆ )[ a, b ] = O ( B T η ) , K = O ( T θ ) and L = T 1+ θ − η 2 − η , then we have R A ′ α, Adv o i ( F ,B ) ( K T ⋆ )[ a, b ] = O B T (1+ θ )(1 − η )+ η 2 − η . 17 A s a sp e cial c ase, when K = O (1) , then we have R A ′ α, Adv o i ( F ,B ) ( K T ⋆ )[ a, b ] = O B T 1 2 − η . E Algo rithms for Dynamic Regret T o supp ort the dynamic regret guarantees presented in Prop osition 3, we restate the Impro ved Ader meta-algorithm and its corresp onding exp ert algorithm from Pedramfar & Aggarw al (2024). W e also include the main theorem gov erning its p erformance. Algorithm 9 Impro ved Ader - IA (Restated Algorithm 10 from P edramfar & Aggarwal (2024)) 1: Input: horizon T , constraint set K , step size λ , a set H containing step sizes for exp erts. 2: Activ ate a set of exp erts { E η | η ∈ H} by inv oking Algorithm 10 for each step size η ∈ H . 3: Sort step sizes in ascending order η 1 ≤ · · · ≤ η N , and set w η i 1 = C i ( i +1) where C = 1 + 1 |H| . 4: for t = 1 , 2 , . . . , T do 5: Receiv e x η t from eac h exp ert E η . 6: Pla y the action x t = P η ∈H w η t x η t and observ e o t = ∇ f t ( x t ) . 7: Define ℓ t ( y ) := ⟨ o t , y − x t ⟩ . 8: Up date the w eight of each exp ert by w η t +1 = w η t e − λℓ t ( x η t ) P µ ∈H w µ t e − λℓ t ( x µ t ) . 9: Send the gradient o t to eac h exp ert E η . 10: end for Algorithm 10 Improv ed Ader : Expert algorithm (Restated Algorithm 11 from Pedramfar & Aggarwal (2024)) 1: Input: horizon T , constraint set K , step size η . 2: Let x η 1 b e an y p oint in K . 3: for t = 1 , 2 , . . . , T do 4: Send x η t to the main algorithm. 5: Receiv e o t from the main algorithm. 6: Up date: x η t +1 = P K ( x η t + η o t ) 7: { Note: T o maintain the pr oje ction-fr e e pr op erty of our fr amework, we implement this up date using the F r ank-W olfe step or the Infe asible Pr oje ction subr outine fr om A lgorithm 4. } 8: end for Theorem 4 (Dynamic Regret of Improv ed A der, Theorem 10 in Pedramfar & Aggarw al (2024)) . L et L b e a class of line ar functions over K such that ∥ ℓ ∥ ≤ M 1 for al l ℓ ∈ L and let D = diam ( K ) . Set H := { η i = 2 i − 1 D M 1 q 7 2 T | 1 ≤ i ≤ N } wher e N = ⌈ 1 2 log 2 (1 + 4 T / 7) ⌉ + 1 and λ = p 2 / ( T M 2 1 D 2 ) . Then for any c omp ar ator se quenc e u ∈ K T , we have R IA 1 , A dv f 1 ( L ) ( u ) = O M 1 p T (1 + P T ( u )) (19) wher e P T ( u ) = P T t =1 ∥ u t − u t − 1 ∥ 2 is the p ath length of the c omp ar ator se quenc e. 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment