Global Stability and Step Size Robustness of RMSProp
In this paper, an input-to-state Lyapunov function for the RMSProp optimization algorithm is introduced. Global asymptotic stability of the RMSProp algorithm for constant step sizes and robustness properties with respect to arbitrary bounded time-var…
Authors: Naum Dimitrieski, Maria Christine Honecker, Carsten Scherer
Global Stability and Step Size Rob ustness of RMSProp Naum Dimitrieski, Maria Christine Honecker , Carsten Scherer , and Christian Ebenbauer Abstract — In this paper , an input-to-state L yapunov function for the RMSProp optimization algorithm is intr oduced. Global asymptotic stability of the RMSProp algorithm f or constant step sizes and r obustness pr operties with r espect to arbitrary bounded time-varying step size rules are established. I . I N T RO D U C T I O N Adaptiv e gradient methods are gradient-based optimiza- tion methods where the step size is adapted in a feedback fashion based on current and past iterates and gradients. The y hav e prov en to be effecti ve in many fields of optimization and machine learning [1]. Among the most well-kno wn adaptiv e gradient methods are AdaGrad [2], RMSProp [3] and Adam [4]. As surve yed in [5], [6], Adam and Adam- like methods are central to training deep neural networks. Notably , Adam integrates adaptive step sizes from RMSProp with momentum in the Polyak sense [7]. Thus, RMSProp represents a core component of Adam and its variants. While momentum methods are quite well understood and in recent years rather extensi vely analyzed from a systems theoretic perspecti ve (see, e.g., [8]), adaptive step size rules remain less well understood. Hence, the purpose of this work is to establish basic systems theoretic properties of the deterministic RMSProp algorithm in the sense of global stability and robustness. In recent years, considerable research effort has been dev oted to establishing conv ergence and stability properties of RMSProp. Due to the close relation between Adam and RMSProp (see, e.g., [9], [10]) we also revie w , in what follows, the literature related to Adam, in particular , works whose results apply to RMSProp as well. W e start with local stability properties of RMSProp, which have been analyzed in recent works. For instance, in [11] local exponential sta- bility is established, while in [12] local asymptotic stability under a backtracking line-search step size rule is shown. The literature on global con ver gence properties of RMS- Prop can be broadly di vided into deterministic results (deter- ministic gradients), and stochastic results (stochastic gradi- ents). For instance, in the deterministic setting, [13] enforces a modified step size rule guaranteeing sufficiently small adaptiv e step sizes, while [14] achieves a similar result under a backtracking line-search step size rule. In both works, Naum Dimitrieski, Maria Christine Honecker , and Christian Ebenbauer are with the Chair of Intelligent Control Sys- tems, R WTH Aachen University , 52062 Aachen, Germany (e-mail: {naum.dimitrieski, maria.honecker, christian.ebenbauer}@ic.rwth-aachen.de) , and Carsten Scherer is with the Chair of Mathematical Systems Theory , Univ ersity of Stuttgart, 70569 Stuttgart, Germany (e-mail: {carsten.scherer}@mathematik.uni-stuttgart.de) con ver gence to a stationary point is ensured by enforcing a suf ficiently small step size. Global stability results on RMSProp are established, for instance, in [15]–[17]. In particular, [15], [16] pro vide stabil- ity results on the continuous-time RMSProp algorithm. Fur - ther , [17] assumes objectiv e functions whose gradients satisfy a sector condition, a property typically fulfilled by quadratic objectiv e functions. Under this assumption, it establishes the boundedness of the RMSProp iterates for any arbitrary bounded step size rule. Howe ver , in [17] no result on the con ver gence of the iterates to any stationary point of the algorithm is provided. Hence, global asymptotic stability of RMSProp under constant step size has not been established yet, nor has its rob ustness property with respect to arbitrary step size rules been established. The contrib utions of this paper are tw ofold. First, we pro- pose a nov el L yapunov function which allows us to establish global asymptotic stability of the RMSProp algorithm with constant step size for strongly con ve x and L -smooth ob- jectiv e functions. Second, by considering time-varying step sizes as inputs, we sho w that this L yapunov function is also an input-to-state L yapunov function. Hence, the RMSProp algorithm guarantees boundedness of the iterates under any bounded step size rule. T o the best of our knowledge, this constitutes the first (input-to-state) L yapunov function and input-to-state stability result for RMSProp. Notation: W e denote the sets of real and non-negativ e numbers as R and R ≥ 0 , respecti vely , and the set of non- negati ve integers as N 0 . W e denote by C n the set of n times continuously dif ferentiable functions, and for any univ ariate function h ∈ C 1 , we denote the first deriv ativ e as h ′ . Further, we say a function h belongs to class K ∞ if it satisfies [18, Definition 2]. For a vector x ∈ R d , we denote the i -th component as x i for any i ∈ 1 : d and the Euclidean norm by ∥ · ∥ . I I . P R O B L E M S TA T E M E N T W e consider an optimization problem of the form min x ∈ R d f ( x ) , (1) where f : R d → R is the objecti ve function. The main goal of this paper is to analyze the stability and rob ustness of the RMSProp algorithm, as introduced in [3], where for each i ∈ 1 : d one defines s i ( t + 1) = (1 − β ) s i ( t ) + β ( ∇ i f ( x ( t ))) 2 , (2a) x i ( t + 1) = x i ( t ) − η ( u ( t )) ε + √ s i ( t +1) ∇ i f ( x ( t )) , (2b) with the initial conditions x (0) ∈ R d , s (0) ∈ R d ≥ 0 as well as the parameters ε > 0 , β ∈ (0 , 1) , and the step sizes η ( u ( t )) : = η 0 + u ( t ) , for η 0 > 0 and u ( · ) ∈ U ≥ 0 : = u : N 0 → R ≥ 0 : u ( · ) is bounded . T o simplify the notation in (2), we omit the dependence on t and introduce the functions s + : R d × R d ≥ 0 → R d ≥ 0 and x + : R d × R d ≥ 0 × R ≥ 0 → R d as s + i : = (1 − β ) s i + β ( ∇ i f ( x )) 2 , (3a) x + i : = x i − η ( u ) ε + √ s + i ∇ i f ( x ) . (3b) W e make the following assumptions on the objecti ve function in our analysis. Assumption 1. Consider pr oblem (1) . Assume that f is i) C 1 , µ -str ongly conve x for some µ > 0 , and has a unique global minimizer x ∗ ∈ R d , and ii) globally L − smooth for some L > 0 , i.e., for any x, y ∈ R d it holds that ∥∇ f ( y ) − ∇ f ( x ) ∥ ≤ L ∥ y − x ∥ . (4) Assumption 1 is restrictiv e but still standard in the opti- mization literature for the purpose of algorithm analysis [19]. I I I . M A I N R E S U LTS In this section we present the main results of this paper, namely a L yapunov function for algorithm (3) and input-to- state stability (ISS) of this algorithm with respect to the step size rule u ( · ) . W e start with the L yapunov function candidate for al- gorithm (3). W e propose the L yapunov function candi- date V : R d × R d ≥ 0 → R ≥ 0 V ( x, s ) : = γ ( f ( x ) − f ( x ∗ )) + 2 d X i =1 h ( s i ) , (5) where for any ω ∈ R ≥ 0 the functions γ and h are defined as γ ( ω ) : = γ 0 ω + 2 3 γ 1 ω 3 2 , (6a) h ( ω ) : = √ ω + ε log ( ε ) − ε log ( √ ω + ε ) (6b) with the constants γ 0 : = max 3 β η 0 , β ε η 0 ε − ( η 0 + η 1 ) L 2 , γ 1 η 0 √ Ld √ β + 12 β L √ d ε , (7a) γ 1 : = 12 β 3 2 √ L η 0 ε , (7b) and where β , ε and η 0 are the parameters in algorithm (3), L is the smoothness constant of f in Assumption 1 ii) and η 1 > 0 is any positi ve constant. Next, we provide the main theorem of the paper, for- mally characterizing the asymptotic stability and ISS for any input u ( · ) ∈ U ≥ 0 of algorithm (3) with respect to the equilibrium ( x ∗ , 0) . Theorem 1. Consider algorithm (3) and let Assump- tion 1 be satisfied with µ > 0 , x ∗ ∈ R d and L > 0 . Let β ∈ (0 , 1) , ε > 0 , and η ( u ) : = η 0 + u , wher e η 0 ∈ 0 , 2 ε L and u ( · ) ∈ U ≥ 0 . Moreo ver , let η 1 > 0 satisfy η 0 + η 1 < 2 ε L . Then, algorithm (3) is ISS for any step size u ( · ) ∈ U ≥ 0 with r espect to the equilibrium ( x ∗ , 0) , and the function (5) is an ISS-L yapunov function for algorithm (3) . One interesting implication of Theorem 1 is that, gi ven any choice of a (positi ve) bounded sequence of step sizes, the states remain bounded. Hence, from a practitioner’ s perspectiv e, the algorithm tuning allows for experimentation with v arious step size rules and to balance con vergence speed (during the burn-in or warm-up phase) and con ver gence accuracy (asymptotic error floor). In standard gradient-based optimization algorithms, such as gradient descent without adaptiv e step size, this is typically not possible. In the next lemma, we first show that the function (5) satisfies the requirements for a L yapunov function candidate, as stated in [20, Definition 3.2., Condition 1] and [21, Theorem 3.4.6.]. Lemma 1. Consider (5) , γ and h as defined in (6) and let Assumption 1 be satisfied with µ > 0 , x ∗ ∈ R d and L > 0 . Moreover , let ε > 0 , β ∈ (0 , 1) and η 0 ∈ 0 , 2 ε L , and let η 1 > 0 satisfy η 0 + η 1 < 2 ε L . Then, ther e exist some ˆ α 1 , ˆ α 2 ∈ K ∞ such that, for all ( x, s ) ∈ R d × R d ≥ 0 , ˆ α 1 ( ∥ ( x − x ∗ , s ) ∥ ∞ ) ≤ V ( x, s ) ≤ ˆ α 2 ( ∥ ( x − x ∗ , s ) ∥ ∞ ) . Pr oof. W e start with the properties of γ in (6). By (7), γ 1 is positi ve. Moreover , since by assump- tion we have η 0 ∈ 0 , 2 ε L and η 0 + η 1 < 2 ε L , we in- fer ε − ( η 0 + η 1 ) L 2 > 0 and, thereby , the second term in the max operator in γ 0 in (7) is well defined. Consequently , we also hav e γ 0 > 0 . Then, γ (0) = 0 , and in addition, γ is continuous, positive definite and radially unbounded on [0 , ∞ ) . Under Assumption 1 i), we observe that f ( x ) − f ( x ∗ ) , and thus γ ( f ( x ) − f ( x ∗ )) , are continuous, positi ve definite in x ∈ R d and radially unbounded. Then, [22, Lemma 4.3] implies that there exist some α γ ,m , α γ ,M ∈ K ∞ such that for all x ∈ R d : α γ ,m ( ∥ x − x ∗ ∥ ∞ ) ≤ γ ( f ( x ) − f ( x ∗ )) ≤ α γ ,M ( ∥ x − x ∗ ∥ ∞ ) . (8) Next, we proceed with the properties of h , defined in (6). W e observe that h (0) = 0 and that h is continuous on [0 , ∞ ) . For all ω ∈ R ≥ 0 we hav e h ′ ( ω ) = 1 2( ω + ε ) > 0 , i.e., h is strictly monotonically increasing and hence, h is positiv e def- inite on [0 , ∞ ) . Finally , (6b) implies that the positi ve definite term √ · gro ws at a faster rate than ε log( ε ) − ε log( √ · + ε ) on [0 , ∞ ) , which implies that h is radially unbounded. It therefore follo ws that h ∈ K ∞ . In the following we find suitable comparison functions for V , defined in (5). W e start by using the inequality 2 d X i =1 h ( s i ) ≥ h ( ∥ s ∥ ∞ ) , (9) which holds since h is positi ve definite. By plugging the lower bounds from (8) and (9) into (5), one gets for all ( x, s ) ∈ R d × R d ≥ 0 that V ( x, s ) ≥ α γ ,m ( ∥ x − x ∗ ∥ ∞ ) + h ( ∥ s ∥ ∞ ) ≥ min { α γ ,m ( ∥ ( x − x ∗ , s ) ∥ ∞ ) , h ( ∥ ( x − x ∗ , s ) ∥ ∞ } , (10) where we used max {∥ x − x ∗ ∥ ∞ , ∥ s ∥ ∞ } = ∥ ( x − x ∗ , s ) ∥ ∞ . The second inequality in (10) can be shown by a simple case analysis and is left for the reader to check. Next, we use the inequality 2 d X i =1 h ( s i ) ≤ 2 dh ( ∥ s ∥ ∞ ) , (11) which follows since h is positi ve definite and monotone on [0 , ∞ ) . By plugging the upper bounds from (8) and (11) into (5), we get for all ( x, s ) ∈ R d × R d ≥ 0 that V ( x, s ) ≤ α γ ,M ( ∥ ( x − x ∗ , s ) ∥ ∞ ) + 2 dh ( ∥ ( x − x ∗ , s ) ∥ ∞ ) . (12) Here, we used that the upper bounds (8) and (11) are strictly monotonically increasing and that ∥ x − x ∗ ∥ ∞ ≤ max {∥ x − x ∗ ∥ ∞ , ∥ s ∥ ∞ } = ∥ ( x − x ∗ , s ) ∥ ∞ , ∥ s ∥ ∞ ≤ max {∥ x − x ∗ ∥ ∞ , ∥ s ∥ ∞ } = ∥ ( x − x ∗ , s ) ∥ ∞ . The lower and upper bounds in (10) and (12) belong to class K ∞ . This is true, since both the minimum and the sum of K ∞ -functions belong to class K ∞ . This concludes the proof. In the follo wing we provide a proof of Theorem 1. Pr oof. This proof is di vided into three main parts. In the first part, we deri ve suitable preliminary upper bounds for the terms in the L yapunov function difference. In the sec- ond part, we provide a refined form of the upper bound of the L yapunov function difference which depends solely on ∥ s ∥ ∞ , ∥∇ f ( x ) ∥ ∞ and u ∈ R ≥ 0 . In the third part, starting from the upper bound from the second part, for algorithm (3) we initially show that global asymptotic stability holds for u = 0 with respect to the equilibrium ( x ∗ , 0) , by in- voking [21, Theorem 3.4.6.]. Subsequently , we show that the L yapunov function satisfies an input-to-state stability condition for any u ∈ R ≥ 0 , in the sense of an adaptation of [20, Definition 3.2.], with respect to the equilibrium ( x ∗ , 0) . Thereby , we get that V , defined in (5), is an ISS-L yapunov function, and by in voking [20, Lemma 3.5.] we establish that (3) is ISS for any u ( · ) ∈ U ≥ 0 with respect to the equilibrium ( x ∗ , 0) . W e start by defining the proof preliminaries. Throughout the proof, for the sake of clarity , for any x ∈ R d we use the notation f 0 ( x ) : = f ( x ) − f ( x ∗ ) , g : = ∇ f ( x ) , g i : = ∇ i f ( x ) . Further , we denote the quadratic, arithmetic and geometric means as QM , AM and GM . For any ( x, s ) ∈ R d × R d ≥ 0 , the L yapunov function dif ference is gi ven as ∆ V : = V ( x + , s + ) − V ( x, s ) = γ ( f 0 ( x + )) − γ ( f 0 ( x )) + 2 d X i =1 h ( s + i ) − h ( s i ) , (13) with s + and x + as defined in (3). Initially , we apply the mean value theorem on γ as in (6a) for the case of f 0 ( x + ) = f 0 ( x ) . For all such ( x, s ) ∈ R d × R d ≥ 0 and u ∈ R ≥ 0 , there exists some ξ ∈ I ( x, s, u ) : = ( a min ( x, s, u ) , a max ( x, s, u )) , where a min ( x, s, u ) : = min f 0 ( x ) , f 0 ( x + ) ≥ 0 , a max ( x, s, u ) : = max f 0 ( x ) , f 0 ( x + ) ≥ 0 , such that γ ′ ( ξ ) = γ ( f 0 ( x + )) − γ ( f 0 ( x )) f 0 ( x + ) − f 0 ( x ) . (14) If f 0 ( x + ) = f 0 ( x ) , we set ξ = f 0 ( x ) = f 0 ( x + ) ≥ 0 . By (6a) and (7), we then hav e γ ′ ( ξ ) > 0 , and the follo wing trivially holds: γ ( f 0 ( x + )) − γ ( f 0 ( x )) = γ ′ ( ξ ) f 0 ( x + ) − f ( x ) = 0 . (15) Therefore, for any ( x, s ) ∈ R d × R d ≥ 0 and an y u ∈ R ≥ 0 there exists some 0 ≤ ξ ∈ I ( x, s, u ) ∪ { f 0 ( x ) , f 0 ( x + ) } (16) such that, by using (14) and (15), we get from (13) that ∆ V = γ ′ ( ξ )( f 0 ( x + ) − f 0 ( x )) + 2 d X i =1 h ( s + i ) − h ( s i ) . (17) Part 1: In this part we deriv e preliminary upper bounds for (17), which consist of negati ve definite and u -dependent components, where u ∈ R ≥ 0 . Step 1.1: In the following we deri ve upper bounds for the first right hand side term and each term of the right hand side sum of (17). W e start by using the L -smoothness property of f . Due to Assumption 1 ii) and that γ ′ ( ξ ) > 0 for all ξ ≥ 0 (see (6a) and (7), and (16)), it is possible to use [19, Theorem 2.1.5, (2.1.9)] to get γ ′ ( ξ )( f 0 ( x + ) − f 0 ( x )) ≤ γ ′ ( ξ ) ∇ f ⊤ 0 ( x )( x + − x ) + γ ′ ( ξ ) L 2 ∥ x + − x ∥ 2 2 . As ∇ f ⊤ 0 ( x )( x + − x ) = P d i =1 g i ( x + i − x i ) , we can subse- quently use (3b) for x + i − x i to get γ ′ ( ξ )( f 0 ( x + ) − f 0 ( x )) ≤ γ ′ ( ξ ) d X i =1 g i − η ( u ) g i ε + √ s + i + γ ′ ( ξ ) L 2 d X i =1 η ( u ) 2 g 2 i ε + √ s + i 2 . (18) Next, we use the concavity of h , defined in (6b), for non-negati ve arguments to deriv e an upper bound for each summand in (17). By using this property , we get for ev- ery i ∈ 1 : d that h ( s + i ) − h ( s i ) ≤ h ′ ( s i )( s + i − s i ) , where h ′ ( s i ) = 1 2( ε + √ s i ) . By using (3a), this is rewritten as h ( s + i ) − h ( s i ) ≤ − β s i + β g 2 i 2( ε + √ s i ) . (19) Step 1.2: In the following, we plug the upper bounds from Step 1.1, i.e., (18) and (19) into (17), and subsequently reorganize the terms to get ∆ V ≤ − d X i =1 β s i ε + √ s i + d X i =1 g 2 i ε + √ s + i 2 ( ε + √ s i ) × " − η ( u ) γ ′ ( ξ )( ε + √ s i ) ε + q s + i + η ( u ) 2 γ ′ ( ξ ) L 2 ( ε + √ s i ) + β ε + q s + i 2 # . For each i ∈ 1 : d , we now define κ i : = − η ( u ) γ ′ ( ξ )( ε + √ s i ) ε + q s + i + η ( u ) 2 γ ′ ( ξ ) L 2 ( ε + √ s i ) + β ε + q s + i 2 , and rearrange as κ i = − q s + i η ( u ) γ ′ ( ξ ) √ s i − β q s + i − η ( u ) γ ′ ( ξ ) √ s i ε − η ( u ) L 2 − 2 ε 2 q s + i η ( u ) γ ′ ( ξ ) 1 − 2 β η ( u ) γ ′ ( ξ ) − ε η ( u ) γ ′ ( ξ ) ε − η ( u ) L 2 − β ε . (20) Then, we get that ∆ V ≤ − d X i =1 β s i ε + √ s i + d X i =1 g 2 i κ i ε + √ s + i 2 ( ε + √ s i ) . (21) The inequality (21) is an important intermediate result. For the remainder of the proof, we deri ve suitable upper bounds for the terms on the right hand side of (21). Also, observe that, in (21), the first sum on the right hand side is ne gativ e definite in s ∈ R d ≥ 0 . In particular, in the remainder of this part of the proof, we focus on the terms κ i , i ∈ 1 : d , in (20). For conv enience, we define κ i = a i 11 + a i 21 + 2 a i 31 q s + i + εa i 41 for each i ∈ 1 : d with a i 11 : = − q s + i η ( u ) γ ′ ( ξ ) √ s i − β q s + i , (22a) a i 21 : = − η ( u ) γ ′ ( ξ ) ε − η ( u ) L 2 √ s i , (22b) a i 31 : = − η ( u ) γ ′ ( ξ ) ε 2 1 − 2 β η ( u ) γ ′ ( ξ ) , (22c) a i 41 : = − η ( u ) γ ′ ( ξ ) ε − η ( u ) L 2 + β ε. (22d) Step 1.3: In the following, we provide upper bounds for a i 11 , a i 31 and a i 41 , i ∈ 1 : d , as defined in (22). W e start by analyzing the terms a i 11 . W e use that a − b = a 2 − b 2 a + b to get a i 11 = − q s + i ( η ( u ) γ ′ ( ξ )) 2 s i − β 2 s + i η ( u ) γ ′ ( ξ ) √ s i + β √ s + i . By using (3a) on s + i in the numerator , we further obtain a i 11 = − q s + i [( η ( u ) γ ′ ( ξ )) 2 − β 2 (1 − β )] s i − β 3 g 2 i η ( u ) γ ′ ( ξ ) √ s i + β √ s + i = − q s + i [( η ( u ) γ ′ ( ξ )) 2 − β 2 (1 − β )] s i η ( u ) γ ′ ( ξ ) √ s i + β √ s + i + q s + i β 3 g 2 i η ( u ) γ ′ ( ξ ) √ s i + β √ s + i . W e define now a i 12 : = − q s + i ( η ( u ) γ ′ ( ξ )) 2 − β 2 (1 − β ) η ( u ) γ ′ ( ξ ) √ s i + β √ s + i s i and a i 13 : = q s + i β 3 g 2 i η ( u ) γ ′ ( ξ ) √ s i + β √ s + i to observe that a i 11 = a i 12 + a i 13 . W e start by upper bounding a i 12 . By using algebraic manipulations, we get a i 12 = − η ( u ) γ ′ ( ξ ) 1 − β 2 (1 − β ) ( η ( u ) γ ′ ( ξ )) 2 √ s i √ s + i + β η ( u ) γ ′ ( ξ ) s i . Note from (7) that γ 0 ≥ 3 β η 0 > 0 and γ 1 > 0 , and thereby γ , defined in (6a), is monotonically increasing on [0 , ∞ ) . W e now use that u ≥ 0 and γ 0 ≥ 3 β η 0 to get η ( u ) γ ′ ( ξ ) ≥ η 0 γ 0 , ( ∗ ) η 0 γ 0 ≥ 3 β , ( ∗∗ ) and from (3a) that q s + i = p (1 − β ) s i + β g 2 i ≥ √ 1 − β √ s i to obtain a i 12 ( ∗ ) ≤ − η ( u ) γ ′ ( ξ ) 1 − β 2 (1 − β ) ( η 0 γ 0 ) 2 √ s i √ (1 − β ) s i + β η 0 γ 0 s i ( ∗∗ ) ≤ − η ( u ) γ ′ ( ξ ) 8+ β 9 √ 1 − β +3 s i = : − η ( u ) γ ′ ( ξ ) c γ s i . (23) W e proceed by upper bounding a i 13 . Observe that, since s i ≥ 0 and from (6a) and (7) that γ ′ ( ξ ) > 0 , we hav e η ( u ) γ ′ ( ξ ) √ s i + β q s + i ≥ β q s + i . (24) Additionally , from (3a), s + i = (1 − β ) s i + β g 2 i ≥ β g 2 i , which we plug together with (24) into a i 13 to get a i 13 ≤ q s + i β 3 g 2 i β √ β g 2 i = q s + i β 3 2 | g i | . (25) Thus, by using (23) and (25), we get a i 11 = a i 12 + a i 13 ≤ − η ( u ) γ ′ ( ξ ) c γ s i + q s + i β 3 2 | g i | . (26) Next, we use ( ∗ ) , ( ∗∗ ) and that γ ′ is positive on [0 , ∞ ) . Then, for a i 31 , defined in (22), we get a i 31 ( ∗ ) ≤ − η ( u ) γ ′ ( ξ ) ε 2 1 − 2 β η 0 γ 0 ( ∗∗ ) ≤ − η ( u ) γ ′ ( ξ ) ε 6 . (27) For a i 41 , defined in (22), we first add and subtract the term η ( u ) γ ′ ( ξ ) η 1 L 2 , and by a subsequent rearrangement, we get a i 41 = − η ( u ) γ ′ ( ξ ) ε − ( η 0 + η 1 ) L 2 − η ( u ) γ ′ ( ξ ) η 2 ε + β ε + uη ( u ) γ ′ ( ξ ) L 2 , (28) where η 2 : = η 1 εL 2 . Since ε − ( η 0 + η 1 ) L 2 > 0 holds by assump- tion, i.e., since η 0 + η 1 < 2 ε L , we use the inequality ( ∗ ) on the first right hand side term in (28). W e then get a i 41 ≤ − η 0 γ 0 ε − ( η 0 + η 1 ) L 2 − η ( u ) γ ′ ( ξ ) η 2 ε + β ε + uη ( u ) γ ′ ( ξ ) L 2 . (29) Moreov er , γ 0 ≥ β ε η 0 ε − ( η 0 + η 1 ) L 2 directly follo ws from (7), which we then plug into the right hand side of (29) to get a i 41 ≤ − η ( u ) γ ′ ( ξ ) η 2 ε + uη ( u ) γ ′ ( ξ ) L 2 . (30) By plugging (26), (27) and (30) into the right hand side of (20), we get for all i ∈ 1 : d that κ i ≤ − η ( u ) γ ′ ( ξ ) c γ s i − a i 21 − η ( u ) γ ′ ( ξ ) ε 6 q s + i − η ( u ) γ ′ ( ξ ) η 2 + uη ( u ) γ ′ ( ξ ) εL 2 + q s + i − η ( u ) γ ′ ( ξ ) ε 6 + β 3 2 | g i | . (31) Step 1.4: In the following we analyze the terms a i 51 : = − η ( u ) γ ′ ( ξ ) ε 6 + β 3 2 | g i | (32) from (31) and show for all i ∈ 1 : d that a i 51 ≤ u ( c 1 + c 2 u ) holds for some positive constants c 1 and c 2 . W e first require a lower bound for ξ , defined in (16), for which an equiv alent representation is gi ven by ξ = λf 0 ( x + ) + (1 − λ ) f 0 ( x ) (33) for some λ ∈ [0 , 1] . Under Assumption 1 i), f is (strongly) con ve x, from where we get f 0 ( x + ) ≥ f 0 ( x ) + ∇ f ⊤ ( x )( x + − x ) . By plugging this lower bound into the right hand side of (33) and by using that s i ≥ 0 and ε > 0 , we get ξ ≥ f 0 ( x ) − λη ( u ) d X i =1 g 2 i ε + √ (1 − β ) s i + β g 2 i ≥ f 0 ( x ) − λη ( u ) d X i =1 g 2 i √ β g 2 i ≥ f 0 ( x ) − η ( u ) √ β d X i =1 | g i | , (34) where in the last inequality we used that λ ≤ 1 . In addition, we use that AM ≥ GM to get | g i | ≤ w ( u ) g 2 i + 1 4 w ( u ) (35) for any w ( u ) > 0 . Choose w ( u ) : = √ β 4 η ( u ) L . Moreover , under Assumption 1 i) and ii) we hav e f 0 ( x ) ≥ ∥ g ∥ 2 2 L (see e.g., [19, Theorem 2.1.5, (2.1.10)]), which we plug together with (35) into the right hand side of (34) to get ξ ≥ 1 4 L ∥ g ∥ 2 − η ( u ) 2 c L = : ξ 0 ( η ( u )) , (36) where c L : = Ld β . From (16) it follo ws directly that ξ ≥ 0 , thus, we deri ve the lower bound ξ ≥ max 0 , ξ 0 ( η ( u )) . (37) W e now analyze the term a i 51 by plugging into (37) in γ ′ ( ξ ) = γ 0 + γ 1 √ ξ on the right hand side of (32), i.e., a i 51 ≤ − η ( u ) γ 0 ε 6 − η ( u ) γ 1 ε 6 q max 0 , ξ 0 ( η ( u )) + β 3 2 | g i | . (38) W e observ e the two possible outcomes of max 0 , ξ 0 ( η ( u )) , with ξ 0 ( u ) as in (36), and therefore define the partition G 1 : = { x ∈ R d : ∥ g ∥ 2 ≤ η ( u ) 2 4 Lc L } , (39a) G 2 : = { x ∈ R d : ∥ g ∥ 2 > η ( u ) 2 4 Lc L } . (39b) W e no w find an upper bound of the right hand side of (38) ov er both partition sets which holds for all i ∈ 1 : d . First, we analyze the right hand side of (38) for any x ∈ G 1 . In this case, we ha ve max 0 , ξ 0 ( η ( u )) = 0 , and by using that g 2 i ≤ ∥ g ∥ 2 ≤ η ( u ) 2 4 Lc L and with c L = Ld β , we get a i 51 ≤ η ( u ) − ε 6 γ 0 + 2 β L √ d = : a 52 . From (7) it follo ws that γ 0 > 12 β L √ d ε , thus a 52 ≤ 0 . Second, we analyze the right hand side of (32) for any x ∈ G 2 . Then, we have max 0 , ξ 0 ( η ( u )) = ξ 0 ( u ) > 0 , and we use the inequality √ a − b ≥ √ a − √ b on p ξ 0 ( u ) , which holds for any a ≥ b ≥ 0 . Subsequently , by plugging the result into the right hand side of (38) and rearranging the terms, we get a i 51 ≤ η ( u ) ε 6 ( − γ 0 + η ( u ) γ 1 √ c L ) − η ( u ) γ 1 ε 12 √ L ∥ g ∥ + β 3 2 | g i | . (40) From (7) one can directly verify that γ 0 > η 0 γ 1 √ c L . By using that u ≥ 0 , as well as norm inequalities, we further get η ( u ) ∥ g ∥ ≥ η 0 | g i | . By plugging these two lo wer bounds into the right hand side of (40), we get a i 51 ≤ uη ( u ) γ 1 ε √ c L 6 − η 0 γ 1 ε 12 √ L | g i | + β 3 2 | g i | . Finally , by plugging in γ 1 , defined in (7), in the second right hand side term of the abov e relation, we get a i 51 ≤ uη ( u ) γ 1 ε √ c L 6 = : uη ( u ) a 53 . (41) Thus, a i 51 ≤ max { a 52 , uη ( u ) a 53 } = uη ( u ) a 53 , which we then plug into the last right hand side term of (31) to get κ i ≤ − η ( u ) γ ′ ( ξ ) c γ s i − a i 21 − η ( u ) γ ′ ( ξ ) ε 6 q s + i − η ( u ) γ ′ ( ξ ) η 2 + uη ( u ) γ ′ ( ξ ) εL 2 + uη ( u ) a 53 q s + i . (42) Step 1.5: In the following, we finalize the upper bounds for κ i , i ∈ 1 : d , in (42). W e first find an upper and a lower bound for all s + i , i ∈ 1 : d , and then plug them into the right hand side of (42). Starting from (3a) and by using that QM ≥ AM , i.e., − QM ≤ − AM , we have − q s + i = − √ 2 1 √ 2 q (1 − β ) s i + β g 2 i ≤ − √ 2(1 − β ) 2 √ s i − √ 2 β 2 | g i | = : − a 61 √ s i − a 62 | g i | , (43) and further , by using the sub-additivity property of √ · , i.e., √ a + b ≤ √ a + √ b for any a, b ≥ 0 , we get q s + i = q (1 − β ) s i + β g 2 i ≤ p 1 − β √ s i + p β | g i | . (44) Hence, by plugging (43) and (44) into the third and sixth term on the right hand side of (42), respectively , we get κ i ≤ − η ( u ) γ ′ ( ξ ) c γ s i − a i 21 − η ( u ) γ ′ ( ξ ) ε 6 ( a 61 √ s i + a 62 | g i | ) − η ( u ) γ ′ ( ξ ) η 2 + uη ( u ) γ ′ ( ξ ) εL 2 + uη ( u ) a 53 p 1 − β √ s i + uη ( u ) a 53 p β | g i | . By using algebraic manipulations and plugging in a i 21 from (22), we further get κ i ≤ η ( u ) γ ′ ( ξ ) − c γ s i − √ s i ε − η 0 L 2 − ε 6 a 61 √ s i − a 7 | g i | − η 2 + u ( √ s i + ε ) L 2 + ua 53 p 1 − β √ s i γ ′ ( ξ ) + ua 53 p β | g i | γ ′ ( ξ ) , (45) where a 7 : = εa 62 6 . T rivially , it holds that √ s i γ ′ ( ξ ) ≤ √ s i γ 0 due to (6a) and (7), and ξ ≥ 0 . In the following, we upper bound | g i | γ ′ ( ξ ) for all | g i | ≥ 0 . By using (6a), γ 0 , γ 1 > 0 from (7) and ξ 0 ( u ) from (37), we get | g i | γ ′ ( ξ ) = | g i | γ 0 + γ 1 √ ξ ≤ | g i | γ 0 + γ 1 √ max { 0 ,ξ 0 ( η ( u )) } ≤ | g i | γ 0 + γ 1 s max n 0 , 1 4 L g 2 i − η ( u ) 2 c L o , (46) where, to obtain the right hand side upper bound, we used that g 2 i ≤ ∥ g ∥ 2 , and where c L = Ld β . W e observe the two possible outcomes of max n 0 , 1 4 L g 2 i − η ( u ) 2 c L o , and define the follo wing partitions for all i ∈ 1 : d : G i 3 : = { x ∈ R d : g 2 i ≤ η ( u ) 2 4 Lc L } (47a) G i 4 : = { x ∈ R d : g 2 i > η ( u ) 2 4 Lc L } . (47b) W e now find a supremum of the right hand side of (46) over the partition sets which holds for all i ∈ 1 : d . W e first consider any x ∈ G i 3 . W e then hav e g 2 i ≤ η ( u ) 2 4 Lc L , and thereby for (46) we ha ve | g i | γ ′ ( ξ ) ≤ η ( u ) 2 √ Lc L γ 0 = : l 1 ( u ) . (48) Second, we consider any x ∈ G i 4 . W e then hav e g 2 i > η ( u ) 2 4 Lc L . W e use the bound √ a − b ≥ √ a − √ b for any a ≥ b ≥ 0 , and plug η ( u ) = η 0 + u into the right hand side of (46) to get | g i | γ ′ ( ξ ) ≤ | g i | γ 0 − γ 1 η 0 √ c L + γ 1 2 √ L | g i |− uγ 1 √ c L . From (7) we ha ve γ 0 ≥ γ 1 η 0 √ c L + 12 β L √ d ε , thus | g i | γ ′ ( ξ ) ≤ | g i | 12 β L √ d ε + γ 1 2 √ L | g i |− uγ 1 √ c L . (49) W e now analyze the right hand side of (49) as a function of | g i | , and find an upper bound which holds for all g 2 i > η ( u ) 2 4 Lc L , i.e., for all x ∈ G i 4 . First, if 12 β L √ d ε − uγ 1 √ c L ≥ 0 , then the right hand side of (49) is upper bounded by its supremum, which occurs for | g i | → ∞ . Second, if 12 β L √ d ε − uγ 1 √ c L < 0 , then the right hand side of (49) is upper bounded by the supremum at g 2 i = η ( u ) 2 4 Lc L . Thus, we ha ve | g i | γ ′ ( ξ ) ≤ max n 2 √ L γ 1 , η ( u ) c m o ≤ 2 √ L γ 1 + η ( u ) c m = : l 2 ( u ) , (50) where c m := 2 ε √ Lc L 12 β √ d + η 0 γ 1 ε √ c L . Then, we get from (48) and (50) for all x ∈ R d that | g i | γ ′ ( ξ ) ≤ max l 1 ( u ) , l 2 ( u ) ≤ l 1 ( u ) + l 2 ( u ) = : l 3 ( u ) , and further recall that √ s i γ ′ ( ξ ) ≤ √ s i γ 0 . By plugging both upper bounds into the right hand side of (45) and rearranging the terms, we get κ i ≤ η ( u ) γ ′ ( ξ ) − η 2 + ul 5 ( u ) − c γ s i + l 4 ( u ) √ s i − a 7 | g i | , (51a) l 4 ( u ) : = η 0 L 2 − ε − ε 6 a 61 + u L 2 + ua 53 √ 1 − β γ 0 , (51b) l 5 ( u ) : = εL 2 + a 53 p β l 3 ( u ) . (51c) Part 2: In this part, as outlined at the beginning of the proof, we deriv e a suitable upper bound for (21) which depends solely on ∥ g ∥ ∞ , ∥ s ∥ ∞ and u ∈ R ≥ 0 . Step 2.1: In the follo wing, we deri ve an upper and a lower bound for ξ , gi ven in (33), which are dependent only on ∥ g ∥ ∞ and u . From (16) it follows that ξ ≥ 0 , and from norm inequalities that ∥ g ∥ 2 ≥ ∥ g ∥ 2 ∞ . Thus, we lo wer bound the right hand side of (37) as follows: ξ ≥ max n 0 , 1 4 L ∥ g ∥ 2 ∞ − η ( u ) 2 c L o . (52) Next, we find an upper bound for (33). Under Assump- tion 1 ii), we ha ve f 0 ( x + ) − f 0 ( x ) ≤ ∇ f ⊤ 0 ( x )( x + − x ) + L 2 ∥ x + − x ∥ 2 2 , which we re write as in (18) and subsequently plug it into the right hand side of (33). W e then have ξ ≤ f 0 ( x ) − λη ( u ) d X i =1 g 2 i ε + √ s + i + λη ( u ) 2 L 2 d X i =1 g 2 i ( ε + √ s + i ) 2 . W e use 0 ≤ λ ≤ 1 , and, following from (3a) and ε > 0 , that ε + q s + i ≥ p β g 2 i to get ξ ≤ f 0 ( x ) + η ( u ) 2 L 2 d X i =1 g 2 i β g 2 i ≤ f 0 ( x ) + η ( u ) 2 c L 2 , with c L = Ld β . Finally , under Assumption 1 i) with µ > 0 it holds that 2 µf 0 ( x ) ≤ ∥ g ∥ 2 (see, e.g., [19, Theorem 2.1.10, (2.1.24)]), and by further using ∥ g ∥ 2 ≤ d ∥ g ∥ 2 ∞ we get ξ ≤ d 2 µ ∥ g ∥ 2 ∞ + η ( u ) 2 c L 2 . (53) Step 2.2: In the following, we provide two upper bounds for g 2 i κ i ( ε + √ s + i ) 2 ( ε + √ s i ) , i ∈ 1 : d , as given in (21), which depend solely on ∥ g ∥ ∞ , ∥ s ∥ ∞ and u . Observe first in (51a) that the upper bound of κ i contains the term η ( u ) γ ′ ( ξ ) − c γ s i + l 4 ( u ) √ s i + ul 5 ( u ) . By analyzing the left hand side of − c γ s i + l 4 ( u ) √ s i + ul 5 ( u ) ≤ 0 as a quadratic polynomial in √ s i , we get that the left hand side expression attains a maximum for √ s i = l 4 ( u ) 2 c γ . More- ov er , we then get for all s i > l 4 ( u ) 2 c γ + √ l 2 4 ( u )+ ul 5 ( u )4 c γ 2 c γ 2 that κ i is negativ e, and this holds for all | g i | ≥ 0 . If l 4 ( u ) ≥ 0 , this upper bound is realized as s i ≥ 0 . Otherwise, the max- imum is attained for s i = 0 . W e thus obtain for all i ∈ 1 : d that κ i ≤ η ( u ) γ ′ ( ξ ) l 2 4 ( u )max { 0 , sgn( l 4 ( u )) } 4 c γ + ul 5 ( u ) = : η ( u ) γ ′ ( ξ ) ˆ Γ 1 ( u ) , where we used that, trivially , − η 2 < 0 . Moreover , by us- ing that s i ≥ 0 , s + i ≥ 0 and ε > 0 , as well as s + i ≥ β g 2 i from (3a), we get g 2 i κ i ε + √ s + i 2 ( ε + √ s i ) ≤ g 2 i η ( u ) γ ′ ( ξ ) ˆ Γ 1 ( u ) εβ g 2 i = η ( u ) γ ′ ( ξ ) ˆ Γ 2 ( u ) , (54) where ˆ Γ 2 ( u ) : = ˆ Γ 1 ( u ) εβ . From (51b) and (51c) it can be directly verified that ˆ Γ 2 is continuous and positiv e definite on [0 , ∞ ) . By further checking (48) and (50) it can be veri- fied that the terms in ˆ Γ 2 are either monotonically increasing or strictly monotonically increasing on [0 , ∞ ) . Thus, we ha ve that ˆ Γ 2 is strictly monotonically increasing on [0 , ∞ ) . W ith (54), we established our first upper bound. W e no w provide a second upper bound for g 2 i κ i ( ε + √ s + i ) 2 ( ε + √ s i ) , i ∈ 1 : d . By rearranging the right hand side of (51a), we ha ve κ i ≤ − η ( u ) γ ′ ( ξ ) ( c γ s i + a 7 | g i | + η 2 ) + η ( u ) γ ′ ( ξ ) ( l 4 ( u ) √ s i + ul 5 ( u )) . Multiplying both sides with g 2 i ( ε + √ s + i ) 2 ( ε + √ s i ) ≥ 0 , we get g 2 i κ i ε + √ s + i 2 ( ε + √ s i ) ≤ − g 2 i η ( u ) γ ′ ( ξ )( c γ s i + a 7 | g i | + η 2 ) ε + √ s + i 2 ( ε + √ s i ) + g 2 i η ( u ) γ ′ ( ξ )( l 4 ( u ) √ s i + ul 5 ( u )) ε + √ s + i 2 ( ε + √ s i ) . (55) W e now find an upper bound for the second right hand side term of (55). By using s i ≥ 0 and, from (51c), that l 5 is positiv e on [0 , ∞ ) , we get l 4 ( u ) √ s i ε + √ s i ≤ max 0 , l 4 ( u ) and ul 5 ( u ) ε + √ s i ≤ ul 5 ( u ) ε . Moreov er , from (3a) we get s + i ≥ β g 2 i which we plug into η ( u ) γ ′ ( ξ ) g 2 i ( ε + √ s + i ) 2 along with ε > 0 to get η ( u ) γ ′ ( ξ ) g 2 i ε + √ s + i 2 ≤ η ( u ) γ ′ ( ξ ) g 2 i ε + √ β g 2 i 2 ≤ η ( u ) γ ′ ( ξ ) g 2 i β g 2 i = η ( u ) γ ′ ( ξ ) β . By plugging the above inequalities into the second right hand side term of (55), we get g 2 i κ i ε + √ s + i 2 ( ε + √ s i ) ≤ η ( u ) γ ′ ( ξ ) max { 0 ,l 4 ( u ) } β + uη ( u ) γ ′ ( ξ ) l 5 ( u ) εβ − η ( u ) γ ′ ( ξ ) g 2 i ( c γ s i + a 7 | g i | + η 2 ) ε + √ s + i 2 ( ε + √ s i ) . (56) Step 2.3: In the following we deriv e an upper bound for the right hand side of (21) dependent only on ∥ g ∥ ∞ , ∥ s ∥ ∞ and u by using the results from Step 2.1 and Step 2.2. W e start by analyzing the second right hand side sum of (21), and rearranging it as d X i =1 g 2 i κ i ε + √ s + i 2 ( ε + √ s i ) = g 2 m κ m ε + √ s + m 2 ( ε + √ s m ) + d X j =1 j = m g 2 j κ j ε + q s + j 2 ( ε + √ s j ) , where m ∈ argmax i ∈ 1: d | g i | . In the above equation, for each term in the sum on the right hand side we use the upper bound in (54), and for the first right hand side term (i.e., the m -th component) we use the upper bound in (56). Then, we get d X i =1 g 2 i κ i ε + √ s + i 2 ( ε + √ s i ) ≤ η ( u ) γ ′ ( ξ ) max { 0 ,l 4 ( u ) } β + uη ( u ) γ ′ ( ξ ) l 5 ( u ) εβ − η ( u ) γ ′ ( ξ ) ∥ g ∥ 2 ∞ ( c γ s m + a 7 ∥ g ∥ ∞ + η 2 ) ε + √ s + m 2 ( ε + √ s m ) + η ( u ) γ ′ ( ξ )Γ( u ) , (57) where we used that | g m | = ∥ g ∥ ∞ and where Γ( u ) : = ( d − 1) ˆ Γ 2 ( u ) . Next, we analyze the first right hand side sum of (21), which we remember is negati ve definite in s ∈ R d ≥ 0 . It therefore tri vially holds that − β d X i =1 s i ε + √ s i ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ . (58) W e now provide an upper bound for ∆ V . By plugging (57) and (58) into the right hand side of (21), we get ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ + p 1 ( u ) γ ′ ( ξ ) − η ( u ) γ ′ ( ξ ) ∥ g ∥ 2 ∞ ( c γ s m + a 7 ∥ g ∥ ∞ + η 2 ) ε + √ s + m 2 ( ε + √ s m ) , (59) where p 1 ( u ) : = η ( u )Γ( u ) + η ( u ) max { 0 ,l 4 ( u ) } β + uη ( u ) l 5 ( u ) εβ . (60) W e note that p 1 is positi ve definite, continuous and strictly monotonically increasing on [0 , ∞ ) . T o obtain the desired upper bound of (21), we also use that s + m = (1 − β ) s m + β ∥ g ∥ 2 ∞ ≤ (1 − β ) ∥ s ∥ ∞ + β ∥ g ∥ 2 ∞ , and that η ( u ) ≥ η 0 , due to u ≥ 0 , and subsequently plug both inequalities into the third right hand side term of (59). Then, we get ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ + p 1 ( u ) γ ′ ( ξ ) − η 0 γ ′ ( ξ ) ∥ g ∥ 2 ∞ ( c γ s m + a 7 ∥ g ∥ ∞ + η 2 ) ε + √ (1 − β ) ∥ s ∥ ∞ + β ∥ g ∥ 2 ∞ 2 ( ε + √ s m ) . (61) In the following we eliminate the dependence of (61) on ξ . W e first focus on the term p 1 ( u ) γ ′ ( ξ ) in (61). W e use from (6a) that γ ′ ( ξ ) = γ 0 + γ 1 √ ξ and plug in the upper bound on ξ from (53), as well as use the sub-additi vity property of √ · . Then, γ ′ ( ξ ) = γ 0 + γ 1 p ξ ≤ γ 0 + ˆ γ 1 ∥ g ∥ ∞ + η ( u ) ˆ γ 2 , (62) with ˆ γ 1 : = γ 1 √ d √ 2 µ and ˆ γ 2 : = γ 1 √ c L √ 2 . Next, we focus on γ ′ ( ξ ) in the third right hand side term of (61) and use the lo wer bound on ξ from (52). Then, γ ′ ( ξ ) ≥ ρ 1 ( ∥ g ∥ ∞ , u ) : = γ 0 + γ 1 q max 0 , 1 4 L ∥ g ∥ 2 ∞ − η ( u ) 2 c L . (63) Thereafter , by plugging (62) and (63) into the second and third right hand side terms of (61), respectively , we get ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ + p 1 ( u ) ( γ 0 + ˆ γ 1 ∥ g ∥ ∞ + η ( u ) ˆ γ 2 ) − η 0 ρ 1 ( ∥ g ∥ ∞ ,u ) ∥ g ∥ 2 ∞ ( c γ s m + a 7 ∥ g ∥ ∞ + η 2 ) ε + √ (1 − β ) ∥ s ∥ ∞ + β ∥ g ∥ 2 ∞ 2 ( ε + √ s m ) . (64) Finally , we leav e it to the reader to verify that − c γ s m + a 7 ∥ g ∥ ∞ + η 2 ε + √ s m ≤ max s m ∈ R ≥ 0 − c γ s m + a 7 ∥ g ∥ ∞ + η 2 ε + √ s m = − 2 c γ q ε 2 + a 7 c γ ∥ g ∥ ∞ + η 2 c γ − ε = : − ρ 2 ( ∥ g ∥ ∞ ) , (65) where the right hand side’ s maximum is attained at s m = q ε 2 + a 9 c γ ∥ g ∥ ∞ + η 2 c γ − ε 2 . Then, for (64) we get the desired upper bound ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ + p 2 ( u ) + p 3 ( u ) ∥ g ∥ ∞ − η 0 ρ 1 ( ∥ g ∥ ∞ ,u ) ∥ g ∥ 2 ∞ ρ 2 ( ∥ g ∥ ∞ ) ε + √ (1 − β ) ∥ s ∥ ∞ + β ∥ g ∥ 2 ∞ 2 , (66) where p 2 ( u ) : = p 1 ( u ) ( γ 0 + η ( u ) ˆ γ 2 ) , (67a) p 3 ( u ) : = p 1 ( u ) ˆ γ 1 . (67b) Part 3: In the following, we analyze the right hand side of (66) for a bipartition of the state space, defined as S 1 : = { ( x, s ) ∈ R d × R d ≥ 0 : ∥ s ∥ ∞ ≥ ( L ∥ y ∥ ) 2+2 q } , (68a) S 2 : = { ( x, s ) ∈ R d × R d ≥ 0 : ∥ s ∥ ∞ < ( L ∥ y ∥ ) 2+2 q } , (68b) with q ∈ 0 , 1 4 and y : = x − x ∗ , and where S 1 ∪ S 2 = R d × R d ≥ 0 . For each region we find upper bounds for ∆ V , from which we further infer that for all ( x, s ) ∈ R d × R d ≥ 0 and any u ∈ R ≥ 0 , we hav e ∆ V ≤ − α V ( ∥ ( y , s ) ∥ ∞ ) + χ V ( u ) , (69) where α V , χ V ∈ K ∞ . Step 3.1: Region S 1 : Let ( x, s ) ∈ S 1 as defined in (68). Starting from (66), in the following we deriv e an upper bound for ∆ V for all ( x, s ) ∈ S 1 . Under Assumption 1 ii), by using norm inequalities one can show that L ∥ y ∥ ≥ ∥ g ∥ ∞ . Then, for all ( x, s ) ∈ S 1 we hav e ∥ s ∥ ∞ ≥ ( L ∥ y ∥ ) 2+2 q ≥ ∥ g ∥ 2+2 q ∞ . Next, we split the first right hand side term of (66) as − ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ = − 3 X j =1 1 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ , and observe that the right hand side functions are indeed monotonically decreasing on [0 , ∞ ) . By us- ing ∥ s ∥ ∞ ≥ ∥ g ∥ 2+2 q ∞ , we get − 1 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ ≤ − 1 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ , and moreov er , − ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ ≤ − 1 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − 2 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ . (70) Now , observe that the last right hand side term of (66) is neg ativ e definite, and thereby upper bounded by zero. Further , we plug (70) into the first right hand side term of (66), and thus get ∆ V ≤ − β 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − 2 β 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ + p 2 ( u ) + p 3 ( u ) ∥ g ∥ ∞ . (71) Next, we focus on p 3 ( u ) ∥ g ∥ ∞ . Consider the expression ρ 3 ( ∥ g ∥ ∞ , u ) : = − β 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ + p 3 ( u ) ∥ g ∥ ∞ . (72) Then, (71) takes the form ∆ V ≤ − β 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − β 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ + p 2 ( u ) + ρ 3 ( ∥ g ∥ ∞ , u ) . (73) The negati ve term in (72) grows at a rate of ∥ g ∥ 1+ q ∞ , which dominates the term linear in ∥ g ∥ ∞ if ∥ g ∥ ∞ is suf- ficiently large with respect to u . W e now show that one can find some v 1 ( u ) ≥ 0 such that for all ∥ g ∥ ∞ ≥ v 1 ( u ) it holds that ρ 3 ( ∥ g ∥ ∞ , u ) ≤ 0 , and for all ∥ g ∥ ∞ ≥ 0 that ρ 3 ( ∥ g ∥ ∞ , u ) ≤ p 3 ( u ) v 1 ( u ) . W e set v 1 ( u ) : = max ( ε 1 1+ q , 6 p 3 ( u ) β 1 q ) , (74) which is positi ve and continuous on [0 , ∞ ) . In the following, we show that for all ∥ g ∥ ∞ ≥ v 1 ( u ) it holds that ρ 3 ( ∥ g ∥ ∞ , u ) ≤ 0 . As we assume that ∥ g ∥ ∞ ≥ v 1 ( u ) , from (74) we further get ε ≤ ∥ g ∥ 1+ q ∞ . By plugging this inequality into the denominator of the negati ve definite term on the right hand side of (72), we get − β 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ ≤ − β 6 ∥ g ∥ 1+ q ∞ . By plugging the obtained inequality into the right hand side of (72), we get ρ 3 ( ∥ g ∥ ∞ , u ) ≤ − β 6 ∥ g ∥ 1+ q ∞ + p 3 ( u ) ∥ g ∥ ∞ . (75) Since by assumption we have ∥ g ∥ ∞ ≥ v 1 ( u ) , from (74) we get ∥ g ∥ ∞ ≥ 6 p 3 ( u ) β 1 q , i.e., p 3 ( u ) ≤ β 6 ∥ g ∥ ∞ q . By plugging this inequality into the right hand side of (75), we get ρ 3 ( ∥ g ∥ ∞ , u ) ≤ 0 . (76) Finally , we sho w for all ∥ g ∥ ∞ ≥ 0 that ρ 3 ( ∥ g ∥ ∞ , u ) ≤ p 3 ( u ) v 1 ( u ) . Remember that ρ 3 ( · , u ) is continuous on [0 , v 1 ( u )) , with the first term in (72) being negati ve definite, and thus upper bounded by zero, and the second term monotonically increasing. Therefore, for ∥ g ∥ ∞ ∈ [0 , v 1 ( u )) we get that ρ 3 ( ∥ g ∥ ∞ , u ) ≤ p 3 ( u ) v 1 ( u ) , and, trivially , due to the result obtained for all ∥ g ∥ ∞ ≥ v 1 ( u ) in (76), we get for all ∥ g ∥ ∞ ≥ 0 that ρ 3 ( ∥ g ∥ ∞ , u ) ≤ max p 3 ( u ) v 1 ( u ) , 0 = p 3 ( u ) v 1 ( u ) , which we then plug into the right hand side of (73) to get ∆ V ≤ − β 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − β 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ + p 2 ( u ) + p 3 ( u ) v 1 ( u ) . (77) It only remains to provide an upper bound for (77) depen- dent on ∥ y ∥ ∞ , ∥ s ∥ ∞ and u . W e use here that − β 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ is monotonically decreasing in ∥ g ∥ ∞ . Under Assumption 1 i), by using [19, Theorem 2.1.10, (2.1.26)] and norm inequal- ities, we hav e that the inequality ∥ g ∥ ∞ 2 ≥ µ 2 ∥ y ∥ 2 ∞ d applies, thereby obtaining − β 3 ∥ g ∥ 2+2 q ∞ ε + ∥ g ∥ 1+ q ∞ ≤ − β 3 ˆ r 2+2 q 1 ∥ y ∥ 2+2 q ∞ ε + ˆ r 1+ q 1 ∥ y ∥ 1+ q ∞ , where ˆ r 1 : = µ √ d . By plugging the above inequality into the right hand side of (77), we get ∆ V ≤ − β 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − β 3 ˆ r 2+2 q 1 ∥ y ∥ 2+2 q ∞ ε + ˆ r 1+ q 1 ∥ y ∥ 1+ q ∞ + p 2 ( u ) + p 3 ( u ) v 1 ( u ) = : − ψ 11 ( ∥ s ∥ ∞ ) − ψ 12 ( ∥ y ∥ ∞ ) + χ 1 ( u ) . (78) Step 3.2: Region S 2 : Let ( x, s ) ∈ S 2 , with S 2 as defined in (68). Starting from (66), in the following we deri ve an upper bound for ∆ V for all ( x, s ) ∈ S 2 . First, observe for all ( x, s ) ∈ S 2 that ∥ s ∥ ∞ < ( L ∥ y ∥ ) 2+2 q . Second, under Assumption 1 i) with µ > 0 , by using [19, Theorem 2.1.10, (2.1.26)] and norm inequalities, it fol- lows that ∥ g ∥ 2 ∞ ≥ ˆ r 2 1 ∥ y ∥ 2 holds, with ˆ r 1 = µ √ d . Combin- ing these two inequalities gi ves us ∥ s ∥ ∞ < ˆ r 2 ∥ g ∥ 2+2 q ∞ for all ( x, s ) ∈ S 2 , where ˆ r 2 : = L 2 d µ 2 1+ q , which we plug into the denominator of the last right hand side term of (66) to get ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ + p 2 ( u ) + p 3 ( u ) ∥ g ∥ ∞ − 2 η 0 2 ρ 1 ( ∥ g ∥ ∞ ,u ) ∥ g ∥ 2 ∞ ρ 2 ( ∥ g ∥ ∞ ) ε + √ ˆ r 3 ∥ g ∥ 2+2 q ∞ + β ∥ g ∥ 2 ∞ 2 , (79) where ˆ r 3 : = (1 − β ) ˆ r 2 . Next, we focus on p 3 ( u ) ∥ g ∥ ∞ . Consider the expression ρ 4 ( ∥ g ∥ ∞ , u ) : = − η 0 2 ρ 1 ( ∥ g ∥ ∞ ,u ) ∥ g ∥ 2 ∞ ρ 2 ( ∥ g ∥ ∞ ) ε + √ ˆ r 3 ∥ g ∥ 2+2 q ∞ + β ∥ g ∥ 2 ∞ 2 + p 3 ( u ) ∥ g ∥ ∞ . (80) Then, (79) takes the form ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ + p 2 ( u ) + ρ 4 ( ∥ g ∥ ∞ , u ) − η 0 2 ρ 1 ( ∥ g ∥ ∞ ,u ) ∥ g ∥ 2 ∞ ρ 2 ( ∥ g ∥ ∞ ) ε + √ ˆ r 3 ∥ g ∥ 2+2 q ∞ + β ∥ g ∥ 2 ∞ 2 , (81) The negativ e definite term on the right hand side of (80), for ∥ g ∥ ∞ sufficiently large with respect to u , grows with a rate of ∥ g ∥ 1+2+ 1 2 − 2 − 2 q ∞ = ∥ g ∥ 1+ 1 2 − 2 q ∞ . The reader can verify this by direct analysis of (63) and (65). As 2 q < 1 2 holds due to the choice of q in (68), this implies that ∥ g ∥ 1+ 1 2 − 2 q ∞ grows with a rate greater than ∥ g ∥ ∞ and thus, for a sufficiently large ∥ g ∥ ∞ dominates the term linear in ∥ g ∥ ∞ . Similarly to the analysis done for Re gion S 1 , we no w show that there exists some v 2 ( u ) ≥ 0 such that for all ∥ g ∥ ∞ ≥ v 2 ( u ) it holds that ρ 4 ( ∥ g ∥ ∞ , u ) ≤ 0 , and moreov er , for all ∥ g ∥ ∞ ≥ 0 that ρ 4 ( ∥ g ∥ ∞ , u ) ≤ p 3 ( u ) v 2 ( u ) . W e set v 2 ( u ) : = max ( η ( u )2 p 2 Lc L , ε 2 c γ a 7 , ε + ϵ 2 √ ˆ r 3 + ϵ 1 1 1+ q , p 3 ( u ) 8( √ 2+1) √ 2 L ( ˆ r 3 + ϵ 1 ) η 0 γ 1 √ a 7 c γ 2 1 − 4 q ) , (82) which is positi ve and continuous on [0 , ∞ ) , and where ϵ 1 and ϵ 2 are positi ve constants such that for all ∥ g ∥ ∞ ≥ 0 β ∥ g ∥ 2 ∞ ≤ ϵ 1 ∥ g ∥ 2+2 q ∞ + ϵ 2 2 . (83) One such pair of constants are, for example, ϵ 1 = β and ϵ 2 = q q β (1 + q ) − q +1 q . In the following, we show that for all ∥ g ∥ ∞ ≥ v 2 ( u ) we hav e ρ 4 ( ∥ g ∥ ∞ , u ) ≤ 0 . W e start by lower bounding ρ 1 , defined in (63). Since (7) implies γ 0 > 0 , we get ρ 1 ( ∥ g ∥ ∞ , u ) > γ 1 r max n 0 , 1 4 L ∥ g ∥ 2 ∞ − η ( u ) 2 c L o . (84) Now , by assumption we hav e ∥ g ∥ ∞ ≥ v 2 ( u ) . From (82) it then follows that ∥ g ∥ ∞ ≥ η ( u )2 √ 2 Lc L , from where we get η ( u ) 2 c L ≤ ∥ g ∥ 2 ∞ 8 L . By plugging this inequality into the right hand side of (84), we get ρ 1 ( ∥ g ∥ ∞ , u ) > γ 1 q ∥ g ∥ 2 ∞ 4 L − ∥ g ∥ 2 ∞ 8 L = γ 1 2 √ 2 L ∥ g ∥ ∞ . (85) Next, we upper bound the denominator in the first term on the right hand side of (80). By plugging (83) into the said denominator , we get q ˆ r 3 ∥ g ∥ 2+2 q ∞ + β ∥ g ∥ 2 ∞ ≤ q ( ˆ r 3 + ϵ 1 ) ∥ g ∥ 2+2 q ∞ + ϵ 2 2 . By using the sub-additi vity property of √ · , we further get q ( ˆ r 3 + ϵ 1 ) ∥ g ∥ 2+2 q ∞ + ϵ 2 2 ≤ p ˆ r 3 + ϵ 1 ∥ g ∥ 1+ q ∞ + ϵ 2 . (86) Next, by assumption, we have ∥ g ∥ ∞ ≥ v 2 ( u ) , and from (82) we further get ∥ g ∥ ∞ ≥ ε + ϵ 2 √ ˆ r 3 + ϵ 1 1 1+ q . Then, we obtain that ε + ϵ 2 ≤ p ˆ r 3 + ϵ 1 ∥ g ∥ 1+ q ∞ . (87) By using (86) and (87), for the denominator of the first term on the right hand side of (80) we get ε + q ˆ r 3 ∥ g ∥ 2+2 q ∞ + β ∥ g ∥ 2 ∞ 2 ≤ ε + ϵ 2 + p ˆ r 3 + ϵ 1 ∥ g ∥ 1+ q ∞ 2 ≤ 4 ( ˆ r 3 + ϵ 1 ) ∥ g ∥ 2+2 q ∞ . (88) Further , we lo wer bound ρ 2 , defined in (65). W e first observe that the inequality ρ 2 ( ∥ g ∥ ∞ ) ≥ 2 c γ q ε 2 + a 7 c γ ∥ g ∥ ∞ − ε holds since η 2 c γ > 0 , and subsequently use that a − b = a 2 − b 2 a + b to get ρ 2 ( ∥ g ∥ ∞ ) ≥ 2 a 7 ∥ g ∥ ∞ r ε 2 + a 7 c γ ∥ g ∥ ∞ + ε . (89) By assumption, we have ∥ g ∥ ∞ ≥ v 2 ( u ) , with v 2 ( u ) de- fined in (82), which implies ∥ g ∥ ∞ ≥ ε 2 c γ a 7 . Thus, we get ε 2 ≤ a 7 ∥ g ∥ ∞ c γ , which we plug into the right hand side of (89) to get ρ 2 ( ∥ g ∥ ∞ ) ≥ 2 a 7 ∥ g ∥ ∞ ( √ 2+1) r a 7 c γ ∥ g ∥ ∞ = 2 √ a 7 c γ √ 2+1 p ∥ g ∥ ∞ . (90) Finally , by plugging in (85), (88) and (90) into the right hand side of (80) and by using elementary algebraic inequalities, we get ρ 4 ( ∥ g ∥ ∞ , u ) ≤∥ g ∥ ∞ − η 0 γ 1 √ a 7 c γ ∥ g ∥ 1 2 − 2 q ∞ 8( √ 2 + 1) √ 2 L ( ˆ r 3 + ϵ 1 ) + p 3 ( u ) . By assumption, we ha ve ∥ g ∥ ∞ ≥ v 2 ( u ) , with v 2 ( u ) defined in (82), which im- plies ∥ g ∥ ∞ ≥ p 3 ( u ) 8( √ 2+1) √ 2 L ( ˆ r 3 + ϵ 1 ) η 0 γ 1 √ a 7 c γ 2 1 − 4 q . By plugging this into the abov e inequality , we get ρ 4 ( ∥ g ∥ ∞ , u ) ≤ 0 . (91) Finally , we sho w for all ∥ g ∥ ∞ ≥ 0 that ρ 4 ( ∥ g ∥ ∞ , u ) ≤ p 3 ( u ) v 2 ( u ) . Remember that ρ 4 ( · , u ) is continuous on [0 , v 2 ( u )) , with the first right hand side term of (72) being negati ve definite, and thus upper bounded by zero, and the second term is monotonically increasing. Therefore, for ∥ g ∥ ∞ ∈ [0 , v 2 ( u )) we get that ρ 4 ( ∥ g ∥ ∞ , u ) ≤ p 3 ( u ) v 2 ( u ) , and trivially , due to the result obtained for all ∥ g ∥ ∞ ≥ v 2 ( u ) in (91), for all ∥ g ∥ ∞ ≥ 0 we get ρ 4 ( ∥ g ∥ ∞ , u ) ≤ max p 3 ( u ) v 2 ( u ) , 0 = p 3 ( u ) v 2 ( u ) . By plugging this inequality into the right hand side of (81), we get ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ + p 2 ( u ) + p 3 ( u ) v 2 ( u ) − η 0 2 ρ 1 ( ∥ g ∥ ∞ ,u ) ∥ g ∥ 2 ∞ ρ 2 ( ∥ g ∥ ∞ ) ε + √ ˆ r 3 ∥ g ∥ 2+2 q ∞ + β ∥ g ∥ 2 ∞ 2 . (92) It only remains to provide an upper bound for (92) dependent on ∥ y ∥ ∞ , ∥ s ∥ ∞ and u . W e first observe from (7) and (63) that ρ 1 ( ∥ g ∥ ∞ , u ) ≥ γ 0 > 0 for all ∥ g ∥ ∞ ≥ 0 and u ∈ R ≥ 0 . Second, we observe that ρ 2 , defined in (65), is a monotonically increasing function on [0 , ∞ ) . W e use the inequality µ 2 ∥ y ∥ 2 ∞ d ≤ ∥ g ∥ ∞ 2 , which holds due to Assump- tion 1 i) with µ > 0 . Then, for the numerator of the last term of (92) we get ρ 1 ( ∥ g ∥ ∞ , u ) ∥ g ∥ 2 ∞ ρ 2 ( ∥ g ∥ ∞ ) ≥ γ 0 ˆ r 2 1 ∥ y ∥ 2 ∞ ρ 2 ( ˆ r 1 ∥ y ∥ ∞ ) , (93) where ˆ r 1 = µ √ d . W e next use that ∥ g ∥ 2 ∞ ≤ L 2 d ∥ y ∥ 2 ∞ which holds under Assumption 1 ii), and plug it in the denominator of the last right hand side term of (92) to get ˆ r 3 ∥ g ∥ 2+2 q ∞ + β ∥ g ∥ 2 ∞ ≤ ˆ r 4 ∥ y ∥ 2+2 q ∞ + ˆ r 5 ∥ y ∥ 2 ∞ , (94) where ˆ r 4 : = ˆ r 3 ( L 2 d ) 1+ q and ˆ r 5 : = β L 2 d . By plugging (93) and (94) into the last term of (92), we get ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − η 0 γ 0 ˆ r 2 1 2 ∥ y ∥ 2 ∞ ρ 2 ( ˆ r 1 ∥ y ∥ ∞ ) ε + √ ˆ r 4 ∥ y ∥ 2+2 q ∞ + ˆ r 5 ∥ y ∥ 2 ∞ 2 + p 2 ( u ) + p 3 ( u ) v 2 ( u ) = : − ψ 21 ( ∥ s ∥ ∞ ) − ψ 22 ( ∥ y ∥ ∞ ) + χ 2 ( u ) . (95) Step 3.3: Full State Space S 1 ∪ S 2 : In the follow- ing, we prove the claim of Theorem 1. First, we show global asymptotic stability of algorithm (3) according to [21, Theorem 3.4.6.] for u = 0 and with respect to the equilibrium ( x ∗ , 0) . Second, we sho w ISS of algorithm (3) for any u ( · ) ∈ U ≥ 0 with respect to the equilibrium ( x ∗ , 0) according to [20, Definition 3.2.], adapted for the considered setup, by analyzing ∆ V for any u ∈ R ≥ 0 . W e start by finding an upper bound for ∆ V for all ( x, s ) ∈ R d × R d ≥ 0 based on the the upper bounds for ∆ V from steps 3.1 and 3.2. Observe that, for ( x, s ) ∈ S j , j ∈ { 1 , 2 } , as giv en in (78) and (95), we get an upper bound of the type ∆ V ≤ − ψ j 1 ( ∥ s ∥ ∞ ) − ψ j 2 ( ∥ y ∥ ∞ ) + χ j ( u ) . It can be verified that the functions ψ 11 , ψ 12 and ψ 21 are continuous, positive definite, strictly monotonically increas- ing and radially unbounded on [0 , ∞ ) , thus, they belong to class K ∞ . Moreov er , it can be v erified that ψ 22 is continuous, positiv e definite and radially unbounded on [0 , ∞ ) . The latter holds true as the numerator in ψ 22 ( ∥ y ∥ ∞ ) gro ws at a rate of ∥ y ∥ 5 2 ∞ and the denominator at a rate of ∥ y ∥ 2+2 q ∞ with q ∈ 0 , 1 4 . Then, from [22, Lemma 4.3] it follows that there exists some ψ 23 ∈ K ∞ , such that for all z ∈ R ≥ 0 it holds that ψ 23 ( z ) ≤ ψ 22 ( z ) . W e remark here that the proof of [22, Lemma 4.3] applies directly for D = R ≥ 0 by substituting ∥ x ∥ with x . One such candidate is ψ 23 ( z ) := − η 0 γ 0 ˆ r 2 1 c γ 4 ε 2 z 2 q ε 2 + a 7 c γ ˆ r 1 z − ε , z ≤ z ∗ − η 0 γ 0 ˆ r 2 1 c γ 4 r ε 2 + a 7 c γ ˆ r 1 z − ε ˆ r 4 z 2 q + ˆ r 5 , z > z ∗ , where z ∗ ∈ R ≥ 0 is the unique real root of ε = z √ ˆ r 4 z 2 q + ˆ r 5 . The lower bound ψ 23 is obtained by analyzing the de- nominator of ψ 22 . In particular , for any z ≥ 0 such that ε ≥ z √ ˆ r 4 z 2 q + ˆ r 5 , we upper bound the denominator by 4 ε 2 , while for any z ≥ 0 such that ε < z √ ˆ r 4 z 2 q + ˆ r 5 , we upper bound the denominator by 4 z 2 ( ˆ r 4 z 2 q + ˆ r 5 ) . W e now use that, for any ψ 3 , ψ 4 : R ≥ 0 → R ≥ 0 and any a, b ∈ R ≥ 0 , it holds that ψ 3 ( a ) + ψ 4 ( b ) ≥ min ( ψ 3 ( ∥ ( a, b ) ∥ ∞ ) , ψ 4 ( ∥ ( a, b ) ∥ ∞ )) . This property can be shown by a simple case analysis and is left for the reader to check. Then, we get that ψ 11 ( a ) + ψ 12 ( b ) ≥ min ( ψ 11 ( ∥ ( a, b ) ∥ ∞ ) , ψ 12 ( ∥ ( a, b ) ∥ ∞ )) = : α 1 ( ∥ ( a, b ) ∥ ∞ ) , ψ 21 ( a ) + ψ 22 ( b ) ≥ ψ 21 ( a ) + ψ 23 ( b ) ≥ min ( ψ 21 ( ∥ ( a, b ) ∥ ∞ ) , ψ 23 ( ∥ ( a, b ) ∥ ∞ )) = : α 2 ( ∥ ( a, b ) ∥ ∞ ) . Substituting a = ∥ s ∥ ∞ and b = ∥ y ∥ ∞ , and observing that ∥ ( ∥ y ∥ ∞ , ∥ s ∥ ∞ ) ∥ ∞ = ∥ ( y , s ) ∥ ∞ , we get for all ( x, s ) ∈ S j , j ∈ { 1 , 2 } that ∆ V ≤ − α j ( ∥ ( y , s ) ∥ ∞ ) + χ j ( u ) , where α j ∈ K ∞ , j ∈ { 1 , 2 } , as the minimum of K ∞ -functions belongs to class K ∞ . Further , for all ( x, s ) ∈ R d × R d ≥ 0 it holds that ∆ V ≤ − min i ∈{ 1 , 2 } α i ( ∥ ( y , s ) ∥ ∞ ) + max j ∈{ 1 , 2 } χ j ( u ) = : − α V ( ∥ ( y , s ) ∥ ∞ ) + χ V ( u ) , (96) where α V ∈ K ∞ . This holds true as the minimum of K ∞ - functions belongs to class K ∞ . For the sake of completeness, we now present the exact form of α V . For any z ≥ 0 , it reads as α V ( z ) = min (( τ 1 z 2 √ ε 2 + τ 2 z − ε , z ≤ z ∗ τ 3 √ ε 2 + τ 4 z − ε τ 5 z 2 q + τ 6 , z > z ∗ , τ 7 z τ 8 + √ z ) , where τ 1 through τ 8 are positi ve constants and z ∗ ∈ R ≥ 0 is the unique real root of ε = z √ τ 5 z 2 q + τ 6 . Further , by expanding χ V by using (78) and (95), we get χ V ( u ) = p 2 ( u ) + p 3 ( u ) max j ∈{ 1 , 2 } v j ( u ) , with p 2 and p 3 as defined in (67). It can be verified that p 2 and p 3 are continuous, positi ve definite, strictly monotonically increasing and radially unbounded on [0 , ∞ ) . In addition, both v 1 and v 2 , defined in (74) and (82), respec- tiv ely , are positiv e, continuous and monotonically increasing on [0 , ∞ ) . Therefore, we hav e that χ V ∈ K ∞ . In the following, we establish the stability and ISS of algorithm (3). First, we show global asymptotic sta- bility of the zero-input system ( u = 0 ) with respect to the equilibrium ( x ∗ , 0) , which lies on the bound- ary of R d × R d ≥ 0 . W e apply [21, Theorem 3.4.6.] with X : = R d × R d ≥ 0 , A : = R d × R d ≥ 0 , M : = { ( x ∗ , 0) } and the distance metric d induced by the infinity norm. Under this setup, Lemma 1 satisfies condition (3.64) in [21, Theorem 3.4.6.]. Moreov er , for u = 0 , for (96) we hav e ∆ V ≤ − α V ( ∥ ( y , s ) ∥ ∞ ) as χ V (0) = 0 . Thus, condition (3.65) in [21, Theorem 3.4.6.] is satisfied, and thereby , it follows that algorithm (3) is globally asymptotically stable with respect to the equilib- rium ( x ∗ , 0) . Next, we consider any u ∈ R ≥ 0 . W e inv oke [20, Defi- nition 3.2.] adapted to the state space R d × R d ≥ 0 , equilib- rium ( x ∗ , 0) for the zero-input system, and the distance metric induced by the infinity norm. From Lemma 1 and (96) it directly follows that the conditions (5) and (6) in the adapted ISS definition in [20, Definition 3.2.] are satisfied for all ( x, s ) ∈ R d × R d ≥ 0 and all u ∈ R ≥ 0 . Consequently , ∆ V , defined in (13), satisfies an ISS decrease condition, and moreov er , V , defined in (5), is an ISS-L yapunov function for algorithm (3). Then, by [20, Lemma 3.5.], algorithm (3) is ISS for ev ery u ( · ) ∈ U ≥ 0 with respect to the equi- librium ( x ∗ , 0) . W e now provide some implications and discussions about the main result of this paper . Remark 1 . The stability of algorithm (3) can be also ap- proached by a cascade-lik e argument. Observe that the s - subsystem is driven by the x -subsystem via the input ∇ f ( x ) . Con versely , the x -subsystem is driv en by the s -subsystem and this direction of coupling has a stabilizing effect. There- fore, cascade-like stability arguments appear to be applicable for studying the stability of RMSProp. In particular , for a continuous-time version of RMSProp (see, e.g., [23]), such an approach seems rather straightforward. Nevertheless, the av ailability of an ISS-L yapunov function is conceptually ap- pealing and offers some benefits, for example, when dealing with related classes of algorithms. Remark 2 . The presented ISS-L yapunov function in (5) can be used to discuss the potential trade-of f between higher con ver gence rates and larger steady-state errors. In particular , the inequality (96) establishes ISS in terms of the (gain-like) functions α V and χ V . Notably , α V in (96) is independent of the input u . Ho we ver , it is possible to get sharper inequalities. In particular , for all ( x, s ) ∈ S 1 we hav e ∆ V ≤ − β 3 ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − η ( u ) 2 γ 0 ˆ r 2 1 ∥ y ∥ 2 ∞ ρ 2 ( ˆ r 1 ∥ y ∥ ∞ ) ε + √ (1 − β ) ∥ s ∥ ∞ + ˆ r 5 ∥ y ∥ 2 ∞ 2 − β 3 ˆ r 2+2 q 1 ∥ y ∥ 2+2 q ∞ ε + ˆ r 1+ q 1 ∥ y ∥ 1+ q ∞ + p 2 ( u ) + p 3 ( u ) v 1 ( u ) , (97) and for all ( x, s ) ∈ S 2 we hav e ∆ V ≤ − β ∥ s ∥ ∞ ε + √ ∥ s ∥ ∞ − η ( u ) 2 γ 0 ˆ r 2 1 ∥ y ∥ 2 ∞ ρ 2 ( ˆ r 1 ∥ y ∥ ∞ ) ε + √ ˆ r 4 ∥ y ∥ 2+2 q ∞ + ˆ r 5 ∥ y ∥ 2 ∞ 2 + p 2 ( u ) + p 3 ( u ) v 2 ( u ) , (98) with S 1 and S 2 as defined in (68), S 1 ∪ S 2 = R d × R d ≥ 0 , and where y = x − x ∗ . W e first highlight here that p 2 , p 3 ∈ K ∞ . Moreov er , the second term on the right hand side of (97) and (98) are negati ve definite in y ∈ R d , and they are scaled by η ( u ) . Therefore, if η ( u ) increases, the conv ergence speed of algorithm (3) does not decrease and potentially could increase. On the other hand, the term p 2 ( u ) + p 3 ( u ) v 2 ( u ) also increases and induces a larger error floor . Hence, one observes a trade-off between con ver gence speed and con ver - gence accurac y . Intuitively , the con vergence speed appears to increase with η ( u ) , howe ver , a rigorous analysis of this effect is beyond the scope of this paper and is left for future work. Remark 3 . The main result of this paper establishes global asymptotic stability and ISS for any u ( · ) ∈ U ≥ 0 of algorithm (3). In the follo wing we loosely discuss local exponential stability for quadratic objectiv e functions based on the ISS- L yapunov function. For the sake of simplicity , let x ∗ = 0 . Further , let f ( x ) : = x ⊤ Qx , with Q ≻ 0 . W e consider the case of u ( t ) = 0 for all t ≥ 0 . W e will now analyze the ISS- L yapunov function (5) and its difference by using (66) under the assumption that ∥ s ∥ ∞ ≤ ε and ∥ x ∥ ∞ ≤ ε . By using first order T aylor expansion of h , defined in (6), around ( x, s ) = (0 , 0) , we get for (5) V ( x, s ) = γ 0 x ⊤ Qx + 2 γ 1 3 x ⊤ Qx 3 2 + d X i =1 s i ε + O s 3 2 i . In the following, we use inequalities resulting from the strong con ve xity of f , as well as norm inequalities, to deriv e a lower bound of type ∥ g ∥ ∞ ≥ ˆ r 1 ∥ x ∥ ∞ , with ˆ r 1 > 0 . Recall ρ 1 and ρ 2 from Step 2.3 in the proof of Theorem 1. Since ρ 1 ( · , 0) ≥ γ 0 holds on [0 , ∞ ) , and further , by using first order T aylor expansion of ρ 2 around ( x, s ) = (0 , 0) , it follows from (66) that ∆ V ≤ − ˆ τ 1 ∥ s ∥ ∞ − ˆ τ 2 ∥ x ∥ 2 ∞ − ˆ τ 3 ∥ x ∥ 3 ∞ + O ( ∥ x ∥ 4 ∞ ) for ∥ x ∥ ∞ ≤ ε and ∥ s ∥ ∞ ≤ ε , and where ˆ τ 1 , ˆ τ 2 and ˆ τ 3 are some positive constants. By using further norm inequalities and suitable algebraic manipulations, we get ∆ V ≤ − ˆ τ 4 V ( x, s ) + O ( ∥ x ∥ 4 ) + O ∥ s ∥ 3 2 1 , where ˆ τ 4 is some suitable positiv e constant. Thus, it seems that one can establish local exponential stability with the established ISS L yapunov function, corroborating the results in [11]. Howe ver , a rigorous analysis is beyond the scope of this work. Remark 4 . In principle, there is a lot of flexibility to extend the proof to more general step size rules. For example, one could adapt algorithm (3) to use dif ferent step size rules for individual components i ∈ 1 : d , i.e., η ( u i ( · )) : = η 0 + u i ( · ) , where u i ( · ) ∈ U ≥ 0 . In that case, Theorem 1 would still hold and the proof would only require minor changes. The most noticeable change would be that χ V in (96) would then depend on ∥ u ∥ ∞ . Furthermore, one could consider algorithms with a gener- alized step size rule ϕ : R d × R d ≥ 0 → (0 , ∞ ) of form s + i = (1 − β ) s i + β ( ∇ i f ( x )) 2 , x + i = x i − η ( u ) ϕ ( ∇ i f ( x ) ,s i ) ∇ i f ( x ) , and study the stability of this class of algorithms under various conditions on ϕ . I V . C O N C L U S I O N S A N D O U T L O O K In this paper we hav e established that the RMSProp algorithm is globally asymptotically stable for a suitable constant step size and input-to-state stable with respect to any time-varying bounded step size rule. This step size robustness property is beneficial when tuning the algorithm since one can experiment with arbitrary bounded step size sequences and balance con ver gence rate versus con ver gence accuracy . For future work, for example, one can analyze the stability and robustness of accelerated algorithms like Adam by a suitable modification of the ISS L yapunov function presented here. R E F E R E N C E S [1] A. Défossez, L. Bottou, F . Bach, and N. Usunier , “A simple con- ver gence proof of Adam and Adagrad, ” T ransactions on Machine Learning Researc h , 2022. [2] J. Duchi, E. Hazan, and Y . Singer, “Adaptiv e subgradient methods for online learning and stochastic optimization, ” Journal of Machine Learning Researc h , vol. 12, pp. 2121–2159, 2011. [3] T . Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, ” COURSERA: Neural networks for machine learning , 2012. [4] D. P . Kingma and J. Ba, “Adam: A method for stochastic opti- mization, ” arXiv pr eprint arXiv:1412.6980 , 2014. Published as a conference paper at ICLR 2015. [5] R. Abdulkadirov , P . Lyakho v , and N. Nagornov , “Surve y of optimiza- tion algorithms in modern neural networks, ” Mathematics , vol. 11, pp. 2466–2502, 2023. [6] X. He, F . Xue, X. Ren, and Y . Y ou, “Large-scale deep learning optimizations: A comprehensive survey , ” arXiv pr eprint arXiv:2111.00856 , 2021. [7] B. T . Polyak, “Some methods of speeding up the con vergence of iter- ation methods, ” USSR Computational Mathematics and Mathematical Physics , vol. 4, pp. 1–17, 1964. [8] C. W . Scherer and C. Ebenbauer , “ A tutorial on conve x design of opti- mization algorithms by integral quadratic constraints, ” Annual Revie w of Control, Robotics, and Autonomous Systems , vol. 9, pp. 12.1–12.28, 2025. [9] C. Chen, L. Shen, F . Zou, and W . Liu, “T owards practical Adam: Non- con vexity , conv ergence theory , and mini-batch acceleration, ” Journal of Machine Learning Researc h , vol. 23, pp. 1–47, 2022. [10] F . Zou, L. Shen, Z. Jie, W . Zhang, and W . Liu, “A sufficient condition for conver gences of Adam and RMSProp, ” in Proceedings of the 32nd IEEE/CVF Confer ence on Computer V ision and P attern Recognition , pp. 11127–11135, 2019. [11] S. Dereich, A. Jentzen, and A. Riekert, “Sharp higher order con ver- gence rates for the Adam optimizer, ” arXiv pr eprint arXiv:2504.19426 , 2025. [12] B. Bensaid, G. Poëtte, and R. Turpault, “An Abstract L yapunov Control Optimizer: Local Stabilization and Global Con vergence, ” arXiv preprint arXiv:2407.01019 , 2024. [13] A. Barakat and P . Bianchi, “Conv ergence rates of a momentum algo- rithm with bounded adapti ve step size for noncon vex optimization, ” in Proceedings of the 12th Asian Conference on Machine Learning , pp. 225–240, 2020. [14] B. Bensaid, G. Poëtte, and R. T urpault, “Conv ergence of the iterates for momentum and RMSProp for local smooth functions: Adaptation is the key, ” arXiv preprint , 2024. [15] B. Bensaid, G. Poëtte, and R. Turpault, “Deterministic Neural Net- works Optimization from a Continuous and Energy Point of Vie w, ” Journal of Scientific Computing , vol. 96, no. 14, 2023. [16] C. Heredia, “Modeling Adagrad, RMSProp, and Adam with Integro- Differential equations, ” arXiv preprint , 2024. [17] S. Dereich, R. Graeber , A. Jentzen, and A. Riekert, “ Asymptotic stabil- ity properties and a priori bounds for Adam and other gradient descent optimization methods, ” arXiv preprint , 2025. [18] C. M. Kellett, “A compendium of comparison function results, ” Mathematics of Contr ol, Signals, and Systems , vol. 26, pp. 339–374, 2014. [19] Y . Nesterov , Lectur es on Con vex Optimization , v ol. 137 of Springer Optimization and Its Applications . Springer Science & Business Media, 2 ed., 2018. [20] Z.-P . Jiang and Y . W ang, “Input-to-state stability for discrete-time nonlinear systems, ” Automatica , vol. 37, pp. 857–869, 2001. [21] A. N. Michel, L. Hou, and D. Liu, Stability of dynamical systems . Systems & Control: Foundations & Applications, Springer , 2 ed., 2015. [22] H. K. Khalil, Nonlinear systems . Prentice Hall, 3 ed., 2002. [23] C. Ma, L. Wu, and W . E, “ A qualitative study of the dynamic behavior for adapti ve gradient algorithms, ” in Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference , vol. 145, pp. 671–692, PMLR, 2022.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment