Approximate Dynamic Programming for Degradation-aware Market Participation of Battery Energy Storage Systems: Bridging Market and Degradation Timescales

1 Approximate Dynamic Programming for De gradation-a ware Market P articipation of Battery Ener gy Storage Systems: Bridging Market and De gradation T imescales Flemming Holtorf and Sungho Shin Abstract —W e present an appr oximate dynamic programming framework for designing degradation-aware market participa- tion policies f or battery energy storage systems. The approach employs a tailored value function approximation that reduces the state space to state of charge and battery health, while per- forming dynamic pr ogramming along a pseudo-time axis encoded by state of health. This formulation enables an ofﬂine/online computation split that separates long-term degradation dynamics (months to years) from short-term market dynamics (seconds to minutes)—a timescale mismatch that renders conventional predicti ve contr ol and dynamic pr ogramming approaches com- putationally intractable. The main computational effort occurs ofﬂine, where the value function is approximated via coarse- grained backward induction along the health dimension. Online decisions then reduce to a real-time tractable one-step predictive control problem guided by the precomputed value function. This decoupling allows the integration of high-ﬁdelity physics- informed degradation models without sacriﬁcing real-time feasi- bility . Backtests on historical market data show that the resulting policy outperforms several benchmark strategies with optimized hyperparameters. I . I N T R O D U CT I O N While battery energy storage systems (BESSs) have gen- erated substantial rev enue for in vestors in recent years, in- creasing storage deployment fundamentally alters the market conditions that initially created these rev enue opportunities. In particular , as storage penetration rises, the very mechanisms that enable proﬁtability—energy arbitrage and ancillary ser- vices provision—tend to weaken. Previous studies have shown that the arbitrage and ancillary service value of storage de- clines with higher penetration le vels because additional storage capacity dampens price volatility and reduces scarcity ev ents, thereby compressing spreads within these market segments [ 4 , 8 , 15 ]. Empirical observations from real markets reinforce this pattern. As ancillary service markets approach saturation and wholesale price spreads narrow , reported annual rev enues for 2-hour BESS assets hav e fallen by nearly 50% in major markets, including ERCO T , CAISO, and the UK [ 11 ]. These trends indicate that margins in BESS business models are increasingly under pressure, making it essential to extract maximum lifetime value thr ough optimized operation . T o maximize their lifetime v alue, the market participation policy for BESSs must strike a delicate balance between This work was supported by the MIT Energy Initiative Seed Grant Program and the MIT Research Support Committee Funds. F . Holtorf and S. Shin are with the Department of Chemical Engineer- ing, Massachusetts Institute of T echnology , Cambridge, MA, USA (e-mail: holtorf@mit.edu; sushin@mit.edu). immediate market rew ards and the cumulative impact of usage on battery state of health (SoH). F or example, aggressive operation that exploits short-term price ﬂuctuations may ac- celerate battery degradation, eroding long-term proﬁtability . Similarly , a conservati ve strategy that prioritizes battery health may miss lucrative market opportunities (e.g., scarcity e vents), leading to suboptimal returns. Navigating this trade-off is further compounded by the following factors: 1) Market uncertainty: Prices and grid conditions e volv e rapidly and partially unpredictably [ 6 ]. In addition, a signiﬁcant portion of annual battery rev enues is typically concentrated within a small number of highly volatile periods; for instance, two-hour batteries in ERCO T and Australia generated nearly half of their total revenues during the top 10 % of days [ 11 ]. 2) Complex degradation dynamics: Battery aging is gov- erned by intricate electrochemical processes whose rates depend nonlinearly on operating conditions and usage patterns [ 25 , 10 , 16 ]. High-ﬁdelity models capturing the lithium-ion battery degradation mechanisms at the individual cell le vel, such as solid-electrolyte interphase growth and lithium plating, are available in the litera- ture [ 29 , 31 , 9 ], but these models are characterized by a large intrinsic state-space dimension and their integration into real-time control frameworks is hindered by their computational complexity . 3) Separation of timescales: Battery state of charge (SoC) and ancillary services/real-time energy market dynamics occur on the order of seconds to hours, while degrada- tion unfolds ov er months or years, presenting a massive separation of timescales. These factors make the design of degradation-aware partic- ipation policies for BESSs in wholesale electricity markets a stochastic, nonlinear , and multiscale problem in nature. Capturing long-term degradation with sufﬁciently high-ﬁdelity physico-chemical battery models leads to a simultaneous explosion in the number of necessary decision epochs and state-space dimension, rendering off-the-shelve computational framew orks, such as model predicti ve control (MPC), com- putationally intractable. In addition, though dynamic pro- gramming (DP) provides a theoretically sound framew ork for the sequential decision-making under uncertainty required for optimal market participation of BESSs, its application to degradation-a ware control is hindered by the delayed reward structure: operational actions yield immediate returns (realized 2 within minutes to hours), whereas costs incurred by degrada- tion manifest much later (over years). Furthermore, approx- imate dynamic programming (ADP) based on neural value function approximators and reinforcement learning methods— which rely on short-horizon feedback between actions and rew ards—struggle to resolve this delay effecti vely and require exceedingly costly , long-horizon rollouts for policy iteration. This motiv ates the following question: How can we bridge the timescale gap between short- term market ﬂuctuations and long-term degr adation behavior in a computationally tractable market par- ticipation policy for BESSs? In this paper, we propose a tailored ADP frame work that turns the separation of timescales from curse to advantage. By performing the DP induction along a pseudo-time axis deﬁned by the monotonically decreasing battery SoH, our formulation naturally decouples long-term degradation from fast market and grid dynamics, enabling a tractable of ﬂine and online computation split. In the ofﬂine phase, we approximate the value function via coarse-grained backward induction along the health dimension using high-ﬁdelity physicochemical bat- tery models. T o keep the state dimension tractable, we perform a state-space reduction by projecting the internal battery state onto the SoH and SoC dimensions. In the online phase, market participation decisions are made with a lightweight, one-step MPC problem guided by the precomputed value function. In this way , we can incorporate accurate, physics-informed degra- dation models without compromising real-time computational feasibility . Backtests on historical mark et data demonstrate that our framew ork outperforms a range of benchmark strategies while maintaining a computational burden that is tolerable for real-time deployment. Related work: Prior work on de gradation-aware operation of grid batteries mostly adopt optimization-embeddable aging surrogates—typically conv ex or piecewise-linear cycle-aging costs—to retain MILP/LP structure and the associated prac- tical tractability . In contrast, physics-based electrochemical models are rarely used directly to decide market participation due to their large state dimension and stiff dynamics. A widely used approach is to represent degradation as an explicit operational cost that depends on cycle characteristics (e.g., depth of discharge) and sometimes rate effects. Xu et al. [ 41 ] propose a piecewise-linear cycle-aging cost that approximates the underlying c ycle-aging mechanism and can be incorporated into standard market clearing/dispatch formulations. Their key modeling step is to translate cycling into a marginal aging cost curve and then embed that curve into an optimization problem for participation in energy and reserve markets [ 41 ]. In a related direction, P admanabhan et al. [ 26 ] de velop a BESS operational cost model that explicitly accounts for degradation, with cost components parameterized by depth of discharge and discharge rate, and then deriv e a bid/of fer structure for co-optimized ener gy and spinning reserve markets that internalizes this degradation-dependent operating cost in the market participation problem [ 26 ]. Foggo and Y u [ 13 ] revisit lifetime valuation under cycle degradation and propose an approximate (co-optimizable) degradation model to reduce the value loss caused by cycling; the degradation surrogate is constructed from empirical aging data, primarily based on depth of discharge, to remain computationally con venient for co-optimization with operational decisions ov er long hori- zons [ 13 ]. Sorourifar et al. [ 32 ] propose a multiscale linear programming framework that simultaneously optimizes sizing, replacement, and market participation ov er horizons ranging from minutes to years while explicitly representing irre versible capacity loss due to degradation; the resulting formulation reaches very large scale (millions of variables and constraints) but remains solvable due to a deliberate choice of a linear degradation model [ 32 ]; a simpliﬁed mileage-based model is used to capture degradation. Contributions: W e present an ADP frame work that ex- ploits the separation of timescales between degradation and market/grid dynamics to ofﬂoad the computational burden of policy optimization to an of ﬂine phase, enabling the incorporation of high-ﬁdelity , physics-informed degradation models without compromising real-time feasibility . Existing degradation-a ware market participation and dispatch frame- works predominantly rely on (piece wise-)linear cycle-aging costs or empirical stress-factor models to retain tractability [ 41 , 26 , 13 , 1 , 32 , 42 ]. In contrast, high-ﬁdelity physics-informed degradation models (e.g., porous-electrode or single particle models with explicit solid electrolyte interphase (SEI) growth) remain difﬁcult to integrate into market participation because they substantially increase the state dimension and require ﬁne time resolution. Our frame work targets precisely this barrier by exploiting the separation of timescales within a tailored ADP formulation. W e demonstrate that with our framew ork, we can incorporate accurate, physics-informed degradation models into the policy design without compromising real- time computational feasibility . W e validate the efﬁcacy of our framew ork through backtests on historical market data, showing that it outperforms a range of benchmarks (at least 10% improvement in cumulati ve returns) while maintaining computational efﬁciency suitable for real-time deployment. Or ganization: The remainder of this paper is organized as follo ws. Section II formulates a model for the optimal participation of degrading battery assets in uncertain electricity markets. Section III re vie ws the recursive DP solution for this model problem and lays the foundation for our ADP framew ork, which is introduced in Section IV , including both the of ﬂine value function approximation and the online policy implementation. Section V compares the performance of our framew ork against baseline heuristics in backtests on historical market data, and Section VI offers concluding remarks. Notation: W e assume throughout that all variables are real vector -v alued and ﬁnite-dimensional unless stated other- wise. For brevity , we use the shorthand notation g ( x ) ≤ 0 for component-wise non-positivity of a vector -valued function g : R n → R m at x . Furthermore, we denote the Jacobian of g with respect to x ev aluated at y by ∇ x g ( y ) . Finally , we use J n K and J n K 0 as shorthands for the integer ranges { 1 , 2 , . . . , n } and { 0 , 1 , . . . , n } , respectiv ely . 3 I I . P R O B L E M S E T T I N G : D E G R A D A T I O N - A WAR E M A R K E T P A RT I C I P AT I O N This section introduces a mathematical abstraction for the problem of optimal participation of degrading battery assets in uncertain electricity markets. W e ﬁrst present a model for the relev ant market signals and battery dynamics, followed by a description of the operational constraints, returns, and degradation dynamics that govern the decision-making prob- lem. For the sake of generality , we present the model in a way that is agnostic to the speciﬁc market segment (e.g., energy arbitrage, frequency regulation) and battery model (e.g., empirical, physics-based) under consideration. Section V then presents a concrete instantiation of the model for the case of frequency regulation market participation of lithium- ion batteries using a high-ﬁdelity physics-based battery model. A. Market Signals and Battery Dynamics W e begin by proposing a mathematical abstraction for the decision-making problem underpinning the participation of degrading BESS in wholesale electricity markets. W e assume that market participation decisions must be made in real- time at ﬁxed bidding intervals that substantially exceed the timescale of relev ant battery and grid dynamics. This assump- tion aligns with regulations in large electricity markets such as ERCO T , PJM, and CAISO, where participation decisions are made at increments of ﬁ ve to sixty minutes; in contrast, battery charging dynamics (e.g., current and voltage ﬂuctuations) and relev ant grid signals (e.g., frequency regulation signals) vary on the order of seconds. Formally and without loss of generality , we assume that bidding intervals are an integer n multiple of the characteristic timescale of the relev ant battery and grid dynamics. Accordingly , we abstract the battery dynamics as a discrete- time dynamical system ev olving on the nested discrete time- axis illustrated in Figure 1 : x k i = f ( x k i − 1 , ξ k i − 1 , u k ) , i ∈ J n K , k ∈ J N K 0 x k 0 = x k − 1 n , k ∈ J N K . (1) Here, x k i ∈ R d x and ξ k i ∈ R d ξ are the internal battery state and the relev ant exogenous grid signals at the i th time increment during the k th bidding interval, respectiv ely . W e introduce the shorthand notation x k = ( x k 0 , x k 1 , . . . , x k n ) and ξ k = ( ξ k 0 , ξ k 1 , . . . , ξ k n ) for notational brevity . The exogeneous signals ( ξ 0 , ξ 1 , . . . , ξ N ) are considered random variables with known joint distribution Ξ . The market participation decisions u k ∈ R d u represent power and ancillary service commitments for the k th bidding interv al throughout which they remain constant. The internal battery state, grid and market signals, and participation decisions determine the trajectory of the internal battery state as governed by the battery dynamics f : R d x × R d ξ × R d u → R d x . B. Operational Constraints The safe and legal operation of BESSs dictates compli- ance with a number of technological and market regulation constraints. Such constraints include, for instance, obeying T ime 0 1 2 n − 1 . . . 1 1 2 n − 1 . . . 2 N . . . bidding interval timescale of battery/grid dynamics Fig. 1: Nested discrete time axis for market and battery/grid dynamics. maximum (dis)charging rates and avoidance of deep discharge or overchar ging while ensuring to meet any made market com- mitments. In the following, we assume that these constraints can be cast as inequalities that in volv e jointly the internal battery state trajectory x k and participation decision u k during the k th bidding interval; formally , g ( x k , u k ) ≤ 0 , k ∈ J N K 0 , (2) where g : R ( n +1) d x × R d u → R d g is a vector -valued function that encodes the relev ant constraints. C. Returns W e assume that mark et participation in each bidding interv al is remunerated by returns r ( u k , ξ k 0 ) , where r : R d u × R d ξ → R is a function that encodes the market remuneration scheme and ξ k 0 is the market and grid information av ailable at the time of bidding. W e assume that returns depend only on the commitments made u k and the prices of electricity and ancillary services at the time of bidding, i.e., ξ k 0 . Remark 1. In real electricity markets, r eturns depend techni- cally not only on market and grid information that is available at the time of bidding b ut ar e subject to small uncertainties; for instance, the r emuneration for committed r e gulation capacity is typically determined by not only the clearing price, but also the actual re gulation signal during the bidding interval, which ar e not known exactly at the time of bidding. However , these uncertainties can be effectively marginalized out by using the expected r eturns (typically simply determined by expected prices), conditioned on the information available at the time of bidding, i.e., r ( u k , ξ k 0 ) = E ξ k | ξ k 0  ˜ r ( u k , ξ k )  , where ˜ r : R d u × R ( n +1) d ξ → R is a function that encodes the r emuneration scheme based on the actual market and grid information during the bidding interval. Thus, the assumption that the r eturns depend only on the information available at the time of bidding is not restrictive and can be easily r elaxed if necessary . D. Battery Degr adation The overall rev enue potential of a BESS is naturally tied to the number of bidding cycles it can participate in and is there- fore affected by battery degradation. Moreover , as observed in experiments and predicted by mechanistic physico-chemical battery models, degradation rates and battery lifetimes vary dramatically depending on usage patterns [ 16 , 18 , 10 ]. A central question for rev enue-maximizing BESS management thus becomes how to optimally trade off momentary returns 4 against diminishing future revenue potential due to usage- induced lifetime reduction. T o incorporate this aspect of BESS management, we assume a battery reaches its end of life when its health—a function of the internal battery state typically measured in terms of remaining charge capacity—decays to a giv en threshold h min ; formally , we deﬁne the bidding cycle at which end of life is reached as N EOL = max { k ∈ N 0 : h ( x k n ) ≥ h min } , where h : R d x → R is a function that maps the internal battery state to a scalar health metric. Note that N EOL depends on the trajectory of the internal battery state, which in turn depends on the market and grid signals and the participation decisions. W e make the following assumption on battery health decay . Assumption 1. Ther e exists a minimum decay rate ϵ > 0 such that h ◦ f ( x, ξ , u ) ≤ h ( x ) − ϵ holds for all exo genous market signals ξ , along all state trajectories, and for all participation decisions admissible under (2) . This assumption aligns both with the understanding of battery aging mechanisms on a microscopic lev el [ 17 , 25 ] and empirical ﬁndings that support the notion calendar aging— irrev ersible battery health decay in the absence of usage [ 21 ]. Crucially , under Assumption 1 , the battery has ﬁnite life N EOL ≤ h ( x 0 0 ) − h min nϵ . I I I . D Y N A M I C P RO G R A M M I N G F O R M U L A T I O N A. V alue Function and Optimal P olicy The optimal trade-off between short- and long-term rev enue potential is concisely encoded in the value function V ( x 0 0 , ξ 0 0 ) , which quantiﬁes the expected remaining lifetime value of the BESS conditioned on its current internal state x 0 0 and av ailable market and grid information ξ 0 0 . Formally , the value function is characterized by the polic y optimization problem of the following form: V ( x 0 0 , ξ 0 0 ) : = sup π ∈ Π E ( x,ξ,u,N EOL ) ∼ P π " N EOL X k =0 γ k r ( u k , ξ k 0 ) # . (3) This problem seeks an admissible participation policy π that maximizes the expected (discounted) cumulativ e return until the battery’ s end of life. Here, P π denotes the joint distribution of the trajectories of process (1) and the conditional market uncertainty ξ ∼ Ξ | ξ 0 0 under the policy u k = π ( x k 0 , ξ k 0 ) . The set of admissible participation policies Π is deﬁned implicitly by the operational constraints (2) , which must hold under any realization of the uncertain market signal ξ ∈ supp Ξ | ξ 0 0 . The discount factor γ ∈ [0 , 1] reﬂects the decision-maker’ s pref- erence for immediate short-term revenue over delayed long- term returns. It is worth noting that strict discounting ( γ < 1 ) is not required to ensure ﬁniteness of the value function due to the ﬁnite battery life guaranteed by Assumption 1 . One can observe that V ( x 0 0 , ξ 0 0 ) = 0 if h ( x 0 0 ) ≤ h min ; that is, the battery value vanishes when end of life is reached. B. Dynamic Pr ogramming P erspective The value function deﬁned in (3) admits the recursi ve DP characterization: V ( x 0 0 , ξ 0 0 ) = sup u r ( u, ξ 0 0 ) + γ E ξ ∼ Ξ | ξ 0 0  V ( x n ( ξ ) , ξ 1 0 )  s.t.          x 0 ( ξ ) = x 0 0 x i ( ξ ) = f ( x i ( ξ ) , ξ 0 i , u ) , i ∈ J n K g ( x ( ξ ) , u ) ≤ 0 h ( x n ( ξ )) ≥ h min , ξ ∈ supp Ξ | ξ 0 0 , (4) which states that the optimal value at the current state and market information can be obtained by optimizing over the current participation decision u to maximize the immediate return plus the expected discounted value of the battery at the beginning following bidding interval. This battery state is determined by the battery dynamics and the realization of the uncertain market signal ξ . The optimal policy π ∗ can be deriv ed from this characterization by selecting, for each state and market information, a participation decision that attains the supremum in (4) . I V . A P R A C TI C A L A D P F R A M E W O R K E X P L O I T I N G S E PAR AT I O N O F T I M E S C A L E S Although the DP recursion in (4) can in principle be solved approximately with various DP algorithms, it remains computationally intractable for accurate physics-based battery degradation models. These mechanistic models describe the microscopic transport phenomena underlying batteries through coupled partial differential-algebraic equations (PD AEs), re- sulting in state space dimensions far beyond those for which DP is practically viable. Reinforcement learning methods that have proven remark- ably effecti ve for policy optimization in high-dimensional spaces [ 2 , 40 ] are likewise challenged by this application. Assessing degradation-induced trade-offs requires extremely long horizons [ 38 , 20 , 36 , 37 , 7 ], while accurate battery and grid modeling demands ﬁne temporal resolution. The combined requirement of ﬁne time resolution for accurate simulation and extremely long look-ahead horizons renders policy roll-outs exceedingly expensi ve and gradient ev aluation instable. T o overcome these challenges, we propose an ADP routine that av oids DP in a high-dimensional state space and explicitly exploits the separation between the timescales of the relev ant operational battery and grid dynamics and battery degradation. Instead of performing backward induction along real time, we perform coarse-grained backward induction along a pseudo- time axis encoded by battery health. This leads to a tractable ofﬂine/online computation split: in the of ﬂine phase an approx- imate v alue function is constructed on a joint battery health and state of charge grid while the online phase determines market participation decisions via a real-time tractable one- step MPC problem guided by the precomputed value function proxy as terminal cost. 5 A. State Space Reduction and Lifting W e approximate the value function as depending only on SoC q 0 and SoH h 0 and hence seek a proxy b V ( q 0 , h 0 , ξ 0 0 ) for the true value function V ( x 0 0 , ξ 0 0 ) ; for the sake of brevity , we omit the explicit dependence of q 0 and h 0 on the full internal battery state x 0 0 . This approximation is motiv ated by the fact that the combination of SoC and SoH provides an effecti ve compression of the full internal battery state from the perspec- tiv e of economic decision-making: SoC determines short-term rev enue potential by encoding immediately av ailable charge and discharge capacity , while SoH reﬂects long-term revenue potential as an indicator of remaining useful life. The full internal battery state – represented by microscopic quantities such as spatial concentration proﬁles – ev olves on shorter timescales and, due to the underactuated nature of batteries, within a more constrained set of dynamics than the deriv ed summary quantities SoC and SoH. The economic impact of the ﬁne-grained microscopic nuances is therefore only relev ant for accurate prediction of the battery dynamics on horizons that are well resolved within a single bidding interval. As such, their impact is effecti vely captured by the one-step MPC problem used for policy implementation as discussed in Section IV -C . T o enable our computational scheme, we assume the exis- tence of a lifting map l ( q 0 , h 0 ) that maps SoC and SoH to an approximate but consistent full internal battery state x 0 0 . Such lifting maps are standard tools for coarse-grained simulation of multiscale systems [ 19 ] which has previously been applied to accelerate cycle aging simulations for high ﬁdelity battery models in [ 35 ]. In particular, we demonstrate in Section V that for the widely adopted single particle model with degrading solid-electrolyte interphase (SEI) layer gro wth, such a lifting map arises naturally from standard approximations. B. Ofﬂine Phase: Coarse-gr ained Backwar d Induction Along The Battery Health Axis As battery health decays monotonically (cf. Assumption 1 ), it admits the interpretation as a pseudo-time variable that naturally encodes the timescale of battery degradation. W e exploit this feature to compute the value function proxy b V ( q 0 , h 0 , ξ 0 0 ) in a recursi ve manner by applying backward induction along this battery health axis rather than real time. Recalling the DP recursion, point-wise ev aluation of the value function proxy at ( q 0 , h 0 , ξ 0 0 ) can be approximated with the following MPC problem. b V ( q 0 , h 0 , ξ 0 0 ) : = sup x,u R ( x, u, ξ 0 0 ) (MPC( q 0 , h 0 , ξ 0 0 ; φ )) s.t.          x 0 ( ξ ) = l ( q 0 , h 0 ) x i ( ξ ) = f ( x i ( ξ ) , ξ 0 i , u ) i ∈ J n K g ( x ( ξ ) , u ) ≤ 0 h ( x n ( ξ )) ≥ h min , ξ ∈ supp Ξ | ξ 0 0 where R ( x, u, ξ 0 0 ) = r ( u, ξ 0 0 ) + γ E ξ ∼ Ξ | ξ 0 0 [ φ ( q ( x n ( ξ )) , h ( x n ( ξ )) , ξ )] . with a suitable smooth terminal cost surrogate φ for the value function V . It is important to emphasize that under sample aver - age approximation MPC is a ﬁnite nonlinear program that is readily solved by off-the-shelf primal-dual interior point solvers [ 39 ]. Moreover , under mild regularity conditions, solv- ing (MPC( q 0 , h 0 , ξ 0 0 ; φ )) with a primal-dual solver not only enables point-wise e valuation of b V ( q 0 , h 0 , ξ 0 0 ) but also of ∂ b V ∂ h ( q 0 , h 0 , ξ 0 0 ) via the following sensitivity result. Proposition 1. Let ( x ∗ , u ∗ , λ ∗ ) be a primal-dual feasible point of (MPC( q 0 , h 0 , ξ 0 0 ; φ )) corr esponding to the unique global optimum. Further , assume that ( x ∗ , u ∗ , λ ∗ ) satisﬁes the str ong second-or der sufﬁcient condition, linear independence con- straint qualiﬁcation, and strict complementary slackness [ 3 , see Chapter 3 for detailed deﬁnitions]. Denote by λ 0 ( ξ ) the Lagrang e multiplier for the constraints x 0 ( ξ ) = l ( q 0 , h 0 ) , for ξ ∈ supp Ξ | ξ 0 0 . Then, ∂ b V ∂ h ( q 0 , h 0 , ξ 0 0 ) = − X ξ ∈ supp Ξ | ξ 0 0 λ ∗ 0 ( ξ ) ⊤ ∇ h l ( q 0 , h 0 ) . Pr oof. The result follows immediately from [ 3 , Proposition 3.3.3] and the chain rule. The ﬁnal ingredient for the ofﬂine phase of our ADP framew ork is a tailored regression scheme that bridges the gap between point-wise e valuations of the value function proxy (and its health gradient) via (MPC( q 0 , h 0 , ξ 0 0 ; φ )) and the terminal cost φ required for solving (MPC( q 0 , h 0 , ξ 0 0 ; φ )) in the ﬁrst place. T o break this circular dependenc y , assume for the moment that point-wise ev aluations of the value function proxy and its health deriv ative are av ailable at a given battery health h 0 for a ﬁnite SoC grid Q and a ﬁnite set M of representativ e scenarios for the market uncertainty in a single bidding interval, i.e., ( b V ( q , h, ξ ) , ∂ b V ∂ h ( q , h, ξ 0 ) ! : ( q , h ) ∈ Q × M ) . Then, we may approximate b V locally around h 0 , for a ﬁxed market uncertainty ξ ∈ M , and for SoC q by φ via φ ( q , h, ξ ) = b V θ 1 ( ξ ) ( q ) + d b V θ 2 ( ξ ) ( q )( h − h 0 ) , where b V θ 1 ( ξ ) and d b V θ 2 ( ξ ) are parametric function approxima- tors determined via the regression problems            θ 1 ( ξ ) ∈ arg min θ X q ∈Q ℓ 0  b V θ ( q ) , b V ( q , h 0 , ξ )  θ 2 ( ξ ) ∈ arg min θ X q ∈Q ℓ 1 d b V θ ( q ) , ∂ b V ∂ h ( q , h 0 , ξ ) ! , ξ ∈ M (5) with suitable regression losses ℓ 0 and ℓ 1 . Putting all pieces together , a practical backward induction algorithm is obtained as follows. In addition to the SoC grid Q and market uncertainty scenarios M , consider a battery health grid H = ( h 1 , . . . , h m ) that covers the entire battery life with h min < h 1 < · · · < h m . Recalling that the battery v alue v anishes at h min , in the initial induction step, (MPC( q 0 , h 0 , ξ 0 0 ; φ )) is solved for battery health h 1 and all 6 Algorithm 1 V alue function approximator Input: Battery health grid H = ( h 1 , h 2 , . . . , h m ) such that h min < h 1 < · · · < h m , SoC grid Q = ( q min , q 1 , . . . , q max ) , market scenarios M , para- metric function approximators b V θ and d b V θ , regres- sion losses ℓ 0 and ℓ 1 , discount factor γ ∈ [0 , 1] . Output: V alue function approximation b V ( q , h, ξ ) for all ( q , h, ξ ) ∈ Q × H × M . Set φ ( q , h, ξ ) ≡ 0 for h 0 ∈ H do for ( q 0 , ξ 0 ) ∈ Q × M do ▷ in parallel Compute primal-dual feasible point ( x ∗ , u ∗ , λ ∗ ) of (MPC( q 0 , h 0 , ξ 0 0 ; φ )) and set b V ( q 0 , h 0 , ξ 0 0 ) ← R ( x ∗ , u ∗ , ξ 0 0 ) ∂ b V ∂ h ( q 0 , h 0 , ξ 0 0 ) ← − X ξ ∈ supp Ξ | ξ 0 0 λ ∗ 0 ( ξ ) ⊤ ∇ h l ( q 0 , h 0 ) end for for ξ ∈ M do ▷ in parallel Solve regression problems (5) and set φ ( q , h, ξ ) ← b V θ 1 ( ξ ) ( q ) + d b V θ 2 ( ξ ) ( q )( h − h 0 ) end for end for combinations of ( q 0 , ξ 0 0 ) ∈ Q × M with the terminal cost φ ≡ 0 . This yields data { b V ( h 1 , q , ξ 0 0 ) : ( q , ξ ) ∈ Q × M} for the regression problems (5) , which in turn furnish a local approximation of b V around h 1 . Updating the terminal cost φ with this approximation, we can proceed inductively along the SoH grid H . Algorithm 1 summarizes this procedure in detail. Finally , a few remarks are in order . First, we emphasize that the linear dependence of φ on SoH is a deliberate choice that explicitly reﬂects the pseudo-time nature of battery health and enables tractable backward induction in practice. In particular , the error in this approximation can be controlled directly by the size of the backward induction steps. Second, we note that while the computational cost of our ADP is substantial, it can be signiﬁcantly offset by exploiting the embarrassingly parallel nature of sev eral substeps as indicated in Algorithm 1 . Moreov er , all computations can be done upfront and ofﬂine, hence do not impose limits on real-time tractability . C. Online Phase: One-step model predictive contr ol During real-time operation, we decide the market partici- pation by solving (MPC( q 0 , h 0 , ξ 0 0 ; φ )) ahead of each bidding interval. The terminal cost φ is constructed by the same re gres- sion scheme described in the previous section, approximating the value function around the health grid point closest to the current battery health. After sample a verage approximation, this reduces (MPC( q 0 , h 0 , ξ 0 0 ; φ )) again to a tractable nonlinear program that can be solved ef ﬁciently with standard primal- dual methods [ 39 ]. The computational burden of long-horizon degradation planning is thus deferred entirely to the ofﬂine phase, while online decision-making reduces to a one-step MPC problem. V . N U M E R I C A L E X P E R I M E N T S A. Battery Model W e assume that the BESS is composed of a large number of identical lithium ion battery cells, each described by a single particle model (SPM) [ 24 ]. The SPM abstracts both positiv e and negati ve battery electrodes as collections of iden- tical spherical particles. It provides a middle ground between crude reservoir -based models and the more detailed Doyle- Fuller-Ne wman pseudo-two-dimensional model. Importantly , its complexity-accuracy trade-off has been shown to be partic- ularly suitable for control and optimization applications [ 5 , 27 , 22 ] and allows straightforward integration of physicochemical battery degradation models [ 5 , 22 ]. The SPM describes the transport phenomena gov erning the macroscopic behavior of lithium ion batteries in terms of a system of spatio-temporal PD AEs. Charging and discharging are modeled by a lithium intercalation reaction that occurs at the electrode particle surfaces follo wed by radial Fickian- diffusi ve lithium transport within the particles. The governing transport equations for the concentration of lithium ions in electrode i ∈ { + , −} are thus              ∂ c i ∂ t + D i r 2 ∂ ∂ r  r 2 ∂ c i ∂ r  = 0 , ( r, t ) ∈ (0 , R i ) × (0 , t ] − D i ∂ c i ∂ r     r = R i = j i and ∂ c i ∂ r     r =0 = 0 , t ∈ (0 , t ] c i ( r , 0) = c i, 0 ( r ) , r ∈ [0 , R i ] , where c i is the lithium ion concentration, D i is the diffusion coefﬁcient, R i is the particle radius for electrode i , and j i is the lithium ion ﬂux at the particle surface due to the intercalation reaction. The ﬁrst equation describes radial dif fusion of lithium ions within the electrode particles, while the second and third equations specify the boundary and initial conditions, respectiv ely . The lithium ion ﬂux due to the intercalation reaction at the particle surface is assumed to obe y Butler-V olmer kinetics, i.e., j i = k i q ( c max i − c i ( R i , t )) c i ( R i , t ) c e sinh RT 2 F η i , where k i is the reaction rate constant, c max i is the maximum lithium ion concentration in electrode i , η i is the electrochem- ical overpotential driving the intercalation reaction, c e is the electrolyte concentration, T is the battery cell temperature, and F and R refer to the univ ersal Faraday and ideal gas con- stants, respectiv ely . Here, the intercalation reaction is solely driv en by the electrochemical overpotential η i . The electrolyte concentration c e is assumed constant while the potential and concentration gradients in the electrolyte phase as well as any axial concentration gradients are neglected. Moreover , the battery cell temperature T is assumed constant at 25 ◦ C in line with the typically ample cooling capacity of grid-scale BESS. As the primary mechanism for degradation, we consider gradual growth of the SEI layer [ 28 ]. The SEI layer grows via 7 a side reaction at the particle surface of the negati ve electrode. The thickness of the formed ﬁlm, w SEI , follows the dynamics d w SEI dt = j SEI ρ SEI , t ∈ (0 , t ] , w SEI (0) = w SEI , 0 , where j SEI and ρ SEI denote the ionic surface ﬂux due to side reaction and the molar density of the SEI layer . Since SEI layer growth acts as a sink for lithium ions, it leads to capacity fade d h d t = − F j SEI , t ∈ (0 , t ] , and h (0) = h 0 . The ionic surface ﬂux for the SEI-forming side reaction follows j SEI = k SEI exp  − F RT η SEI  . Importantly , it is easy to see that j SEI > 0 and thus d h d t < 0 , implying this model is predictiv e of calendar aging with capacity fade acting as a strictly monotonically decreasing pseudo-time for the control problem. The overpotentials driving intercalation and SEI formation reactions are giv en by η + = ϕ + − η OCV +  c + ( t, R + ) c max +  η − = ϕ − + ∆ ϕ SEI − η OCV −  c − ( t, R + ) c max −  η SEI = ϕ − + ∆ ϕ SEI − η eq SEI where ∆ ϕ SEI is the potential drop across the SEI layer ∆ ϕ SEI =  R 0 SEI + w S E I σ SEI  i app for an applied current with density i app at the negati ve electrode. R 0 SEI + w S EI σ SEI quantiﬁes the thickness-dependent re- sistance of the SEI layer . η OC V − and η OC V + are the open-circuit potentials for the negati ve and positi ve electrodes, respectiv ely . Their functional form is given by empirical relationships ﬁtted to experimental data [ 5 ]. Similarly , η eq SEI denotes the constant equilibrium potential for the SEI-forming side reaction. Finally , the ionic ﬂuxes must satisfy the closure condition i app F = − ( j SEI + j − ) = S + S − j + , where S + /S − denote the relati ve surface areas of the posi- tiv e/negati ve electrodes. All model parameters are taken from [ 14 ] and the open- circuit potentials from [ 5 ]. The parameters are for A123 Sys- tems’ ANR26650M1 cells with LiFePO 4 cathode as suitable for high-power applications such as grid-scale BESS. B. Operational Constraints For safe operation, we impose bound constraints on the voltage across the cell U = ϕ + − ϕ − and the applied current S − i app , i.e., U ∈ [2 . 4 V , 3 . 65 V] and S − i app ∈ [ − 10 C , 10 C ] . T o av oid ov ercharging and deep discharging, we constrain the SoC c − /c max − to remain in the healthy window [0 . 3 , 0 . 9] . C. Numerical Appr oximation & Lifting Map In order to apply Algorithm 1 , the PD AE system un- derpinning the SPM must be approximated by an algebraic equation system as per (1) . T o that end, we employ two distinct approximations. First, we make a parabolic approximation to the radial con- centration proﬁles in the electrode particles, which is known to introduce minimal error for moderate applied currents [ 33 , 34 ]. Under this approximation, the spatio-temporal PD AE gov erning dif fusi ve transport inside the particles reduces to differential algebraic relations governing the dynamics of the av erage and surface particle concentrations, i.e.,    d¯ c i dt = 3 R i j i , t ∈ (0 , t ] ¯ c i (0) = ¯ c i, 0 and c i ( R i , · ) = ¯ c i − 5 R i D i j i for i ∈ { + , −} . Second, the resulting differential-algebraic equation system is approximated by a discrete-time dynamical system by 5 th order-accurate Gauss-Radau collocation on ﬁnite elements in time. The time grid is chosen uniformly with 10 s increments, coinciding with the timescale of available grid frequency data. A conv enient consequence of the employed parabolic ap- proximation in the SPM is that it gi ves rise to a straightforward lifting map for use in Algorithm 1 . Under this approximation, the SoC q is entirely characterized by the average lithium concentration in the negati ve electrode particles, i.e., q = ¯ c − c tot − , where the normalization c tot − deriv es from the total amount of cyclable lithium: c tot − = ¯ c − + V + V − ¯ c + . Furthermore, the dynamics of the total amount of cyclable lithium, battery health, and SEI-layer thickness are governed exclusi vely by the SEI-forming molar ﬂux acting as the sole sink for lithium ions, j SEI = ρ SEI d w SEI d t = − 1 F d h d t = − 1 S − d c tot − d t . It follows that, gi ven a known reference state of battery health h ref , SEI layer thickness w SEI , ref , and total amount of cyclable lithium c tot − , ref , kno wledge of the current SoC q and battery health h can be lifted to the entire microscopic battery state via l : ( q , h ) 7→                    c tot − = c tot − , ref + S − F ( h − h ref ) , ¯ c − = q c tot − , ¯ c + =  1 − V − V + q  c tot − w SEI = w SEI , ref − 1 F ρ SEI ( h − h ref ) and the algebraic relationships provided in the previous sec- tion. The reference point is arbitrary , but in most applications is con veniently derived from the nominal battery capacity . W e ﬁnally wish to emphasize that this lifting incurs no errors beyond the parabolic approximation to the radial concentration proﬁles in the electrode particles. Moreover , this lifting map can be easily adapted to more complicated models such as the SPM with electrolyte [ 23 ] or the pseudo two-dimensional Doyle-Fuller -Ne wman model [ 9 ]. 8 D. Uncertainty Model W e consider an empirical, scenario-based model for the con- ditional market uncertainty Ξ | ξ 0 0 . Detailed uncertainty model- ing is a grand challenge in its own right and hence beyond the scope of this contribution. T o probe the utility of the proposed ADP frame work, we assume that an accurate forecast of the frequency regulation signal is a vailable for the imminent bidding interval. Realization of the market uncertainty for subsequent bidding intervals is assumed independent of prior uncertainty realizations. For each stage, we consider a ﬁnite number of uncertainty realizations M . For the numerical experiments presented in this section, the scenarios are constructed to match the empirical marginals of the joint distribution of electricity prices, regulation capacity prices, and the frequency regulation signal of the French electricity market for the calendar year 2021. The data was re- triev ed from the R ´ eseau de T ransport d’ ´ Electricit ´ e (frequency regulation capacity prices and grid frequency signal) [ 30 ] and the European Network for Transmission System Operators for Electricity (day-ahead market prices) [ 12 ]. W e use day-ahead market prices as a proxy for real-time market prices since, to the best of our knowledge, there is no publicly av ailable record of the latter . T o construct scenarios that yield accurate approximations for the expectations with respect to the empirical marginals, we choose the electricity and frequency regulation capacity price scenarios and their probabilities to coincide with Gauss- Legendre quadrature nodes and weights under the empiri- cal in verse cumulati ve distribution function. The empirical marginals and resultant scenarios are shown in Figure 2 . For generation of the frequency regulation signal scenarios, we use a rank-5 Karhunen-Lo ` eve (KL) decomposition which captures 99 % of the empirical autocovariance of the signal within each bidding interval; see Figure 3 . The scenarios are constructed from the 5-dimensional second-order Gauss- Hermite quadrature nodes and weights for the KL expansion coefﬁcients. The underlying empirical data and resultant sce- narios are shown in Figure 3 . The individual scenarios for frequency regulation signal as well as electricity and frequency regulation capacity prices are combined as a tensor product (implicitly assuming indepen- dence) to form a representative set M of 1138 joint market scenarios considered per stage in the uncertainty model. E. Closed-Loop Simulations In this section, we compare the performance of policies deriv ed via the proposed ADP framework against two heuris- tic benchmark policies. All three considered policies deri ve online market participation decisions from solving the one- step predictiv e problem (MPC( q 0 , h 0 , ξ 0 0 ; φ )) ahead of each bidding interval but with different choices for the terminal cost φ . The benchmark heuristics deploy a simpliﬁed terminal cost, penalizing capacity fade and SoC deviation with constant factors. Speciﬁcally , we consider the choices φ ( h, q , ξ ) = α ( h ref − h ) , ξ ∈ supp Ξ | ξ 0 0 (6) 100 0 100 200 300 400 500 600 electricity price [EUR/MW] 0.000 0.002 0.004 0.006 0.008 0.010 0.012 empirical density 0 25 50 75 100 125 150 175 200 fr equency r egulation capacity price [EUR/MW] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 empirical density Fig. 2: Ground truth and representativ e scenarios for electricity and frequency regulation prices. and φ ( h, q , ξ ) = α ( h ref − h ) + 1 { 0 . 5 } ( q ) , ξ ∈ supp Ξ | ξ 0 0 (7) as benchmarks. For both terminal costs, α represents a tunable degradation penalty intended to approximate the marginal value of capacity loss. The con ve x indicator 1 { 0 . 5 } ( q ) in Heuristic (7) constraints the battery SoC in addition to remain at 50 % at the end of each bidding interval to av oid myopic discharge. The reference battery health state h ref is chosen to coincide with the battery health at the beginning of the bidding interval. In particular Heuristic (7) has been demonstrated in [ 5 ] to notably outperform a range of other market participation strategies, including strategies based on coarser battery models and longer look-ahead horizons, in backtests on historical market data. In contrast to the above benchmarks, our method deriv es the terminal cost from point-wise approximate value function and sensitivity e valuations computed ofﬂine via Algorithm 1 . Algorithm 1 is applied on a coarse health grid H consisting of 100 points equidistantly spaced throughout the battery life span, i.e., from 100 % to 80 % nominal capacity . Similarly , we use a uniform SoC grid Q covering the feasible range in increments of 5 % . The battery and market uncertainty models are chosen as described in Sections V -A and V -D , respectiv ely . For online deployment, the terminal cost is approximated using the regression problems (5) on the value function and sensitivity information at the health grid point closest to the current battery health. For the regression problems (5) , we use 9 0 25 50 75 100 125 150 175 200 1 0 3 1 0 2 1 0 1 1 0 0 autocovariance spectrum 0 50 100 150 200 250 300 350 inde x 1 0 9 1 0 8 1 0 7 1 0 6 1 0 5 1 0 4 1 0 3 1 0 2 1 0 1 1 0 0 r el. une xplained variance 0 500 1000 1500 2000 2500 3000 3500 time [s] 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 fr equency r egulation signal [-] Fig. 3: Uncertainty model for frequency regulation signal. T op: Spectrum and normalized unexplained variance of low-rank approximation to the empirical autocov ariance matrix of the hourly frequency regulation signals. Bottom: Hourly frequency regulation signal (gray) and representativ e scenarios (black). the same parametric function approximators, b V θ ( q ) = d b V θ ( q ) =  1 q q 2 e − q e q  θ , alongside mean-squared error loss in both the ofﬂine and online phases. T o disentangle the accuracy of the uncertainty and battery model from errors introduced in the proposed value function approximation scheme, we ﬁrst compare closed-loop simu- lations under the three different participation policies using the market uncertainty and battery model as ground truths. Figure 4 compares the cumulative returns attained by the dif- ferent policies for an exhausti ve range of degradation penalties in (6) and (7) . The proposed ADP framework outperforms the heuristics (6) and (7) by more than 50 % and 25 % , respectiv ely . This experiment further underlines the computational merits of the proposed frame work. While Algorithm 1 could be ex ecuted using merely 100 grid points along the battery health pseudo-time axis, the battery life spans more than 500 days ( 12 , 000 bidding interv als) in real time under the resultant policy . As a consequence, the value function approximation itself required signiﬁcantly less computational time than a single closed-loop simulation of the entire battery life. On a standard HPC-compute node with 96 Intel ® Xeon ® Platinum 8160 CPU cores and 376 GB of RAM, executing Algorithm 1 required approximately 8 h in contrast to more than 24 h needed for ev aluation of a single closed-loop simulation. It could therefore be of independent interest to apply the ideas put forward in Algorithm 1 to hyperparameter tuning for heuristics akin to (6) and (7) . F . Backtests on Historical Market Data Finally , we ev aluate both the benchmark and the proposed participation policies by simulating their closed-loop perfor- mance on historical market data from 2022 onward. (W e recall that the uncertainty model was constructed using data from 2021 exclusiv ely .) Figure 4 compares the cumulati ve returns of the proposed participation policy with the heuristics (6) and (7) . Despite the crude uncertainty model and without any hyperparameter tun- ing, our proposed participation policy outperforms the bench- mark heuristics by more than 25 % and 10 % , respectively . In addition, our ﬁndings support that the one-step predictiv e problems (MPC( q 0 , h 0 , ξ 0 0 ; φ )) , which determine online mark et participation decisions, are indeed reliably tractable for real- time deployment of the derived policy . On the same HPC node used for the ofﬂine phase of our ADP framew ork, 99 . 9 % of MPC problem instances were solved in under 4 . 6 s , with an average solve time of 1 . 6 s . In more than 12 , 000 bidding intervals, only a single signiﬁcant outlier due to numerical issues was observed, requiring a solve time of 479 s . V I . C O N C L U S I O N S W e presented an ADP frame work for the design of degradation-a ware participation policies for BESS in real- time electricity markets. By applying the DP recursion along a pseudo-time axis deﬁned by battery health, our approach naturally lev erages the separation of timescales between slow degradation processes and fast operational dynamics. The result is a tractable of ﬂine/online computation scheme for making degradation-aware market participation decisions in real time, yet based on high-ﬁdelity physicochemical battery models. Our experiments demonstrate that the proposed framework giv es rise to policies that effecti vely balance short-term proﬁts with the long-term impacts of market participation on battery longevity and can lead to superior market performance when compared to common heuristics. These ﬁndings support the notion that degradation-a ware market participation of BESSs may indeed not only be more sustainabile but also increase proﬁtability . Future work should focus on composing the presented framew ork with more sophisticated uncertainty models that capture the temporal correlation and prev alent periodicity in electricity grid and market signals. R E F E R E N C E S [1] Kyriaki Antoniadou-Plytaria et al. “Market-Based Energy Manage- ment Model of a Building Microgrid Considering Battery Degra- dation”. In: IEEE T ransactions on Smart Grid 12.2 (Mar. 2021), pp. 1794–1804. [2] Kai Arulkumaran et al. “Deep reinforcement learning: A brief sur- vey”. In: IEEE Signal Processing Magazine 34.6 (2017), pp. 26–38. [3] Dimitri P . Bertsekas. Nonlinear Pr ogramming . 2nd. Belmont, Mas- sachusetts: Athena Scientiﬁc, 1999. 10 d a ys 0 2 0 0 4 0 0 6 0 0 8 0 0 r e v e n u e [ E U R / c e l l ] 0 2 0 4 0 d e g r a d a t i o n p en a l t y [ EU R / Ah ] 0 5 0 1 0 0 w it h a ppr o xi m ate v a l ue fun c tion d a ys 0 25 0 5 0 0 r e v e n u e [ E U R / c e l l ] 0 2 0 4 0 d e g r a d a t i o n p en a l t y [ EU R / Ah ] 0 5 0 1 0 0 w it h a ppr o xi m ate v a l ue fun c tion d a ys 0 2 0 0 4 0 0 6 00 8 0 0 r e v e n u e [ E U R / c e l l ] 0 2 0 4 0 d e g r a d a t i o n p en a l t y [ EU R / Ah ] 0 5 0 1 0 0 w it h a ppr o xi m ate v a l ue fun c tion d a ys 0 2 00 4 00 60 0 r e v e n u e [ E U R / c e l l ] 0 2 0 4 0 d e g r a d a t i o n p en a l t y [ EU R / Ah ] 0 5 0 1 0 0 w it h a ppr o xi m ate v a l ue fun c tion Fig. 4: Cumulative returns under approximate value function-informed participation policy versus degradation penalty heuristic. The lines end where the battery reaches its end of life. Left to right: closed-loop simulations with ground truth uncertainty model, compared against Heuristic (6) ; closed-loop simulations, compared against Heuristic (7) ; backtests on historical market data, compared against Heuristic (6) ; backtests on historical market data, compared against Heuristic (7) . [4] T om Brijs et al. “Quantifying Electricity Storage Arbitrage Oppor- tunities in Short-T erm Electricity Markets in the CWE Region”. In: Journal of Energy Storage 25 (Oct. 1, 2019), p. 100899. [5] Y ankai Cao et al. “Multiscale model predictive control of battery sys- tems for frequency regulation markets using physics-based models”. In: Journal of Process Control 90 (2020), pp. 46–55. [6] Eike Cramer et al. “Multivariate probabilistic forecasting of intraday electricity prices using normalizing ﬂows”. In: Applied Energy 346 (2023), p. 121370. [7] Peter Dayan and Geoffrey E Hinton. “Feudal reinforcement learning”. In: Advances in neural information pr ocessing systems 5 (1992). [8] Paul Denholm et al. “The Potential for Battery Energy Storage to Provide Peaking Capacity in the United States”. In: Rene wable Energy 151 (May 1, 2020), pp. 1269–1277. [9] Marc Doyle, Thomas F Fuller, and John Newman. “Modeling of galvanostatic charge and discharge of the lithium/polymer/insertion cell”. In: J ournal of the Electr ochemical society 140.6 (1993), p. 1526. [10] Eric J Dufek et al. “Battery calendar aging and machine learning”. In: Joule 6.7 (2022), pp. 1363–1367. [11] S&P Global Energy . Declining Costs, Shifting Revenues: Evolving Business Case for Battery Storage . S&P Global Energy . 2025. U R L : https : / / www. spglobal . com / energy / en / news - research / blog / energy - transition / 110625 - declining - costs - shifting - revenues - evolving - business- case- for - battery- storage . [12] European Network of T ransmission System Operators for Electricity (ENTSO-E). ENTSO-E T ranspar ency Platform: Central collection and publication of pan-Eur opean electricity market data . Accessed: 2025-05-24. 2025. [13] Brandon Foggo and Nanpeng Y u. “Improved Battery Storage V al- uation Through Degradation Reduction”. In: IEEE Tr ansactions on Smart Grid 9.6 (Nov . 2018), pp. 5721–5732. [14] Joel C Forman et al. “Genetic identiﬁcation and ﬁsher identiﬁability analysis of the Doyle–Fuller–Ne wman model from experimental cy- cling of a LiFePO4 cell”. In: Journal of P ower Sources 210 (2012), pp. 263–275. [15] A. Frazier et al. Stor age Futur es Study: Economic P otential of Diurnal Storag e in the U.S. P ower Sector . NREL/TP-6A20-77449, 1785688, MainId:27385. May 1, 2021, NREL/TP-6A20–77449, 1785688, MainId:27385. [16] Alexis Geslin et al. “Dynamic cycling enhances battery lifetime”. In: Natur e Ener gy 10.2 (2025), pp. 172–180. [17] Sev erin Lukas Hahn et al. “Quantitative validation of calendar aging models for lithium-ion batteries”. In: Journal of P ower Sources 400 (2018), pp. 402–414. [18] Peter Keil et al. “Calendar aging of lithium-ion batteries”. In: Journal of The Electr ochemical Society 163.9 (2016), A1872. [19] Ioannis G Ke vrekidis and Giovanni Samaey. “Equation-free multi- scale computation: Algorithms and applications”. In: Annual r eview of physical chemistry 60.2009 (2009), pp. 321–344. [20] T ejas D Kulkarni et al. “Hierarchical deep reinforcement learning: In- tegrating temporal abstraction and intrinsic motiv ation”. In: Advances in neural information pr ocessing systems 29 (2016). [21] V ivek N Lam et al. “A decade of insights: Delving into calendar aging trends and implications”. In: Joule 9.1 (2025). [22] Jie Li et al. “A single particle model with chemical/mechanical degradation physics for lithium ion battery State of Health (SOH) estimation”. In: Applied energy 212 (2018), pp. 1178–1190. [23] Scott J Moura, Jeffre y L Stein, and Hosam K Fathy. “Battery-health conscious power management in plug-in hybrid electric vehicles via electrochemical modeling and stochastic control”. In: IEEE T ransac- tions on Contr ol Systems T echnology 21.3 (2012), pp. 679–694. [24] Gang Ning and Branko N Popov. “Cycle life modeling of lithium-ion batteries”. In: Journal of The Electr ochemical Society 151.10 (2004), A1584. [25] Simon EJ O’Kane et al. “Lithium-ion battery degradation: how to model it”. In: Physical Chemistry Chemical Physics 24.13 (2022), pp. 7909–7922. [26] Nitin Padmanabhan, Mohamed Ahmed, and Kankar Bhattacharya. “Battery Energy Storage Systems in Energy and Reserve Mark ets”. In: IEEE T ransactions on P ower Systems 35.1 (Jan. 2020), pp. 215–226. [27] HE Perez et al. “Optimal charging of li-ion batteries via a single particle model with electrolyte and thermal dynamics”. In: Journal of The Electr ochemical Society 164.7 (2017), A1679. [28] P Ramadass et al. “Development of ﬁrst principles capacity fade model for Li-ion cells”. In: Journal of the Electroc hemical Society 151.2 (2004), A196. [29] Jorn M Reniers, Grietus Mulder, and David A Howe y. “Revie w and performance comparison of mechanical-chemical degradation models for lithium-ion batteries”. In: Journal of The Electroc hemical Society 166.14 (2019), A3189–A3200. [30] R ´ eseau de Transport d’ ´ Electricit ´ e. RTE Services P ortal: Access to Network Connection, Market Data, APIs and Customer Services . Accessed: 2025-05-24. 2025. [31] Raymond B Smith and Martin Z Bazant. “Multiphase porous electrode theory”. In: Journal of The Electroc hemical Society 164.11 (2017), E3291. [32] Farshud Sorourifar, V ictor M. Zavala, and Alexander W . Dowling. “Integrated Multiscale Design, Market P articipation, and Replacement Strategies for Battery Energy Storage Systems”. In: IEEE T ransac- tions on Sustainable Energy 11.1 (Jan. 2020), pp. 84–92. [33] V enkat R Subramanian, James A Ritter, and Ralph E White. “ Ap- proximate solutions for galvanostatic discharge of spherical particles I. Constant diffusion coefﬁcient”. In: Journal of the Electroc hemical Society 148.11 (2001), E444. [34] V enkat R Subramanian, Deepak T apriyal, and Ralph E White. “A boundary condition for porous electrodes”. In: Electr ochemical and solid-state letters 7.9 (2004), A259. [35] V alentin Sulzer et al. “ Accelerated battery lifetime simulations using adaptiv e inter-cycle extrapolation algorithm”. In: Journal of The Electr ochemical Society 168.12 (2021), p. 120531. [36] Richard S Sutton, Doina Precup, and Satinder Singh. “Between MDPs and semi-MDPs: A framew ork for temporal abstraction in reinforcement learning”. In: Artiﬁcial intelligence 112.1-2 (1999), pp. 181–211. [37] Richard Stuart Sutton. T emporal credit assignment in reinfor cement learning . Univ ersity of Massachusetts Amherst, 1984. [38] Alexander Sasha V ezhnevets et al. “Feudal networks for hierarchical reinforcement learning”. In: International conference on machine learning . PMLR. 2017, pp. 3540–3549. [39] Andreas W ¨ achter and Lorenz T . Biegler. “On the Implementation of an Interior-Point Filter Line-Search Algorithm for Large-Scale Non- linear Programming”. In: Mathematical Progr amming 106.1 (Mar. 1, 2006), pp. 25–57. 11 [40] Ding W ang et al. “Recent progress in reinforcement learning and adaptiv e dynamic programming for advanced control applications”. In: IEEE/CAA Journal of Automatica Sinica 11.1 (2023), pp. 18–36. [41] Bolun Xu et al. “Factoring the Cycle Aging Cost of Batteries Participating in Electricity Markets”. In: IEEE T ransactions on P ower Systems 33.2 (Mar . 2018), pp. 2248–2259. [42] Pei Y ong, Fei Guo, and Zhifang Y ang. “An Age-Dependent Battery Energy Storage Degradation Model for Power System Operations”. In: IEEE T ransactions on P ower Systems 40.1 (Jan. 2025), pp. 1188– 1191.

Approximate Dynamic Programming for Degradation-aware Market Participation of Battery Energy Storage Systems: Bridging Market and Degradation Timescales

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment