Differentiable Power-Flow Optimization

Graphical Abstract Diﬀeren tiable P o w er-Flo w Optimization Muhammed Öz, Jasmin Hörter, Kaleb Phipps, Charlotte Debus, A c him Streit, Markus Götz Highligh ts Diﬀeren tiable P o w er-Flo w Optimization Muhammed Öz, Jasmin Hörter, Kaleb Phipps, Charlotte Debus, A c him Streit, Markus Götz • W e use the concept of diﬀerentiable sim ulations on AC-pow er-ﬂo w op- timizations. • Our Diﬀer entiable Power-ﬂow (DPF) metho d is scalable in terms of run time and memory usage. • The calculation of m ultiple p o w er-ﬂo ws can b e accelerated via the GPU, making use of sparse tensors and batc hing. • When calculating p o w er-ﬂo ws for time series, DPF is able to eﬀectiv ely reuse previous solutions for faster con v ergence. • DPF can b e used as a fast screening metho d whic h can b e accelerated b y early stopping. • Easy implemen tation using common automatic diﬀeren tiation libraries lik e PyT orc h. Diﬀeren tiable P o w er-Flo w Optimization Muhammed Öz a, ∗ , Jasmin Hörter a , Kaleb Phipps a , Charlotte Debus a , A c him Streit a , Markus Götz a,b, ∗ a Scientiﬁc Computing Center (SCC), Karlsruhe Institute of T e chnolo gy (KIT), 76344 Eggenstein-L e op oldshafen, Germany b Helmholtz AI, Karlsruhe Abstract With the rise of renewable energy sources and their high v ariabilit y in genera- tion, the management of pow er grids b ecomes increasingly complex and com- putationally demanding. Con v entional A C-p o wer-ﬂo w simulations, which use the Newton-Raphson (NR) metho d, suﬀer from p oor scalability , making them impractical for emerging use cases such as join t transmission–distribution mo deling and global grid analysis. A t the same time, purely data-driv en surrogate models lac k ph ysical guaran tees and may violate fundamen tal con- strain ts. In this w ork, w e prop ose Diﬀer entiable Power-Flow (DPF) , a re- form ulation of the AC p o w er-ﬂow problem as a diﬀeren tiable sim ulation. DPF enables end-to-end gradient propagation from the physical p o w er mis- matc hes to the underlying sim ulation parameters, thereby allo wing these parameters to b e identiﬁed eﬃcien tly using gradient-based optimization. W e demonstrate that DPF pro vides a scalable alternativ e to NR b y lev eraging GPU acceleration, sparse tensor represen tations, and batc hing capabilities a v ailable in mo dern machine-learning framew orks such as PyT orc h. DPF is esp ecially suited as a to ol for time-series analyses due to its eﬃcien t reuse of previous solutions, for N-1 con tingency-analyses due to its abilit y to pro cess cases in batches, and as a screening to ol by leveraging its sp eed and early stopping capabilit y . The co de is a v ailable in the authors’ co de rep ository . ∗ Please address corresp ondence to Muhammed Öz or Markus Götz Email addr esses: muhammed.oez@kit.edu (Muhammed Öz), jasmin.hoerter@kit.edu (Jasmin Hörter), kaleb.phipps@kit.edu (Kaleb Phipps), charlotte.debus@kit.edu (Charlotte Debus), achim.streit@kit.edu (A chim Streit), markus.goetz@kit.edu (Markus Götz) Keywor ds: Energy Grid, P o wer-ﬂo w, Optimization, Sto chastic Gradient Descen t, Automatic Diﬀeren tiation 1. In tro duction Energy grids are among the most critical infrastructures of mo dern so- ciet y and the economy . They m ust b e contin uously monitored and actively managed to deliv er electricity reliably while main taining system stabilit y . A t the same time, p o w er grids are inheren tly fragile: the failure of a single com- p onen t can trigger cascading eﬀects, as the redistribution of electrical ﬂo ws ma y o v erload other parts of the netw ork ( Cam us , 2021 ; Emma P inedo and Latona , 2025 ; Lemos et al. , 2021 ). One example of an o v erload-induced cas- cade is the recen t black out that aﬀected parts of F rance, Spain, and Portugal on April 28 th 2025 ( Emma Pinedo and Latona , 2025 ). Main taining grid stability b ecomes particularly challenging due to the enormous scale of mo dern p ow er systems. The North American p o wer grid alone encompasses more than 600,000 miles of transmission lines and 5.5 mil- lion miles of distribution lines ( Orf , 2023 ). Benc hmark transmission mo dels illustrate this complexity: the Europ ean transmission net work represented b y the c ase9241p e gase test case con tains roughly ten thousand high-voltage buses ( Josz et al. , 2016 ), while the North American Eastern Interconnec- tion comprises appro ximately eigh ty thousand buses ( Birc hﬁeld and Ov erby e , 2023 ). These ﬁgures represen t only the transmission la y er. When distribu- tion netw orks are included, whic h is increasingly necessary due to the gro wing p enetration of distributed renew able generation, the scale increases dramat- ically . F or example, T exas alone i s estimated to contain roughly 46 million electrical no des when distribution-lev el infrastructure is considered as well ( Mateo et al. , 2024 ). T o ensure reliable op eration of suc h large systems, grid op erators must k eep transmission line ﬂo ws within thermal limits, maintain stable frequen- cies, and ensure that voltages remain within sp eciﬁed b oundaries while con- tin uously balancing electricity supply and demand ( Donnot et al. , 2017 ). A c hieving this b ecomes increasingly diﬃcult as p ow er systems grow in size and complexity and as uncertain ty rises, for example due to w eather-dep endent renew able generation ( Hamann et al. , 2024 ). Sim ulation to ols play a crucial role in the decision-making pro cess b y en- abling the ev aluation of diﬀerent grid states in near real time. F rameworks 2 suc h as Grid2Op ( Donnot , 2020a ) and Pandapower ( Thurner et al. , 2018 ) p erform AC-pow er-ﬂow sim ulations to assess system stability . These sim- ulations enable sev eral imp ortant applications: F or example, grid op erators p erform N-1 c ontingency analyses ( Mitra et al. , 2016 ) to v erify that the grid remains stable if a comp onen t suc h as a transmission line or substation fails. Similarly , time-series simulations are used to ev aluate grid stability under c hanging demand and generation conditions throughout the day . T o sup- p ort suc h applications in op erational settings, p ow er-ﬂow simulations must b e computationally eﬃcien t. Most existing simulation framew orks rely on numerical metho ds suc h as Newton-R aphson (NR) ( Tinney and Hart , 1967 ) to solv e the non-linear A C p o wer-ﬂo w equations. While NR remains the state-of-the-art metho d due to its high accuracy , its computational cost grows rapidly with system size. As a result, NR is not suitable for tackling emerging c hallenges suc h as the global grid ( Chatziv asileiadis et al. , 2013 ) or in tegrated transmission–distribution systems ( Idema et al. , 2013 ) and it is not sui table for expanding contingency analyses b ey ond N-1. Mac hine-learning approac hes ha ve therefore b een explored to accelerate p o wer-ﬂo w computations. A common strategy is to train surrogate mo d- els that approximate the ph ysical simulation directly from data ( Fikri et al. , 2018 ). Such models can pro vide substan tial computational speedups by elim- inating the need to rep eatedly solv e the p o wer-ﬂo w equations. How ever, b ecause they replace the underlying physical mo del with a learned appro x- imation, they may violate fundamental physical constrain ts and struggle to generalize reliably outside the training distribution. T o address these limitations, more recent work seeks to com bine mac hine learning with physical laws rather than replacing the sim ulation en tirely . One prominen t approac h is to incorp orate physical laws directly in to neural- net w ork arc hitectures, resulting in ph ysics-informed neural netw orks ( Donon et al. , 2020 ; Böttcher et al. , 2023 ; Dogoulis et al. , 2025 ). These mo dels im- pro v e ph ysical consistency compared to purely data-driv en surrogates. Nev- ertheless, they still rely on learned approximations of the system dynamics and therefore do not fully preserv e the original ph ysical consistency . This highlights a remaining gap b et ween traditional sim ulation metho ds and mo dern mac hine-learning approac hes. What is needed is a framework that preserves the physical equations without relying on learned appro xima- tions, scales fa vorably to systems with millions of no des, and preferably lev er- ages mo dern hardware capabilities such as GPU acceleration, batc hing, and 3 sparse op erations, and integrates seamlessly with existing mac hine-learning infrastructure. Rather than learning appro ximations of the physical system, one wa y to address these requiremen ts is to mak e the simulation itself compatible with modern machine-learning metho ds ( Barati , 2025 ; Okh uegb e et al. , 2024 ; Costilla-Enriquez et al. , 2020 ). Diﬀeren tiable simulations ( Liang and Lin , 2020 ; Barati , 2025 ) reform ulate the simulation pip eline in a fully diﬀeren- tiable manner, enabling gradien ts to b e computed throughout the en tire pro cess via automatic diﬀerentiation. This preserves the underlying phys- ical equations while enabling the use of eﬃcien t gradient-based optimization algorithms such as A dam ( Kingma and Ba , 2017 ) or sto c hastic gradient de- scen t ( Ruder , 2017 ). In the context of p o w er netw orks, the p ow er-ﬂow prob- lem can therefore b e solved by minimizing violations of the p ow er-balance equation deriv ed from Kirc hhoﬀ ’s curren t law and Ohm’s la w (see Equa- tion ( 1 )). Compared to classical second-order metho ds such as NR , gradien t- based optimization av oids the explicit construction and in version of large Jacobian matrices and therefore scales more fav orably with system size (see Section 5.2 ). At the same time, diﬀeren tiable sim ulations retain ph ysical con- sistency , while naturally leveraging mo dern machine-learning infrastructure, including GPU acceleration, batching of m ultiple sim ulations, and sparse tensor represen tations. In this w ork, we prop ose Diﬀer entiable Power-Flow (DPF) , a reform ula- tion of the AC p o wer-ﬂo w problem as a diﬀerentiable sim ulation. This for- m ulation enables end-to-end gradient propagation from the physical p ow er mismatc hes to the underlying simulation parameters, thereby allo wing these parameters to b e iden tiﬁed eﬃciently using gradien t-based optimization. W e mak e the follo wing contributions: 1. W e introduce DPF , applying the concept of diﬀeren tiable sim ulations to A C p ow er-ﬂow calculations. Using synthetic grid data, we analyze the scaling b eha vior of DPF compared to the classical NR solv er and show that DPF ac hiev es fav orable runtime and memory scaling on large grids. 2. W e ev aluate the accuracy of DPF on standard test systems b y comparing it to NR , a highly accurate second-order solv er, and to the DC p ower-ﬂow appr ox- imation ( Qi et al. , 2012 ), whic h is computationally eﬃcien t but less accurate. Our results sho w that th e solution qualit y of DPF lies b etw een the high ac- curacy of NR and the DC approximation. 3. W e demonstrate the practical applicabilit y of DPF for op erational grid analyses, fo cusing on time-series sim ulations. T o fully leverage the strengths of diﬀeren tiable simulations, our 4 Figure 1: Schematic view of the components of the energy grid and the curren t applications of p ow er ﬂow calculations. implemen tation supp orts batching, GPU acceleration, w arm-start initializa- tion from previous solutions, and sparse tensor representations, whi c h can b e naturally integrated using mo dern machine-learning frameworks such as PyT or ch ( Paszk e et al. , 2019 ). 2. Bac kground This section provides the foundations for our approac h. First, we describ e the classical p ow er-ﬂow equations and show how they can b e expressed as an optimization problem in Section 2.1 . Next, w e brieﬂy introduce gradien t- based optimization and automatic diﬀeren tiation, which enable eﬃcien t gra- dien t computation in diﬀerentiable simulations in Section 2.2 , and ﬁnally , w e motiv ate the use of diﬀerentiable sim ulation tec hniques for p ow er-ﬂow calculations in Section 2.3 . 2.1. Pr oblem F ormulation The pow er grid is a complex system connecting generation and consump- tion through transmission and distribution la yers as illustrated in Figure 1 . It consists of N buses, or nodes, connected via transmission lines represented in the sparse admittance matrix Y bus . Y bus,ij represen ts the admittance, i.e., the 5 recipro cal of the imp edance, b et ween bus i and bus j . S bus,i = P i + iQ i ∈ C , with P i , Q i ∈ R , denotes the complex no dal p ow er injections at bus i . Using the complete AC-pow er-ﬂow mo del, the goal of p ow er-ﬂow sim- ulations is to determine the complex voltages V = | V | ∗ e iθ ∈ C N suc h that the sp eciﬁed complex no dal injections S bus and the calculated injections S calc = V I ∗ = V ( Y bus V ) ∗ matc h. This leads to the non-linear p o wer-balance equation 1 : S bus ! = S calc = V I ∗ = V ( Y bus V ) ∗ . (1) There are three diﬀeren t bus t yp es in a grid: PV-buses (generators), PQ- buses (loads) and the slack bus (generator with adaptable generation such that the p o wer-balance equation is solv able). Dep ending on the bus type, diﬀeren t v ariables among {| V | , θ , P , Q } are sp eciﬁed at bus i : PV-bus: activ e p ow er P i and v oltage magnitude | V i | PQ-bus: activ e p ow er P i and reactiv e p ow er Q i Slac k bus: v oltage V i Computing the missing v ariables is the purp ose of AC-pow er-ﬂo w solvers, while the sp eciﬁed v ariables should not change. Equation ( 2 ) shows the problem without v oltage b oundary conditions: Find { V , P , Q } such that P i = P calc i ∀ i ∈ pv ∪ pq , Q i = Q calc i ∀ i ∈ pq , | V i | = | V i | set ∀ i ∈ pv , θ slack = 0 , V slack = V set slack . (2) This can b e form ulated as an optimization problem: min | V | pq ,θ pv,pq || F ( V ) || 2 = || S bus − V ( Y bus V ) ∗ || 2 s.t. | V i | = | V i | set ∀ i ∈ pv , θ slack = 0 , (3) V slack = V set slack . There is a v ariety of diﬀeren t approaches to solve the optimization problem as listed in Section 3 . One p ossible w a y is to use gradien t-based optimization. 1 See matp ow er-manual , sections 3.6 and 4.1. 6 2.2. Gr adient Desc ent Optimization Optimization problems are commonly expressed as the minimization of an ob jective function f ( θ ) with parameters θ ∈ Θ . The goal is to ﬁnd a mini- mizer θ ∗ = ar g min θ ∈ Θ f ( θ ) . A common approach to do this is to use gradien t descen t ( Andrycho wicz et al. , 2016 ). Starting from an initial parameter vec- tor θ 0 , v anilla gradien t descent iteratively up dates the parameters by mo ving to w ards the negative gradient with a learning rate η : θ t +1 = θ t − η ∇ f ( θ t ) (4) Diﬀeren t metho ds exist to calculate the gradien t ∇ f ( θ t ) , ranging from exact calculations (manual, symbolic and automatic diﬀerentiation) to numerical appro ximations ( K omarov et al. , 2025 ; Baydin et al. , 2018 ). Sym b olic diﬀerentiation w orks on expression trees or expression forests b y applying the rules of calculus, suc h as the p o w er rule, pro duct rule, and c hain rule, to obtain its deriv atives ( Zhang , 2025 ). It can suﬀer from "expres- sion sw ell" generating unnecessarily large expressions ( Zhang , 2025 ; Corliss , 1988 ), but this can b e a voided b y allowing common sub expressions ( Laue , 2022 ). A con v enient and eﬃcient wa y of utilizing symbolic diﬀerentiation is au- tomatic diﬀerentiation (auto diﬀ ). Auto diﬀ follo ws the idea that all numer- ical computations are ultimately comp ositions of a ﬁnite set of elementary op erations for which deriv ativ es are known. These op erations form a com- putational graph in which deriv atives can b e systematically obtained using the chain rule. The essen tial algorithmic principle is dynamic programming: partial deriv atives are stored at each no de and reused, thereby a v oiding re- dundan t computations. T w o main ev aluation strategies exist: forward mo de and rev erse mo de auto diﬀ ( Ba ydin et al. , 2018 ). In forw ard mo de, deriv ativ es are propagated alongside the function ev aluation from inputs to outputs. In rev erse mo de, the function is ﬁrst ev aluated in a forward pass, after whic h partial deriv atives are propagated backw ard from the output to the inputs. The accumulation direction is relev ant. F or ob jective functions with many parameters and a scalar output, rev erse mo de auto diﬀ is particularly eﬃcien t b ecause in termediate partial deriv atives can b e reused during the bac kw ard pass. This prop ert y mak es reverse mo de auto diﬀ w ell suited for large-scale optimization problems and forms the basis of diﬀeren tiable programming framew orks suc h as PyT orch. As an example for reverse mo de auto diﬀ (similar to Flügel et al. ), con- sider the ob jective function y = f ( x 1 ; x 2 ) = g 3 ( g 2 ( g 1 ( x 1 , x 2 ))) ∈ R with the 7 in termediate scalar v alues h i = ( g i ◦ g i − 1 ◦ · · · ◦ g 1 )( x 1 , x 2 ) ∈ R , g i : R → R up to index i ∈ N . Then the partial deriv ative of y with regards to x 1 is ∂ y ∂ x 1 = ∂ y ∂ h 2 ∂ h 2 ∂ h 1 ∂ h 1 ∂ x 1 In a computational graph represen tation, the v ariables h 2 and h 1 corresp ond to internal no des, each holding their partial deriv atives ∂ y ∂ h 2 and ∂ y ∂ h 1 . The key adv an tage of storing the v ariables is the reusability . First ∂ y ∂ h 2 is calculated, then ∂ y ∂ h 1 using the previously stored deriv ative from h 2 . Calculating ∂ y ∂ x 2 afterw ards b ecomes trivial as ∂ y ∂ h 1 is already stored inside the inner no de and only one lo cal m ultiplication is required. 2.3. Diﬀer entiable Simulations Ph ysical sim ulations are a prime use cases for reverse mo de auto diﬀ, esp e- cially for industrial, design, engineering, and rob otics applications ( Newbury et al. , 2024 ). When simulations are expressed in a diﬀerentiable w a y , they are called diﬀeren tiable simulations. F ormulating a problem in this w a y can accelerate sim ulations, but also unlo ck new applications, e.g., learning un- kno wn system parameters or ev aluating their inﬂuence on the sim ulation result b y solving an inv erse problem ( F an et al. , 2020 ). A ccording to Liang and Lin , a goo d diﬀeren tiable sim ulator should b e v ectorization friendly (not con tain fragmen ted op erations during the forw ard pass), GPU friendly , and supp ort sparse op erations due to the lo cality of man y ph ysical computations. F or p o w er ﬂo w sim ulations, all of these criteria are satisﬁed with the help of automatic diﬀeren tiation to ols such as PyT orch. 3. Related W ork A wide range of solutions has b een prop osed to p erform p ow er-ﬂo w cal- culations ( Ala wneh et al. , 2023 ; Sauter et al. , 2017 ). Existing approaches can broadly be categorized in to direct methods, iterativ e numerical metho ds, and data-driv en metho ds. 2 Should be faster than NR in theory and can b e used for screening and large grids but in LightSim2Grid NR using a KLU-solv er is faster due to implementation reasons . 8 T able 1: P o wer-ﬂo w solution metho ds. Abbreviations: DC: DC- Appro ximation, HELM: Holomorphic Embedding Load Flow Metho d, GS: Gauß-Seidel, SA: Successiv e Approximation, NR: Newton-Raphson, FDLF: F ast Decoupled Load Flow, DPF: Diﬀerentiable Po wer-Flo w, GNS: Graph Neural Solv er, K CLNet: Kirc hhoﬀ Curren t La w Net work. Name Description Usage Dir e ct metho ds DC approx. ( Qi et al. , 2012 ) Remo ve nonlinearities and solv e linear system Screening HELM ( T rias , 2012 ) Em b ed p ow er-ﬂow equation in to holomorphic function Guaran teed solution Iter ative metho ds b ase d on ﬁx-p oint-iter ations GS ( Grainger , 1999 ) Rewrite pow er equation as a ﬁx-point iteration and up- date buses sequentially Educational SA ( Giraldo et al. , 2022 ) Rewrite p o wer equation as ﬁx-p oin t iteration and up- date buses simultaneously Educational Iter ative metho ds b ase d on line arization NR ( Tinney and Hart , 1967 ) Jacobian-in version through matrix decomp ositions Op erational FDLF ( Stott and Alsac , 2007 ) Simpliﬁes NR b y remo ving oﬀ-diagonal blo cks from Ja- cobian Screening 2 DPF ( Barati , 2025 ) and us Gradien t-based optimiza- tion, corresp onds to NR with Jacobian-transp ose for squared loss function and simple gradient descen t Screening, large grids Data-driven metho ds - ( Fikri et al. , 2018 ) Multi-la yer perceptron F ast GNS ( Donon et al. , 2020 ) GNN F ast, accurate - ( Böttcher et al. , 2023 ) GNN F ast, accurate K CLNet ( Dogoulis et al. , 2025 ) GNN w. hard constraints F ast, physical 9 3.1. Dir e ct Metho ds Direct metho ds compute the solution voltage in one iteration. The most widely used direct metho d is the DC-approximation ( Qi et al. , 2012 ) which simpliﬁes the AC p o wer-ﬂo w equations to a linear problem by neglecting re- activ e p o wer eﬀects, voltage magnitude v ariations, and line resistances. Due to its computational eﬃciency it is commonly used for screening applications suc h as contingency analysis, although the resulting pow er-ﬂow estimates ma y deviate signiﬁcan tly from the AC solution, with t ypical errors on the order of 20% ( Stott et al. , 2009 ). A more accurate no v el metho d is the Holomorphic Em b edding Load Flow (HELM) metho d ( T rias , 2012 ). It embeds the p ow er equation into a holo- morphic function, constructs and solv es a holomorphic p ow er series, and then ev aluates the series to get the resulting v oltages. While HELM is v ery sta- ble and can b e very accurate, it needs a signiﬁcant amoun t of time to reach accurate results comparable to NR ( Sauter et al. , 2017 ). As a result, HELM in its base form is to o slow to b e used as a screening metho d, for decision to ols and for N − 1 analyses. Other v arian ts such as FFHE ( Chiang et al. , 2017 ) in com bination with NR can achiev e a b etter p erformance ( Huang and Sun , 2023 ), but it remains to b e seen whether a comp etitiv e implementation of HELM can outp erform NR . 3.2. Iter ative Numeric al Metho ds Iterativ e metho ds solv e the pow er-ﬂow equation through rep eated up- dates until conv ergence in con trast to direct or data-driven metho ds. They can use other metho ds as an initialization for b etter stabilit y ( Costilla- Enriquez et al. , 2020 ; Okh uegb e et al. , 2024 ). They can also reuse solutions of similar problems. F or example, Grid2Op reuses previous solutions in time series analyses ( Donnot et al. , 2017 ). The iterative metho ds are based on one of t w o broader concepts: ﬁx-point iterations or linearization. The metho ds based on ﬁx-p oint-iterations are Gauß-Seidel (GS) ( Grainger , 1999 ) and the Successiv e Approximation (SA) metho d ( Giraldo et al. , 2022 ). Both rewrite the p ow er equation as a ﬁx-p oint iteration to rep eatedly up- date v oltages. They diﬀer in their up date order. GS up dates the volt- ages sequentially , bus by bus, and uses the up dated v alues to up date the remaining v oltages, while the Successiv e Appro ximation Metho d up dates the v oltages for all buses in parallel using the old voltage v alues. Both metho ds are rather conceptional and less practically relev an t as they are 10 m uc h slo wer than linearization-based metho ds ( Donnot , 2020b ), esp ecially for larger grids ( Sauter et al. , 2017 ). Linearization-based metho ds approximate the nonlinear equations locally and solve a linear system at eac h iteration. They include Newton-Raphson (NR), F ast Decoupled Load Flow (FDLF) and gradient-based metho ds. The standard op erational approac h is the NR metho d ( Tinney and Hart , 1967 ). In eac h iteration of NR , a linear system ∆ V = J − 1 ( − F ( V )) ⇔ J ∆ V = − F ( V ) is solved. In practice, NR conv erges to accurate solutions in a few itera- tions (lo w single digit) because of its quadratic conv ergence ( Overton , 2017 ). Ho w ever, it can b e slo w when applied to large grids. T o sp eed up NR , it is p ossible to further simplify the Jacobian. The FDLF metho d remov es the oﬀ-diagonal blo c ks. The k ey idea is that the v oltage angle is strongly coupled with the real p o wer and the voltage magnitude is coupled with reactiv e pow er while the cross-couplings are w eak. As a result, not muc h information is lost with the simpliﬁcation. FDLF is suitable as a fast screening metho d. Ho w- ev er, w e do not consider this approach in our comparison as the rep orted times in the LightSim2Grid b enchmark for FDLF are slightly w orse than NR ( Donnot , 2020b ). 3.3. Data-Driven Metho ds Recen t researc h has explored the use of machine learning to appro ximate p o wer-ﬂo w solutions. Early approac hes emplo y ed m ulti-lay er p erceptrons to directly map system states to v oltage solutions ( Fikri et al. , 2018 ). More re- cen t work lev erages graph neural net w orks (GNNs), whic h b etter capture the top ology of p o wer systems ( Donon et al. , 2020 ; Böttcher et al. , 2023 ). The main adv an tage of these models is their low inference time once trained. But they usually ha v e lo w er accuracy and sometimes output ph ysically implau- sible predictions ( Dogoulis et al. , 2025 ). T o prev en t this to a certain degree, ph ysics-informed neural netw orks can b e employ ed to p enalize the physical violations in the loss function (soft-constraints) and pro ject the solutions to ph ysically viable planes (h ard-constrain ts) ( Dogoulis et al. , 2025 ). Ho w ev er, the authors do not rep ort the additional training time for hard-constrain t enforcemen t via pro jections. 11 3.4. Gr adient-b ase d Appr o aches Gradien t-based methods represent an alternativ e class of iterative solvers that ha v e received comparatively limited attention in the p ow er-ﬂow litera- ture. Unlike Newton-based metho ds, gradient-based optimization do es not require explicit Jacobian factorization and therefore yields computationally inexp ensiv e iterations. Ho wev er, they t ypically exhibit linear con v ergence rates ( Garrigos and Gow er , 2024 ), in contrast to the quadratic conv ergence of NR ( Ov erton , 2017 ). Previous w ork has explored gradien t-based tec hniques primarily as auxil- iary to ols, for example to impro v e the robustness of Newton-based solv ers or escap e lo cal minima using sto chastic optimization ( Costilla-Enriquez et al. , 2020 ). More recen tly , Barati ( 2025 ) prop osed a gradien t-based form ulation of the p o w er-ﬂow problem. While their work provides a detailed theoret- ical analysis, the rep orted implemen tations op erate at signiﬁcan tly larger run times (seconds instead of milliseconds for small grids suc h as the case- 118 grid) than optimized p ow er-ﬂo w solv ers. W e b elieve that a fair ex- p erimen t m ust compare the metho ds in their optimal state. In this work, w e inv estigate gradien t-based p o w er-ﬂow optimization within a diﬀeren tiable sim ulation framew ork and ev aluate its practical p erformance using a highly optimized Newton–Raphson implemen tation from LightSim2Grid ( Donnot , 2020b ) as a baseline. Our goal is to identify application scenarios in which dif- feren tiable pow er ﬂo w can complement or outp erform state-of-the-art solvers. 4. The Diﬀeren tiable P o w er-Flo w Metho d W e prop ose the DPF metho d, whic h formulates the A C p o w er-ﬂow prob- lem as a diﬀerentiable simulation. The k ey idea is to express the p o w er- ﬂo w equations as a diﬀerentiable computational graph, enabling the use of gradien t-based optimization to compute the voltage solution while preserving the underlying ph ysical equations. Diﬀeren tiable sim ulations are constructed such that ev ery step of the nu- merical computation is diﬀeren tiable ( Pargmann et al. , 2024 ; Newbury et al. , 2024 ). This allows gradien ts of the simulation output with resp ect to in ter- nal parameters to b e computed eﬃciently using automatic diﬀeren tiation. Mo dern machine-learning frameworks such as PyT orch ( P aszk e et al. , 2019 ) pro vide built-in supp ort for this paradigm and enable scalable implementa- tions that leverage GPU acceleration, batc hing, and sparse tensor op erations. 12 Figure 2: The ﬁgure depicts the op erational graph of the p o wer balance equation. The calculated p ow er S calc (dep ending on the current voltage v ector) and the actual p o wer S bus are formed and their active/reactiv e comp onen ts are used in y calc and y to create the loss function. The green colored v ariables, namely the v oltage magnitude of PV-buses and the voltage angle of PV and PQ buses, are trainable and are used as the voltage solution after training. 13 An illustration of using this approac h on p ow er-ﬂow sim ulations is de- picted in Figure 2 . Using the paradigm of diﬀeren tiable sim ulations in p o w er- ﬂo w studies can b e done in the follo wing wa y: The p o w er-balance equation (Equation ( 1 )) deﬁnes the forward simulation. The complex bus v oltages are represen ted in p olar form as V = | V | e iθ . The unkno wn voltage magnitudes and phase angles are treated as optimiza- tion v ariables, while grid-sp eciﬁc parameters suc h as the sparse admittance matrix Y bus remain ﬁxed. The ob jective function measures the mismatch b et ween the sp eciﬁed and calculated p o wer injections, L ( V ) = 1 N ∥ S bus − V ( Y bus V ) ∗ ∥ 2 , whic h corresp onds to the mean squared violation of the p ow er-balance equa- tion. Minimizing this loss therefore yields v oltages that satisfy the A C pow er- ﬂo w equations. Starting from an initial voltage estimate, the metho d iter- ativ ely performs forw ard and backw ard passes through the computational graph. During the forward pass, PyT orch constructs a dynamic computa- tional graph that records all tensor operations. By inv oking backpropagation in the bac kw ard pass, PyT orch then applies reverse-mode automatic diﬀeren- tiation to tra v erse this graph and compute gradien ts of the loss with resp ect to the trainable voltage v ariables. These gradien ts are subsequently used b y an optimizer to up date the voltage v ector. Detailed pseudo co de for our approac h can b e found in the app endix (Algorithm 1 ). This approac h, by design, follo ws the characteristics of a go o d simulator as deﬁned b y Liang and Lin ( 2020 ): it is vectorization friendly , it is GPU- friendly and it supp orts sparse op erations. 4.1. Implementation Details In our DPF implemen tation (see 1 in the App endix) w e mak e use of the classical PyT orch training lo op 3 . In each iteration, w e conduct a for- w ard pass, calculate the resulting loss, and ﬁnally use PyT orch’s automatic diﬀeren tiation functionalit y to calculate gradients (using the reverse mo de to calculate partial deriv atives in the computational graph) and up date the 3 https://docs . pytorch . org/tutorials/beginner/introyt/trainingyt . html 14 learnable comp onen ts of the complex v oltage (in p olar form) using an opti- mizer and a sc heduler. It should b e noted that the admittance matrix Y bus is stored in sparse compressed ro w storage format (CSR) and that all v ariables can b e easily b e placed on the GPU. In applications in volving rep eated sim- ulations on the same grid (e.g., time-series analyses), grid-sp eciﬁc quantities suc h as Y bus and bus-t yp e indices can b e reused across iterations. 4.2. Batche d Computation An imp ortan t adv antage of the diﬀerentiable formulation is the ability to solv e m ultiple p ow er-ﬂow problems simultaneously using batching. This is particularly relev an t for applications suc h as time-series sim ulations or con tingency analyses. T o enable batc hing while preserving the original computational structure, the system matrices and v ectors of individual sim ulations are concatenated. Sp eciﬁcally , the admittance matrices are combined in to a block-diagonal ma- trix Y bus = blo ckdiag ( Y ( i ) bus ) , while v oltage v ectors | V | and θ and the other v ectors S calc , out and tar g et are stac k ed accordingly . The corresp onding bus indices are shifted by batch _ num ∗ N to reﬂect the extended system size. This formulation allo ws m ultiple p o wer-ﬂo w problems to b e solved in parallel using a single forw ard and back- w ard pass, enabling eﬃcien t utilization of mo dern hardware accelerators. 5. Theoretical Ev aluation In this section, we compare DPF to NR from a theoretical standp oint. W e sho w ho w they are related in Section 5.1 and compare them with resp ect to time (Section 5.2 ), memory (Section 5.3 ) and stabilit y (Section 5.4 ). 5.1. R elation to the Jac obian DPF and NR b oth use the Jacobian matrix in diﬀeren t wa ys. The Ja- cobian is deﬁned as all ﬁrst-order partial deriv atives of the complex p ow er mismatc h F ( V ) = S calc ( V ) − S bus : J = J ( V ) = ∂ F ( V ) ∂ V = ∂ P ∂ θ ∂ P ∂ ∥ V ∥ ∂ Q ∂ θ ∂ Q ∂ ∥ V ∥ ! (5) 15 It is a linear appro ximation of how the p ow er c hanges with respect to c hanges in the v oltage. NR uses the Jacobian to ﬁnd the voltage up date that pro duces a p ow er equal to the negativ e p o wer mismatch. This is done by solving the linear system ∆ V = J − 1 ( − F ( V )) ⇔ J ∆ V = − F ( V ) (6) The in v ersion is usually done with sparse LU -decomp ositions. The new fac- tors L and U can hav e a larger amount of non-zero elements than the Jacobian due to the ﬁll-in. In contrast, gradien t-based metho ds do not jump to wards the solution im- mediately but up date the v oltage along the direction of the steep est descen t. Since a scalar loss is used instead of a full mismatc h v ector and the corre- sp onding Jacobian matrix, gradient-based metho ds use less information p er iteration. F or regular gradient descent with the squared error loss function L ( V ) = 1 2 ∥ F ( V ) ∥ 2 = 1 2 F ( V ) T F ( V ) (7) the v oltage up date vector is ∆ V = − η ∂ ( L ( V )) ∂ V = − η J T F ( V ) (8) This can b e understo o d as pro jecting the p ow er mismatch F ( V ) back in to v oltage space through J T (instead of the inv erse J − 1 in NR ). This appro xi- mation has the adv antage that the Jacobian’s sparsit y can b e used eﬀectively without any ﬁll-in. The do wnside is that the up date v ector is less accurate and needs to b e con trolled by a learning rate η . 5.2. R un-time and Conver genc e NR is w ell kno wn to conv erge lo cally quadratically ( Overton , 2017 ) while gradien t descent sho ws global linear con v ergence ( Garrigos and Go wer , 2024 ). This, ho wev er, comes with a cost. NR repeatedly solv es a linear system using a large Jacobian matrix. T o do this, an LU -decomp osition can be used ( Donnot , 2020b ). While the w orst case run time for standard LU - decomp ositions is cubic, it is p ossible to exploit the sparsit y of the Jacobian to impro v e run times, e.g., using KLU ( Da vis and P alamadai Natara jan , 2010 ). The actual run time dep ends on the ﬁll-in (additional numb er of non- zer o entries (nnz) in the decomp osition compared to the sparse Jacobian). 16 If no ﬁll-in is presen t ( meaning that nnz ( J ) = nnz ( L ) + nnz ( R ) ), then an iteration of NR is in O ( nnz ( J )) as only nnz ( J ) op erations are needed to create the factorization and to solv e the system. In comparison, DPF do es not ha v e to solv e a linear system. F or standard gradien t descen t with a squared loss function the gradient can b e form ulated as ∂ L ( V ) ∂ V = J T F ( V ) . So the cost is dominated by a matrix vector m ultiplica- tion with no ﬁll-in with an asymptotic run time of O ( nnz ( J )) = O ( nnz ( Y bus )) , whic h is linear for p ow er grids. Ov erall the total runtime is a tradeoﬀ b etw een the num b er of iterations needed and the time p er iteration. Since this analysis do es not accoun t for diﬀeren t optimizers and loss functions, and since the ﬁll-in is unclear, w e ev aluate the run time exp erimentally in Section 6 . 5.3. Memory Usage In the following, as deﬁned previously , let N b e the n umber of busses (no des) and M the n umber of connections in the grid (edges). Aside from existing inputs (admittance matrix Y bus , injections S bus and indices pv , pq and sl ack and the voltage vector for the solution) DPF needs to store the com- plex gradient v ector ∂ L ( V ) ∂ V = J T F ( V ) consisting of δ LO S S δ ∥ V ∥ and δ LO S S δ θ , and the complex output vector created from indexing S calc . By using reverse-mode AD it is not necessary to fully store the Jacobian as a matrix. Instead, the partial deriv atives are applied directly in matrix-vector m ultiplication. In to- tal, additional memory of 2 N complex num b ers is required while temp orarily partial deriv ativ es of size M are used but not stored. As a comparison, NR has to store the Jacobian consisting of the partial deriv ativ es δ P δ ∥ V ∥ , δ P δ θ , δ Q δ ∥ V ∥ and δ Q δ θ ( 4 M real num b ers as eac h partial deriv ative mirrors the sparsit y pattern of the admittance matrix), the LU -factorization of the Jacobian (dep ends on the ﬁll-in) and the complex mismatch v ector ( N complex num b ers). W e can further simplify this by assuming M ≈ 3 N . This is the case b ecause nodes in the grid (substations) are connected lo cally and ha v e only few connections. Doing this, NR needs to store a total of ab out 12 N complex num b ers as well as the ﬁll-in from the LU -decomp osition. Ov erall, DPF needs ab out one sixth of the additional memory NR do es with no ﬁll-in. Ho w ever, with ﬁll-in, the additional memory of NR can b e O ( N 2 ) . 17 5.4. Stability NR can b e problematic when the starting p oin t is badly chosen or when the metho d gets stuck in lo cal minima. Costilla-Enriquez et al. solv e these problems for very small grids b y using stochastic gradien t descent for the initialization and for cases where NR gets stuc k in lo cal minima. They argue that, even though the Jacobian in NR can lose its full rank, it is often still close to its full rank, whic h means that the gradient can still pro vide a useful direction ( Costilla-Enriquez et al. , 2020 ). 6. Exp erimen tal Ev aluation In this section, we ev aluate the prop osed DPF metho d with a fo cus on its scalability and p erformance in large-scale settings. W e consider t w o sce- narios: (i) p o w er-ﬂow computations on increasingly large grids to analyze scaling behavior, and (ii) time series sim ulations, where w e demonstrate ho w DPF b eneﬁts from solution reuse and batching. W e explain the exp erimental setting in Section 6.1 and Section 6.2 and w e compare it to established metho ds in Section 6.4 . 6.1. Simulation F r amework and Dataset W e use the simulation framew ork Grid2Op and its back end LightSim2Grid ( Donnot , 2020b ) to represen t pow er grids and conduct pow er-ﬂow simulations via NR. The grid sizes range from 118 buses with the IEEE-118 grid mo delling the old American p ow er system ( P ena et al. , 2017 ) to 9,241 buses with the case9241p egase grid mo deling the Europ ean system ( Josz et al. , 2016 ) similar to the b enchmark exp eriments in LightSim2Grid 4 (see table A.4 for a table with the used grids). The dataset w e use is the L e arning Industrial Physic al Simulation b enchmark suite (LIPS) ( Leyli Abadi et al. , 2022 ) for the 118- bus grid. It con tains a data-set of diﬀeren t grid states suited for ev aluating diﬀeren t p ow er-ﬂow tec hniques, esp ecially data-driven approac hes. F or the other grids w e use the synthetic data as used in the b enchmark exp eriments. 6.2. Har dwar e and Softwar e Envir onment Our diﬀeren tiable simulation is implemented in PyT orc h ( Paszk e et al. , 2019 ), a p opular deep learning framework, including PyT orch’s sparse library 4 h ttps://lightsim2grid . readthedocs . io/en/latest/b enchmarks_grid_sizes . h tml 18 T able 2: Hyp erparameters for DPF on the small 118-bus and the large 9,241- bus grid. IEEE118 Case9241p egase Optimizer A dam Optimizer parameters lr ( η ) 0 . 0034 0 . 0001 deca y ( β ) (0 . 979 , 0 . 963) Sc heduler Reduce lr on plateau Sc heduler parameters factor 0 . 547 patience 41 threshold 0 . 0673 co oldo wn 97 Maxim um iterations 1 , 000 to represen t admittance matrices in a sparse compressed ro w storage (csr) format. The exp erimen ts w ere conducted on a gpu4-no de with four NVIDIA A100-40 GPUs with 40GB VRAM and dual so c ket In tel Xeon Platinum 8,368 CPUs. W e utilized CUD A 12.4 ( Nic k olls et al. , 2008 ), NVIDIA’s parallel computing platform, and PyT orch 2.6 ( Paszk e et al. , 2019 ) to lev erage GPU acceleration for faster sim ulations. 6.3. Hyp erp ar ameters T o ﬁnd suitable hyper-parameters on the 118-bus-grid, we used Optuna ( Ak- iba et al. , 2019 ). W e considered the optimizers Adam ( Kingma and Ba , 2017 ), SGD ( Ruder , 2017 ) and RMSprop ( Hinton , 2012 ) and the learning rate schedulers constan t, step-lr, reduce-lr-on-plateau, and m ulti-step-lr with their resp ectiv e h yp er-parameters and conducted 100 trials for each com bi- nation. All three optimizers achiev e comparable losses. So we w en t ahead with A dam with a reduce-lr-on-plateau scheduler. W e noticed that the 9,241- bus-grid needs a smaller learning rate and manually lo wered it. F or the time series setting on the large grid, we conducted another h yp erparameter searc h. T able 2 and T able 3 show the hyperparameters used in a static setting and for time series. F or time series, while reusing previous solutions, w e noticed that the solutions are already close and a smaller learning rate is suﬃcient to learn the delta b et ween time steps. 19 T able 3: Hyperparameter settings for Time Series on IEEE118 with time step t . IEEE118 t = 0 IEEE118 t > 0 Optimizer A dam Optimizer parameters lr ( η ) 0 . 03564 0 . 00027 deca y ( β ) (0 . 9802 , 0 . 9440) (0 . 7847 , 0 . 6624) Sc heduler Step lr Reduce lr on plateau Sc heduler parameters step 100 γ 0 . 773 factor 0 . 8 patience 2 threshold 0 . 0388 co oldo wn 4 Maxim um iterations 1 , 000 300 6.4. Comp arison In this section, we compare our DPF to NR and the DC appr oximation . W e test the metho ds for diﬀerent use-cases, in particular on individual grids in Section 6.4.1 and on the time series application in Section 6.4.2 . W e further compare the scalabilit y of DPF and NR in Section 6.4.3 ev aluating the approaches for relev ant sizes needed in larger continen tal transmission grids and join t transmission-distribution systems. 6.4.1. Single-Step Optimization W e b egin with the standard setting of solving individual p o wer-ﬂo ws on small test grid cases. While this scenario fav ors second-order metho ds such as NR due to their fast lo cal con v ergence, it provides a useful baseline to analyze the b ehavior and prop erties of DPF . Here, the traditional NR ap- proac h w orks b est (see Figure 3 ). While our approach conv erges to a solution that is b etw een NR and the DC-appro ximation in terms of quality , it takes ab out 1,000 iterations (ab out 0 . 8 s for the IEEE-118 and ab out 5 s for the larger 9,241-bus grid) on CPU, which is longer than the runtime of NR and DC. Despite the longer runtime for small grids, the runtime for larger grids c hanges in fa vor of DPF . This is more apparen t when lo oking at diﬀeren t grid sizes. Figure 4 sho ws the scaling b ehavior of our approach compared to NR and DC. The larger 20 Figure 3: Comparison of our Diﬀeren tiable Sim ulation (blue), NR (red) and the DC- appro ximation (green) on CPU without data loading time. Left: the IEEE-118 grid, righ t: case9241pegase grid. In both cases our diﬀerentiable simulation is slow er than NR but the solution quality lies betw een NR and DC. Figure 4: Scaling b ehavior of DPF (braces show the num b er of iterations), NR and DC. While on CPU (left) the scaling seems similar betw een the diﬀeren t approac hes, on GPU (righ t) the b etter scaling b eha vior b ecomes apparen t. Y et for a grid size of 9,241 NR is still faster, as our gradient-based approac h needs ab out 1,000 iterations in its base form for conv ergence. 21 Figure 5: Solution distance of previous solutions (or initialization) to the solution of the next time step for the IEEE-118 grid. Subsequent time-steps ha ve v ery similar grids, and as a result, very similar solutions. the grid, the more iterations of our diﬀeren tiable simulation are p ossible. Ho w ever, even for the largest a v ailable grid in the LightSim2grid b enchmark tests ( case9241pegase ), NR is still faster. T o utilize the scaling b ehavior more eﬀectively , it is p ossible to consider multiple p ow er-ﬂow simulations at the same time. 6.4.2. Time Series Optimization One use-case where DPF shows adv an tages o ver NR is the time series setting with ﬁxed grids and changing injections. The adv antage of DPF is the reusability of approximate solutions, e.g., pow er-ﬂo w solutions of previous time steps. While b oth NR and DPF are iterative metho ds, NR cannot fully utilize previous solutions as it still has to calculate the Jacobian matrix. On the other hand, DPF uses many smaller steps to conv erge to a solution, whic h can b e accelerated with a go o d initial solution. The grid states of subsequen t time steps are similar, and as a consequence, the solutions are close to each other, as shown in Figure 5 . Using the simi- larit y of solutions from subsequent time-steps w e can reduce the num b er of iterations needed from ab out 1,000 to 100 iterations in Figure 6 . Another adv an tage of DPF that can b e used in a time-series setting but is not exclusive to is the abilit y to use batc hing. While NR b e used to 22 Figure 6: T raining curves for a time series application use case for the IEEE-118 grid. The ﬁrst time step (left) needs a full ﬁtting of roughly 1,000 iterations. Alb eit slow, needs to b e p erformed only once p er grid and can p ossibly b e replaced by Newton-Raphson. Subsequen t time steps (righ t), using the ﬁrst as initial guess, only require 100 iterations to reach a plateau in solution quality . iterativ ely solve the subsequent p ow er-ﬂows one b y one (and making use of information from previous time steps such as a similar admittance matrix and a similar Jacobian to reduce the calculation time), DPF uses batching as in Section 4.2 to calculate m ultiple pow er-ﬂows at once. Using batc hing on the 9,241 bus grid we improv e the time p er p o wer-ﬂo w from 2 ms to 0 . 45 ms as sho wn in Figure 7 for a batc h size of 64. Comp arison to NR. F or the time series setting we hav e sho wn that batc hing and reusing previous solutions impro v es the simulation speed for our metho d. But we cannot mak e a fair comparison to NR as LightSim2Grid do es not supp ort batc hing. What w e can do, how ever, is to compare our metho d to the most optimized version of NR used on time series 5 . W e ran the exp eri- men t on our device and found that the optimized v ersion NR for time series for the case9241pegase grid needs 12 . 37 ms . This means that by the time NR is ﬁnished, our diﬀeren tiable sim ulator has ﬁnished ab out 28 iterations, whic h is not enough to con verge. Ev en for c hanging grids (and th us chang- ing admittance matrices which DPF can handle as easily as ﬁxed admittance 5 h ttps://lightsim2grid . readthedocs . io/en/latest/b enchmarks_grid_sizes . h tml 23 Figure 7: Normalized time p er p o wer-ﬂo w using batching on the case9241pegase grid. Without batching, a time of ab out 2 ms p er iteration and p o wer-ﬂo w is needed while a time of 0 . 45 ms p er iteration and p o wer-ﬂo w is achiev ed at a batch size of 64. matrices) NR needs 26 . 06 ms which is still faster than DPF (DPF can do ab out 58 out of 100 iterations in that time). 6.4.3. Sc aling Behavior on Synthetic Data The av ailable grids w e used (see Section 6.1 ) are limited to 9,241 or less busses. On these grids, NR outp erforms DPF but the scaling app ears to b e in fav or of our approach (see Figure 4 ). T o shed ligh t on this, w e compare b oth metho ds using synthetic data. In Figure 8 we analyze no de and edge scaling. F or the no de scaling exp erimen t, we increase the grid size by cop ying the case9241pegase grid and adding a few random connections to connect them. F or the edge scal- ing exp eriment, w e add an increasing amount of random connections to the case9241pegase grid. Both exp eriments give a similar result: NR scaling app ears to b e close to quadratic while DPF scales linearly with the num b er of no des and edges. T ogether with the observ ation that the solution quality at eac h iteration is similar (not w orse) for larger grid sizes, the no de and edge scaling b eha vior makes DPF a scalable approach. 24 Figure 8: Scaling behavior with resp ect to grid c haracteristics. F or no de scaling (left), w e increase the num b er of no des in the grid b y copying the case9241pegase grid and making few ( 20 ∗ # copies ) random connections throughout the combined grid. In case of edge scaling (right), we insert an increasing amoun t of random connections to the case9241pegase grid, adding t wo new non-zero entries (due to symmetry) per connection in to the Y bus matrix consisting of 37 , 655 non-zero entries. DPF is almost unaﬀected (rises from 4 . 7 s to 5 . 2 s ) while the run time of NR increases signiﬁcantly . 25 6.5. Discussion With a diﬀeren tiable simulation of the p ow er-ﬂow equations, gradient- based optimization tec hniques can b e applied directly to the ph ysical sim- ulation. The primary adv antage of DPF lies in its scalability . While NR remains highly eﬃcient for small and medium-sized grids, the computational and memory requiremen ts associated with constructing and solving large Ja- cobian matrices increase with system size. In contrast, DPF av oids these exp ensiv e matrix factorizations and shows fa vorable scaling b ehavior in large net w orks. As demonstrated in our exp erimen ts, DPF ev en tually surpasses NR in conv ergence sp eed for suﬃcien tly large systems while maintaining ac- ceptable solution qualit y . Although classical high-v oltage transmission net works alone are often too small to fully exploit this scaling adv antage, the situation c hanges when considering broader system p ersp ectives. The Europ ean transmission net- w ork, for example, con tains roughly ten thousand buses as represen ted by the case9241pegase b enchmark grid, while the North American Eastern In- terconnection con tains appro ximately eigh t y thousand buses ( Birc hﬁeld and Ov erb ye , 2023 ). At this scale, NR remains highly eﬃcient. Our results indi- cate that netw orks with on the order of millions of buses are required for DPF to outp erform NR on the ﬁrst run, although this threshold could decrease with further optimization of the metho d. Suc h system sizes b ecome realistic when transmission and distribution netw orks are mo deled jointly . This p er- sp ectiv e is increasingly relev ant in the con text of the renew able energy tran- sition, where distributed generation, electriﬁcation, and weather-dependent ﬂuctuations b eneﬁt from more detailed system representations. In these set- tings, netw ork sizes can reac h tens of millions of no des. F or example, the electrical netw ork in T exas has b een estimated to contain appro ximately 46 million electrical no des ( Mateo et al. , 2024 ), well within the regime where scalable approaches suc h as DPF b ecome adv antageous. P ossible future work in this direction is to app ly DPF in a fully optimized, parallelized and p er- haps distributed manner on larger realistic systems. Bey ond its scalability , DPF also oﬀers additional practical b eneﬁts. Be- cause the formulation is em b edded in a diﬀerentiable programming frame- w ork, it can naturally exploit GPU acceleration and batc hing of m ultiple grid states. This mak es the approach particularly suitable for applications that require solving many related p o w er-ﬂow problems, suc h as time-series sim ulations or N − 1 contingency analyses. While our curren t implemen- tation do es not yet outp erform the optimized time-series implemen tation of 26 LightSim2Grid for small grids, the observ ed p erformance gap is relativ ely small, suggesting that impro ved implementations could become comp etitiv e. F u rthermore, gradient-based approaches can con tribute to improving the robustness of traditional solv ers. NR is known to occasionally fail to con verge ev en with go o d initialization ( Okhuegbe et al. , 2024 ). In suc h cases, gradient- based metho ds can serve as stabilizing or initialization pro cedures, as ex- plored in h ybrid approaches com bining b oth techniques ( Costilla-Enriquez et al. , 2020 ). F u ture work should therefore fo cus on optimizing the implementation of DPF , including improv ed exploitation of sparsity structures, more suitable optimizers and hyperparameters, early stopping strategies, and tighter inte- gration with existing sim ulation framew orks such as LightSim2Grid . A d- ditionally , large-scale exp erimen ts on realistic transmission–distribution sys- tems could further demonstrate the adv an tages of the approach. 7. Conclusion This work demonstrates that the diﬀerentiable simulation paradigm can b e successfully applied to p ow er-ﬂow calculations through the prop osed DPF form ulation. The metho d preserv es the underlying ph ysical equations while enabling gradien t-based optimization within mo dern machine-learning frame- w orks. The main strength of DPF lies in its scalabilit y . By a voiding explicit Jacobian construction and matrix factorization, the metho d exhibits fav or- able scaling b ehavior as net w ork size increases. While NR remains the most eﬃcien t solv er for small and medium-sized transmission grids, DPF b ecomes increasingly attractiv e for v ery large systems where the computational and memory requiremen ts of traditional metho ds b ecome limiting. This scalabil- it y is particularly relev an t for emerging applications that require large and detailed system mo dels, such as join t transmission–distribution simulations. In suc h contexts, netw ork sizes can reac h millions of no des, making scalable solution approac hes essen tial. Bey ond scalability , DPF oﬀers additional practical adv antages. The dif- feren tiable form ulation allo ws seamless integration with modern mac hine- learning infrastructure, including GPU acceleration, batc hing across m ultiple sim ulations, and automatic diﬀeren tiation framew orks suc h as PyT orch. This mak es the metho d w ell suited for applications in v olving man y related p o wer- ﬂo w computations, suc h as time-series simulations and N − 1 -con tingency 27 analyses. Moreov er, DPF can provide intermediate appro ximate solutions that are more accurate than the widely used DC appro ximation, enabling eﬃcien t screening applications where full con vergence is not required. Ov erall, DPF provides a scalable alternative to classical p o wer-ﬂo w solvers. F u ture w ork should fo cus on impro ving the eﬃciency and con vergence behav- ior of the metho d, exploring optimized implementations, and deploying it on large realistic p o w er-system mo dels using parallel and distributed computing arc hitectures. A c kno wledgemen ts This w ork is supp orted by the Helmholtz AI platform gran t and the Helmholtz Asso ciation Initiative and Netw orking F und on the HAICORE@KIT partition. W e credit Assia Benk erroum from ﬂaticon.com for pro viding the icons in Figure 1 . Declaration of generativ e AI and AI-assisted tec hnologies in the writing pro cess. Statemen t: During the preparation of this work, the author(s) used Chat- GPT to assist in the co ding pro cess and sparingly to reﬁne the language in the do cumen t for unclear parts and explicitly not to generate new conten t. After using this tool/service, the author(s) review ed and edited the con ten t as needed and take(s) full responsibility for the con tent of the published article. 28 References T akuya Akiba, Shotaro Sano, T oshihiko Y anase, T ak eru Oh ta, and Masanori K o yama. Optuna: A next-generation hyperparameter optimization frame- w ork. In Pr o c e e dings of the 25th A CM SIGKDD international c onfer enc e on know le dge disc overy & data mining , pages 2623–2631, 2019. doi: 10 . 1145/3292500 . 3330701 . Shadi G Ala wneh, Lei Zeng, and Seyed Ali Areﬁfar. A review of high- p erformance computing metho ds for p ow er ﬂo w analysis. Mathematics , 11(11):2461, 2023. doi: 10 . 3390/math11112461 . Marcin Andryc howicz , Misha Denil, Sergio Gomez, Matthew W. Hoﬀman, Da vid Pfau, T om Sc haul, Brendan Shillingford, and Nando de F reitas. Learning to learn by gradien t descent by gradient descent, 2016. URL https://arxiv . org/abs/1606 . 04474 . Masoud Barati. Enhancing acpf analysis: In tegrating newton-raphson metho d with gradient descen t and computational graphs. IEEE T r ansac- tions on Industry Applic ations , 61, 2025. doi: 10 . 1109/TIA . 2025 . 3571862 . A tilim Gunes Ba ydin, Barak A. P earlmutter, Alexey Andrey evich Radul, and Jeﬀrey Mark Siskind. Automatic diﬀerentiation in machine learning: a surv ey , 2018. URL https://arxiv . org/abs/1502 . 05767 . A dam B Birc hﬁeld and Thomas J Overb ye. A review on pro viding real- istic electric grid simulations for academia and industry . Curr ent Sus- tainable/R enewable Ener gy R ep orts , 10(3):154–161, 2023. doi: 10 . 1007/ s40518- 023- 00212- 7 . Luis Böttcher, Hinrikus W olf, Bastian Jung, Philipp Lutat, Marc T rageser, Oliv er Pohl, Xiaohu T ao, Andreas Ulbig, and Martin Grohe. Solving ac p ow er ﬂow with graph neural n et w orks under realistic constrain ts. In 2023 IEEE Belgr ade PowerT e ch , pages 1–7. IEEE, 2023. doi: 10 . 1109/p o wertec h55446 . 2023 . 10202246 . Claire Camus. Outage of frenc h-spanish interconnection on 24 july 2021, 2021. URL https://www . entsoe . eu/news/2021/08/20/outage- of- french- spanish- interconnection- on- 24- july- 2021- update/ . ac- cessed: 2025-08-20. 29 Sp yros Chatziv asileiadis, Damien Ernst, and Göran Andersson. The global grid. R enewable Ener gy , 57:372–383, 2013. doi: 10 . 1016/ j . renene . 2013 . 01 . 032 . Hsiao-Dong Chiang, T ao W ang, and Hao Sheng. A nov el fast and ﬂexible holomorphic embedding p o wer ﬂo w method. IEEE T r ansactions on Power Systems , 33(3):2551–2562, 2017. doi: 10 . 1109/TPWRS . 2017 . 2750711 . George F. Corliss. Applic ations of diﬀer entiation arithmetic , page 127–148. A cademic Press Professional, Inc., USA, 1988. ISB N 0125056303. Nap oleon Costilla-Enriquez, Y ang W eng, and Baosen Zhang. Com bining newton-raphson and sto c hastic gradien t descent for p ow er ﬂo w analysis. IEEE T r ansactions on Power Systems , 36(1):514–517, 2020. doi: 10 . 1109/ TPWRS . 2020 . 3029449 . Timoth y A Da vis and Ek anathan P alamadai Natara jan. Algorithm 907: Klu, a direct sparse solv er for circuit simulation problems. A CM T r ans- actions on Mathematic al Softwar e (TOMS) , 37(3):1–17, 2010. doi: 10 . 1145/1824801 . 1824814 . P an telis Dogoulis, Karim Tit, and Maxime Cordy . Kclnet: Physics-informed p o wer ﬂo w prediction via constraints pro jections. In Joint Eur op e an Con- fer enc e on Machine L e arning and Know le dge Disc overy in Datab ases , pages 95–110. Springer, 2025. doi: 10 . 1007/978- 3- 032- 06129- 4_6 . Benjamin Donnot. Grid2op-a testb ed platform to mo del sequential decision making in p o wer systems, 2020a. URL https://github . com/Grid2Op/ grid2op . Benjamin Donnot. Ligh tsim2grid - A c++ bac kend targeting the Grid2Op platform, 2020b. URL https://GitHub . com/Grid2Op/lightsim2grid . Benjamin Donnot, Isab elle Guyon, Marc Sc ho enauer, P atric k P anciatici, and An toine Marot. In tro ducing mac hine learning for p o w er system op eration supp ort, 2017. doi: 10 . 48550/arXiv . 1709 . 09527 . Balthazar Donon, Rém y Clémen t, Benjamin Donnot, An toine Marot, Isabelle Guy on, and Marc Schoenauer. Neural net w orks for p o wer ﬂow: Graph neural solv er. Ele ctric Power Systems R ese ar ch , 189:106547, 2020. doi: 10 . 1016/j . epsr . 2020 . 106547 . 30 Catarina Demon y Emma Pinedo and David Latona. P o wer b egins to return after huge outage hits spain and p ortugal, 2025. URL https://www . reuters . com/world/europe/large- parts- spain- portugal- hit- by- power- outage- 2025- 04- 28/ . accessed: 2025-05- 09. Tiﬀan y F an, Kailai Xu, Ja y Pathak, and Eric Darve. Solving inv erse problems in steady-state navier-stok es equations using deep neural netw orks, 2020. URL https://arxiv . org/abs/2008 . 13074 . Meriem Fikri, T ouria Haidi, Bouchra Cheddadi, Omar Sabri, Meriem Ma- jdoub, and Ab delaziz Belfqih. P o w er ﬂo w calculations b y deterministic metho ds and artiﬁcial intelligence metho d. Int J A dv Eng R es Sci , 5: 148–152, 2018. doi: 10 . 22161/ijaers . 5 . 6 . 25 . Katharina Flügel, Daniel Co quelin, Marie W eiel, Charlotte Debus, A chim Streit, and Markus Götz. Beyond bac kpropagation: Optimization with m ulti-tangen t forward gradien ts. In 2025 International Joint Confer enc e on Neur al Networks (IJCNN) , page 1–8. IEEE, June 2025. doi: 10 . 1109/ ijcnn64981 . 2025 . 11227446 . Guillaume Garrigos and Rob ert M. Gow er. Handb o ok of conv ergence theorems for (sto chastic) gradien t metho ds, 2024. doi: 10 . 48550/ arXiv . 2301 . 11235 . Juan S Giraldo, Oscar Danilo Monto ya, P edro P V ergara, and F ederico Mi- lano. A ﬁxed-p oint current injection p ow er ﬂo w for electric distribution systems using laurent series. Ele ctric Power Systems R ese ar ch , 211:108326, 2022. doi: 10 . 1016/j . epsr . 2022 . 108326 . John J Grainger. Power system analysis . McGra w-Hill, 1999. ISBN 1259008355. Hendrik F. Hamann, Blazhe Gjorgiev, Thomas Bruns c h wiler, Leonardo S.A. Martins, Alban Puec h, Anna V arb ella, Jonas W eiss, Juan Bernab e- Moreno, Alexandre Blondin Massé, Seong Lok Choi, Ian F oster, Bri- Mathias Ho dge, Rishabh Jain, Kibaek Kim, Vincen t Mai, F rançois Mi- rallès, Martin De Montign y , Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Y oussef, Arnaud Zinﬂou, Alexander Belyi, Ricardo J. Bessa, Bishnu Prasad Bhattarai, Johannes Schm ude, and Stanisla v 31 Sob olevsky . F oundation mo dels for the electric p ow er grid. Joule , 8(12): 3245–3258, 2024. ISSN 2542-4351. doi: 10 . 1016/j . joule . 2024 . 11 . 002 . Geoﬀrey Hin ton. rmsprop: Divide the gradien t b y a running av erage of its recen t magnitude, 2012. URL https://www . cs . toronto . edu/~tijmen/ csc321/slides/lecture_slides_lec6 . pdf . Kaiy ang Huang and Kai Sun. A review on applications of holomor- phic embedding methods. IEner gy , 2(4):264–274, 2023. doi: 10 . 23919/ IEN . 2023 . 0037 . Reijer Idema, Georgios P apaefthymiou, Domenico Lahay e, Cornelis V uik, and Lou v an der Sluis. T ow ards faster solution of large p o wer ﬂow prob- lems. IEEE T r ansactions on Power Systems , 28:4918–4925, 2013. doi: 10 . 1109/TPWRS . 2013 . 2252631 . Cédric Josz, Stéphane Fliscounakis, Jean Maegh t, and P atric k P anciatici. Ac p o wer ﬂo w data in matp ow er and qcqp format: itesla, rte snapshots, and p egase, 2016. doi: 10 . 48550/arXiv . 1603 . 01533 . Diederik P . Kingma and Jimm y Ba. Adam: A metho d for sto chastic opti- mization, 2017. doi: 10 . 48550/arXiv . 1412 . 6980 . P a vel K omaro v, Floris v an Breugel, and J. Nathan Kutz. A taxonomy of n umerical diﬀerentiation metho ds, 2025. URL https://arxiv . org/abs/ 2512 . 09090 . So eren Laue. On the equiv alence of automatic and symbolic diﬀeren tiation, 2022. URL https://arxiv . org/abs/1904 . 02990 . Gerardo Lemos, Ana Melgar, Mauricio T orres, and Michael Rios. State of emergency declared after black out plunges most of chile into darknes, 2021. URL https://edition . cnn . com/2025/02/25/americas/chile- blackout- 14- regions- intl- latam . accessed: 2025-08-20. Milad Leyli Abadi, Antoine Marot, Jérôme Picault, Da vid Danan, Mouadh Y agoubi, Benjamin Donnot, Seif A ttoui, Pa vel Dimitro v, Asma F arjallah, and Clement Etienam. Lips-learning industrial physical simulation b enc hmark suite. A dvanc es in Neur al Information Pr o c essing Sys- tems , 35:28095–28109, 2022. URL https://proceedings . neurips . cc/ 32 paper_files/paper/2022/hash/b3ac9866f6333beaa7d38926101b7e1c- Abstract- Datasets_and_Benchmarks . html . Jun bang Liang and Ming C Lin. Diﬀerentiable physics simulation. In ICLR 2020 workshop on inte gr ation of de ep neur al mo dels and diﬀer ential e qua- tions , 2020. URL https://openreview . net/pdf?id=p- SG2KFY2 . Carlos Mateo, F ernando Postigo, T arek Elgindy , Adam B Birc hﬁeld, P ablo Dueñas, Bry an Palmin tier, Nadia P anossian, T omás Gómez, F ernando de Cuadra, Thomas J Ov erby e, et al. Building and v alidating a large- scale com bined transmission & distribution synthetic electricity system of texas. International Journal of Ele ctric al Power & Ener gy Systems , 159: 110037, 2024. doi: 10 . 1016/j . ijep es . 2024 . 110037 . P arag Mitra, Vijay Vittal, Brian Keel, and Jeni Mistry . A systematic ap- proac h to n-1-1 analysis for p ow er system securit y assessment. IEEE Power and Ener gy T e chnolo gy Systems Journal , 3(2):71–80, 2016. doi: 10 . 1109/JPETS . 2016 . 2546282 . Rh ys Newbury , Jac k Collins, Kerry He, Jiahe Pan, Ingmar Posner, Da vid Ho w ard, and Ak ansel Cosgun. A review of diﬀerentiable simulators. IEEE A c c ess , pages 97581–97604, 2024. doi: 10 . 1109/A CCESS . 2024 . 3425448 . John Nick olls, Ian Buc k, Mic hael Garland, and Kevin Sk adron. Scalable parallel programming with cuda: Is cuda the parallel programming mo del that application dev elop ers ha v e been w aiting for? Queue , 6(2):40–53, 2008. doi: 10 . 1145/1401132 . 1401152 . Sam uel N Okhuegbe, Adedasola A Ademola, and Yilu Liu. A mac hine learn- ing initializer for newton-raphson ac pow er ﬂo w conv ergence. In 2024 IEEE T exas Power and Ener gy Confer enc e (TPEC) , pages 1–6. IEEE, 2024. doi: 10 . 1109/TPEC60005 . 2024 . 10472261 . Darren Orf. The p o w er grid is the largest mac hine in the w orld, and our nation’s greatest engineering achiev ement, 2023. URL https://www . popularmechanics . com/science/energy/a44067133/ how- does- the- power- grid- work/ . accessed: 2024-07-29. Mic hael Overton. Quadratic con vergence of newton’s metho d. T ec hnical rep ort, New Y ork Univ ersit y , 2017. URL https://cs . nyu . edu/~overton/ NumericalComputing/newton . pdf . 33 Max P argmann, Jan Eb ert, Markus Götz, Daniel Maldonado Quin to, Rob ert Pitz-P aal, and Stefan Kesselheim. Automatic heliostat learning for in situ concen trating solar p o w er plan t metrology with diﬀeren tiable ra y trac- ing. Natur e Communic ations , 15(1):6997, 2024. doi: 10 . 1038/s41467- 024- 51019- z . A dam P aszk e, Sam Gross, F rancisco Massa, Adam Lerer, James Bradbury , Gregory Chanan, T revor Killeen, Zeming Lin, Natalia Gimelshein, Luca An tiga, Alban Desmaison, Andreas Kopf, Edw ard Y ang, Zachary De Vito, Martin Raison, Alykhan T ejani, Sasank Chilamkurthy , Benoit Steiner, Lu F ang, Junjie Bai, and Soumith Chin tala. Pytorch: An imp erativ e style, high-p erformance deep learning library . In A dvanc es in Neur al Information Pr o c essing Systems , volume 32. Curran Asso ciates, Inc., 2019. Iv onne P ena, Carlo Brancucci Martinez-Anido, and Bri-Mathias Ho dge. An extended ieee 118-bus test system with high renewable p enetra- tion. IEEE T r ansactions on Power Systems , 33(1):281–289, 2017. doi: 10 . 1109/TPWRS . 2017 . 2695963 . Yingying Qi, Di Shi, and Daniel T yla vsky . Impact of assumptions on dc p o wer ﬂo w mo del accuracy . In 2012 North Americ an Power Symp osium (NAPS) , pages 1–6. IEEE, 2012. doi: 10 . 1109/NAPS . 2012 . 6336395 . Sebastian Ruder. An ov erview of gradient descen t optimization algorithms, 2017. P atric k S Sauter, Christian A Braun, Mathias Kluw e, and Soren Hohmann. Comparison of the holomorphic em b edding load ﬂow metho d with estab- lished p o wer ﬂo w algorithms and a new h ybrid approac h. In 2017 Ninth A nnual IEEE Gr e en T e chnolo gies Confer enc e (Gr e enT e ch) , pages 203–210. IEEE, 2017. doi: 10 . 1109/GreenT ec h . 2017 . 36 . Brian Stott and Ongun Alsac. F ast decoupled load ﬂo w. IEEE tr ans- actions on p ower app ar atus and systems , pages 859–869, 2007. doi: 10 . 1109/TP AS . 1974 . 293985 . Brian Stott, Jorge Jardim, and Ongun Alsaç. Dc p ow er ﬂo w revisited. IEEE T r ansactions on Power Systems , 24(3):1290–1300, 2009. doi: 10 . 1109/ TPWRS . 2009 . 2021235 . 34 Leon Thurner, Alexander Scheidler, Florian Sc häfer, Jan-Hendrik Menke, Ju- lian Dollichon, F riederike Meier, Steﬀen Meinec ke, and Martin Braun. pan- dap o wer—an op en-source python to ol for conv enient mo deling, analysis, and optimization of electric p ow er systems. IEEE T r ansactions on Power Systems , 33(6):6510–6521, 2018. doi: 10 . 1109/TPWRS . 2018 . 2829021 . William. F. Tinney and Cliﬀord E. Hart. P o w er ﬂow solution b y newton’s metho d. IEEE T r ansactions on Power App ar atus and Systems , P AS-86 (11):1449–1460, 1967. doi: 10 . 1109/TP AS . 1967 . 291823 . An tonio T rias. The holomorphic embedding load ﬂow method. In 2012 IEEE p ower and ener gy so ciety gener al me eting , pages 1–8. IEEE, 2012. doi: 10 . 1109/PESGM . 2012 . 6344759 . He Zhang. Higher-order automatic diﬀerentiation using sym b olic diﬀeren tial algebra: Bridging the gap b etw een algorithmic and symbolic diﬀerentia- tion, 2025. URL https://arxiv . org/abs/2506 . 00796 . 35 T able A.4: LightSim2Grid times on our device in ms . Compared are times of p o wer-ﬂo ws for c hanging grids with and without recycling (RC), p ow er-ﬂo ws for ﬁxed grids on time series (TS) and times for contingency analysis (TS). The runtime scaling app ears to b e betw een linear and quadratic. The fastest times are achiev ed for time series as the grid (and therefore the admittance matrix and the sparsity structure of the Jacobian) stays the same and can b e reused. P o w er-ﬂo w times using NR in ms Grid Size R C No RC TS CA case14 14 0.0204 0.0491 0.00790 0.0171 case118 118 0.1180 0.3206 0.0506 0.0746 case_illinois200 200 0.2447 0.5842 0.1032 0.1705 case300 300 0.4501 0.9606 0.2521 0.3341 case1354pegase 1,354 2.3303 4.0754 1.2324 1.5423 case1888rte 1,888 3.6692 5.9012 1.6169 2.059 case2848rte 2,848 5.6715 9.093 2.447 3.1987 case2869pegase 2,869 5.4858 9.443 2.9107 3.42051 case3120sp 3,120 6.371 10.155 2.307 3.5356 case6495rte 6,495 17.6517 25.9042 7.53377 8.6814 case6515rte 6,515 20.3695 28.8332 7.3132 8.77377 case9241pegase 9,241 26.0554 41.0947 12.3748 14.0037 App endix App endix A. Baseline T able A.4 sho ws the p ow er-ﬂow runtime using the NR baseline when running Ligh tSim2Grid co de . It is notable that the run time app ears to b e b et ween linear and quadratic and not cubic. App endix B. Pseudo co de of DPF and NR Algorithm 1 and Algorithm 2 show pseudo-co de for our metho d ( DPF ) and the baseline metho d NR as it is implemen ted. 36 Algorithm 1 Pseudo co de for our DPF metho d. Lines 22-26 calculate a loss from the p ow er-balance equation which is used to up date the v oltages at lines 27-29 b y using an optimizer and sc heduler. 1: Hyp erparameters 2: optimiz er optimizer with own h yp erparameters 3: schedul er scheduler with o wn hyperparameters 4: l oss loss function 5: 6: Inputs 7: n n umber of active buses 8: V ∈ C n v oltages V = | V | ∗ e iθ with θ , | V | ∈ R n 9: Y bus ∈ C nxn A dmittance Matrix 10: S bus ∈ C n Injection V ector 11: p v index list of PV-buses 12: p q index list of PQ-buses 13: slac k index of slack bus 14: tol tolerance for conv ergence c hec k 15: 16: Start 17: V = | V | e iθ ← ones ▷ Initialization 18: | V | lear nable = | V | [ pq ] 19: θ lear nable = θ [ pv , pq ] 20: 21: for i=1 until max_iter do 22: | S calc = P calc + iQ calc ← V ( Y bus V ) ∗ 23: | out = [ P calc [ pv , pq ] , Q calc [ pq ]] ▷ F orward P ass 24: | tar g et = [ P [ pv , pq ] , Q [ pq ]] 25: | 26: | l oss = M S E ( out, tar g et ) 27: | l oss.back w ar d () ▷ Only up date learnable parameters 28: | optimiz er.step () 29: | schedul er.step () 30: | if l oss < tol then 31: | | return | V | e iθ 32: | end if 33: end for 34: return | V | e iθ 37 Algorithm 2 NR metho d for p o w er-ﬂow calculations (Ligh tSim2Grid ( Don- not , 2020b )). Every iteration a linearization is done with the Jacobian (line 22) con taining the partial deriv ativ es (lines 20-21). If the p o wer mismatch is to o large (lines 13-16, 26-29), the voltage vector is up dated b y the v oltage delta that creates a p o wer delta (under the Jacobian) to balance out the p o wer mismatch (line 23). 1: Inputs 2: n n umber of active buses 3: V ∈ C n v oltages V = | V | ∗ e iθ with angles and magnitudes θ, | V | ∈ R n 4: Y bus ∈ C nxn A dmittance Matrix 5: S bus ∈ C n Injection V ector 6: p v index list of PV-buses 7: p q index list of PQ-buses 8: slac k index of slack bus 9: tol tolerance for conv ergence c hec k 10: 11: Start 12: V = | V | e iθ ← DC-p ow er-ﬂow ▷ Initialization 13: S calc ← V ( Y bus V ) ∗ 14: P calc , Q calc ← Re ( S calc ) , I m ( S calc ) 15: f ( | V | , θ ) ← [( P calc − P )[ pv pq ] , ( Q calc − Q )[ pq ]] T ▷ Ev aluate p o wer mismatc h 16: if f ( | V | , θ ) < tol then ▷ Up date lo cal best solution 17: | return V 18: end if 19: for i=1 until max_iter do 20: | ∂ S calc i ∂ | V | j ←    V i Y ∗ bus ij V ∗ j | V j | , for i  = j V i | V i | I ∗ i + V i Y ∗ bus ij V ∗ j | V j | , for i = j    ▷ Partial deriv atives 21: | ∂ S calc i ∂ θ j ← ( − iV i Y ∗ bus ij V ∗ j , for i  = j iV i I ∗ i − iV i Y ∗ bus ij V ∗ j , for i = j ) 22: | J f ← ∂ P calc ∂ θ [ pv pq , pv pq ] ∂ P calc | V | [ pv pq , pq ] ∂ Q calc ∂ θ [ pq , pv pq ] ∂ Q calc | V | [ pq , pq ] ! ▷ Determine Jacobian 23: |  θ [ pv pq ] | V | [ pq ]  ←  θ [ pv pq ] | V | [ pq ]  − J − 1 f f ( | V | , θ ) ▷ Newton-step using linear solver 24: | S calc ← V ( Y bus V ) ∗ 25: | P calc , Q calc ← Re ( S calc ) , I m ( S calc ) 26: | f ( | V | , θ ) ← [( P calc − P )[ pv pq ] , ( Q calc − Q )[ pq ]] T ▷ Po wer mismatc h 27: | if f ( | V | , θ ) < tol then ▷ Up date lo cal best solution 28: | | return V = | V | e iθ 29: | end if 30: end for 31: return V = | V | e iθ 38

Differentiable Power-Flow Optimization

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment