Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid

Fine-T uning LLMs to Generate Economical and Reliable Actions for the Po wer Grid Mohamad Chehade and Hao Zhu Chandra Family Department of Electrical and Computer Engineering The Univ ersity of T exas at Austin, Austin, TX, USA { chehade, haozhu } @utexas.edu Abstract —Public Safety Po wer Shutoffs (PSPS) f orce rapid topology changes that can render standard operating points infeasible, requiring operators to quickly identify corr ective transmission switching actions that reduce load shedding while maintaining acceptable voltage beha vior . W e present a veriﬁable, multi-stage adaptation pipeline that ﬁne-tunes an instruction- tuned large language model (LLM) to generate open-only cor- recti ve switching plans from compact PSPS scenario summaries under an explicit switching budget. First, super vised ﬁne-tuning distills a DC-OPF MILP oracle into a constrained action gram- mar that enables reliable parsing and feasibility checks. Second, direct preference optimization reﬁnes the policy using A C- evaluated preference pairs ranked by a voltage-penalty metric, injecting voltage-awar eness bey ond DC imitation. Finally , best-of- N selection provides an infer ence-time addition by choosing the best feasible candidate under the target metric. On IEEE 118-bus PSPS scenarios, ﬁne-tuning substantially improves DC objectiv e values versus zero-shot generation, r educes A C power -ﬂow failure from 50% to single digits, and impro ves voltage-penalty outcomes on the common-success set. Code and data-generation scripts are released to support reproducibility . Index T erms —Public Safety Power Shutoff (PSPS), transmis- sion switching, large language models, supervised ﬁne-tuning, direct pr eference optimization, A C feasibility , voltage regulation I . I N T RO D U C T I O N Large language models (LLMs) hav e rapidly transitioned from research prototypes to deployable decision-support tools across diverse domains [1]–[3]. Their ability to transform unstructured descriptions into structured outputs makes them attractiv e for operational environments where decisions are time-sensitiv e and consequences are high. Power system con- trol rooms are a compelling setting: operators manage complex contingencies, coordinate actions across many assets, and balance reliability , economics, and compliance under tight time constraints [4]. Unlike traditional decision-support tools that require specialized inputs or rigid interfaces, LLMs enable operators to interact through natural language while producing machine-readable recommendations (e.g., structured action lists) that can be veriﬁed before execution [5], [6]. Howe ver , foundation LLMs lack domain-speciﬁc kno wl- edge of power system physics, operational constraints, and grid safety requirements. T raining grid-speciﬁc LLMs from scratc h is impractical: modern LLMs succeed through pre- training on trillions of tokens spanning div erse domains [7], This work was funded by NSF Grant 2130706 and AR O Grant W911NF2310266. Base LLM SFT imitate MILP DPO voltage-a ware Policy open-only MILP demos (DC-OPF) A C e val → rank by V pen optional: best-of- N Fig. 1: Multi-stage adaptation pipeline for PSPS correctiv e switching. while grid operations data are orders of magnitude smaller and specialized. A practical alternative is to adapt a strong instruction-tuned model via targeted ﬁne-tuning so that it can (i) read a compact, structured description of a grid scenario and (ii) output actions in a constrained grammar that can be checked for feasibility . In this work, we study a concrete and operationally moti- vated task: correcti ve, open-only transmission switching during Public Safety Power Shutoffs (PSPS), which are corrective de- energization actions used by utilities to reduce wildﬁre igni- tion risk during extreme weather conditions [8]. When PSPS forces lines out of service, operators must rapidly determine whether opening additional elements can mitigate overloads, reduce load shedding, and improve operating conditions while respecting switching budgets and operational rules. Comput- ing optimal actions with mixed-integer optimization can be expensi ve under time pressure, especially when considering nonlinear A C constraints [9]–[11]. Our goal is to amortize this optimization effort into training, then produce high-quality switching recommendations at inference time using structured scenario summaries and a veriﬁable action grammar . Figure 1 illustrates the pipeline we adopt. Starting from an instruction-tuned base model, supervised ﬁne-tuning (SFT) trains the LLM to imitate MILP-deriv ed open-only switching decisions under DC constraints. W e then apply direct pref- erence optimization (DPO) using rank ed responses deri ved from A C v oltage-quality ev aluation, producing a voltage-a ware policy that more reliably prioritizes actions with fewer voltage violations. This design follows a standard alignment pattern for instruction-tuned LLMs: imitation learning ﬁrst, followed by preference-based reﬁnement [12], [13]. In our setting, the supervised stage anchors the policy to an optimization oracle, while the preference stage injects A C voltage-a wareness that is difﬁcult to encode directly in DC training. The resulting model functions as a candidate-plan generator whose outputs can be parsed, veriﬁed, and ev aluated with existing grid-analysis tools. Our contributions are: • W e formulate PSPS-aware open-only switching with switch- ing budgets and corridor structure using a DC-OPF MILP oracle (Section II). • W e design a structured scenario representation and action grammar that enables an instruction-tuned LLM to emit switching plans that are straightforward to parse and verify (Section III). • W e introduce a voltage-aw are preference reﬁnement stage based on DPO, using A C-deriv ed voltage-quality pref- erences to align the model beyond DC imitation (Sec- tion III-A). • W e ev aluate economic performance, A C feasibility , and voltage quality , including comparisons to a neural baseline and training-curve reporting for reproducibility (Section IV). Finally , we discuss practical considerations such as feasi- bility checks, training/inference costs, and deployment con- straints (Section IV). W e view this as a step toward veriﬁable, operator-f acing LLM assistants that interface with existing grid analysis pipelines rather than replacing them. I I . P S P S S W I T C H I N G P R O B L E M Public Safety Power Shutoffs (PSPS) are prev entive and correctiv e de-energization actions taken by utilities to reduce wildﬁre ignition risk during extreme weather conditions [14], [15]. When a PSPS e vent forces a subset of transmission lines out of service, system operators must determine whether additional corrective open-only switching actions can improve reliability and reduce load shedding. W e formulate a DC optimal power ﬂow (DC-OPF) model that explicitly incorpo- rates PSPS constraints and pose an open-only decision-making problem in which operators may proactiv ely open a limited number of additional transmission elements to mitigate system stress. a) Network model and PSPS constraints.: Consider a power system with bus set B ( |B | = n b ), transmission line set E ( |E | = n ℓ ), and generator set G ⊆ B . Each line e has reac- tance x e > 0 and rating S max e > 0 (MW). Buses have demand P d i ≥ 0 , generators have capacity P min g ≤ P g ≤ P max g and cost c g ≥ 0 ($/MW), and load shedding P s i ≥ 0 is penalized at rate γ ($/MW). A PSPS e vent is encoded by availability mask ξ ∈ { 0 , 1 } n ℓ , where ξ e = 0 forces line e open. Operator decisions z e ∈ { 0 , 1 } determine ef fecti ve status g e = ξ e z e . W e use standard DC power ﬂow with bus-branch incidence A and susceptance B ℓ = diag(1 /x e ) . b) Open-only switching with a line budget.: Operators may open up to K ℓ additional PSPS-available lines. The open- only switching problem is: min P g , P s , θ , P ℓ , z ∈{ 0 , 1 } n ℓ X g ∈G c g P g + γ X i ∈B P s i (1a) s.t. P min g ≤ P g ≤ P max g , ∀ g ∈ G , (1b) P s i ≥ 0 , ∀ i ∈ B , (1c) P ℓ = B ℓ A ⊤ θ , θ r = 0 , (1d) A P ℓ = P g − ( P d − P s ) , (1e) − S max e ξ e z e ≤ P ℓ,e ≤ S max e ξ e z e , ∀ e ∈ E , (1f) z e ∈ { 0 , 1 } , z e = 0 if ξ e = 0 , ∀ e ∈ E , (1g) X e ∈E (1 − z e ) ξ e ≤ K ℓ . (1h) Constraint (1g) enforces that PSPS-forced outages remain open; (1h) limits operator-induced opens to K ℓ av ailable lines. This is structurally related to optimal transmission switch- ing [9], [10] but restricted to open-only actions. c) Corridor-constrained open-only switching.: Switch- ing decisions may be constrained to transmission corridors S —geographically grouped lines [15], [16]. Binary variables y S ∈ { 0 , 1 } indicate whether corridor S is acti v ated for switching. Corridor and line decisions are coupled by: z e ≥ 1 − y S , ∀ S ∈ S , ∀ e ∈ S, (2a) X S ∈S y S ≤ K S , y S ∈ { 0 , 1 } , ∀ S ∈ S , (2b) When y S = 0 , (2a) prevents operator opens in corridor S ; when y S = 1 , line decisions remain free. Constraint (2b) limits activ ated corridors to K S . d) Role in the pipeline.: W e use the DC-OPF MILP abov e as an ofﬂine oracle to generate labeled switching plans for supervised adaptation. V oltage quality and A C feasibility are ev aluated separately using A C po wer ﬂow in the exper - imental section, enabling us to train on DC structure while assessing voltage-critical performance. I I I . S U P E RV I S E D F I N E - T U N I N G ( S F T ) L L M S Solving the mix ed-integer linear program (MILP) in (1) for every PSPS scenario can be computationally expensiv e, particularly when ev aluating large numbers of contingencies or when operators require rapid what-if analysis. Similar scala- bility challenges are well documented for optimal transmission switching (O TS) formulations, moti vating learning-assisted and proxy-based approaches that approximate an optimizer while preserving most of the economic beneﬁt [17], [18]. W e therefore train a large language model (LLM) to imitate an optimization oracle, amortizing the cost of of ﬂine MILP solves into a single supervised ﬁne-tuning (SFT) phase. After SFT , the model generates candidate switching plans from compact scenario summaries, which are then parsed and veriﬁed before ev aluation or deployment. This section describes the oracle, the input–output representation, and the SFT protocol. a) Ground-T ruth Oracle.: Given PSPS mask ξ and bud- get K ℓ , we solve the DC-OPF MILP (1) to obtain optimal operator-induced opens T ( ξ ) ≜  e ∈ E : ξ e = 1 , z ⋆ e = 0  . (3) For corridor-constrained problems, we solve the variant with (2). b) Scenario Representation.: Each training example cor- responds to a PSPS scenario with mask ξ and budgets K ℓ , K S . W e provide a structured textual summary including case dimensions, the number of PSPS-forced opens, and a corridor breakdown listing member lines with their forced/eligible status and applicable budgets. This abstracts away numerical parameters while preserving topological structure. c) Action Grammar .: The target encodes T ( ξ ) us- ing: open(Sk:LINE) for corridor-associated lines (e.g., open(S6:135) ), open(LINE) for non-corridor lines, and do_nothing if T ( ξ ) = ∅ . W e sort actions lexicographically to form a canonical string, e.g., open(S3:21); open(S7:98); open(131) . d) Dataset F ormatting.: Supervised pairs use a chat format: system prompt deﬁning the task and grammar , user message with scenario JSON, and assistant message with the canonical action string, follo wing standard instruction-tuning practice [12]. e) Fine-T uning Objective and Pr otocol.: W e initialize from an instruction-tuned base LLM and perform supervised ﬁne-tuning on the training partition to learn a policy π SFT that maps scenario summaries to action strings. The objectiv e is the standard conditional language-modeling loss over the assistant tok ens. Let s k denote the scenario summary and a k the canonical action string for example k . W e optimize parameters ϕ by minimizing L SFT ( ϕ ) = − X k log p ϕ  a k   s k , system prompt  , (4) where the sum ranges ov er training examples. This mirrors the supervised stage used in instruction-following pipelines prior to preference optimization [12]. The resulting model π SFT is then used for inference and (optionally) subsequent preference-based reﬁnement. f) P arsing and V eriﬁcation.: At inference, we parse out- puts and verify: (i) grammar validity , (ii) all opened lines are PSPS-av ailable ( ξ e = 1 ), and (iii) budget adherence. Inv alid outputs are ﬂagged, ensuring the LLM functions as a candidate generator rather than an un veriﬁed controller . A. Impro ving V oltage Safety via DPO (SFT -T ime) The supervised policy π SFT imitates a DC open-only oracle, but DC imitation alone does not explicitly optimize voltage quality under nonlinear AC physics. T o better align the policy with voltage safety objectiv es, we introduce a reﬁnement stage based on direct pr eference optimization (DPO) [13]. DPO is an reinforcement learning-free objective that trains a policy from pairwise preferences ( y + , y − ) , encouraging the reﬁned policy to assign higher probability to actions that yield better voltage outcomes under A C ev aluation. a) V oltage-P enalty Metric.: For candidate plan y , we parse into topology z ( y ) and solve A C power ﬂow to obtain bus voltage magnitudes {| V i ( y ) |} . W e deﬁne a deadband v db around nominal voltage and penalize violations outside [1 − v db , 1 + v db ] : V pen ( y ) ≜ κ X i ∈B  max    | V i ( y ) | − 1   − v db , 0   p , (5) where p ∈ { 1 , 2 } selects aggregation norm and κ > 0 scales the penalty . Non-con vergent AC solutions receiv e penalty V fail (large constant). b) Prefer ence P air Construction.: For each scenario x , we sample N pref candidates from π SFT ( · | x ) , discard mal- formed or budget-violating plans, ev aluate V pen ( y ) via AC power ﬂow , and select ( y + , y − ) pairs where V pen ( y − ) − V pen ( y + ) ≥ ∆ pref , (6) yielding preference dataset D volt = { ( x i , y + i , y − i ) } M i =1 where y + is preferred for voltage quality . c) DPO Objective.: Let π ϕ denote the trainable policy initialized from π SFT , with reference π ref = π SFT . For triple ( x, y + , y − ) , deﬁne ∆ ϕ ( x, y + , y − ) ≜ log π ϕ ( y + | x ) − log π ϕ ( y − | x ) , (7) and analogously ∆ ref . The DPO loss is L DPO ( ϕ ) = − X ( x,y + ,y − ) ∈D volt log σ  β DPO  ∆ ϕ − ∆ ref   , (8) where β DPO > 0 controls preference strength [13]. Minimiz- ing this produces π DPO , which increases likelihood of low- penalty plans while anchored to the SFT reference. d) Scope and Practicality .: Preference construction uses A C power ﬂow ofﬂine to label candidates with voltage-quality information. At inference time, the reﬁned model can be used either in single-shot mode or combined with best-of- N reranking (Section III-B), where V pen can serve as a selection criterion when A C ev aluation is available. B. Impro ving V oltage Safety via Best-of- N (Infer ence-T ime) Even after supervised ﬁne-tuning and preference align- ment, a single stochastic decode can produce a suboptimal or malformed plan. W e therefore use best-of- N inference as a lightweight, training-free mechanism to impro ve solution quality by sampling multiple candidate plans and selecting the best under a task metric. Best-of- N is widely used in reasoning and structured prediction tasks to exploit sampling div ersity , often with strong gains at moderate N [19]. Giv en a scenario summary x and policy π (e.g., π SFT or π DPO ), we draw N independent candidates: y ( j ) ∼ π ( · | x ) , j = 1 , . . . , N . (9) Each candidate is veriﬁed through staged checks: grammar parsing, PSPS/budget constraint validation, DC feasibility ev aluation, and optional A C power ﬂow for voltage scoring. Let Y v alid ( x ) denote candidates passing veriﬁcation. If empty , we fall back to a safe default (e.g., doing nothing). W e then select the plan minimizing a scalar score: ˆ y ( x ) ∈ arg min y ∈Y v alid ( x ) Score( x, y ) . (10) Primary choices are: (i) DC economic score J DC ( x, y ) , or (ii) A C voltage penalty V pen ( y ) from (5). When both matter, we use scalarization Score( x, y ) = J DC ( x, y ) + λ V pen ( y ) with λ ≥ 0 set by operational priorities. Best-of- N increases compute linearly in N . W e use mod- erate N with early stopping when target scores are met. Sampling is embarrassingly parallel, and N can be adjusted to trade of f compute for quality . W e quantify costs and inference times in Section IV. I V . E X P E R I ME N TA L R E S U LT S Experiments use the IEEE 118-bus test system from MA T - PO WER [20] ( n b = 118 , n ℓ = 186 , n g = 54 ). W e deﬁne |S | = 9 transmission corridors by grouping geographically proximate lines (8–20 lines per corridor). PSPS ev ents are simulated by selecting corridor lines to force open, producing av ailability mask ξ for each scenario. Unless stated otherwise, operators may open at most K ℓ = 3 additional PSPS-av ailable lines. a) Datasets.: For SFT , we use 200 PSPS scenarios split between training and held-out testing. For DPO, we construct 440 preference pairs ( x, y + , y − ) by sampling candidates from π SFT and ranking by A C voltage penalty V pen (Section III-A). Data-generation scripts and splits are in the released codebase. b) Implementation and Models.: Power ﬂow and optimization use MA TLAB R2024b with Y ALMIP [21]. W e adapt an instruction-tuned LLM, ft:gpt-4.1-mini-2025-04-14 , via the OpenAI ﬁne-tuning API. SFT trains for 3 epochs (batch size 1, LR multiplier 2) on 453,384 training tokens. DPO initializes from π SFT and trains for 2 epochs (batch size 8) with β DPO = 0 . 1 on 1,611,736 tokens. For the e v aluation of the voltage penalties, we use v db = 0 , p = 1 , κ = 1 for (5). Our pipeline is backend-agnostic and can be applied to any instruction-tuned LLM deployed either via API or locally hosted models. c) Compared P olicies.: W e compare: (i) zero-shot base LLM, (ii) π SFT , (iii) π DPO , and (i v) a neural network baseline: a fully-connected MLP with one hidden layer of width 512 (ReLU), trained on the same supervised dataset to predict cor- rectiv e opens under identical budget and feasibility constraints. d) Code Release.: Full code is av ailable on GitHub: MFHChehade/LLM-Grid-Actions. A. T raining Curves for the F ine-T uning Process Figures 2(a–b) show that the SFT process is well-behaved: log loss drops sharply early then decreases gradually , while token accuracy rapidly increases and stabilizes. This pattern reﬂects the model learning the output grammar then reﬁning scenario-to-action mappings. Figure 2(c–d) shows stable DPO optimization with decreasing loss and preference error rate, indicating successful separation of preferred vs. dispreferred plans. No diver gence occurs under the chosen β DPO and dataset size. 0 200 400 Step 0 2 Log loss (a) SFT log loss. 0 200 400 Step 0 1 Accuracy (b) SFT token accuracy . 0 50 100 Step 0.65 0.70 0.75 Loss (c) DPO loss. 0 25 50 75 100 Step 0 1 Error rate (d) DPO error rate. Fig. 2: Training curves from the ﬁne-tuning jobs (SFT and DPO) show the con vergence. zero shot sft dpo nn 2000 2500 J dc Fig. 3: Distribution of DC objective J DC across all four compared policies (zero-shot, SFT , DPO, and NN). B. DC Objective Figure 3 reports the box plots for the J DC distributions. Both SFT and DPO shift downw ard relative to zero-shot, distilling the oracle signal into lower -cost decisions. SFT and DPO me- dians are close, showing voltage-aw are preference reﬁnement preserves DC performance. The NN baseline achieves low median J DC but exhibits wider dispersion and heavier upper tail, indicating occasional high-cost decisions. C. AC F easibility and V oltage Quality Figure 4 shows zero-shot fails A C power ﬂow in half of scenarios, while SFT and DPO reduce f ailures to a small fraction. This indicates DC oracle imitation plus constrained grammar substantially improves physical plausibility . The NN baseline achiev es the lowest failure rate. W e e valuate V pen on the common-success set (Fig- ure 5)—scenarios where SFT , DPO, and NN all conv erge. DPO achieves lower median V pen than SFT and NN, consis- tent with preference reﬁnement improving voltage outcomes beyond DC imitation. Howe ver , the upper tail persists across policies, suggesting some scenarios remain challenging for voltage regulation and motiv ating richer preference data as future work. zero shot sft dpo nn 0.00 0.25 0.50 0.75 1.00 AC failure rate 50.0% 6.7% 6.7% 3.3% Fig. 4: A C power -ﬂow failure rate across compared policies. Fine-tuned policies drastically reduce AC failures relative to the zero-shot baseline. sft dpo nn 0.5 0.6 0.7 0.8 V pen Fig. 5: V oltage penalty V pen distribution on the common- success set , i.e., scenarios where all compared policies (SFT , DPO, NN) hav e achieved A C con ver gence. This ensures an apples-to-apples comparison of voltage quality . V . C O N C L U SI O N W e developed a veriﬁable ﬁne-tuning pipeline that adapts a foundation LLM into a correctiv e switching assistant for PSPS ev ents. Supervised ﬁne-tuning distills MILP-derived DC-optimal open-only actions into a constrained, parseable grammar that enables systematic feasibility checks. Direct preference optimization then injects AC voltage-a wareness by learning from preference pairs ranked using a voltage-penalty metric, improving voltage quality among A C-feasible cases. Best-of- N selection complements training by tr ading inference compute for improv ed candidate quality . Empirically on IEEE 118-bus PSPS scenarios, the ﬁne-tuned policies outperform zero-shot generation in economic objecti ve, dramatically re- duce A C infeasibility , and yield improv ed voltage-penalty distributions on the common-success set, while remaining compatible with standard power -ﬂow veriﬁcation. Future work will in vestigate multi-task ﬁne-tuning of a single foundation model so it can support diverse po wer-grid applications under a uniﬁed, veriﬁable decision framework. R E F E R E N C E S [1] A. Mastropaolo, L. Pascarella, E. Guglielmi, M. Ciniselli, S. Scalabrino, R. Oliv eto, and G. Bavota, “On the robustness of code generation techniques: An empirical study on GitHub Copilot, ” in IEEE/ACM International Conference on Softwar e Engineering (ICSE) , 2023, also av ailable as [2] Y . Zhang, S. A. Khan, A. Mahmud, H. Y ang, A. Lavin, M. Le vin, J. Fre y , J. Dunnmon, J. Evans, A. Bundy , S. D ˇ zeroski, J. T egn ´ er , and H. Zenil, “Exploring the role of large language models in the scientiﬁc method: From hypothesis to discovery , ” npj Artiﬁcial Intelligence , vol. 1, p. 14, 2025. [3] J. Lai, X. Li, Z. W ang, Y . Zhu, Y . Li, and S. W ang, “Large language models in law: A survey , ” AI Open , vol. 6, pp. 181–196, 2024. [4] C.-J. Lin, M. M. Robertson, and J. D. Lee, “ Analyzing the stafﬁng and workload in the main control room, ” Safety Science , vol. 57, pp. 161– 168, 2013. [5] S. Kalami et al. , “Po wering the grid with language: How LLMs are transforming energy systems, ” Medium (online article), 2025, accessed 2025. [6] M. Sanguinetti, L. Atzori, R. Girau, and M. Marras, “Con versational agents for energy awareness and efﬁciency: A survey , ” Electr onics , vol. 13, no. 2, p. 401, 2024. [7] J. Kaplan, S. McCandlish, T . Henighan, T . B. Brown, B. Chess, R. Child, S. Gray , A. Radford, J. W u, and D. Amodei, “Scaling laws for neural language models, ” arXiv pr eprint , 2020. [8] Paciﬁc Gas and Electric Company, “Public safety power shutoffs (PSPS) fact sheet, ” W ebsite, 2024. [Online]. A vailable: https://www .pge.com/ [9] E. B. Fisher , R. P . O’Neill, and M. C. Ferris, “Optimal transmission switching, ” IEEE T ransactions on P ower Systems , vol. 23, no. 3, pp. 1346–1355, 2008. [10] K. W . Hedman, R. P . O’Neill, E. B. Fisher , and S. S. Oren, “Optimal transmission switching with contingency analysis, ” IEEE T ransactions on P ower Systems , vol. 24, no. 3, pp. 1577–1586, 2009. [11] E. Haag, N. Rhodes, and L. Roald, “Long solution times or low solution quality: On trade-offs in choosing a power ﬂow formulation for the optimal po wer shutof f problem, ” Electric P ower Systems Resear ch , vol. 234, p. 110713, 2024, also av ailable as [12] L. Ouyang, J. W u, X. Jiang, D. Almeida, C. L. W ainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al. , “Training language models to follow instructions with human feedback, ” in Advances in Neural Information Processing Systems , 2022, also known as the InstructGPT paper . [13] R. Rafailov , A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Y our language model is secretly a rew ard model, ” in International Confer ence on Learning Repr esentations (ICLR) , 2024. [14] N. Rhodes, L. Ntaimo, and L. Roald, “Balancing wildﬁre risk and power outages through optimized power shut-offs, ” IEEE T ransactions on P ower Systems , vol. 36, no. 4, pp. 3118–3128, 2021. [15] A. K ody , R. Piansky , and D. K. Molzahn, “Optimizing transmission infrastructure in vestments to support line de-energization for mitigating wildﬁre ignition risk, ” in Pr oceedings of the 11th Bulk P ower Systems Dynamics and Control Symposium (IREP) , 2022, also available as [16] A. Moreira, F . Pianco, B. Fanzeres, A. Street, R. Jiang, C. Zhao, and M. Heleno, “Distribution system operation amidst wildﬁre-prone climate conditions under decision-dependent line av ailability uncer- tainty , ” IEEE Tr ansactions on P ower Systems , 2024, also available as [17] S. Pineda, J. M. Morales, and A. Jim ´ enez-Cordero, “Learning-assisted optimization for transmission switching, ” TOP , vol. 32, pp. 489–516, 2024. [18] A.-A. B. Bugaje, J. L. Cremer , and G. Strbac, “Real-time transmission switching with neural networks, ” IET Generation, T ransmission & Distribution , vol. 17, no. 3, pp. 696–705, 2023. [19] X. W ang, J. W ei, D. Schuurmans, Q. Le, E. H. Chi, S. Narang, A. Chowdhery , and D. Zhou, “Self-consistency improves chain-of- thought reasoning in language models, ” in International Confer- ence on Learning Repr esentations (ICLR) , 2023, also available as [20] R. D. Zimmerman, C. E. Murillo-S ´ anchez, and R. J. Thomas, “MA TPOWER: Steady-state operations, planning, and analysis tools for power systems research and education, ” IEEE Tr ansactions on P ower Systems , vol. 26, no. 1, pp. 12–19, 2011. [Online]. A vailable: https://matpower .org/ [21] J. L ¨ ofberg, “Y ALMIP: A toolbox for modeling and optimization in MA TLAB, ” in Pr oceedings of the IEEE International Symposium on Computer Aided Contr ol System Design (CACSD) , T aipei, T aiwan, 2004, pp. 284–289.

Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment