LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design
This paper presents an LLM-driven, end-to-end workflow that addresses the lack of automation and intelligence in power system transient stability assessment (TSA). The proposed agentic framework integrates large language models (LLMs) with a professional simulator (ANDES) to automatically generate and filter disturbance scenarios from natural language, and employs an LLM-driven Neural Network Design (LLM-NND) pipeline to autonomously design and optimize TSA models through performance-guided, closed-loop feedback. On the IEEE 39-bus system, the LLM-NND models achieve 93.71% test accuracy on four-class TSA with only 4.78M parameters, while maintaining real-time inference latency (less than 0.95 ms per sample). Compared with a manually designed DenseNet (25.9M parameters, 80.05% accuracy), the proposed approach jointly improves accuracy and efficiency. Ablation studies confirm that the synergy among domain-grounded retrieval, reasoning augmentation, and feedback mechanisms is essential for robust automation. The results demonstrate that LLM agents can reliably accelerate TSA research from scenario generation and data acquisition to model design and interpretation, offering a scalable paradigm that is readily extensible to other power system tasks such as optimal power flow, fault analysis, and market operations.
💡 Research Summary
The paper introduces an end‑to‑end, agentic workflow that leverages large language models (LLMs) to fully automate the transient stability assessment (TSA) pipeline for power systems. Traditional TSA relies heavily on manual scenario creation, parameter tuning, and handcrafted neural network architectures, which become bottlenecks as grids grow larger and renewable penetration increases. To address these challenges, the authors embed LLMs into two core modules: (1) an LLM‑Driven Simulation Controller and (2) an LLM‑Driven Neural Network Designer (LLM‑NND).
The Simulation Controller translates natural‑language instructions into executable ANDES simulator scripts. It combines prompt engineering, Retrieval‑Augmented Generation (RAG), and Chain‑of‑Thought (CoT) prompting. RAG retrieves domain‑specific documents, templates, and parameter limits, ensuring the LLM’s output respects physical constraints. CoT prompts enforce step‑by‑step reasoning, allowing the model to handle complex fault descriptions without hallucination. A self‑correction loop monitors simulation errors, feeds the error messages back into the prompt, and automatically regenerates corrected scripts. This eliminates the need for human experts to hand‑craft fault scenarios or debug code.
The second module, LLM‑NND, treats the LLM as a neural architecture search (NAS) agent. It defines a design space (layer depth, channel width, kernel size, activation, normalization, etc.) and iteratively proposes candidate architectures. Each candidate is trained on the data generated by the Simulation Controller and evaluated on multiple objectives: classification accuracy, parameter count, and inference latency (target < 0.95 ms per sample). Performance feedback is fed back to the LLM, which refines its design proposals in a reinforcement‑learning‑like loop. The process converges on compact yet powerful models without human intervention.
Experimental validation is performed on the IEEE 39‑bus system (four‑class TSA) and the IEEE 118‑bus system. On the 39‑bus case, the LLM‑NND discovers a 4.78 M‑parameter network achieving 93.71 % test accuracy and sub‑millisecond inference, outperforming a manually designed DenseNet with 25.9 M parameters that reaches only 80.05 % accuracy and higher latency. On the larger 118‑bus case, the automatically designed model maintains ≈91 % accuracy with ≈1.2 ms inference, demonstrating scalability. Ablation studies show that removing any of the three pillars—domain‑grounded RAG, CoT reasoning, or the feedback mechanism—significantly degrades performance: RAG removal raises script failure rates by 27 %; without CoT, accuracy drops below 8 %; without feedback, the resulting networks either over‑parameterize or under‑fit, violating real‑time constraints.
Beyond TSA, the authors argue that the same paradigm can be extended to optimal power flow, fault analysis, and market operation tasks, where natural‑language scenario specification, automated data generation, and LLM‑guided model design would similarly reduce expert workload and accelerate research cycles. Future work is suggested on multimodal LLMs that ingest grid topology graphs and time‑series data, formal verification of LLM‑generated code for safety‑critical applications, and scaling the framework to continental‑scale grids.
In summary, the paper demonstrates that LLMs can act as intelligent agents that not only understand and generate domain‑specific simulation code but also autonomously explore neural architecture spaces, yielding models that are both more accurate and computationally efficient than traditional hand‑crafted designs. This work marks a significant step toward AI‑driven scientific discovery in critical infrastructure domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment