Agentic AI for Scalable and Robust Optical Systems Control
We present AgentOptics, an agentic AI framework for high-fidelity, autonomous optical system control built on the Model Context Protocol (MCP). AgentOptics interprets natural language tasks and executes protocol-compliant actions on heterogeneous opt…
Authors: Zehao Wang, Mingzhe Han, Wei Cheng
1 Agentic AI for Scalable and Rob ust Optical Systems Control Zehao W ang ∗ , Mingzhe Han ∗ , W ei Cheng, Y ue-Kai Huang, Philip Ji, Denton W u, Mahdi Safari, Flemming Holtorf, K enaish AlQubaisi, Norbert M. Linke, Danyang Zhuo, Y iran Chen, Ting W ang, Dirk Englund, and T ingjun Chen Abstract —W e present AgentOptics, an agentic AI framework for high-fidelity , autonomous optical system control built upon the model context pr otocol (MCP). AgentOptics interprets nat- ural language tasks and executes pr otocol-compliant actions on heterogeneous optical devices through a structur e tool abstraction layer . W e implement 64 standardized MCP tools spanning eight repr esentative optical devices and construct a compr ehensive 410- task benchmark to evaluate the perf ormance of AgentOptics across request understanding, role-dependent responses, multi- step coordination, robustness to linguistic variation, and error - handling capability . W e evaluate two deployment configurations– integrating either commercial online large language models (LLMs) or locally hosted open-source LLMs–and compar e against LLM-based code generation baselines. Experimental results demonstrate that AgentOptics achieves 87.7%–99.0% av erage task success rates, significantly outperf orming code gen- eration approaches with up to only 50% success rate. W e further validate the broader applicability of AgentOptics through five repr esentative case studies that extend bey ond accurate device control to enable system-level orchestration and monitoring, as well as closed-loop optimization. These case studies include dense wav elength division multiplexing (DWDM) link provisioning and coordinated performance monitoring of coherent 400 GbE and analog radio-over -fiber (ARoF) channels, autonomous character - ization and bias optimization of a wideband ARoF link carrying 5G fronthaul traffic, multi-span channel provisioning and signal launch power optimization, closed-loop fiber link polarization stabilization, and distributed acoustic sensing (DAS)-based fiber This w ork has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. This work was supported in part by NSF under grants EEC-1941583, CNS-2112562, OIA-2134891, CNS-2211944, PHY -2325080, CNS-2330333, CNS-2443137, CNS-2450567, and O A C-2503010, and ARO under grant W911NF2510241. W e also acknowledge funding from Duk e University under the Beyond-the-Horizon and DST -Launch initiatives. *These authors contributed equally to this work. Z. W ang, M. Han, W . Cheng, Y . Chen, and T . Chen are with the De- partment of Electrical and Computer Engineering, Duke Univ ersity , Durham, NC 27708, USA (email: { zehao.w , mingzhe.han, wei.cheng, yiran.chen, tingjun.chen } @duke.edu). D. W u is with the Duke Quantum Center and Department of Physics, Duke Univ ersity , Durham, NC, USA 27708 (email: denton.wu@duke.edu). D. Zhuo is with the Department of Computer Science, Duke Univ ersity , Durham, NC 27708, USA (email: danyang@duk e.edu). Y .-K. Huang, P . Ji, and T . W ang are with NEC Laboratories America, Princeton, NJ 08540, USA (email: { kai, pji, ting } @nec-labs.com). M. Safari and F . Holtorf are with Axiomatic AI, Cambridge, MA 02139, USA (email: { mahdi, flemming } @axiomatic-ai.com). N. Link e is with the Joint Quantum Institute, Department of Physics, and the National Quantum Laboratory (QLab), University of Maryland, College Park, MD 20742, USA; the Duk e Quantum Center and Department of Physics, Duke University , Durham, NC 27708, USA; (email: linke@umd.edu). D. Englund and K. AlQubaisi are with the Research Laboratory of Elec- tronics, Massachusetts Institute of T echnology , Cambridge, MA 02139, USA (email: { englund, alalif } @mit.edu). monitoring with LLM-assisted e vent interpretation and detection. These results demonstrate that AgentOptics provides a scalable and rob ust paradigm for autonomous control and orchestration of heterogeneous optical devices and systems. Index T erms —Agentic AI, optical networks, networked system control, closed-loop automation, large language models I . I N T RO D U C T I O N Optical networks form the backbone of modern Internet infrastructure, interconnecting data centers, metro and long- haul transport systems, wireless front/backhaul, and emerging quantum networks [1]–[8]. As optical systems scale in both heterogeneity and performance–incorporating reconfigurable optical add-drop multiplexer (R O ADMs), coherent pluggable transceiv ers, radio-ov er-fiber (RoF) links, as well as fiber sensing and quantum photonic hardw are–the operational com- plexity of de vice configuration, monitoring, and optimiza- tion increase significantly . Achieving high fidelity , reliability , and efficiency in these systems requires coordinated con- trol across heterogeneous devices, accommodation of vendor- specific management interfaces, and closed-loop telemetry and adaptation mechanisms. Software-defined netw orking (SDN) has been widely adopted in optical networks to decouple the control plane from the data transmission plane, enabling centralized con- trol, programmability , and automated provisioning of optical resources [5], [9]–[11]. By abstracting network control, SDN improv es operational efficienc y and flexibility in managing complex optical infrastructures. Initiatives such as Open- R O ADM [12] also aim to address this challenge by defin- ing common data models and standardized control interfaces across multi-vendor optical transport equipment, improving system interoperability and manageability . Howe ver , existing SDN solutions face limitations in multi- vendor en vironments due to inconsistent support for standard- ized interfaces and vendor-specific extensions. In addition, limited abstraction of physical-layer behaviors and device capabilities increases system comple xity and poses challenges for operators seeking scalable and intuitiv e netw ork control. In current practice, SDN lo w-lev el control layers often rely on human-written scripts deri ved from vendor manuals or implemented through software development kits (SDK) and command-line interface (CLI) tools, as illustrated in Fig. 1(a). Dev elopers and users must translate high-lev el experimental goals (e.g., “ get optical power spectrum ”) into explicit se- quences of function calls or protocol commands (e.g., via 2 … OSA 400G CFP2 - DCO ROADM OSA 400G CFP2 - DCO ROADM … Domain Knowledge Device Control Natural Language Code Lib Func 1 Func 2 … User cfp2_set_frequency() Customized CLI (SSH over Ethernet) 400G CFP2 - DCO Tun abl e T RX Code Lib Func 1 Func 2 … User get_osa_spectrum_measurement() Customized API ( PyApex via Ethernet) OSA Semantic understanding + code gener ation Request LLM - generated Co de Device Control Agen tOptic s Individual MCP Servers To ol Selection API Invocation Returned Va lu es Request API API API API get_edfa_info() Code Lib Func 1 Func 2 … User netconf (SSH via Ethernet) ROADM with EDFAs an d WSS’s (a) (b) (c) … Device Manual Code Lib Online Host (IDE, App) MCP Client Local OR “What’ s the EDFA status ?” “What’ s the EDFA status ?” Server 4 To o l 1 To o l 2 … Server 3 To o l 1 To o l 2 … Server 2 To o l 1 To o l 2 … Server 1 To ol 1 To ol 2 … CodeGen Use Cases Fig. 1: Traditional optical device control using RO ADM, 400 GbE CFP2-DCO, and OSA as examples: (a) Traditional control requires de vice- specific manuals, custom scripts, and protocol handling. (b) LLM-based control interprets natural-language prompts to generate control code, reducing manual scripting. (c) The proposed AgentOptics framework standardizes control via a unified tool layer, where the MCP client maps prompts to device APIs through tool selection, enabling LLM-based reasoning over tool outputs for more autonomous control. NETCONF/SSH or v endor-provided Python APIs), manually specify ex ecution order, establish de vice connections, validate parameter constraints, and parse raw device responses. While this workflow of fers precise control, it requires substantial engineering work and de vice-specific e xpertise, resulting in significant onboarding ef fort and limited portability across heterogeneous equipment. Recent adv ances in lar ge language models (LLMs) hav e enabled intuiti ve, natural language-based interaction with com- plex systems, transforming ho w users control, configure, and automate technical workflows [13]. The primary advantage of LLM-based agents is not merely improv ed natural language understanding, b ut the ability to translate high-lev el design objectiv es into structured, multi-step execution workflows, augmented with reasoning and decision-making capabilities. Howe ver , enabling reliable interaction between LLMs and heterogeneous technical systems requires more than improved reasoning capability: it demands a standardized and structured interface for tool discovery , inv ocation, and response handling. The model context protocol (MCP) [14] addresses this need by defining a formal client-server architecture that connects LLM hosts to external tools and services through well-defined schemas and execution semantics. By decoupling reasoning from execution, MCP provides a unifying protocol for intel- ligent, model-driv en interaction with div erse and distributed technical en vironments. LLM-based agents ha ve already been widely adopted in domains such as software engineering [15] (e.g., repository management, issue tracking), personal productivity [16] (e.g., calendar scheduling, daily task automation), and data ana- lytics [17] (e.g., structured knowledge base querying). More recently , tool-augmented or MCP-enabled LLM systems have been e xpanded and applied to a broader range of applications, including communication and edge computing systems [18], Internet-of-Things networks [19], electronic design automa- tion [20], and photonic integrated circuits design [21]. These dev elopments demonstrate the growing role of LLM agents as orchestration layers that bridge natural language intent and structured system interfaces. W ithin networking and optical systems, LLM-based agents hav e been explored for monitoring, diagnosis, control, and performance optimization [22]–[25]. In most approaches in- volving device control, the agent translates user requests– often combined with device manuals, schema definitions, or reference code–into e xecutable control scripts or structured SDN API calls. This workflo w , illustrated in Fig. 1(b), relies on the LLM to synthesize control logic that directly interfaces with device APIs or controller northbound interfaces. While such code-generation or API-synthesis approaches reduce manual scripting effort, they remain tightly coupled to textual reasoning and prompt conditioning. At the lev el of individual hardware components, significant challenges persist, particu- larly in ensuring high-fidelity tool in vocation, strict parameter validation, and robust handling of dynamic or ambiguous user inputs across heterogeneous de vices. These limitations motiv ate the need for a more structured abstraction layer between LLM reasoning and physical device execution. In this paper, we present AgentOptics for autonomous, scalable, and high-fidelity optical device control b uilt upon MCP . As shown in Fig. 1(c), AgentOptics introduces a stan- dardized and structured abstraction layer that leverages LLM- based reasoning to translate user natural language inputs into protocol-compliant operations, in voking corresponding MCP tools across heterogeneous optical devices through validated API calls. W e implement 64 standardized MCP tools spanning eight representati ve optical de vices (see T able I), encapsulat- ing common device operations as deterministic, executable primitiv es. By exposing de vice operations through structured MCP-based tool schemas and le veraging LLMs, AgentOptics enables dynamic workflo w orchestration without requiring task-specific code generation. T o systematically validate and ev aluate the performance of AgentOptics, we construct a benchmark consisting of 410 tasks ex ecuted on real hardware and systems, covering single-, dual-, and triple-action inv ocations across multiple devices. The benchmark also includes fi ve representativ e task variants, including par aphrasing , non-sequitur , error , r oles , and chain , designed to emulate rea listic and diverse user inputs 3 to assess the robustness and fidelity of AgentOptics under dynamic interaction scenarios. W e implement AgentOptics using fi ve representati ve commercial online LLMs, including GPT , Claude, and DeepSeek, as well as three locally deployed open-source models of varying parameter sizes. For compari- son, we establish an LLM-based code generation (CodeGen) baseline with two variants: (i) online LLM-based optical device control via direct code generation, conditioned on either device manuals or reference control code, and (ii) a locally deployed LLM fine-tuned on optical device control code using low-rank adaptation (LoRA). Overall, AgentOptics achiev es an average success rate of 99.0% with online LLMs and 87.7% with locally deployed models across 410 benchmark tasks. In contrast, the CodeGen baseline attains average success rates of up to only 50.0%. AgentOptics enables practical natural language control of optical de vices, achieving 98.1%–99.8% success rates with online LLMs at a token cost ranging from $0.004 to $0.15 per task, while locally deployed models achie ve 87.1%–88.8% success rates at near-zero cost per task. W e further demonstrate the capabilities of AgentOptics through fi ve representativ e case studies: ( i ) In a R O ADM- based dense wavelength-di vision multiplexing (DWDM) net- work, AgentOptics provisions a coherent 400 GbE signal alongside an analog radio-over -fiber (ARoF) signal and per- forms coordinated multi-de vice monitoring of link perfor- mance metrics, including optical signal-to-noise ratio (OSNR) and error vector magnitude (EVM). ( ii ) For a wideband ARoF link carrying 5G fronthaul traffic, AgentOptics autonomously characterizes system performance and optimizes the ARoF transmitter bias v oltage to enhance wireless transmission qual- ity . ( iii ) In a two-span link with co-propagating comb channels, AgentOptics provisions additional 400 GbE channels and au- tonomously adjusts launch power to minimize the pre–forward error correction (pre-FEC) bit error rate (BER). ( iv ) For polarization-sensitiv e links, AgentOptics enables closed-loop polarization stabilization through orchestrated measurement and actuation, maintaining con vergence despite intentional fiber perturbations. ( v ) For distributed acoustic sensing (D AS)- based fiber sensing systems, AgentOptics automates sensing operations, data acquisition, and LLM reasoning to identify potential fiber cut ev ents. The AgentOptics implementation and benchmark are open-sourced at [26]. This paper is organized as follows. W e revie w the back- ground and related work in Section II, and describe the pro- posed system architecture and implementation in Section III. Sections IV and V present the benchmarking methodology and experimental results. W e demonstrate practical capabilities of AgentOptics through fi ve case studies in Section VI, and conclude in Section VII. I I . R E L A T E D W O R K A. Agentic AI F rameworks and Applications Unlike traditional AI systems that generate single-shot text responses, agentic AI e xtends LLMs to perform goal setting, multi-step planning, external tool interaction, and feedback- driv en decision-making. A defining feature of such agents is their ability to act on external systems, typically through structured tool function calling. There are multiple methods through which an LLM may in vok e a tool function: the tool name and definition may be (i) implicitly acquired during pre-training [27], which requires massiv e training data; (ii) provided as part of the input prompt and e xecuted via an external controller [28], where the context length linearly with tool count; (iii) accessed through a specific protocol such as MCP [14], which provides a standardized schema but adds protocol overhead; or (iv) implemented via program- aided language (P AL) models [29], where the model directly generates executable control code, which offers flexibility but lacks safety validation. These methods enable LLMs to interact with a wide range of real-world applications. F or example, HuggingGPT [30] is an early example that uses an LLM as a controller to route user requests to specialized expert models and aggregate their output into a comprehensive response. SWE-agent [15] demonstrates a repository-level automation agent for software engineering. IoT -MCP [19] bridges LLMs and heterogeneous devices for IoT system dev elopment. In scientific reasoning and verification, ax-Prover [31] shows the agent’ s capability in theorem proving for mathematics and quantum physics. Similarly , physics Supernov a [32] demonstrates near-gold medal performance on International Physics Olympiad prob- lems. Seed-Prov er [33] reaches undergraduate- to PhD-level mathematics capability . In addition, a multi-agent framew ork achiev es single-de vice design [21]. In the broader networking area, agentic pipelines have been explored for intent-based in- frastructure and service orchestration [34], as well as wireless and O-RAN management [35]. B. Agentic AI in Optical Network Monitor and Contr ol In optical networks, recent LLM-based systems began to couple language-guided decision making with operational in- terfaces, enabling workflo ws that interpret telemetry and logs, coordinate control-plane actions, and verify outcomes through feedback from the network and devices. Agentic optical network diagnosis and monitoring. Sev eral studies ha ve applied LLM agents to optical network diagnosis and monitoring tasks, primarily focusing on analytical reason- ing and decision support. In [36], a GPT -4-powered agent was proposed to support autonomous optical network management, such as quality of transmission (QoT) estimation, performance analysis, optimization, and calibration. AlarmGPT [22] in a LangChain-based tool-augmented workflow that automates op- tical transport networks for alarm interpretation, compression, prioritization, and diagnosis. [23] presented an instruction- tuned LLM for field-collected optical network log parsing, anomaly detection and classification, and report generation. Agentic optical network contr ol. Recent works hav e also explored LLM-based automation for optical network control [37]. These approaches typically employ external grammars to con vert natural language outputs into valid ex ecutable instructions, incorporate de vice API descriptions through prompt engineering, or fine-tune models to directly generate structured function calls. F or example, [38] proposed 4 an LLM-driven pipeline that leverages formal grammars to constrain the LLM output to valid JSON-formatted device control instructions for SDN configuration. Another approach to instruction generation is embedding device API specifica- tions directly within prompts, and it was demonstrated in [25] that amplifier gain optimization can be performed via SDN API in prompts. Similarly , AutoLight presented in [24] is a multi-agent framew ork for distributed AI training using optical communications API as LLM input reference. Finally , smaller LLMs fine-tuned on network-specific control instructions hav e been used to directly generate e xecutable commands [39]. Howe ver , existing LLM-based autonomous de vice and net- work control methods introduce three major limitations. First, these approaches assume the existence of a mature SDN infrastructure with an external instruction-formatted grammar . Changes to the SDN infrastructure typically require recon- struction or substantial modification of the grammar and LLM- related control mechanisms. Second, in large-scale networks with multi-vendor devices, the number of tools and associ- ated function descriptions grows significantly . This results in lengthy prompts containing extensi ve tool specifications for each function call or agent inv ocation, leading to increased token consumption and higher operational costs. Finally , the fine-tuning process itself presents additional challenges. Each SDN adaptation requires a dedicated dataset of user requests paired with appropriate SDN tool calls. Consequently , when- ev er ne w devices or vendors are added, further adaptation and re-training of the LLM are necessary . In addition, fine-tuning often leads to ov erfitting, as demonstrated in our manuscript by the baseline of local LLM-based code generation. The fine- tuned model performs well when user inputs closely match the distrib ution of the request–tool training dataset; howe ver , when users paraphrase their requests, the task ex ecution suc- cess rate declines significantly . This sensitivity to linguistic variation limits the robustness and practical applicability of such methods. T o address these limitations, AgentOptics adopts a protocol- centric design that fundamentally separates language reasoning from device e xecution. Rather than relying on handcrafted grammars or embedding detailed tool specifications directly into the prompts–which become unscalable in ev olving, multi- vendor infrastructures–AgentOptics introduces a structured protocol-lev el interface that standardizes tool in vocation inde- pendent of natural language phrasing. This decoupling elim- inates the need for continual grammar updates as devices or APIs change. Moreover , by abstracting execution into a protocol-gov erned layer instead of fine-tuning on request-tool pairs, AgentOptics better preserves the native reasoning capa- bilities of LLMs and enables reliable closed-loop automation across heterogeneous devices. I I I . A G E N T O P T I C S : D E S I G N A N D I M P L E M E N TA T I O N In this section, we present the design and implementation of the AgentOptics for optical de vice control built on MCP , comprising 64 MCP tools across eight representative optical devices and supporting both cloud-hosted commercial and locally deployed open-source LLMs. A. MCP-based Agentic System Design MCP is an open and standardized interoperability protocol designed to enable structured communication between LLM- based applications and external data sources, tools, and ser- vices. It defines a formal client-server architecture in which the MCP client typically resides on the user side, while MCP servers are deployed on the device side, allowing LLM hosts to systematically discov er, access, and utilize contextual resources exposed by devices. In AgentOptics, as illustrated in Fig. 1(c), a user initiates interaction by issuing a natural language task (e.g., querying the status of an EDF A), which is received by an MCP client embedded within the host application. The client forwards the task to the LLM, which interprets the user intent using domain kno wledge and selects the relev ant and appropriate MCP server(s) based on its published serv er description. Subsequently , the client retriev es the a vailable tool descriptions and function definitions from the selected MCP server(s) and supplies them to the LLM, which determines the most suitable tool by e valuating semantic sim- ilarity between the user task and tool metadata. The selected tool is then ex ecuted by the MCP server , in voking device- specific APIs, monitoring task completion, and returning the ex ecution results to the MCP client. Finally , the MCP client relays the results back to the LLM, which processes the output and generates a human-readable natural language response for the user . Note that the LLM interacting with the MCP client can be either an online commercial model (e.g., GPT -5 or Claude Sonnet 4.5) or a locally deployed open-source model (e.g., Qwen-14B) with fine-tuning if needed, depending on considerations such as performance, cost, latency , and pri vacy , which are ev aluated in Sections IV and V. This MCP-based design offers se veral adv antages for agen- tic AI-based optical device control compared with existing approaches [25], [39]. First, each MCP tool encapsulates a well-defined operational capability , allowing the MCP client to inv oke device functions without granting the LLM direct access to low-le vel device systems or requiring it to generate control code, thereby enhancing robustness and operational safety . Second, the decoupled implementation of MCP clients and servers enables remote device operation across network boundaries, providing increased flexibility in system deploy- ment and access control. For example, a user can run the MCP client on a separate host while keeping the MCP server close to the optical devices, so multi-de vice actions (e.g., configuring RO ADM channels and then measuring spectrum from OSA) are ex ecuted locally , and only the structured results are returned to the client. Finally , MCP provides a uniform communication interface that abstracts vendor- specific protocols, enabling administrators and users to operate systems composed of diverse optical hardware through the same tool interface, e ven when the underlying devices use different control protocols, without necessitating retraining or fine-tuning of the underlying language models. B. Implementation of MCP Servers and T ools W e implement eight MCP servers for eight representa- tiv e optical devices, as summarized in T able I: (i) Lumen- 5 Optical Device # of T ools Example T ools Lumentum RO ADM 10 Set/get EDF A gain; set/get WSS connections/attenuation, ... Lumentum 400 GbE CFP2-DCO 6 Set center frequenc y/output po wer/operation mode; get config, ... OptiLab L T -12-EM ARoF TX 6 Set bias v oltage/current; get status, ... APEX T echnologies OSA 26 Get power/spectrum; set/get measurement parameters, ... Calient S320 Optical Circuit Switch 4 Get port; add/delete connection; delete all connection DiCon MEMS 32 × 32 Optical Switch 2 Get connections; set connection Luna POD2000 Polarimeter 7 Set configuration; read polarization, read po wer, ... Luna PCD-M02 Polarization Controller 3 Reset DA C code; set D AC code; set v oltage T ABLE I: List of optical devices and validated MCP tools supported by AgentOptics. tum R O ADM including booster/preamp erbium-doped fiber amplifiers (EDF As) and 1 × 20 MUX/DEMUX WSSs; (ii) Lumentum 400 GbE CFP2-DCO tunable TRX; (iii) Optilab analog radio-over -fiber (ARoF) transmitter (TX) based on electro-absorption modulator (EAM); (iv) APEX T echnologies optical spectrum analyzer (OSA); (v) Calient S320 optical circuit switch (OCS); (vi) DiCon MEMS 32 × 32 optical switching system; (vii) Luna POD2000 polarimeter; and (viii) Luna PCD-M02 miniature multi-channel piezoelectric actuator driv er card with integrated PolaRITE III polarization con- troller . F or each de vice, we implement its respectiv e MCP server that exposes a set of tools supporting core operations, including device setup, parameter control, status monitoring, and connection reconfiguration. Note that one tool list abo ve can include many smaller tools, and we implement our tools based on our needs. The design architecture of AgentOptics is inherently e xtensible to a broad range of optical devices and instruments, enabling the construction of scalable systems composed of heterogeneous components. W e implement AgentOptics interacting with two types of LLM serving for optical device control: online commercial LLMs and locally deployed open-source LLMs: • MCP with online LLM (AgentOptics-Online) , where the MCP client sends tasks to online LLMs. W e con- sider various online models from dif ferent platforms 1 , including GPT -4o mini, GPT -5, Deepseek-V3, Claude Haiku 3.5, and Claude Sonnet 4.5. GPT -4o mini and DeepSeek-V3 hav e the lowest input costs ($0.15/$0.6 and $0.28/$1.1 per 1M input/output tokens, respec- tiv ely), fa voring high-frequency , long-conte xt usage, while Haiku 3.5 and Sonnet 4.5 are more expensi ve ($1/$5 and $3/$15 per 1M input/output tokens, respec- tiv ely) due to their lar ger model capacity . GPT -5 occupies an intermediate position with a cost of $1.25/$10 per 1M input/output tokens, balancing computational cost and reasoning performance. • MCP with local LLMs (AgentOptics-Local) , which are hosted on a local Dell PowerEdge R750 server with a 64-core Intel Xeon Gold 6548N CPU @2.6 GHz and a NVIDIA 40 GB A100 GPU. Qwen models with parameter sizes of 0.4 B, 8 B, and 12 B are selected. All models are deployed without quantization using the vLLM inference framew ork. Unlike online commercial models, locally deployed LLMs incur no per -token usage costs; instead, expenses are dominated by electricity 1 Model pricing is based on publicly av ailable rates as of February 2026. Physical Device Control Wo rk fl ow E xec ut io n CodeGen (baseline) Agen t Optics Ground Truth Human - crafted Scripts Prompt Engine ering Enforce Unified Output Format Define Basic T asks (30) Single - action Dual - action Tr ip le - action Eval uation Va li d at i on Success Rate Cost & Time Ta s k E xp a ns i o n ( 4 1 0) Par aph ras in g Non - sequitur Error Chain Role Fig. 2: Benchmark workflo w for ev aluating the performance of AgentOptics and the CodeGen baseline, where reference ground truth is established using human-crafted scripts that are manually validated on physical devices for correctness. consumption and one-time in vestments in GPU and server hardware. Consequently , the effecti ve token cost of local LLM inference is considered negligible and approximated as zero in our analysis. I V . E X P E R I M E N T A L E V A L UA T I O N W e compare two workflo ws for operating real optical de- vices from natural language task descriptions: (i) AgentOptics, and (ii) an LLM-based code generation approach (CodeGen), both within a carefully crafted e valuation benchmark. A. Evaluation Benchmark As shown in Fig. 2, the ev aluation benchmark begins basic operational tasks, categorized into single-, dual-, and triple- action tasks. Each task corresponds to one or more optical device action requests, such as reading device status or con- figuring operational parameters, expressed in human-readable language. These tasks are then translated into structured ac- tion sequences for execution and e valuation, where prompt engineering is also applied to enforce a unified output format, enabling consistent downstream processing. W e implement device control scripts based on manufacturer manuals and existing codebase to obtain the ground truth device status and operation results (see Fig. 1(a)), which provides the most accurate and reliable reference for v alidating optical de vice operations, and has been e xtensiv ely used and validated across a series of works [4], [40]–[45]. For example, the R OADM control scripts ha ve been used for collecting EDF A gain profile datasets [40], as well as modeling and optimizing multi-span 6 link quality of transmission [46]–[48]. Similarly , the ARoF and OSA control scripts hav e been used to study the coexistence of fiber sensing, 400 GbE, and 5G signals in field trials [49]. T o ev aluate the performance of AgentOptics, each bench- marking task is ex ecuted on real optical de vices and compared with the ground truth. All workflows run within an identical ex ecution environment, ensuring controlled and reproducible comparisons. W e consider task success rate as the primary metric, where the task is labeled as success when the hardware behavior results based on the task execution match the ex- pected de vice behavior results. More specifically , this includes: ( i ) correct tools and inv ocation order; ( ii ) correct arguments used for tool in vocation; and ( iii ) ex ecution results meet task intents. W e use DeepSeek-V3 to analyze e xecution result logs, enabling reasoning about the root causes of the failed tasks to identify execution errors. W e also consider the average cost per task by multiplying the token consumption of each task execution by the cor - responding per-token prices of different online LLMs. For locally deployed open-source LLMs, the per-token cost is effecti vely negligible, with expenses dominated by electricity consumption and one-time in vestments in server and GPU hardware. Therefore, the per-task cost for local models is approximated as zero in our ev aluation. Moreover , we record the execution time per task, including the time spent on task input, tool selection, de vice control, and retriev al and analysis of the returned value. For tasks in volving multiple actions, the ex ecution time is measured as the total time required to complete all actions. B. Agentic Optical Device Contr ol T asks T o systematically ev aluate the performance of AgentOptics, we de velop a benchmark consisting of a comprehensi ve set of carefully designed user tasks, as sho wn in Fig. 2. The bench- mark ev aluates agent performance across multiple dimensions, including the number of device-control APIs required to exe- cute a user command, robustness to div erse natural language inputs, error -handling capability , and support for multi-vendor device operation and coordination. This benchmark spans a representativ e set of optical hardware, including a Lumentum R O ADM, a 400 GbE coherent CFP2-DCO, an OptiLab ARoF transmitter , and an APEX OSA, which are commonly used together across a wider range of optical networking and ex- perimentation workflows. The benchmark includes user tasks that in volve one to three actions (e.g., inv oking vendor-specific APIs) spanning multiple devices. For example, a single-action task may read the signal spectrum from an OSA, while a dual- action task may combine two related or unrelated operations within a single prompt, such as reading signal spectrum from an OSA followed by setting the output power of an EDF A. W e further extend each task to multiple representativ e variants, designed to ev aluate the robustness of the agentic AI framew ork under conditions commonly encountered in day- to-day optical device and network control. Specifically , we consider fiv e types of task variants (T able II): • P araphrasing ev aluates semantic understanding by test- ing whether the agent can recognize differently worded instructions with identical intent (e.g., “ OSA measur e ” ⇔ “ OSA data r ecording ”); • Non-Sequitur assesses robustness to irrelev ant or inco- herent instructions that should not result in valid actions (e.g., “ OSA measur e, watch TV ”); • Error tests the agent’ s ability to detect and reject in valid or unsafe commands (e.g., “ Set OSA wavelength to 0 ”); • Chain measures multi-step reasoning and state consis- tency across sequentially dependent commands (e.g., “ Set EDF A gain, then read EDF A gain ”); • Role ev aluates contextual and role-based instruction fol- lowing, where task ex ecution depends on an assigned role (e.g., “ Act as the service pr ovider , OSA measur e ”). The benchmark begins with 30 basic tasks–10 single-action, 10 dual-action, and 10 triple-action, which are systematically expanded into a total number of 410 tasks spanning multiple difficulty lev els. Specifically , for each basic single-action task, we generate fiv e paraphrasing v ariants and fiv e non- sequitur variants, as well as three error variants and three role-conditioned variants, yielding 10 × (5 + 5 + 3 + 3) = 160 expanded tasks. The error and role-conditioned variants are restricted to single-action tasks because their localized decision point allows controlled error and role-conditioned modifications without altering task semantics; in multi-action tasks, such modifications would propagate ambiguously across independent steps. For each basic dual-action task, we gener- ate fiv e paraphrasing variants and fiv e non-sequitur variants, as well as 10 chained v ariants constructed by conv erting the two- step format into a single sequential instruction that includes both steps (e.g., “ (1) set ...; (2) get/r ead ... ” ⇒ “ set ... and then get/r ead ... ”), each with fiv e paraphrasing variants, yielding 10 × (5 + 5) + 10 × 5 = 150 expanded tasks. For each basic triple-action task, we generate five paraphrasing v ariants and fiv e non-sequitur variants, yielding 10 × (5 + 5) = 100 expanded tasks. In total, this yields 410 expanded tasks that are used for ev aluating the performance of AgentOptics. C. Baseline: LLM-based Code Generation (CodeGen) W e ev aluate AgentOptics against LLM code generation workflo ws that rely on either user manuals provided by the device manufacturer or pre-existing code libraries for direct device operation, as illustrated in Fig. 1(b). The code generated by LLM is then executed in a clean and isolated runtime en vironment, which uses the generated code to e xecute on actual de vices. The return output and ex ecution logs/errors from the de vice are captured for ev aluation. The task success rate for the CodeGen baseline is defined identically to that for AgentOptics described in Section IV -A, where only the hardware returned results that meet the expectation is consid- ered as success. W e e valuate both online high-capacity models and a locally hosted model supporting priv acy-preserving deployments: • LLM-based code generation with online LLMs (CodeGen-Online) . This baseline uses online LLMs (e.g., Claude Sonnet 4.5) to generate control code in Python based on the de vice documentation. The LLM 7 T ype Description T ask example Paraphrasing Same meaning, different phrases • Operate the CFP2 so that port cfp2-opt-1-1 has an output target power setting of − 5 dBm. • Using the CFP2, adjust the output target po wer parameter on port cfp2-opt-1-1 to − 5 dBm. Non-sequitur Adding unrelated infor- mation to the task • Set CFP port cfp2-opt-1-1 power to − 5 dBm; the bench mat has a curled corner . Error T ask with wrong or lost value • Missing power value: on the CFP2, set output tar get po wer on port cfp2-opt-1-1. • Wrong power value: on the CFP2, set output tar get po wer on port cfp2-opt-1-1 to − 100 dBm. Chain Sequential related tasks • First set CFP2 port cfp2-opt-1-1 output target po wer to − 4 dBm, then read CFP2 output power . Roles T ask tone as service provider or user • Y ou are an optical de vice user; set CFP port cfp2-opt-1-1 power to − 5 dBm. T ABLE II: Fiv e representati ve task variants e valuated in the agentic optical device control benchmark. receiv es task descriptions, de vice connection informa- tion (e.g., IP address, serial ports, USB identifiers), and relev ant materials in one of the two forms: ( i ) vendor- provided manuals, including APIs/code that is part of the manual (CodeGen-Online with manual), or ( ii ) reference code libraries for each target de vice (CodeGen-Online with code). The LLM then produces executable code to control the target optical devices. • LLM-based code generation with local LLMs (CodeGen-Local) : This baseline uses CodeLlama-7b- hf [50], which is a 7 billion-parameter foundation model in the Code Llama family de veloped for general code syn- thesis and understanding, capable of generating program- ming code from natural language or partial code prompts. Howe ver , small local models such as CodeLlama-7b- hf usually exhibit significantly weaker code generation performance compared to online models. T o mitigate this limitation, we applied low-rank adaptation (LoRA) fine-tuning [51] using a local dataset consisting of user - issued device control commands (e.g., requesting an OSA spectrum measurement) paired with their corresponding control code. This task-specific adaptation transforms a general-purpose local LLM into a specialized optical device control model, substantially improving control accuracy and e xecution success rates. The LoRA fine- tuning emplo ys a learning rate of 3 × 10 − 4 ov er 100 epochs. V . R E S U L T S W e ev aluate the performance of AgentOptics using differ- ent online commercial and local-deployed open-source LLMs under the benchmark described in Section IV, compare Agent- Optics against CodeGen baselines, analyze agentic AI execu- tion error types, and finally compare the cost and execution time of AgentOptics with the CodeGen baseline. A. AgentOptics with Online and Local LLMs Fig. 3 shows the success rate achie ved by AgentOptics for tasks with varying numbers of actions under different online (AgentOptics-Online) and local (AgentOptics-Local) LLMs. Overall, online commercial models (e.g., GPT -4o mini, Claude Sonnet 4.5, and DeepSeek-V3) achiev e a verage success rates of 95.6%–99.4% for single-action tasks, 99.3%–100.0% dual- action tasks, and 97.0%–100.0% for triple-action tasks. In 1 2 3 Number of A ctions per T ask 0 20 40 60 80 100 Success R ate (%) Qwen-0.6B Qwen-8B Qwen-14B Deepseek- V3 GPT -4o Mini Claude Haik u 3.5 GPT -5 Claude Sonnet 4.5 Fig. 3: T ask success rate achiev ed by AgentOptics across varying task complexities using three locally hosted and fiv e online LLMs. P araphrasing Non-sequitur Er r or R oles Chain T ask V ariants 0 20 40 60 80 100 Success R ate (%) Qwen-0.6B Qwen-8B Qwen-14B Deepseek- V3 GPT -4o Mini Claude Haik u 3.5 GPT -5 Claude Sonnet 4.5 Fig. 4: T ask success rate achie ved by AgentOptics across different task variants using three locally hosted and fiv e online LLMs. contrast, open-source locally deployed models (e.g., Qwen- 0.6B and Qwen-14B) exhibit a noticeable degradation in per- formance, particularly when more actions are in volv ed in the task: their success rates drop from 91.3%–93.1% for single- action tasks and 92.7%–94.7% for dual-action tasks to 70.0%– 75.0% for triple-action tasks, highlighting the impact of model capacity on multi-tool coordination reliability . Notably , model performance is also affected by how an LLM is trained to support MCP . For e xample, GPT -4o mini, despite having much fewer parameters than GPT -5, demonstrates consistently strong performance across all tasks. 8 1 2 3 Number of A ctions per T ask 0 20 40 60 80 100 Success R ate (%) CodeGen- Online w/ Manual CodeGen- Online w/ Code CodeGen-L ocal AgentOptics-L ocal AgentOptics- Online 0 Fig. 5: T ask success rate achiev ed by AgentOptics across varying task complexities using locally hosted and online LLMs, and comparison to the CodeGen baseline that leverages LLM for code generation. Fig. 4 shows the success rate achiev ed by AgentOptics across dif ferent task variants described in Section IV -A (see also T able II). Overall, all ev aluated models perform well on the paraphrasing (92.0%–100.0%) and role (93.3%–100.0%) variants, indicating strong robustness to linguistic reformu- lation and role-based prompt conditioning. In contrast, the performance of AgentOptics degrades for more challenging task variants such as non-sequitur and error detection. For the non-sequitur variant, commercial online models (e.g., GPT -5, Claude Sonnet 4.5, and DeepSeek-V3) achieve success rates between 99.3% and 100.0%, whereas locally deployed LLMs achiev e only 77.3%–81.3%, often failing to select the appro- priate tool when user prompts contain unrelated or extraneous information. The error variant poses challenges for both online and local models: online models achie ve success rates ranging from 76.7% (Claude Haiku 3.5) to 96.7% (DeepSeek-V3), while local models achiev e 90.0%–93.3%. Although explicit error-handling mechanisms are implemented within the MCP tools integrated in AgentOptics, missing or incomplete input parameters can still cause LLMs to inv oke incorrect APIs or tools, e.g., calling a configuration setting function as a read operation due to the absence of required tool in vocation parameters. B. AgentOptics and CodeGen P erformance Comparison Next, we compare the performance of AgentOptics against the CodeGen baseline using both an online model (Claude Sonnet 4.5) and a locally deployed, domain fine-tuned model (CodeLlama-7b-hf, see Section IV -C). Fig. 5 shows the cor- responding task success rates for single-, dual-, and triple- action tasks, where AgentOptics consistently achieves the highest success rates across all task complexities. In par- ticular , AgentOptics-Online attains near-perfect performance, achieving success rates of 98.8%–100.0%. On the other hand, the CodeGen baseline exhibits substantially lower success rates across all task comple xities. Notably , CodeGen-Local shows a clear degradation a the number of required actions increases (71.3% → 55.3% → 8.0%) For example, on single- action tasks, CodeGen-Local achieves a success rate of 71.3%, P araphrasing Non-sequitur Er r or R oles Chain T ask V ariants 0 20 40 60 80 100 Success R ate (%) CodeGen- Online w/ Manual CodeGen- Online w/ Code CodeGen-L ocal AgentOptics-L ocal AgentOptics- Online 0 0 0 Fig. 6: T ask success rate achie ved by AgentOptics across different task v ariants using locally hosted and online LLMs, and comparison to the CodeGen baseline that leverages LLM for code generation. whereas AgentOptics-Online maintains a success rate of 98.8%. This performance gap is more significant for complex tasks: for triple-action tasks, CodeGen-Local achieves a suc- cess rate of only 8.0%, whereas AgentOptics-Online maintains a success rate of 99.0%. Notably , CodeGen-Online with code and manuals performs e ven worse–achieving success rates of at most 20.0%–due to the lack of domain-specific knowledge in the online model. In contrast, CodeGen-Local outperforms the CodeGen-Online variants on low-comple xity tasks (e.g., 71.3% vs. 9.4% for single-action tasks), primarily due to the benefits of LoRA fine-tuning. Howe ver , this advantage comes at the expense of scalability , as the local LLM must be fine- tuned ev ery time with new devices or features. Fig. 6 reports the success rate across dif ferent task vari- ants, including paraphrasing, non-sequitur, error, role, and chain. AgentOptics leveraging MCP consistently outperforms the CodeGen baseline across all variants, with AgentOptics- Online achieving 93.3%–100.0% across all task types. In contrast, the CodeGen baseline exhibits substantially lo wer performance, particularly under more challenging variants such as error induced (0.0% for both CodeGen-Online variants vs. 93.3% for AgentOptics-Online) and chain tasks (22.0% for CodeGen-Online with code vs. 100.0% for AgentOptics- Online). CodeGen-Local achie ves superior performance com- pared to CodeGen-Online with code (e.g., 51.3% vs. 20.0% success rates on paraphrasing tasks), benefiting from LoRA fine-tuning. Moreov er , providing reference code has a higher success rate than providing a user manual. C. Cost Efficiency and Execution T ime Fig. 7 shows the trade-offs between task success rate and av erage cost ($/task) achieved by AgentOptics and the CodeGen baseline for dual-action tasks with dif ferent LLMs. Online commercial LLMs show a wide range of ( cost , success rate ) performance: high-end online models such as GPT -5 ($0.048/task, 98.8%) and Claude Sonnet 4.5 ($0.152/task, 99.3%) achiev e high success rate at substantially higher cost, whereas lightweight online models such as GPT -4o mini ($0.004/task, 99.3%) and DeepSeek-V3 ($0.011/task, 99.8%) 9 Mar k e r s i z e = R e l a t i v e e x e c u t i o n t i me Age ntOpti cs CodeGen Fig. 7: Trade-of f between success rate and av erage cost for dual- action tasks with AgentOptics and the CodeGen baseline using locally hosted (squares) and online (circles) LLMs. Marker size indicates the relativ e average e xecution time. achiev e the best performance at significantly lower cost—both match or exceed the success rate achiev ed by more expensiv e models. Locally deployed models such as Qwen-14B provide competitiv e (87.3%) accuracy at minimal cost. In contrast, the CodeGen baseline exhibits both limited accuracy (e.g., only up to 50%) and cost ef ficiency . Notably , AgentOptics is not cost- efficient when paired with high-cost models such as Claude Sonnet 4.5, ev en compared to both CodeGen-Online v ariants with code and manuals ($0.010/task), which itself requires providing e xtensive device manuals–often tens to hundreds of pages–or reference codebases consisting of hundreds to thousands of lines of code. For example, AgentOptics using Claude Sonnet 4.5 and GPT -4o mini achiev e identical task success rates, yet the latter incurs a 38 × lo wer cost per task, suggesting that for specific applications, higher-cost models provide negligible improvements in success rates relative to their substantially increased cost. W e also report the relativ e execution time of different methods in Fig. 7, indicated by the size of the circles. On av erage, for AgentOptics, ex ecution time varies significantly across models. Higher-capacity , reasoning-oriented models (e.g., GPT -5 at 23.8 sec, Claude Sonnet 4.5 at 13.1 sec, and DeepSeek-V3 at 16.4 sec) show longer runtimes compared to smaller models optimized for throughput (e.g., Qwen- 0.6B at 4.0 sec, GPT -4o mini at 11.3 sec). When comparing AgentOptics with the CodeGen baseline using the same LLM (e.g., Claude Sonnet 4.5), AgentOptics incurs additional la- tency (11.4 sec vs. 8.6 sec for CodeGen-Online-Code) due to potentially multiple rounds of communication between the client and LLM running the MCP-based method. D. Agentic AI Execution Err or T ypes T o understand the remaining failures under this bench- mark, T able III summarizes common failure modes observed when applying AgentOptics and the CodeGen baseline. For CodeGen, failures primarily stem from software-le vel issues, including importing non-existent libraries, in voking in valid functions or class attributes, and generating syntactically incor - rect code. F or example, the prompt with “ set the targ et output MUX WSS B ARoF TX 400 GbE CFP2 - DCO SDR1 ROADM1 P ARoF RX 400 GbE CFP2 - DCO SDR2 ROADM2 ARoF Link 1549.32 nm 1542.93 nm 99x1 splitter DEMUX WSS -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 400G ARoF Fig. 8: Diagram for DWDM link with ARoF and 400 GbE signals. power of a CFP2 module to − 5 dBm ” leads to the error “ ER- R OR: CFP2 object has no attribute set_output_power ”. Such errors indicate limitations in CodeGen’ s ability to ac- curately reason about programming language syntax and li- brary semantics under complex task requirements. In contrast, AgentOptics failures are mainly related to tool orchestra- tion, such as missing required tool inv ocations, incorrect tool naming due to formatting inconsistencies, calling unde- fined MCP tools, or e xecuting on ly a subset of the tools necessary to complete a task. For example, “ Set the OSA start wavelength to 1545 nm ” which requires calling the tool osa_set_start_wavelength , but AgentOptics actually in vokes the tool osa_set_center_wavelength . These failure modes suggest that while AgentOptics can often reason correctly about individual tools, it still struggles with reliably coordinating multiple tools. V I . C A S E S T U D I E S AgentOptics is not limited to high-fidelity control of indi- vidual optical devices under diverse natural-language inputs, but also extends to more comprehensiv e network production scenarios, including optical link configuration, autonomous channel QoT optimization, quantum polarization control, and automatic fiber cut ev ent detection. W e demonstrate these capabilities through four representative case studies. A. D WDM Link Setup and P erformance Monitoring T o ev aluate the AgentOptics capability to coordinate multi- vendor multi-de vice optical network operation, we establish a DWDM link that integrates Lumentum R OADMs with coherent 400 GbE and ARoF subsystems. This configuration, shown in Fig. 8, demonstrates end-to-end wav elength provi- sioning with coordinated control across multiple signal types. The topology consists of two R O ADM units (R OADM1 and R O ADM2) interconnected through a 99 × 1 optical splitter, where the 1% port is directed to an OSA for real-time monitoring while 99% of the optical power propagates through a 20 km fiber spool connected to RO ADM2. Each R O ADM unit includes both MUX/DEMUX WSS modules as well as booster (B) and pre-amplifier (P) EDF As. In this e xperiment, we configure only the required subset of the components, using the MUX WSS and booster EDF A on R OADM1 and the DEMUX WSS and pre-amplifier EDF A on R OADM2. This architecture enables wav elength routing with optical amplification for span loss compensation. 10 Methods Failur e Category Example Description CodeGen Import non-existing library Imports an undefined library (e.g., import lab_api which is undefined). CodeGen Call non-existing function Calls an in valid function (e.g., call AP2XXX.get_powower which is not existed). CodeGen Syntax error Contains inv alid syntax (e.g., IP=’’192.168.0.2 ). AgentOptics Missing tool Required tools are not inv oked (e xpected OSA-related tools, but called none). AgentOptics Incorrect tool Call the wrong tool (e.g., arof_get_power ↔ arof_read_power ). AgentOptics Calling non-existent tool In vokes an undefined MCP tool (e.g., cfp2_get_voltage which does not exist). T ABLE III: Reasons and examples for CodeGen and MCP-based AgentOptics execution failures. Using an agentic workflow , AgentOptics executes a coor- dinated multi-device control sequence from an operator intent to provision the two optical channels described below . In this case study , the agent interacts with four devices (R OADMs, OSA, CFP2-DCO, and ARoF TX) and in vokes a total number of nine tools. Note that although AgentOptics provides a spec- trum measurement tool, it is not in v oked here: high-resolution spectrum acquisition would require a lar ge numerical array that significantly increases token consumption in the LLM context. Instead, the OSA tools here only configure the sweep window , and spectrum acquisition is performed outside the AgentOptics workflo w . Specifically , it configure an ARoF path ov er 193.9–194.45 THz on port 1 and a CFP2 path ov er 193.4– 193.7 THz on port 20 on both R O ADMs, sets OSA sweep to 1540–1550 nm, sets the CFP2 center frequency to 193.5 THz, and applies 99 mA current and − 0.9 V bias to the ARoF TX. W ith these WSS paths and endpoints settings, the link carries two optical channels: an ARoF channel operating at 1542.92 nm (194.3 THz) for analog radio signal transmission and 400 GbE at 1549.32 nm (193.5 THz) by the CFP2-DCO for high-speed coherent data transmission. Both channels are routed through the R O ADM WSS modules with appropriate power lev els and port assignments. Performance monitoring successfully confirms provisioning with a measured OSNR of 32.6 dB for the 400 GbE signal by the CFP2-DCO and an error vector magnitude (EVM) of 3.91% for the ARoF signal, while the launch spectrum sho ws both channels at their designated wa velengths in Fig. 8. B. W ideband 5G ARoF Link Characterization In the second case study , we demonstrate AgentOptics controlling a wireless-optical testbed to autonomously char- acterize an initially unkno wn transmitter configuration. The experimental setup is shown in Fig. 9(a), where a radio frequency system-on-chip (RFSoC) ZCU216 board serves as the radio, generating and receiving a 400 MHz orthogonal frequency-di vision multiplexing (OFDM) signal centered at 600 MHz carrier frequenc y . W e adopt the RFSoC implementa- tion from [52] as the base hardw are design, and the generated RF signal is subsequently modulated onto an optical carrier at 1552.44 nm (193.11 THz) using an Optilab L T -12-E-M EAM. W ithin this framework, AgentOptics acts as the agentic control layer , dynamically configuring and sweeping the ARoF transmitter bias voltage while continuously measuring the ARoF link SNR and BER. Specifically , AgentOptics au- tonomously orchestrates the RFSoC ZCU216 board to transmit and recei ve 5G NR OFDM wa veforms with different modula- tion schemes (QPSK, 16QAM, and 64QAM), using onboard signal processing to compute feedback metrics including SNR, Host MCP Client MCP Workflow To o l Selection API Invoca tion Online LLM Response Prompt s API API MCP T ools To o l 1 To o l 2 … MCP T ools To o l 1 To o l 2 … MCP Servers ARoF TX 10 km fiber spool ARoF RX RFSoC Django HTTP Server Natural Lan guag e Device Control RF Path Optical Path - 0. 8V - 0. 8V (a ) (b ) (c ) “What’ s the ARoF link performance across all modulations?” Fig. 9: (a) AgentOptics workflow for LLM-assisted wide-bandwidth ARoF 5G NR link with an RFSoC ZCU216 board and an ARoF transmitter-recei ver pair . (b)–(c) Optimized ARoF transmitter bias voltage across link SNR and BER with different modulation orders. EVM, and BER. The modulated ARoF signal is propagated ov er a 10 km fiber spool under varying ARoF transmitter bias voltages. Figs. 9(b)–(c) show the measured SNR and BER of the received OFDM signal across the automated ARoF link configurations. The results sho w that AgentOptics can characterize the ARoF link and, based on there measurements, identify and apply the optimal ARoF transmitter bias voltage for improved wireless transmission performance. C. Adaptive Channel Configuration and GSNR Optimization on Multi-Span Optical Links For the third case study , we demonstrate that AgentOptics is capable not only of basic link establishment but also of reasoning-driv en operations such as channel power optimiza- tion. T o illustrate this capability , we establish a two-span op- tical link consisting of two 400 GbE CFP2-DCO transceiv ers and an amplified spontaneous emission (ASE) comb source, as shown in Fig. 10. Specifically , ten ASE channels with 50 GHz channel spacing from 193.85–194.85 THz are injected at the first R OADM to emulate background traffic. At R OADM1, the WSS multiplex es the ASE channels and two 400 GbE signals from CFP2-DCOs, which are subsequently amplified by a booster EDF A and transmitted over a 20 km fiber spool. 11 (a) (c) (b) User input: “Current topology is ... ” User Step 1: Rea d CFP port & check OSM and W SS Step 3: Increa se CFP Tx power to X dBm an d check CFP Rx sta tus Step 2: Set X = - 10 Is RX power higher than 0 dBm ? Step 4: set X = X+2 Step 5: Chec k CFP Rx status an d OSA spectrum, analyze and respo nse to user Step 6: response to user “Op timi zati on f inis hed, Pre - FEC BER 1x10 - 3, not affect o ther channels. ” Agen tOpti cs RX TX RX User Agen tOpti cs Input Response MUX WSS B 400 GbE CFP2 - DCO Comb source ROADM1 P ROADM3 195.5 THz MUX WSS B ROADM2 20 km Fiber spool 27 km Field fiber 400 GbE CFP2 - DCO 193.5 THz 400 GbE CFP2 - DCO 400 GbE CFP2 - DCO “Current topology is ... Add a 400 GbE c hannel on 195.5 THz and optimize its TX launching power to minimizing the CFP receiv er pre - FEC BER. The power for existing channels should be smaller than 0.5 dB . ” Host Computer User MC P C l i e n t MC P S e r v e r User Input TX power autonomou s adjustment Open channel DEMUX WSS Fig. 10: (a) AgentOptics provisions an 400 GbE channel in a two- span link and autonomously optimizes the channel GSNR based on a single-line human language instruction. (b) Autonomous launch power optimization of the CFP2-DCO 400 GbE transmitter (TX) performed by AgentOptics. (c) Pre-FEC BER optimization of the 400 GbE signal by AgentOptics using an online LLM (Sonnet 4.5) without impacting existing background traffic. The signals then pass through R O ADM2 and subsequently propagate over a 27 km field fiber . Finally , the composite signal is amplified by a pre-amplifier and demultiplexed by the WSS in R O ADM3, where the background ASE channels are dropped, and the two 400 GbE channels are directed to the receiv er CFP2-DCO. In this case study , we demonstrate AgentOptics’ s capability for real-time dynamic link reconfiguration. Specifically , it adds a new 400 GbE channel centered at 195.5 THz using CFP2-DCO to the existing two-span link, and optimizes the transmitter launch power in order to minimize the pre-FEC bit error rate (BER) at the receiv er to be below a giv en threshold. The user inputs link topology and reconfigura- tion requirements, including channel adjustment constraints, minimal impact on existing channels, and 400 GbE channel optimization objectiv es, into the AgentOptics, as shown in Fig. 10(a). AgentOptics behavior during the channel power optimization is shown in Fig. 10(b), which is determined by AgentOptics itself. The user specifies the topology and optimization objecti ve. In Step 1, the AgentOptics reads CFP port parameters and checks OSM and WSS status. In Step 2, the CFP Tx po wer is initialized to X = − 10 dBm. In Step 3, the AgentOptics increases CFP Tx power to X dBm and queries CFP Tx/Rx status. A decision block ev aluates whether CFP Rx po wer exceeds 0 dBm; if so, Step 4 updates the Tx power to X + 2 dB and repeats the loop. In Step 5, the AgentOptics analyzes CFP Rx status and OSA spectrum to verify pre-FEC BER and inter-channel impact. Finally , in Step 6, the agent reports the optimization results to the user , confirming minimized pre-FEC BER and negligible impact on existing channels. The corresponding optimization results are shown in Fig. 10(c), where the x -axis denotes the iteration index as AgentOptics sequentially in vokes dif ferent MCP tools to fulfill the user request for optimizing the BER for the ne wly added 400 GbE channel. A series of iterati ve steps are performed with a 2 dB step size to optimize the launch power of the CFP2- DCO, which is autonomously determined by the LLM (see Fig. 10(b), right y -axis). The left y -axis depicts the ev olution of the pre-FEC BER of the newly added channel, demonstrat- ing conv ergence to a minimized BER upon completion of the LLM-based optimization process. The optimization process automatically terminates once the CFP2-DCO receiver power reaches its predefined threshold. D. P olarization Monitoring and Stabilization The fourth case study demonstrates agentic control for automated polarization stabilization, which is crucial for a wide range of applications, including coherent optical commu- nication [53], distributed interferometric fiber sensing [54], and polarization-sensitiv e quantum links such as entanglement- based quantum key distribution (QKD) [55]. Maintaining polarization stability is essential for fiber-optic links because en vironmental perturbation, such as temperature variation and vibration, can induce polarization drift over time and cause performance degradation. Traditionally , mitigating this drift requires researchers to manually acti vate polarization stabi- lization procedures upon observing degradation. In contrast, AgentOptics enables polarization correction through a single natural-language command, greatly simplifying operation and response time. The experimental setup, as shown in Fig. 11(a), consists of a 1092 nm laser source, a Luna PCD-M02 4-channel piezo polarization controller, and a Luna POD2000 polarimeter . A host computer running an MCP client interfaces with an MCP server , which communicates with an Arduino Mega 2560 via USB. The Arduino device driv es the PCD-M02 controller using 12-bit digital control codes (0–4095) via a digital-to-analog con verter (D A C) interface; these codes map to an output v oltage range from 0–5 V with a step size of 12 1.22 mV. In this case study , AgentOptics orchestrates two MCP-controlled devices–the POD2000 for polarization state measurement and the PCD-M02 (via Arduino) for actuation– and completes the workflow in voking four tools: configure the POD2000 wavelength (1090 nm), read the current polarization state, stabilize to azimuth − 147 deg and ellipticity 8 deg with stopping threshold of 0.5 deg, and re-read the polarization state for verification. The polarization stabilization is implemented using a multi-stage gradient descent procedure that iteratively adjusts the four piezo control codes while monitoring the polarization state until conv ergence criteria are met. Fig. 11(b) sho ws the results given the input prompt, includ- ing a series of deliberate fiber perturbations to demonstrate system robustness. Starting from a random initial state far from the target ( ψ = 39 . 33 ◦ , χ = − 1 . 05 ◦ ), the system con verges to tar get values ( ψ = − 47 ◦ , χ = 8 ◦ ), achieving the stopping criterion (angular error < 5 ◦ ) after 12 iteration, with an av erage iteration time of 0.23 sec. For each optimization iteration, the controller performs multiple polarimeter readings to decide the actuation direction: for each piezo channel, it applies one small step forward and one small step backward, measures the resulting polarization states, and ex ecutes an actuation step in the direction that most significantly reduces the angular error relative to the target. During con vergence, the system successfully recov ers from multiple perturbations, including manual fiber disturbances that temporarily shift the polarization by over 40 ◦ (visible at iterations 4 and 9). The piezo control voltages display smooth optimization trajectories with significant adjustments across all channels: starting from the initialization at 2.5 V, channels 1, 2, and 4 increase to 2.75 V, 2.77 V, and 2.81 V respectively , while channel 3 decreases slightly to 2.18 V during the optimization process. E. D AS-enabled F iber Sensing and Event Detection In the final case study , we show that AgentOptics can also assist in monitoring fiber conditions using distrib uted acoustic sensing (D AS) and in determining whether a potential fiber cut ev ent may occur that could affect optical link transmission performance. T o support agentic workflow supporting D AS operation, we dev elop MCP tools for an NEC Spectral LS3300 D AS interrogator , which is set up to monitor a 27.4 km field fiber , as sho wn in Fig. 12(a). The MCP tools support D AS vi- bration monitoring ov er a specified time window and retriev al of the corresponding waterfall plot images. A waterfall plot shows vibration intensity along the sensing fiber over time, with distance along the fiber on the horizontal axis, time on the vertical axis, and color indicating vibration amplitude or strain rate. This representation enables rapid identification of the location, timing, and strength of vibration ev ents. The returned images are autonomously analyzed by an LLM to determine potential fiber cut events based on abnormal vibration patterns. W ithout domain-specific guidance, AgentOptics often fails to accurately identify fiber cut e vents due to limited knowledge of fiber sensing characteristics. T o address this challenge, we apply prompt engineering [56] to supply additional domain knowledge related to fiber-sensing waterfall plots, thereby enhancing the reasoning capability of AgentOptics for DAS (a) (b) “Stabilize polarization to azimuth - 47 deg and ellipticity 8 deg. ” … Lase r Sou rce 1090 nm 4- Channel Piezo Pol ari zat ion Co ntro ll er Pol ari met er Arduino Mega 2560 Host Computer User MC P C l i e n t 12 - bi t (5 V ) Cl o s e d - Lo o p F e e d b a c k MC P S e r v e r Natural Language Device Control Digital Control Optical Path Azim uth 𝜓 Ellipticity 𝜒 Prompts Fig. 11: (a) Experimental setup and control architecture for fiber link polarization stabilization using AgentOptics. (b) Closed-loop polarization stabilization results with deliberate fiber perturbations, showing polarization state and piezo controller actuation ov er time. analysis. In addition to requesting a determination of whether a giv en waterfall plot indicates an impending fiber cut, we provide background information in the prompt describing characteristic pre-cut waterfall plot patterns. This information is combined with the fiber cut determination request and the corresponding waterfall plot and fed into the LLM. W e further validate this prompt engineering approach for fiber cut prediction using a previously recorded real-world fiber cut ev ent that occurred on a 53 km field-deployed fiber . Figs. 12(c)–(e) compare the LLM-based waterfall analysis with and without prompt engineering under three scenarios: a stable en vironment, a slight fiber perturbation e vent induced by manually agitating the fiber to generate horizontal bright lines in the w aterfall plot, and a real fiber cut e vent recorded on a 53 km field fiber loop. The prompt provides e xplicit descrip- tions of fiber cut signatures in waterfall plots, such as: “ A fiber cut event is detected when fiber agitation pr oduces multiple vertical streaks, or when unequal brightness between the top and bottom vertical lines indicates a power discontinuity . ” As shown in Figs. 12(c)–(e), incorporating prompt engineering enables the LLM to reliably detect fiber cut events, thereby demonstrating AgentOptics is capable as a fully MCP-driv en, LLM-in-the-loop fiber sensing and monitoring framew ork that can help prev ent data traffic disruption caused by fiber cuts. V I I . C O N C L U S I O N S In this paper , we presented AgentOptics, an MCP-based framew ork for autonomous and scalable control of optical 13 “Turn on the DAS fo r 30 seconds, read th e waterfall plot and anal yze the results. ” 27.4 km field fiber Host Computer User MC P C l i e n t MC P S e r v e r • Tu rn on/of f • Read waterf all p lots User input DAS (a) LLM determines whe ther fiber cut event happens (b) LLM prompt enginee ring (PE) LLM direct judgem ent “Will fiber cut event happen?” “Fiber cut waterfall plot f eature is …, will it happen in this plot?” LLM judgement LLM judgement (c) Stable environment (d) Finger agitation (human pseu do fiber cut) (e) Real fiber cut event Input into LLM for abnormal event detection Construction vibration Judgement No sudden, high - intensity vertica l line Narrow vertical li nes are localized disturbances or refle ctions Reasoning LLM LLM + PE No clear horizontal agita tion pattern No new vertical lines brightening over time Judgement narrow vertical lines refle ctions o r fixed disturbances persistent vibr ation or environmental noise f or horizontal lines Reasoning LLM Horizontal agitation pattern Increasing brightness of vertica l lines Judgement See localized vertica l lines Fixed d isturbanc es for vert ical lines Reasoning LLM See localized vertica l lines Ver ti ca l l in es do n ot ma in ta in consistent brightne ss LLM + PE LLM + PE Fig. 12: (a) Experimental setup and workflow for AgentOptics- enabled fiber monitoring using distrib uted fiber sensing (DAS). (b) LLM-based reasoning and prompt engineering (PE) for automated ev ent interpretation on the DAS waterfall plot analysis for (c) a stable en vironment, (d) human-induced pseudo fiber agitation, and (e) a real fiber cut ev ent. devices and systems. AgentOptics implements 64 standardized tools across eight physical devices and is systematically ev alu- ated using a 410-task benchmark conducted on real hardware. W e assess performance using five commercial online and three locally deployed open-source LLMs, and compare against the direct LLM-based code generation baselines. Overall, AgentOptics achiev es task success rates of 99.0% and 87.7% with online and locally deployed LLMs, significantly outper- forming the direct code-generation approaches and demon- strating strong robustness across model types. W e further validate AgentOptics through five representati ve case studies from D WDM signal provisioning and monitoring to D AS- based fiber sensing with LLM-assisted ev ent detection. These case studies collectively demonstrate AgentOptics’ s ability to coordinate heterogeneous optical devices, execute closed- loop optimization, and enable intelligent orchestration in real- world testbeds. Future work will expand the AgentOptics MCP toolset to incorporate broader classes of optical and hybrid wireless-optical systems, impro ve robustness for lar ge- scale and long-horizon orchestration tasks, and incorporate en- hanced safety mechanisms and cost-efficient deployments. W e also plan to enhance AgentOptics in more complex operational en vironments to further advance autonomous optical network control and management. A P P E N D I X A. D WDM Link Setup and P erformance Monitoring Prompt : “ Add a WSS connection named ‘ARoF’ with fr e- quency fr om 193900 GHz to 194450 GHz with attenuation of 19.5 dB on the MUX side, using input port 1 and connection ID 1. Add a WSS connection named ‘CFP2’ with frequency fr om 193400 GHz to 193700 GHz with attenuation of 5 dB on the MUX side, using input port 20 and connection ID 2. Set the OSA start wavelength to 1540 nm and stop wavelength to 1550 nm. Add a WSS connection named ‘ARoF’ fr om 193900 GHz to 194450 GHz with attenuation of 5 dB on the DEMUX side, using output port 1 and connection ID 1. Add a WSS connection named ‘CFP2’ fr om 193400 GHz to 193700 GHz with attenuation of 5 dB on the DEMUX side, using output port 20 and connection ID 2. Set the CFP2 fr equency to 193500000 MHz. Set the ARoF TX curr ent to 99 mA and bias voltage to − 0.9 V. ” B. W ideband 5G ARoF Link Characterization Prompt : “ Y ou are an automated experiment contr oller tasked with finding the optimal bias voltage for an analo g radio- over -fiber (ARoF) link. The testbed consists of an RFSoC that generates and r eceives RF signals r outed thr ough optical transceivers, with system parameters fixed as follows: RF bandwidth at 400 MHz, D AC NCO at 600 MHz, ADC NCO at − 600 MHz, RF attenuation at 0 dB, and bias current at 99 mA. Using the AgentOptics MCP server , sweep the bias voltage fr om − 1.5 V to 0 V in steps of 0.1 V, and at each bias point use the ‘rfsoc link tester’ MCP server to measure EVM, SNR, and BER for three modulation schemes: QPSK, 16QAM, and 64QAM. Once all measur ements are collected, g enerate SNR- vs-bias and BER-vs-bias plots for all modulation schemes, and write the results into thr ee markdown files, each organized as a table with bias voltage and per-modulation metric values clearly pr esented. ” C. Adaptive Channel Configuration and GSNR Optimization on Multi-Span Optical Links Prompt : “ Set a new CFP channel output power to -10 dBm, add an R O ADM 2 WSS single connection for a 400 GbE channel at 195.5 THz, and r ecor d the CFP input power and pr e-FEC BER. The current link is CFP transmitter → RO ADM 1 booster → 20 km fiber → RO ADM 2 booster → 27 km field fiber → R O ADM 3 preamp → CFP r eceiver . Optimize the CFP launching power fr om − 15 to 0 dBm to minimize the pre- FEC BER of the added CFP channel without adjusting any 14 R O ADM gains, ensuring the received power of the new CFP channel remains below 0 dBm. An existing CFP channel is alr eady stable; its pr e-FEC BER must not c hange significantly during optimization, and the R O ADM 2 demux input power for c hannels 1–20 must not vary by mor e than ± 0.5 dB befor e and after optimization. Recor d each step, including pre-FEC BER for both the existing and newly added CFP channels, to determine the optimal launching power . ” D. P olarization Monitoring and Stabilization Prompt : “ Configur e POD2000 at 1090 nm wavelength, get the curr ent polarization state , then stabilize the polarization to azimuth − 47 deg and ellipticity 8 de g with a stop threshold of 0.5 de g, and verify the r esult. ” E. D AS-Enabled F iber Sensing and Event Detection Prompt : “ A fiber cut event can occur under two conditions: (1) fiber agitation, which appears as horizontal lines in the waterfall plot e ven when the y are not dominant; this agitation is often accompanied by multiple vertical lines, indicating a potential fiber cut event; and (2) a mismatch in brightness between the vertical lines at the top and bottom of the plot, which may also suggest a fiber cut. Based on these criteria, does this waterfall plot indicate a fiber cut event? ” R E F E R E N C E S [1] H. Nishizawa, G. Borraccini, T . Sasai, Y .-K. Huang, T . Mano, K. Anazawa, M. Namiki, S. Usui, T . Matsumura, Y . Sone et al. , “Semi-automatic line-system provisioning with an integrated physical- parameter-a ware methodology: Field verification and operational feasi- bility , ” IEEE/Optica J. Opt. Commun. Netw . , vol. 16, no. 9, pp. 894–904, 2024. [2] H. Nishizawa, T . Mano, T . Ferreira de Lima, Y .-K. Huang, Z. W ang, W . Ishida, M. Kawashima, E. Ip, A. D’Amico, S. Okamoto et al. , “Fast WDM provisioning with minimal probing: The first field experiments for DC exchanges, ” IEEE/Optica J. Opt. Commun. Netw . , vol. 16, no. 2, pp. 233–242, 2024. [3] T . Sasai, G. Borraccini, Y .-K. Huang, H. Nishizawa, Z. W ang, T . Chen, Y . Sone, M. T akahashi, T . Matsumura, M. Nakamura et al. , “Optical link tomography: First field trial and 4D extension, ” IEEE J. Lightwave T echnol. , vol. 43, no. 24, pp. 10 776–10 787, 2025. [4] Z. W ang, A. Raj, Y .-K. Huang, E. Ip, G. Borraccini, A. D’Amico, S. Han, Z. Qi, G. Zussman, K. Asahi et al. , “T oward intelligent and efficient optical networks: Performance modeling, co-existence, and field trials, ” in Proc. OECC/PSC’25 , 2025. [5] A. Ferrari, J. Kundr ´ at, E. Le Rouzic, M. Filer , A. Campanella, A. D’Amico, K. Balasubramanian, Y . Y in, O. Havli ˇ s, M. Ha ˇ zlinsk ` y et al. , “The GNPy open source library of applications for software abstraction of WDM data transport in open optical netw orks, ” in Pr oc. IEEE NetSoft’20 , 2020. [6] J. Y u, T . Chen, C. Gutterman, S. Zhu, G. Zussman, I. Seskar, and D. Kilper, “COSMOS: Optical architecture and prototyping, ” in Proc. IEEE/OSA OFC’19 , 2019. [7] C. Simon, “T ow ards a global quantum network, ” Nature Photonics , vol. 11, no. 11, pp. 678–680, 2017. [8] A. Sevincer , A. Bhattarai, M. Bilgi, M. Y uksel, and N. P ala, “LIGHT - NETs: Smart LIGHTing and mobile optical wireless NET works: A survey , ” IEEE Commun. Surv . T utor . , vol. 15, no. 4, pp. 1620–1641, 2013. [9] G. Borraccini, S. Straullu, A. D’Amico, A. Nespola, S. Piciaccia, A. T anzi, G. Galimberti, and V . Curri, “ Autonomous raman amplifiers in multi-band software-defined optical transport networks, ” IEEE/Optica J . Opt. Commun. Netw . , v ol. 13, no. 10, pp. E53–E62, 2021. [10] D. Raychaudhuri, I. Seskar, G. Zussman, T . Korakis, D. Kilper , T . Chen, J. Kolodziejski, M. Sherman, Z. Kostic, X. Gu et al. , “Challenge: COSMOS: A city-scale programmable testbed for experimentation with advanced wireless, ” in Proc. ACM MobiCom’20 , 2020. [11] T . Chen, J. Y u, A. Minakhmetov , C. Gutterman, M. Sherman, S. Zhu, S. Santaniello, A. Biswas, I. Seskar , G. Zussman et al. , “ A software- defined programmable testbed for beyond 5G optical-wireless experi- mentation at city-scale, ” IEEE Network , vol. 36, no. 2, pp. 90–99, 2022. [12] R. Casellas, R. Martinez, R. V ilalta, and R. M. noz, “Overvie w of SDN control of multiband over SDM optical networks with physical layer impairments, ” J. Opt. Commun. Netw . , v ol. 17, no. 2, pp. A165–A177, 2025. [13] X. Hou, Y . Zhao, Y . Liu, Z. Y ang, K. W ang, L. Li, X. Luo, D. Lo, J. Grundy , and H. W ang, “Large language models for software engineer- ing: A systematic literature revie w , ” ACM Tr ans. Softw . Eng. Methodol. , vol. 33, no. 8, 2024. [14] “Model context protocol, ” https://modelcontextprotocol.io, 2025. [15] J. Y ang, C. E. Jimenez, A. W ettig, K. Lieret, S. Y ao, K. Narasimhan, and O. Press, “SWE-agent: agent-computer interfaces enable automated software engineering, ” in Pr oc. NeurIPS’24 , 2024. [16] O. W ijerathne, A. Nimasha, D. Fernando, N. de Silva, and S. Perera, “ScheduleMe: Multi-agent calendar assistant, ” , 2025. [17] X. W ang, X. Ling, K. Li, G. Y in, L. Zhang, J. W u, A. W ang, and W . W ang, “LLM and agent-driven data analysis: A systematic approach for enterprise applications and system-level deplo yment, ” arXiv:2511.17676 , 2025. [18] M. Xu, D. Niyato, J. Kang, Z. Xiong, S. Mao, Z. Han, D. I. Kim, and K. B. Letaief, “When large language model agents meet 6G networks: Perception, grounding, and alignment, ” IEEE W ireless Commun. , vol. 31, no. 6, pp. 63–71, 2024. [19] N. Y ang, G. L yu, M. Ma, Y . Lu, Y . Li, Z. Gao, H. Y e, J. Zhang, T . Chen, and Y . Chen, “IoT -MCP: Bridging LLMs and IoT systems through model context protocol, ” in Pr oc. ACM W iNTECH’25 , 2025. [20] Y . W ang, W . Y e, Y . He, Y . Chen, G. Qu, and A. Li, “MCP4EDA: LLM-powered model context protocol R TL-to-GDSII automation with backend aware synthesis optimization, ” , 2025. [21] A. Sharma, Y . Fu, V . Ansari, R. Iyer, F . Kuang, K. Mistry , R. I. Aishy , S. Ahmad, J. Matres, D. R. Englund et al. , “AI agents for photonic integrated circuit design automation, ” APL Machine Learning , vol. 3, no. 4, 2025. [22] Y . W ang, C. Zhang, J. Li, Y . P ang, L. Zhang, M. Zhang, and D. W ang, “AlarmGPT: an intelligent alarm analyzer for optical networks using a generativ e pre-trained transformer, ” J. Opt. Commun. Netw . , vol. 16, no. 6, pp. 681–694, 2024. [23] Y . Pang, M. Zhang, Y . Liu, X. Li, Y . W ang, Y . Huan, Z. Liu, J. Li, and D. W ang, “Large language model-based optical network log analysis using LLaMA2 with instruction tuning, ” J. Opt. Commun. Netw . , vol. 16, no. 11, pp. 1116–1132, 2024. [24] Y . Zhang, Q. Qiu, X. Liu, D. Fu, X. Liu, L. Fei, Y . Cheng, L. Y i, W . Hu, and Q. Zhuge, “First field-trial demonstration of L4 autonomous optical network for distributed AI training communication: An LLM-powered multi-AI-agent solution, ” , 2025. [25] X. Liu, Q. Qiu, Y . Zhang, Y . Cheng, L. Y i, W . Hu, and Q. Zhuge, “First field trial of LLM-powered AI agent for lifecycle management of autonomous driving optical networks, ” in Pr oc. IEEE/Optica OFC’25 , 2025. [26] “AgentOptics: Agentic ai for scalable and robust optical systems con- trol, ” https://github.com/functions- lab/AgentOptics, 2026. [27] T . Schick, J. Dwivedi-Y u, R. Dess ´ ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer , N. Cancedda, and T . Scialom, “T oolformer: Language models can teach themselves to use tools, ” in Pr oc. NeurIPS’23 , 2023. [28] S. Y ao, J. Zhao, D. Y u, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models, ” arXiv:2210.03629 , 2022. [29] L. Gao, A. Madaan, S. Zhou, U. Alon, P . Liu, Y . Y ang, J. Callan, and G. Neubig, “P AL: Program-aided language models, ” in Proc. ICML’23 , 2023. [30] Y . Shen, K. Song, X. T an, D. Li, W . Lu, and Y . Zhuang, “HuggingGPT: solving AI tasks with ChatGPT and its friends in Hugging Face, ” in Pr oc. NeurIPS’23 , 2023. [31] B. Breen, M. D. Tredici, J. McCarran, J. A. Mijares, W . W . Y in, K. Sulimany , J. M. T aylor, F . H. L. Koppens, and D. Englund, “ Ax- prover: A deep reasoning agentic framew ork for theorem proving in mathematics and quantum physics, ” , 2025. [32] J. Qiu, J. Shi, X. Juan, Z. Zhao, J. Geng, S. Liu, H. W ang, S. Wu, and M. W ang, “Physics supernova: AI agent matches elite gold medalists at IPHO 2025, ” , 2025. [33] J. Chen, W . Chen, J. Du, J. Hu, Z. Jiang, A. Jie, X. Jin, X. Jin, C. Li, W . Shi et al. , “Seed-prover 1.5: Mastering undergraduate-le vel theorem proving via learning from experience, ” , 2025. 15 [34] D. Brodimas, A. Birbas, D. Kapolos, and S. Denazis, “Intent-based infrastructure and service orchestration using agentic-AI, ” IEEE Open J. Commun. Soc. , v ol. 6, pp. 7150–7168, 2025. [35] X. Wu, Y . W ang, J. Farooq, and J. Chen, “LLM-driv en agentic AI approach to enhanced O-RAN resilience in next-generation networks, ” in Proc. IEEE INFOCOM’25 W orkshops , 2025. [36] Y . Zhang, Y . Song, Y . Pang, S. Li, X. Jiang, Y . W ang, J. Li, M. Zhang, and D. W ang, “Design and e valuation of an LLM-based agent for QoT estimation and performance optimization in optical networks, ” IEEE Open J. Commun. Soc. , vol. 6, pp. 7470–7484, 2025. [37] Y . Zhang, Q. Qiu, X. Liu, X. Y u, D. Fu, X. Liu, Z. W ang, H. Lin, Y . Chen, L. Y i, W . Hu, and Q. Zhuge, “AI agent for autonomous optical networks: architectures, technologies, and prospects [invited tutorial], ” J. Opt. Commun. Netw . , v ol. 18, no. 2, pp. A159–A178, 2026. [38] N. Di Cicco, M. Ibrahimi, S. Troia, F . Musumeci, and M. T ornatore, “Open implementation of a large language model pipeline for automated configuration of software-defined optical networks, ” in Pr oc. ECOC’24 , 2024. [39] C. Sun, X. Y ang, N. D. Cicco, R. A yassi, V . V . Garbhapu, P . A. Stavrou, M. T ornatore, G. Charlet, and Y . Pointurier , “Experimental demonstration of local AI-agents for lifec ycle management and control automation of optical networks, ” IEEE/Optica J . Opt. Commun. Netw . , vol. 17, no. 8, pp. C82–C92, 2025. [40] Z. W ang, D. C. Kilper , and T . Chen, “Open EDF A gain spectrum dataset and its applications in data-driv en EDF A gain modeling, ” IEEE/Optica J. Opt. Commun. Netw . , v ol. 15, no. 9, pp. 588–599, 2023. [41] Z. W ang, Y .-K. Huang, S. Han, T . W ang, D. Kilper, and T . Chen, “Multi- span optical po wer spectrum prediction using ML-based EDF A models and cascaded learning, ” in Pr oc. IEEE/Optica OFC’24 , 2024. [42] Z. W ang, A. Raj, G. Borraccini, S. Han, Y .-K. Huang, T . W ang, M. Ruffini, D. Kilper, and T . Chen, “Scalable machine learning models for optical transmission system management, ” in Proc. IEEE/Optica OFC’25 , 2025, pp. M1J–3. [43] Z. W ang, A. D’Amico, G. Borraccini, A. Raj, Y .-K. Huang, S. Han, T . W ang, M. Ruffini, D. Kilper, and T . Chen, “Scalable ml models and cascaded learning for efficient multi-span osnr and gsnr prediction, ” IEEE/Optica J. Opt. Commun. Netw . , vol. 18, no. 1, pp. A88–A99, 2025. [44] H. Nishizawa, T . Mano, T . Ferreira de Lima, Y .-K. Huang, Z. W ang, W . Ishida, M. Kawashima, E. Ip, A. D’Amico, S. Okamoto et al. , “Fast WDM provisioning with minimal probing: the first field experiments for DC exchanges, ” IEEE/Optica J. Opt. Commun. Netw . , vol. 16, no. 2, pp. 233–242, 2024. [45] S. Xie, R. Raj, D. Briantcev , Z. W ang, T . Chen, and D. Kilper, “WDM system stimulated raman scattering spectrum and tilt prediction using CNN-based transfer learning, ” in Pr oc. IEEE ONDM’25 , 2025. [46] A. Raj, Z. W ang, F . Slyne, T . Chen, D. Kilper, and M. Ruffini, “Multi- span optical power spectrum ev olution modeling using ML-based multi- decoder attention frame work, ” in Pr oc. ECOC’24 , 2024. [47] Z. W ang, Y .-K. Huang, S. Han, T . W ang, D. Kilper, and T . Chen, “Multi- span optical po wer spectrum prediction using ML-based EDF A models and cascaded learning, ” in Pr oc. IEEE/Optica OFC’24 , 2024. [48] Z. W ang, E. Akinrintoyo, D. Kilper, and T . Chen, “Optical signal spectrum prediction using machine learning and in-line channel monitors in a multi-span R OADM system, ” in Pr oc. ECOC’22 , 2022. [49] Z. W ang, Y .-K. Huang, E. Ip, Z. Qi, G. Zussman, D. Kilper , K. Asahi, H. Kageshima, Y . Aono, and T . Chen, “Field trial of coexistence and simultaneous switching of real-time fiber sensing and coherent 400 GbE in a dense urban environment, ” IEEE/Optica J. Lightwave T echnol. , vol. 42, no. 4, pp. 1304–1311, 2023. [50] B. Rozi ` ere, J. Gehring, F . Gloeckle, S. Sootla, I. Gat, X. E. T an, Y . Adi, J. Liu, R. Sauvestre, T . Remez, J. Rapin, A. Kozhe vnikov , I. Evtimov , J. Bitton, M. Bhatt, C. C. Ferrer, A. Grattafiori, W . Xiong, A. D ´ efossez, J. Copet, F . Azhar , H. T ouvron, L. Martin, N. Usunier , T . Scialom, and G. Synnaeve, “Code Llama: Open foundation models for code, ” arXiv:2308.12950 , 2024. [51] E. J. Hu, Y . Shen, P . W allis, Z. Allen-Zhu, Y . Li, S. W ang, L. W ang, W . Chen et al. , “LoRA: Low-rank adaptation of large language models, ” ICLR , vol. 1, no. 2, p. 3, 2022. [52] W . Cheng, Z. Gao, J. Guajardo, H. Beshary , A. M. Niknejad, and T . Chen, “SPEAR+: Streaming-based multi-channel SDR implementa- tion using the RFSoC platform, ” in Pr oc. IEEE MILCOM’25 , 2025. [53] H. Ji, Z. W ang, X. Li, J. Li, R. R. Unnithan, Y . Su, W . Hu, and W . Shieh, “Photonic integrated self-coherent homodyne receiv er without optical polarization control for polarization-multiplexing short-reach optical interconnects, ” IEEE/Optica J. Lightw . T echnol. , vol. 41, no. 3, pp. 911– 918, 2023. [54] X. Fu, Z. Deng, Q. W ei, and Z. Li, “Polarization fading suppression in distributed interferometric sensing by matched interference between polarization-switched pulses, ” Opt. Expr ess , vol. 30, no. 11, pp. 19 705– 19 715, 2022. [55] S.-H. Y in, T .-W . Luo, W .-J. Jiang, X.-Y . Liu, Y .-F . Y u, Z.-J. W ei, T .- M. Zhao, J.-B. Fang, and J.-D. W ang, “Polarization compensation for entanglement-based quantum key distribution, ” Opt. Express , vol. 33, no. 18, pp. 38 431–38 439, 2025. [56] T . Shin, Y . Razeghi, R. L. Log an IV , E. W allace, and S. Singh, “ Auto- prompt: Eliciting knowledge from language models with automatically generated prompts, ” , 2020.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment