Transformer-Based Prognostics: Enhancing Network Availability by Improved Monitoring of Optical Fiber Amplifiers

T ransf ormer -Based Pr ognostics: Enhancing Network A vailability by Impr ov ed Monitoring of Optical Fiber Ampliﬁers Dominic Schneider , 1,* Lutz Rapp, 1 and Christoph Ament 2 1 Advanced T echnology , Adtran Networks SE, 98617 Meiningen, Germany 2 F aculty of Applied Computer Science, Univer sity of Augsbur g, 86159 A ugsbur g, Germany * dominic.schneider@adtran.com Abstract: W e enhance optical network av ailability and reliability through a lightweight transformer model that predicts optical ﬁber ampliﬁer lifetime from condition-based mon- itoring data, enabling real-time, edge-le vel predicti ve maintenance and advancing deploy- able AI for autonomous network operation. © 2025 The Author(s) 1. Introduction Optical communication networks form the backbone of modern digital infrastructure, supporting high-speed con- nectivity and cloud-based services. W ithin these systems, optical ﬁber ampliﬁers (OF As) are critical for main- taining signal integrity ov er long distances. Their degradation or failure can cause costly service disruptions and downtime, making predicti ve maintenance (PdM) essential for ensuring reliability and operational ef ﬁciency . T raditional maintenance strate gies, either reacti ve or scheduled, fail to meet the demands of large-scale optical systems. Reactiv e maintenance leads to unexpected service outages and costly emer gency repairs, while scheduled interventions are blind to actual component condition and often result in unnecessary replacements or missed early failures. In contrast, data-driven prognostics, particularly remaining useful lifetime (R UL) forecasting, enable operators to anticipate component degradation, plan maintenance proacti vely , and reduce operational costs. Predicting the R UL in OF As is challenging due to nonlinear de vice beha vior , feedback control dynamics, and limited monitoring data. T o address these issues, we propose a lightweight model that combines structured sparse attention and lo w-rank parameterization to improve prediction accuracy while reducing computational complex- ity: the Sparse Lo w-Ranked Self-Attention Transformer (SLA T). Applied to OF A condition-based monitoring data, SLA T pro vides early , interpretable de gradation insights and supports real-time, edge-level PdM deployment. Experiments on dedicated OF A datasets demonstrate that SLA T surpasses con ventional deep learning models in accuracy and rob ustness, paving the way for scalable, AI-dri ven reliability management in optical netw orks. + = Global Atomic Sparse Attention Pattern Band Atomic Sparse Attention Pattern SLA T's Sparse Attention Pattern Multivariate T ime Series Sliding T ime W indow T echnique Multivariate Batched T ime W indows Multiple Features Individual T ime Steps Sliding T ime W indow Data Preprocessing Sparse Multi-Head Attention Mask Add Flatten FFN Position-wise FFN Encoder Decoder Norm N E x N E x N D x Positional Encoding Input Embedding Input Embedding Multi-Head Self-Attention Multi-Head Cross-Attention Add & Norm Add & Norm Feature Fusion Layer RUL prediction Multi-Head Self-Attention Norm Add & Norm Position-wise FFN Add Multi-Head Self-Attention Norm Add & Norm Position-wise FFN Add Sparse Low-Ranked Self-Attention T ransformer (SLA T) Fig. 1. Proposed transformer architecture SLA T for predicting the R UL of OF As, deri ved from mul- tiv ariate time series data using sliding time windo w technique. VOA Controller MUX EDF Gain Stage 1 EDF PD 3 PL 2 PD 4 PD 1 PD 2 PL 1 GFF Gain Stage 2 EDF A Switch LD 1 LD 2 LD 9 VOA Switch 50/50 Coupler PM OSA Fig. 2. Data acquisition setup, illustrating the collection of multiv ariate time series data from OF As, with induced degradation scenarios in the sho wn EDF A architecture. 2. Methodology and Experimental Setup The proposed frame work combines structured data preprocessing with a tailored T ransformer [ 1 ] architec- ture optimized for R UL forecasting of OF As, as shown in Fig. 1 . Condition-based monitoring (CBM) signals from the ampliﬁer are segmented using a sliding time window approach, transforming raw sensor sequences X = { x 1 , x 2 , . . . , x T } into overlapping multi variate tensors of ﬁxed length N stw . Each window captures the short- term dynamics of the system and is enriched with statistical descriptors such as the mean and linear trend of each signal. These features are concatenated with normalized sensor data to improv e de gradation sensiti vity while keeping input dimensionality consistent across datasets. The Sparse Low-Rank ed Self-Attention T ransformer (SLA T) extends the standard encoder-decoder Trans- former to ef ﬁciently handle multi variate time series under limited data conditions. T wo parallel encoders inde- pendently process temporal and sensor-wise dependencies, and their outputs are fused before decoding. Each encoder block employs structured sparse attention, where each query tok en attends only to a restricted set of ke y tokens deﬁned by a combination of global and local (banded) patterns: Attention ( Q , K , V ) = A V = softmax M ⊙  QK ⊤  √ d k ! V , (1) where M is the binary sparse attention mask encoding global and band connections, and ⊙ denotes the Hadamard product. This reduces quadratic complexity while preserving long- and short-range dependencies. T o further reg- ularize the model, the attention weight matrices W Q , W K , W V are low-rank parameterized as: Q = X W Q , K = X W K , V = X W V , (2) with rank r ≪ d k , reducing parameter count and improving generalization in small-sample re gimes. Ke y hyperparameters of SLA T were optimized using Bayesian Optimization. The resulting conﬁguration, sum- marized in T able 2 , includes four encoder blocks for both temporal and sensor paths, two decoder blocks, and eight attention heads per block, using 64-dimensional linear embeddings. This design achieves an effecti ve trade-off be- tween predicti ve precision and computational ef ﬁciency , making the model suitable for real-time deplo yment on distributed optical systems. Experimental data were obtained from a controlled OF A testbed, sho wn in Fig. 2 , designed to emulate realistic degradation scenarios. The setup comprises a two-stage erbium-doped ﬁber ampliﬁer (EDF A) with integrated sensors and actuators, including pump lasers (PL), power detectors (PD), a variable optical attenuator (V OA), and a gain-ﬂattening ﬁlter (GFF). A hardware-in-the-loop (HIL) simulator interfaces with the ampliﬁer’ s embedded controller via SSH and T elnet to inject parameter drifts corresponding to physical degradation processes (e.g., pump aging or photodiode sen- sitivity loss). This allo ws precise, repeatable generation of CBM data across dif ferent fault modes and operating conditions. A detailed description of the test en vironment, instrumentation, and dataset structure is provided in [ 2 ]. T able 1. Bayesian-optimized Hyperparameters Hyperparameter SLA T Input embedding 64 units, linear Sensor encoder 4 blocks, 8 heads, GELU T ime encoder 4 blocks, 8 heads, GELU Decoder 2 blocks, 8 heads, GELU Output layer 1 unit, linear T able 2. RMSE results for R UL prediction Subdataset BiLSTM DCNN D AST SLA T PL 10.18 9.72 9.66 8.93 PD 9.03 8.75 8.28 7.67 V O A 3.73 2.70 1.58 1.34 PC 10.81 10.20 9.99 8.29 A verage 7.72 7.85 7.50 6.56 0 27 55 82 110 138 Current cycle 0 25 50 75 100 125 RUL (cycles) (a) 0 30 60 90 120 150 Current cycle 0 25 50 75 100 125 (b) 0 28 57 86 115 144 Current cycle 0 25 50 75 100 125 (c) 0 29 58 87 116 145 Current cycle 0 25 50 75 100 125 (d) Actual RUL SLA T DAST BiLSTM DCNN Fig. 3. R TF trajectories for representati ve test samples of each degradation scenario: (a) pump laser , (b) power detector , (c) v ariable optical attenuator , and (d) passi ve components. 3. Results T o assess the forecasting performance of the proposed model SLA T , we compare its R UL forecasting accuracy against three established deep learning architectures: the Bidirectional Long Short-T erm Memory (BiLSTM) net- work, the Deep Con volutional Neural Netw ork (DCNN), and the Dual-Aspect Self-Attention Transformer (D AST) model [ 3 – 5 ]. All models were trained and ev aluated under identical experimental conditions. T able 2 summarizes the root mean squared error (RMSE) obtained across all subdatasets representing soft- failures of the follo wing components: pump laser (PL), power detector (PD), v ariable optical attenuator (VO A), and passi ve components (PC). Across all categories, SLA T consistently outperforms the state-of-the-art baselines, achieving the lowest a verage RMSE of 6.56, compared to 7.50 for D AST , 7.72 for BiLSTM, and 7.85 for DCNN. The results demonstrate that SLA T achie ves a robust trade-off between predicti ve accuracy and computational efﬁcienc y , v alidating its suitability for real-time, distributed PdM in optical transport systems. T o further examine model behavior ov er time, Run-to-Failure (R TF) trajectories were generated for representa- tiv e test samples of each degradation scenario, as sho wn in Fig. 3 (a-d). Each subplot compares the predicted R UL from SLA T with the ground-truth trajectory and with the outputs of DAST , BiLSTM, and DCNN. Across all scenarios, SLA T’ s predicted trajectories remain within a narro w conﬁdence interv al around the meas- ured RUL, conﬁrming both the accuracy and temporal stability of its forecasts. These visual results reinforce the quantitativ e ﬁndings summarized in T able 2 and illustrate SLA T’ s capability to model di verse optical degradation processes with high reliability . 4. Conclusion W e proposed a lightweight transformer model that enables accurate, real-time prediction of ampliﬁer lifetime from monitoring data, enhancing ov erall optical network resilience and operational efﬁcienc y . With its lo w com- putational comple xity , SLA T adv ances edge-intelligent, AI-dri ven maintenance toward fully autonomous optical networks. Acknowledgment This work has recei ved funding from the German Federal Ministry of Research, T echnology , and Space (BMFTR) project SUST AINET -Advance, Grant 16KIS2271K, in the framework of the CEL TIC-NEXT project id C2024/3-3. References 1. A. V aswani, N. Shazeer , N. Parmar , J. Uszk oreit, L. Jones, A. N. Gomez, Ł. Kaiser , and I. Polosukhin, “ Attention is all you need, ” Adv. Neural Inf. Process. Syst. 30 (2017). 2. D. Schneider, L. Rapp, and C. Ament, “ A transformer-based approach for diagnosing fault cases in optical ﬁber ampli- ﬁers, ” in 2025 25th Anniversary International Confer ence on T ranspar ent Optical Networks (ICTON), (IEEE, 2025), pp. 1–4. 3. T . S. Kim and S. Y . Sohn, “Multitask learning for health condition identiﬁcation and remaining useful life prediction: deep con volutional neural network approach, ” J. Intell. Manuf. 32 , 2169–2179 (2021). 4. J. W ang, G. W en, S. Y ang, and Y . Liu, “Remaining useful life estimation in prognostics using deep bidirectional LSTM neural network, ” in 2018 Pro gnostics and System Health Management Confer ence (PHM-Chongqing), (IEEE, 2018), pp. 1037–1042. 5. Z. Zhang, W . Song, and Q. Li, “Dual-aspect self-attention based on transformer for remaining useful life prediction, ” IEEE T rans. on Instrum. Meas. 71 , 1–11 (2022).

Transformer-Based Prognostics: Enhancing Network Availability by Improved Monitoring of Optical Fiber Amplifiers

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment