From Vessel Trajectories to Safety-Critical Encounter Scenarios: A Generative AI Framework for Autonomous Ship Digital Testing

From V essel T rajectories to Safety-Critical Encounter Scenarios: A Generati ve AI Frame work for Autonomous Ship Digital T esting Sijin Sun 1 Liangbin Zhao 1 , ∗ Ming Deng 2 Xiuju Fu 1 ∗ Corresponding author 1 Institute of High Performance Computing, Agency for Science, T echnology and Research (A*ST AR IHPC) 2 Shanghai Univ ersity Abstract —Digital testing has emerged as a key paradigm for the dev elopment and veriﬁcation of autonomous maritime navi- gation systems, yet the a vailability of realistic and di verse safety- critical encounter scenarios remains limited. Existing appr oaches either rely on handcrafted templates, which lack r ealism, or extract cases directly fr om historical data, which cannot sys- tematically expand rare high-risk situations. This paper proposes a data-driven framework that con verts large-scale Automatic Identiﬁcation System (AIS) trajectories into structured safety-critical encounter scenarios. The frame- work combines generative trajectory modeling with automated encounter pairing and temporal parameterization to enable scal- able scenario construction while preserving real trafﬁc character- istics. T o enhance trajectory realism and rob ustness under noisy AIS obser vations, a multi-scale temporal v ariational autoencoder is intr oduced to capture vessel motion dynamics across differ ent temporal resolutions. Experiments on real-world maritime trafﬁc ﬂows demonstrate that the proposed method impr oves trajectory ﬁdelity and smoothness, maintains statistical consistency with observed data, and enables the generation of diverse safety-critical encounter scenarios beyond those directly recorded. The resulting frame- work provides a practical pathway for building scenario libraries to support digital testing, benchmarking, and safety assessment of autonomous navigation and intelligent maritime trafﬁc man- agement systems. Code is av ailable at https://anonymous.4open. science/r/traj- gen- anonymous- re view. Index T erms —A utonomous Ships, Maritime T rafﬁc, T rajectory Generation, V ariational Autoencoder , Safety-Critical Scenarios, Simulation-Based Digital T esting I . I N T RO D U C T I O N Recent advances in intelligent transportation and au- tonomous navigation systems have led to a gro wing reliance on digital testing and virtual validation as a necessary comple- ment to ﬁeld testing, particularly in domains such as maritime navigation where large-scale real-world experiments are costly and dif ﬁcult to control. In speciﬁc maritime trafﬁc en viron- ments, vessel behavior is inﬂuenced by navigational con- straints, operational practices, and en vironmental uncertainties. Consequently , effecti ve simulation-based validation depends on the av ailability of safety-critical encounter scenarios that are both realistic and sufﬁciently div erse. Howe ver , such scenarios occur only sparsely in real-world trafﬁc data and are difﬁcult to generate in a systematic manner . This gap between av ailable trajectory data and the scenario requirements of digital testing remains a key obstacle to systematic validation of autonomous maritime systems. T o address the need for scenario-based validation, exist- ing studies have explored se veral approaches to constructing maritime testing scenarios. Kno wledge-driv en methods design encounter cases based on navigation rules and expert-deﬁned conﬁgurations, ensuring interpretability and regulatory consis- tency but often lacking di versity and realism. Data-statistical approaches extract representati ve encounters from historical AIS datasets, impro ving realism but remaining constrained by data av ailability and the sparsity of rare high-risk ev ents. More recently , algorithmic and generati ve techniques ha ve been proposed to expand the scenario space. Howe ver , many of these approaches primarily rely on parameter variation or simpliﬁed encounter formulations, and only limited ef forts hav e been made to inte grate generative modeling with the statistical structure of real trafﬁc ﬂows. As a result, despite substantial progress in scenario design, current approaches still struggle to simultaneously achie ve realism, di versity , and systematic construction of safety-critical encounter scenarios from real-world trafﬁc data. This limi- tation highlights the need for a data-driv en framework that can le verage large-scale trajectory records while enabling structured and scalable generation of safety-critical scenarios suitable for simulation-based digital testing. In this work, we propose a data-driv en framework that con verts large-scale vessel trajectory data from Automatic Identiﬁcation System (AIS) into structured safety-critical en- counter scenarios for digital testing. The framework integrates trajectory synthesis and scenario construction into a uniﬁed pipeline. First, representativ e vessel motion patterns within designated trafﬁc ﬂows are learned and synthesized through a generativ e modeling approach. T o enhance trajectory realism and robustness under noisy AIS data, a multi-scale tempo- ral variational autoencoder architecture (V AE) is introduced, which captures vessel motion dynamics across different tem- poral scales and preserves the statistical structure of real trafﬁc ﬂows. The generated trajectories are then systematically paired based on spatial and temporal interaction conditions, and each identiﬁed interaction is transformed into a standardized sce- nario representation through temporal parameterization around the closest-encounter instant. By linking data-dri ven trajectory generation with structured encounter construction, the pro- posed frame work enables scalable generation of realistic and div erse safety-critical maritime testing scenarios. The main contributions of this work are summarized as follows: 1) A data-driven framework for safety-critical scenario construction: W e propose a uniﬁed pipeline that con- verts large-scale v essel trajectory data from Automatic Identiﬁcation System (AIS) into structured safety-critical encounter scenarios, enabling systematic generation of simulation-ready test cases for maritime autonomous navigation systems. 2) A rob ust generative trajectory synthesis approach: W e de velop a multi-scale temporal v ariational autoen- coder (V AE) architecture to model vessel motion pat- terns from AIS data and integrate trajectory smoothing to enhance realism. 3) A structur ed encounter construction mechanism: W e design an automated trajectory pairing and temporal parameterization strategy that transforms synthesized trajectories into standardized safety-critical encounter scenarios suitable for digital testing. I I . L I T E R A T U R E R E V I E W A. Safety-Critical V essel Encounter Scenario Generation The construction of safety-critical encounter scenarios has attracted increasing attention in recent years, particularly for simulation-based testing of autonomous navigation and decision-support systems. In the domain of maritime au- tonomous vessels, generation of safety-critical scenarios has also been explored, though less extensi vely . Existing ap- proaches to scenario generation can be broadly categorized into knowledge-dri ven, data-statistical, and generative meth- ods, each addressing different aspects of the problem. Knowledge-dri ven approaches construct encounter scenarios based on navigation rules, expert kno wledge, or predeﬁned geometric conﬁgurations. Standardized encounter templates deriv ed from COLREGs, such as the Imazu problem [1], or classical cases including head-on, crossing, and ov ertaking situations, are widely used to ev aluate collision av oidance systems in a reproducible and interpretable manner [2]. Some studies further organize scenario sets according to regulatory cov erage or encounter taxonomies to enable systematic testing across rule-deﬁned contexts [3]. Simulation en vironments built upon such templates may also incorporate uncertainty models and en vironmental disturbances to provide conﬁgurable ev al- uation suites [4]. While these approaches ensure interpretabil- ity and regulatory alignment, their reliance on handcrafted conﬁgurations limits their ability to capture the div ersity and statistical structure of real maritime trafﬁc patterns. Data-statistical approaches construct encounter scenarios directly from historical AIS records by identifying ship inter- actions and analyzing the statistical distributions of key navi- gation parameters. These statistics are then used to reconstruct representativ e real-world encounters or guide the sampling of synthetic scenarios consistent with observed trafﬁc patterns [5], [6]. Some works further incorporate measures such as en- counter complexity or importance to support scenario selection for algorithm testing [7]. While grounding scenario construc- tion in real trafﬁc data improv es realism, such methods remain constrained by the av ailability and distribution of historical observations. Safety-critical encounters are inherently rare, making it difﬁcult for purely data-driv en extraction approaches to systematically generate di verse high-risk scenarios beyond those already recorded. Algorithmic and generativ e approaches attempt to expand the scenario space beyond historical observ ations by synthesiz- ing encounter conﬁgurations through sampling, optimization, or learning-based generation. Some studies generate candidate trafﬁc situations by systematically sampling parameter spaces and ﬁltering hazardous cases using predeﬁned risk metrics or clustering strategies [8]. Other works provide toolchains for producing structured encounter sets that enable controlled variation of trafﬁc conﬁgurations for systematic testing [9]. More recently , learning-based methods such as reinforcement learning have been explored to adaptiv ely generate high- risk scenarios through interaction with the tested system, improving the efﬁcienc y of discovering safety-critical cases [10]. While these approaches impro ve coverage and controlla- bility of scenario generation, many rely on abstract encounter parameterizations or simpliﬁed motion assumptions, making it challenging to preserve the statistical realism and operational patterns observed in real vessel trajectories. Overall, e xisting approaches address complementary aspects of scenario generation, including interpretability , realism, and controllability . Howe ver , a systematic mechanism that can jointly preserve real trafﬁc characteristics while enabling scal- able generation of di verse safety-critical encounter scenarios remains limited. Motiv ated by these limitations, the proposed framew ork aims to preserve the realism of historical trafﬁc data while expanding the av ailability of safety-critical scenar- ios through generativ e trajectory modeling. B. V essel T rajectory Generation A scenario can be interpreted as a temporal sequence of system states, which may be represented using different modalities depending on the lev el of abstraction required for testing and analysis [11]. For many trafﬁc applications, trajectories provide a compact and effecti ve representation of object motion, making them widely used in scenario modeling and simulation studies. In the maritime domain, trajectory data derived from AIS observations hav e been extensi vely used for traf ﬁc analysis, behavior modeling, and collision risk assessment. Existing research has primarily focused on trajectory prediction, where future vessel positions are estimated from historical motion patterns or en vironmental conditions [12]. While prediction methods are valuable for navigation support, they do not directly address the need to generate div erse motion patterns for systematic testing purposes. Fig. 1: Overview of the trajectory generation and scenario construction pipeline including Data Preprocessing, Conﬂux V AE Structure, and Encounter Pairing and Safety-Critical Scenario Construction: synthetic trajectories are paired and ﬁltered to form realistic vessel encounters for safety analysis. T o synthesize new trajectories, generativ e models hav e recently attracted attention. Recurrent neural networks such as LSTM have been applied to learn motion dynamics from AIS data and generate plausible vessel movements [13]. More advanced generativ e approaches, including generativ e adversarial networks and variational autoencoders, hav e also been explored to capture the statistical structure of trajectory datasets and enable data-driv en synthesis of new motion se- quences [14], [15]. These methods provide promising tools for constructing trajectory sets that remain consistent with real trafﬁc behavior while allowing controlled expansion of the motion space. Motiv ated by these dev elopments, the present work adopts a v ariational autoencoder–based generati ve framework to syn- thesize vessel trajectories, enabling the subsequent construc- tion of safety-critical encounter scenarios while preserving the statistical characteristics of real AIS data. I I I . M E T H O D O L O G Y A. Overall F r amework The proposed frame work consists of two sequential modules designed to construct safety-critical vessel encounter scenarios from historical AIS data, as illustrated in Figure 1. Historical vessel trajectories within a speciﬁed trafﬁc ﬂow are extracted and processed to generate representative nav- igation tracks. In addition to trajectory reconstruction and quality control, a data-driv en learning process is employed to capture the statistical characteristics of vessel motion within the trafﬁc ﬂow . This module ensures that the synthesized trajectories preserve the spatial, kinematic, and operational patterns observed in real vessel mov ements. The generated trajectories are then systematically paired to identify combinations satisfying encounter conditions deﬁned by spatial proximity and relative motion consistenc y . Each de- tected interaction is then transformed into a standardized sce- nario through temporal parameterization centered on the close- encounter instant, with conﬁgurable pre- and post-encounter time windows. B. Data-Driven T rajectory Generation fr om Historical AIS 1) Data Prepr ocessing: The workﬂow of the data prepro- cessing step mainly includes interest route ﬁltering, timestamp interval resampling, abnormal data ﬁltering, and time window standardizing. Error data or missing values are ﬁltered out and interpolation is applied where needed. The ship trajectory is denoted as a series of timestamp points, i.e., T ra j = P 1 , P 2 , . . . , P i , . . . , P N , P i = (lon i , lat i ) , i = 1 , . . . , N , where P i is the i th datapoint comprising longitude and latitude and N is the number of points. Among the recorded vessel routes in a designated area, two overlapping routes (Route 1 and Route 2) should be identiﬁed under start and end area constraints. Route j = { T ra j 1 , T ra j 2 , . . . , T ra j k , . . . , T ra j n } , j = 1 , 2 , denotes the two route datasets with n trajectories each, which are respectively used as training sets for subse- quent trajectory generation. 2) Conﬂux EMA for Sequence Modeling: Exponential mov- ing average (EMA) is widely used for sequence smoothing and long-range dependency modeling. Multi-headed EMA maintains se veral EMA states at different temporal scales to fuse short-, medium-, and long-term information. W e propose Conﬂux EMA, a split-ﬂo w module with three parallel multi-headed EMA branches operating at distinct representation scales. The three branches correspond to natural temporal levels in vessel trajectories: (1) local dynamics (step- wise displacement and instantaneous velocity), (2) segment - lev el maneuvers (turning and course adjustment), and (3) global directional trends. T wo branches would merge segment and global patterns, while more than three introduce redundant scales without additional cov erage. Giv en x ∈ R B × L × d , the three branches produce y s , y m , and y l . The small branch projects d =64 to d s =32 with H s =4 heads for ﬁne-grained features; the medium branch keeps d m =64 with H m =8 heads; the large branch e xpands to d l =128 with H l =16 heads for global context. A learnable softmax gate w fuses them as a residual: w = softmax( w 0 , w 1 , w 2 ) , ˆ x = x + w 0 y s + w 1 y m + w 2 y l . (1) Conﬂux EMA is embedded in a con volutional block (CE- Con v): a 1D conv olution is applied ﬁrst, followed by the Conﬂux EMA residual along the sequence dimension. The same CECon v is used in both encoder and decoder . Fig. 2: Conﬂux EMA block (CECon v): three parallel multi- headed EMA branches at different scales (small, medium, large) are combined via a learnable softmax gate and added to the input as a residual. Compared to T ransformer self-attention, multi-headed EMA offers two adv antages for noisy AIS trajectories: the exponen- tial decay imposes a temporal locality prior aligned with vessel kinematics, and EMA acts as a low-pass ﬁlter that suppresses high-frequency GPS and AIS noise rather than amplifying anomalous frames. 3) ConﬂuxV AE for T rajectory Generation: A V AE consists of an encoder E and a decoder D . The encoder maps input data to a latent space; the decoder reconstructs data from the latent code. The generati ve model is p ( x, z ) = p θ ( z ) p θ ( x | z ) , where p θ ( z ) is the prior over the latent variable z and p θ ( x | z ) is the likelihood modeled by the decoder . The training objectiv e is to maximize the e vidence lower bound (ELBO) [15]: L ( θ , ϕ ; x i ) = − D KL ( q ϕ ( z | x i ) ∥ p θ ( z ))+ E q ϕ ( z | x i ) [log p θ ( x i | z )] , (2) where ϕ and θ denote the variational and generativ e parame- ters. The encoder outputs the parameters of a Gaussian ov er the latent space (mean µ and log-variance log σ 2 ); z is obtained by the reparameterization z = µ + ϵ · exp(0 . 5 log σ 2 ) with ϵ ∼ N (0 , I ) . ConﬂuxV AE integrates Conﬂux EMA into a V AE for tra- jectory generation. The encoder processes input trajectories through con volutional feature extraction and reﬁnement layers, followed by a CECon v block for multi-scale temporal model- ing. The extracted features are ﬂattened and passed through fully connected layers that output the latent parameters µ and log σ 2 . After reparameterization to obtain z , the decoder reconstructs trajectories through a symmetric in verse process: fully connected layers reshape the latent code, which then passes through a CECon v block and transposed con volutional layers, with sigmoid activ ation producing the ﬁnal output. Compared to a standard V AE that uses a single reconstruction-KL objecti ve, the training loss used here is a β -weighted combination of reconstruction and KL terms. Let ˆ x denote the reconstruction. The reconstruction loss is the mean squared error L recon = MSE( x, ˆ x ) . The KL div ergence term (Gaussian prior and posterior) is L KL = − 1 2 P J j =1  1 + log σ 2 j − µ 2 j − σ 2 j  , where J is the latent di- mension. The total loss is L = L recon + β L KL , where β balances latent regularization and reconstruction quality . 4) Data P ostprocessing: T o improve the quality of the gen- erated trajectories, a Savitzk y–Golay ﬁlter is applied to denoise the model output. The ﬁlter ﬁts adjacent data points with a low-de gree polynomial via linear least squares, smoothing the trajectory while preserving its trend. 5) Evaluation Index: W e use four metrics to ev aluate trajectory generation quality . MAE (Mean Absolute Error) and MSE (Mean Squared Error) measure point-wise reconstruction error: MAE = 1 N P N i =1 | x i − ˆ x i | and MSE = 1 N P N i =1 ( x i − ˆ x i ) 2 , where x i and ˆ x i denote corresponding points in the original and generated trajectories respectiv ely , and N is the total number of points. DM (Distance Metric) measures the distributional distance between the original and generated trajectory sets using Fr ´ echet distance between their statistical representations. MMD (Maximum Mean Discrepancy) quan- tiﬁes distributional discrepancy via a kernel k (exponentiated quadratic [16]). For reference set X and generated set Y , MMD 2 ( X, Y ) = 1 m ( m − 1) X i  = j k ( x i , x j ) + 1 n ( n − 1) X i  = j k ( y i , y j ) − 2 mn X i,j k ( x i , y j ) (3) where m and n are the sizes of sets X and Y respecti vely . C. Encounter P airing and Safety-Critical Scenario Construc- tion 1) Pr oblem F ormulation: Let T (1) = { τ (1) i } N 1 i =1 and T (2) = { τ (2) j } N 2 j =1 denote two trajectory pools generated for two designated trafﬁc ﬂows (e.g., inbound vs. outbound). Each trajectory τ is represented as a temporally ordered sequence of AIS states τ = { s k } L k =1 , s k = ( t k , ϕ k , λ k ) , (4) where t k is the timestamp, and ( ϕ k , λ k ) are latitude and longitude at sample k . The objectiv e is to automatically construct a set of in- teraction scenarios by pairing trajectories across ﬂo ws and extracting those that satisfy encounter conditions within a predeﬁned encounter region. Let R be a polygonal re gion of interest (R OI) deﬁned by vertices { ( ϕ ( r ) m , λ ( r ) m ) } M m =1 . The output is a scenario set S = { σ ij | τ (1) i ∈ T (1) , τ (2) j ∈ T (2) , σ ij is safety-critical } , (5) where each scenario σ ij stores the paired trajectory indices and the corresponding encounter indices ( k ⋆ , ℓ ⋆ ) at which a safety-critical interaction is detected. 2) Region-Constr ained Encounter P airing: T o av oid spuri- ous pairs from out-of-region ev aluation, we restrict to states within encounter region R via indicator I R ( ϕ, λ ) = 1 if ( ϕ, λ ) ∈ R else 0 . For candidate pair ( τ (1) i , τ (2) j ) , we require indices k , ℓ with I R ( ϕ (1) i,k , λ (1) i,k ) = I R ( ϕ (2) j,ℓ , λ (2) j,ℓ ) = 1 . This focuses detection on interaction hotspots and reduces false positiv es from distant co-existence. 3) Pr oximity and Motion Based Encounter Identiﬁcation: For encounter detection, relativ e motion of vessels s (1) i,k and s (2) j,ℓ is e valuated from consecutive AIS samples. W ith ﬁxed ∆ t , Speed Over Ground (SOG) and Course Over Ground (COG) are estimated: SOG ≈ d ( p k − 1 , p k ) / ∆ t , COG ≈ bearing ( p k − 1 → p k ) , where p k = ( ϕ k , λ k ) and d is great- circle distance. Based on these reconstructed motion states, relative motion indicators between vessel pairs are ev aluated using standard closest point of approach (CP A) analysis. This enables the interaction assessment to consider not only instantaneous separation but also the motion tendency of the vessels, en- suring that detected ev ents correspond to genuine navigational encounters. An interaction is considered safety-critical only if both proximity and motion-consistency conditions are satisﬁed. The vessels must exhibit sufﬁciently close spatial interaction at some point along the trajectories. Let D min = min t D r ( t ) denote the minimum observed separation distance between the vessels ov er the ev aluated interval, where D r ( t ) is the instantaneous inter-vessel distance expressed in nautical miles. The proximity condition requires D min ≤ d min , ensuring that the vessels come into near-contact proximity at least once. d min denotes the minimum-distance threshold. Additionally , the relative motion must indicate a conv erging encounter conﬁguration. This requires that there exists at least one time instant t at which the instantaneous distance and CP A indicators jointly satisfy D r ( t ) ≤ d th , 0 < TCP A( t ) ≤ T th , DCP A( t ) ≤ d cpa , (6) where d th denotes the proximity threshold, T th denotes the admissible time-to-closest-approach windo w , and d cpa denotes the admissible closest-approach distance. These constraints ensure that the vessels are not only spatially close b ut also dynamically con ver ging within a relev ant time horizon. By jointly enforcing the minimum-distance requirement and the motion-consistency constraint, the framework ﬁlters out incidental spatial proximity and isolates interaction conﬁgura- tions that are both spatially critical and dynamically meaning- ful. The resulting encounters therefore represent safety-critical maritime situations suitable for scenario-based safety analysis and testing of intelligent navigation systems. 4) Conﬁgurable Safety-Critical Scenario Repr esentation: Giv en a paired trajectory ( τ (1) i , τ (2) j ) that satisﬁes the safety- critical encounter conditions, a standardized scenario represen- tation is further constructed by anchoring the scenario at the closest-interaction instant. Let t ⋆ denote the time correspond- ing to the minimum observed inter -vessel separation within the encounter interval, i.e., t ⋆ = arg min t D r ( t ) , (7) where D r ( t ) is the instantaneous inter-v essel distance (in nautical miles) ev aluated at time t . T o enable consistent scenario extraction across different encounter instances, two conﬁgurable temporal margins are introduced: t early and t after . These margins deﬁne a scenario window centered around t ⋆ : [ t ⋆ − t early , t ⋆ + t after ] . Accordingly , each vessel trajectory is partitioned into three segments: 1) Pre-encounter segment (nominal tracking path): from the trajectory start time to t ⋆ − t early . This segment represents the nominal path that the vessel is expected to track prior to entering the hazardous situation. 2) Encounter segment (safety-critical test window): from t ⋆ − t early to t ⋆ + t after . This segment constitutes the core safety-critical interval used to ev aluate collision avoid- ance and maneuvering capability under the encounter conﬁguration. 3) Post-encounter segment (recovery tracking path): from t ⋆ + t after to the end of the trajectory . This segment represents the reco very phase in which the vessel is expected to re-plan and continue tracking a nominal route after the hazardous interaction. The extracted scenario instance is represented as σ ij =  i, j, t ⋆ , t early , t after  , (8) together with the corresponding trajectory clips for both ves- sels within the three segments. This representation provides a uniﬁed interface for downstream simulation-based digital testing, where the pre- and post-encounter segments deﬁne nominal reference paths, and the encounter segment deﬁnes the ev aluation window for collision-av oidance performance. The speciﬁc values of the conﬁguration parameters, in- cluding t early and t after , can be selected according to test- ing objectives and operational context. In practice, these parameters may be determined based on domain expertise, vessel characteristics, or regulatory requirements. Moreov er , performance ev aluation criteria for autonomous vessels can be deﬁned differently across the three trajectory se gments, enabling ﬂexible assessment of tracking stability , collision- av oidance capability , and post-encounter recovery behavior . I V . E X P E R I M E N T S A N D R E S U LT S A. V essel T r ajectory Generation for Designated T raf ﬁc Flows 1) Data Description: The dataset is deri ved from one-year AIS records of a Singapore seaway , covering the area ap- proximately bounded by longitudes 103.785°E–103.837°E and latitudes 1.180°N–1.215°N. T wo opposing trafﬁc ﬂows along the same waterway are selected as Route 1 (northbound, from 1.180°N to 1.212°N) and Route 2 (southbound, from 1.210°N to 1.189°N). The two routes traverse the same geographic corridor in opposite directions, causing their trajectories to spatially overlap and creating potential v essel encounter situa- tions. Figure 3 shows the spatial distribution of both datasets. Raw trajectories are ﬁltered by geographic start/end bounding-box constraints to isolate each directional ﬂow . The retained trajectories are then resampled to a uniform 10-second interval and missing values are ﬁlled by linear interpolation. Each datasets utliers that de viate signiﬁcantly from the domi- nant ﬂow are identiﬁed and removed using pairwise distance ﬁltering to obtain a clean, well-structured dataset suitable for comparativ e ev aluation. A ﬁxed time window is then applied to standardize sequence length, yielding 1094 trajectories with 91 steps for Route 1 and 2310 trajectories with 61 steps for Route 2 as the ﬁnal training inputs. Fig. 3: V isualization of trajectory datasets for (a) Route 1 and (b) Route 2. 2) Experiment Setup: All experiments are conducted on a workstation running Ubuntu 22.04 with a single NVIDIA GeForce R TX 4080 GPU under PyT orch 2.6. T wo models are trained corresponding to the two route datasets. The Route 1 model is trained on 1094 well-structured trajectories with 91 time steps each, serving as the primary comparativ e ev aluation benchmark. The Route 2 model is trained on 2310 more complex trajectories with 61 time steps each, designed for robustness ev aluation under challenging data conditions. Both models use a latent dimension of 100, batch size of 64, and are trained for 500 epochs. After training, each model generates 1000 trajectories for e valuation. All trajectories maintain a uniform 10-second interval between consecutiv e data points. 3) Experimental Results: Comparativ e ev aluation is con- ducted on both Route 1 and Route 2 datasets against baseline models, where Route 2 additionally serves to demonstrate the robustness of ConﬂuxV AE under more challenging real-world data conditions. ConﬂuxV AE demonstrates stable conv ergence on the Route 1 dataset, effecti vely capturing the dominant trafﬁc ﬂow pat- terns. Overall, the generated trajectories e xhibit similar trends and spatial distributions to the original trajectories. Figure 4 visualizes the generated trajectories, demonstrating that the model successfully reproduces the spatial distribution and navigational patterns of real v essel mov ements while maintain- ing realistic trajectory smoothness and continuity . Howe ver , some unexpected jitters occur in the generated trajectories, particularly at trajectory bends. T o improv e data quality and smooth the trajectories, the Savitzk y–Golay ﬁlter is applied for denoising after model generation. Fig. 4: V essel trajectories generated by ConﬂuxV AE T o ev aluate the performance of the generated trajectories, the similarity and error between the original and generated trajectories are measured. Since MAE and MSE are calculated based on pairwise trajectory comparisons, the results are obtained in a 1000 × 1094 matrix for Route 1. The mean value of the calculated matrix is taken as the ﬁnal metric value. Smaller values indicate better performance and higher similarity between trajectories. T able I shows the comparative results using MAE, MSE, DM, and MMD metrics against baseline models. T ABLE I: Comparative ev aluation on Route 1 dataset. Model MAE ↓ MSE ↓ DM ↓ MMD ↓ CNN 0.1658 0.0453 0.9073 0.5163 GAN 0.0763 0.0125 0.7560 0.5630 LSTM 0.0615 0.0079 0.7301 0.4937 BiLSTM 0.0621 0.0082 0.6807 0.4812 V AE 0.0569 0.0053 0.7235 0.4998 T ransformer V AE 0.0452 0.0038 0.6287 0.5373 Social V AE 0.0396 0.0038 0.6706 0.3748 ConﬂuxV AE 0.0177 0.0008 0.6315 0.3706 ↓ : Less is better; ↓ : Training metric; ↓ : Maritime AIS-speciﬁc metric. Bold : 1 st per column; Underline: 2 nd per column. T o further ev aluate model robustness, T able II presents the comparativ e ev aluation on the Route 2 dataset, which contains greater trajectory irregularity and AIS signal variability . While all models exhibit some performance degradation compared to Route 1, ConﬂuxV AE maintains top performance across all metrics, demonstrating its robustness under noisy and complex real-world data conditions. The multi-scale temporal modeling capability of Conﬂux EMA enables ConﬂuxV AE to simultaneously capture both local trajectory details and global ﬂo w patterns, allo wing it to generalize effecti vely across varying data quality and behavioral diversity . T ABLE II: Comparativ e ev aluation on Route 2 dataset (robust- ness ev aluation). Model MAE ↓ MSE ↓ DM ↓ MMD ↓ CNN 0.1282 0.0291 0.9308 0.6992 GAN 0.0860 0.0145 0.7604 0.8202 LSTM 0.0358 0.0036 0.6724 0.7420 BiLSTM 0.0361 0.0045 0.6717 0.6767 V AE 0.0425 0.0023 0.7512 0.7402 T ransformer V AE 0.0290 0.0021 0.6782 0.6816 Social V AE 0.0392 0.0042 0.7245 0.6452 ConﬂuxV AE 0.0217 0.0016 0.6393 0.6314 ↓ : Less is better; ↓ : Training metric; ↓ : Maritime AIS-speciﬁc metric. Bold : 1 st per column; Underline: 2 nd per column. 4) Ablation Study: T o validate the contribution of each component in ConﬂuxV AE, an ablation study is conducted on Route 1 by removing one component at a time. Four ablation variants are e valuated: without the Conﬂux EMA module (replaced by a standard con volutional block), without the complete Conﬂux module, without β -weighted KL di ver gence, and without the Savitzky–Golay postprocessing ﬁlter . T able III reports the results. Removing the Conﬂux EMA module leads to a substantial increase in MAE and MSE, conﬁrming that multi-scale tempo- ral feature extraction is critical for trajectory reconstruction ac- curacy . Removing the entire Conﬂux module further degrades MAE while achie ving the second-best MSE, indicating that the EMA sub-module is the primary contributor to distribution- lev el quality . Removing β -weighted KL causes the DM metric to collapse (marked as strikethrough), sho wing that the model fails to generate valid trajectory distributions without proper KL regularization. Removing the Savitzk y–Golay ﬁlter has negligible impact on MAE and MSE but slightly degrades DM and MMD, demonstrating that postprocessing smoothing contributes to distribution-le vel ﬁdelity . The full ConﬂuxV AE achiev es the best performance on MAE, MSE, and MMD, validating the necessity of each component. T ABLE III: Ablation study of ConﬂuxV AE on Route 1. V ariant MAE ↓ MSE ↓ DM ↓ MMD ↓ w/o Conﬂux EMA 0.0568 0.0052 0.7086 0.4999 w/o Conﬂux 0.0351 0.0026 0.6958 0.4658 w/o β -KL 0.0609 0.0047 0.0000 0.9996 w/o SG ﬁlter 0.0179 0.0008 0.6481 0.3709 ConﬂuxV AE (full) 0.0177 0.0008 0.6315 0.3706 5) Qualitative Case Study: Baseline models each exhibit distinct limitations: LSTM-based models and standard V AE suffer from mode collapse with trajectories con ver ging to a single route, CNN and GAN produce chaotic unrealistic paths, Social-V AE generates fragmented trajectories by ov er- capturing AIS signal dropouts, and T ransformer-V AE sho ws noticeable jitter at trajectory bends. ConﬂuxV AE ov ercomes these limitations through multi-scale temporal modeling, si- multaneously capturing local trajectory details and global traf- ﬁc ﬂow patterns, yielding trajectories that balance smoothness and div ersity while faithfully reproducing maritime trafﬁc ﬂo w structure for safety-critical scenario construction. B. Demonstration of Constructed Safety-Critical Scenario 1) T raf ﬁc Flow and Encounter Context: T o demonstrate the applicability of the proposed framework, two representativ e trafﬁc ﬂows within the study area are selected as experimental contexts. As shown in Figure 5a, the study area is located near the precautionary area with the highest density of multi-vessel encounters according to the chart. Among the intersecting trafﬁc ﬂo ws in this region, the two dominant routes indicated by the thick arro ws are selected for analysis, namely the southbound outbound trafﬁc ﬂo w and the eastbound trafﬁc that turns northward to enter the port. A representative real encounter occurring within these ﬂows is illustrated in Figure 5b, where the vessel trajectories demon- strate a typical interaction pattern observed in the historical AIS data, and the southbound vessel has already initiated an av oidance maneuver . 2) Scenario Construction fr om Generated T rajectories: Based on the representativ e traf ﬁc ﬂows identiﬁed in the pre- vious section, synthetic trajectories are generated to construct (a) Study Area and Traf ﬁc Flows (b) An Encounter Example Fig. 5: T rafﬁc Flow and Encounter Context a di verse set of encounter scenarios. The generation process preserves the statistical properties of the original AIS data while allowing controlled variations in vessel motion, timing, and spatial interaction patterns. The resulting set of scenarios exhibits signiﬁcant diversity in terms of relativ e geometry , encounter timing, and ma- noeuvring behaviour , as illustrated by the six representativ e scenarios shown in Figure 6 (Additional generated scenarios are av ailable at the attached link). In addition, to systemati- cally represent these generated encounters, each scenario is expressed in a structured form consisting of Pre-encounter segment (solid line), Encounter segment (light dashed line) and the Post-encounter segment (thick dashed line). It is worth noting that the trajectory dataset used for training is dominated by encounters in v olving relati vely small vessels, primarily bunk er barges with lengths of around 100 m. Accordingly , the scenario construction parameters adopted in this study are set conservati vely , with d min = 0 . 05 nm and both t early and t after ﬁxed at 100 s. It can be observed that the generated trajectories preserve the ke y characteristics of the original historical tracks while not being exact replicas. At the same time, the interacting trajectories are capable of triggering safety-critical close-encounter situations. Fig. 6: Constructed Safety-Critical Encounter Scenarios V . C O N C L U S I O N This paper presented a data-dri ven framework for construct- ing safety-critical v essel encounter scenarios from histori- cal AIS data. By integrating generative trajectory modeling with structured encounter pairing and temporal parameteri- zation, the proposed method enables scalable construction of simulation-ready maritime testing scenarios while preserving the statistical characteristics of real trafﬁc ﬂows. T o enhance trajectory realism and rob ustness under noisy AIS observations, a multi-scale temporal variational autoen- coder architecture was introduced. Experimental results on real-world trafﬁc ﬂows demonstrated that the proposed Con- ﬂuxV AE ef fectiv ely captures vessel motion patterns, improves trajectory smoothness and distributional ﬁdelity , and maintains robustness across datasets with varying lev els of complexity . The generated trajectories were further combined to construct div erse encounter scenarios that retain ke y characteristics of real historical interactions while enabling the discovery of safety-critical cases beyond those directly observed. The proposed framew ork pro vides a practical pathway for building scenario libraries for simulation-based digital testing, benchmarking, and safety assessment of autonomous navig a- tion systems and decision-support tools in maritime trafﬁc en vironments. Future work will in vestigate the extension of the framew ork to multi-vessel interactions, incorporation of en vi- ronmental factors such as wind and current, and integration with large-scale simulation platforms to support systematic validation of intelligent maritime trafﬁc management systems. R E F E R E N C E S [1] H Imazu. Research on collision avoidance manoeuvre. T okyo University of Marine Science and T echnology: T okyo, Japan , 1987. [2] Ivan Porres, Sepinoud Azimi, and Johan Lilius. Scenario-based testing of a ship collision avoidance system. In 2020 46th Eur omicr o Confer ence on Software Engineering and Advanced Applications (SEAA) , pages 545–552. IEEE, 2020. [3] Ryohei Sawada, Keiji Sato, and Makiko Minami. Framework of safety ev aluation and scenarios for automatic collision av oidance algorithm. Ocean Engineering , 300:117506, 2024. [4] Trym T engesdal and T or A Johansen. Simulation framework and software environment for ev aluating automatic ship collision avoidance algorithms. In 2023 IEEE Conference on Control T ec hnology and Applications (CCT A) , pages 186–193. IEEE, 2023. [5] Feixiang Zhu, Zhengyu Zhou, and Hongrui Lu. Randomly testing an autonomous collision avoidance system with real-world ship encounter scenario from ais data. Journal of Marine Science and Engineering , 10(11):1588, 2022. [6] W eiqiang W ang, Liwen Huang, Kezhong Liu, Y ang Zhou, Zhitao Y uan, Xuri Xin, and Xiaolie W u. Ship encounter scenario generation for colli- sion avoidance algorithm testing based on ais data. Ocean Engineering , 291:116436, 2024. [7] W eiqiang W ang, Kezhong Liu, Liwen Huang, Xuri Xin, Xiaolie W u, and Zhitao Y uan. Generation and complexity analysis of ship encounter scenarios using ais data for collision av oidance algorithm testing. Ocean Engineering , 312:119034, 2024. [8] V ictor Bolbot, Christos Gkerekos, Gerasimos Theotokatos, and Evan- gelos Boulougouris. Automatic traf ﬁc scenarios generation for au- tonomous ships collision avoidance system testing. Ocean Engineering , 254:111309, 2022. [9] T om Arne Pedersen, Chanjei V asanthan, Kristian Karolius, Øystein Engelhardtsen, Koen Pieter Houweling, and Are Jørgensen. Generating structured set of encounters for verifying automated collision and grounding avoidance systems. In Journal of Physics: Conference Series , volume 2618, page 012013. IOP Publishing, 2023. [10] Feixiang Zhu, Y ihan Niu, Moxuan W ei, Y ifan Du, and Pengyu Zhai. A high-risk test scenario adaptive generation algorithm for ship au- tonomous collision av oidance decision-making based on reinforcement learning. Ocean Engineering , 320:120344, 2025. [11] Simon Ulbrich, T ill Menzel, Andreas Reschka, Fabian Schuldt, and Markus Maurer . Deﬁning and substantiating the terms scene, situation, and scenario for automated driving. In 2015 IEEE 18th International Confer ence on Intelligent T ransportation Systems , pages 982–988. IEEE, 2015. [12] Xian Ding, Hongwei Bian, Heng Ma, and Rongying W ang. Ship trajectory generator under the interference of wind, current and wa ves. Sensors , 22(23):9395, 2022. [13] ChangXi Zhuang and Chao Chen. Research on autonomous route generation method based on ais ship trajectory big data and improved lstm algorithm. Fr ontiers in Neuror obotics , 16, 2022. [14] Ian J Goodfellow , Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, Da vid W arde-Farley , Sherjil Ozair , Aaron Courville, and Y oshua Bengio. Generativ e adversarial nets. Advances in neural information pr ocessing systems , 27, 2014. [15] Diederik P Kingma and Max W elling. Auto-encoding variational bayes. arXiv pr eprint arXiv:1312.6114 , 2013. [16] Luming Chen and Sujit K. Ghosh. A test of relativ e similarity for model selection in generativ e models. Entr opy , 26(2):150, 2024.

From Vessel Trajectories to Safety-Critical Encounter Scenarios: A Generative AI Framework for Autonomous Ship Digital Testing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment