From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 1 From V irtual En vironments to Real-W orld T rials: Emer ging T rends in Autonomous Dri ving Aditya Humnabadkar* , Student Member , IEEE , Arindam Sikdar* , Member , IEEE , Benjamin Cav e, Huaizhong Zhang , Nik Bessis , Senior Member , IEEE , and Ardhendu Behera † , Member , IEEE Abstract —A utonomous driving technologies have achieved sig- niﬁcant advances in r ecent years, yet their real-w orld deployment remains constrained by data scarcity , safety requir ements, and the need for generalization acr oss diverse en vironments. In response, synthetic data and virtual en vironments hav e emerged as powerful enablers, offering scalable, contr ollable, and richly annotated scenarios for training and e valuation. This survey presents a compr ehensive r eview of r ecent developments at the intersection of autonomous driving, simulation technologies, and synthetic datasets. W e or ganize the landscape across three cor e dimensions: (i) the use of synthetic data for per ception and planning, (ii) digital twin-based simulation for system validation, and (iii) domain adaptation strategies bridging synthetic and real-w orld data. W e also highlight the role of vision-language models and simulation realism in enhancing scene understanding and generalization. A detailed taxonomy of datasets, tools, and simulation platforms is provided, alongside an analysis of trends in benchmark design. Finally , we discuss critical challenges and open resear ch directions, including Sim2Real transfer , scalable safety validation, cooperative autonomy , and simulation-driven policy learning, that must be addressed to accelerate the path toward safe, generalizable, and globally deployable autonomous driving systems. Index T erms —A utonomous Driving, Deep Learning, Synthetic Data, Sim2Real, Real2Sim, Multimodal Learning, Semantic Un- derstanding, Vision-Language Models. I . I N T RO D U C T I O N A UTONOMOUS V ehicles (A Vs) ha ve witnessed signiﬁ- cant progress with deep learning methods enabling them to see, understand, and act within complex trafﬁc en viron- ments. Modern perception systems can no w interpret scenes with remarkable accuracy [1], [2]. Howe ver , collecting large- scale real-world driving data is often costly , time-consuming, and constrained by safety and ethical considerations [3]. Real- world datasets may lack div ersity in rare or safety-critical scenarios, limiting an A V’ s holistic testing. T o address these gaps, researchers increasingly rely on simulation techniques to create virtual replicas of driving en vironments, often termed Digital T wins (DTs), which can be used to test and train A V models under a broader spectrum of conditions than would be feasible on real roads. During dataset creation, subtle differences in sensor noise, weather , and trafﬁc beha vior between virtual and physical domains create a domain gap [4] A. Humnabadkar , A. Sikdar , B. Cave, H. Zhang, N. Bessis and A. Behera authors are with the Department of Computer Science, Edge Hill Univ ersity , United Kingdom. *Equal contribution; † Ardhendu Behera is the corresponding author , beheraa@edgehill.ac.uk Manuscript received February xx, 2025; revised xx. that can degrade model performance when transferring from simulation to reality . W ithin this landscape, Simulation-to-Reality (Sim2Real) and Reality-to-Simulation (Real2Sim) framew orks have be- come indispensable. These approaches transfer knowledge between simulated environments (where models are developed and tested) and the real world (where they are deployed) to iterativ ely reduce the domain gap. For example, Sim2Real techniques adapt models trained in simulators lik e CARLA [5], V irtual KITTI [6], or SYNTHIA [7] to perform reliably under unpredictable real-world conditions. Conv ersely , Real2Sim methods inject real-world data back into simulations, making virtual scenarios more faithful to reality [8]. By leveraging both directions, A V developers create feedback loops that continuously reﬁne model performance (see Fig. 1). In prac- tice, a model trained under idealized virtual conditions might struggle with irregular lighting, surprise weather ev ents, or unusual road user behavior when mo ved to real streets [8], [9]. Real2Sim alignment helps the simulator mirror such real- world challenges, ensuring that virtual testing uncovers issues before on-road deployment. Another crucial aspect of understanding driving scenes is predicting the intentions and trajectories of other agents (vehicles, pedestrians, cyclists) for safe navigation. In vir- tual or mixed-reality settings, researchers can simulate a wide spectrum of human and vehicle maneuvers, enabling AI models to learn from both real and fabricated behav- iors. This iterativ e cycle between real and synthetic data resources mitigates the domain gap and enhances model robustness [10]. Recently , V ision-Language Models (VLMs) hav e emerged as a promising tool to bridge simulated and real domains by embedding semantic information (textual cues) with visual data [11]. VLMs provide deeper contextual understanding of objects and scenes by associating images with descriptions or labels. For instance, real sensor im- agery can be rendered in simulation and annotated with rich text, allowing models to align semantics across domains. By incorporating such multimodal understanding, self-driving systems become more resilient to domain shifts, gaining robust insights from div erse visual contexts whether physical or artiﬁcially generated. Achieving robust scene understand- ing requires simultaneously modeling static infrastructure, dynamic agents, and external conditions through integrated perception architectures. T raditional CNNs established foun- dational capabilities: NVIDIA ’ s PilotNet demonstrated end- to-end pixel-to-steering mapping [12], while YOLO [13], [14] and SSD [15]–[19] rev olutionized real-time multi-class JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 2 Fig. 1. Illustration of a traditional A V perception-control pipeline. Sensor inputs (cameras, radar, LiD AR, GPS) capture environmental data, which is ﬁltered and processed in a preprocessing stage. The reﬁned data feeds into deep learning models for object detection and prediction. The outputs then guide high-lev el decision-making modules for path planning and behavior prediction, which are ex ecuted by the control system. A feedback loop continuously updates the system for real-time adjustments. detection. PointPillars extended detection to 3D LiDAR point clouds [20], complemented by Bird’ s-Eye-V iew (BEV) rep- resentations that unify multi-sensor fusion. Lift-Splat-Shoot encodes arbitrary camera conﬁgurations into BEV space [21], while BEVFusion fuses camera and LiD AR features for im- prov ed cross-domain robustness [22]. Graph Neural Networks further enhance relational reasoning [23]–[26], with frame- works like CommonRoad-Geometric and PreGSU modeling inter-agent spatial-temporal interactions [24], [27]. Seman- tic segmentation provides pixel-le vel scene parsing through efﬁcient architectures (ENet, ICNet) [28], [29] and multi- scale models (DeepLabv3+, HRNet) [30], [31]. T rajectory prediction employs RNNs and LSTMs with attention mech- anisms [32]–[34], comprehensiv ely surveyed for action recog- nition [35]–[37]. Control strategies use deep reinforcement learning with attention-augmented architectures [34], [38]– [44]. Translating these capabilities to simulation demands careful handling of static (road layouts, signage), dynamic (vehicles, pedestrians), and external (weather , lighting) fac- tors [45]–[47], as illustrated in Fig. 4. Neural-based 3D scene generation [48] enables procedural environment cre- ation through PCG-based methods (CARLA [5]), neural-3D approaches (CityDreamer [49], CityDreamer4D [50]), and video-based methods (MagicDriv e [51]). Frame works like DrivingGaussian employ Composite Gaussian Splatting for perspectiv e-reﬁned 3D representations [52], while V ectorized Scene Representation enables real-time static feature edit- ing [53]. Dynamic agent modeling combines LSTM/GNN architectures with V AE-GRU hybrids [24], [33], [41], [52], [54] and trafﬁc management systems [55]. External factors are addressed through multimodal sensor fusion [56], com- prehensiv e datasets (nuScenes) [57], physics-based rendering, and domain randomization [58]. This integrated treatment of perception architectures and simulation techniques is essential for creating high-ﬁdelity virtual environments that accurately replicate real-w orld comple xity . In summary , dev eloping re- liable autonomous dri ving systems requires bridging the gap between real and simulated en vironments. This survey demon- strates how deep learning, vision language models (VLMs), and advanced simulation techniques collectiv ely contribute to this objectiv e, ultimately enabling more robust perception, prediction, and planning under div erse real world conditions. W e also examine the ethical and logistical constraints as- sociated with large scale real world data collection, which position synthetic data as a practical alternativ e for testing rare and safety critical scenarios. Simulation platforms provide controlled and repeatable testbeds to validate and reﬁne au- tonomous vehicle algorithms across a wide range of trafﬁc and weather conditions that would be prohibitiv ely difﬁcult or un- safe to replicate in physical trials [59], [60]. Furthermore, we emphasize the importance of ev aluating autonomous dri ving systems not only by con ventional performance metrics b ut also against emerging safety standards and societal benchmarks such as ISO 21448 and OpenSCEN ARIO. T able I contrasts the scope of this survey with earlier works. While pre vious surve ys ha ve typically focused on individual themes such as synthetic datasets, simulation platforms, or domain adaptation, only a few hav e explored emerging topics like vision language models or digital twin frame works in depth. Moreov er, very few hav e connected these technical advancements to real world autonomous vehicle trials and deployment challenges. In contrast, this surve y provides a uniﬁed and systematic ov erview that brings together simulation en vironments, mul- timodal learning, synthetic data generation, cooperativ e per - ception, and ﬁeld testing. This broader perspective synthe- sizes existing knowledge while revealing ne w interconnections across research areas, addressing a critical gap in the current literature. The remainder of the paper is or ganized as follows. Sec- tion II discusses vision-language models for contextual scene understanding and how multimodal representations improve situational awareness and decision-making in A Vs. Section III presents a comprehensive taxonomy of driving scene datasets and critically e xamines the scarcity of rich multi-modal an- notations, demonstrating how simulation-based approaches can address these fundamental data limitations. Section IV JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 3 examines simulation technologies that connect theoretical ad- vancements to applied testing, including domain adaptation strategies. Section V surve ys real-world autonomous driving trials and their outcomes, and provides an overvie w of infras- tructural, regulatory , and societal considerations for large-scale A V deployment. Finally , Section VI discusses ke y insights, open research questions, and future directions toward an in- tegrated, trustworthy autonomous driving ecosystem followed by conclusion in Section VII. Fig. 2. A uniﬁed framework for driving scene understanding where sev eral critical components, namely , sensor fusion, object recognition, semantic segmentation, motion prediction, mapping, and localization – merge synergis- tically to enable robust and safe driving. Each component addresses speciﬁc perception, prediction, or decision-making challenges. T ogether, they allow the vehicle to navigate dynamic environments with precision and reliability . I I . V I S I O N - L A N G U AG E M O D E L S ( V L M S ) F O R C O N T E X T U A L S C E N E U N D E R S TA N D I N G V ision-Language Models (VLMs) enable A Vs to in- terpret both visual geometry and high-le vel semantics through integrated language reasoning. As illustrated in Fig. 3, VLM-enabled systems can ground natural lan- guage commands (”ov ertake the red BMW”) in visual perception for context-a ware planning. Recent dev elop- ments (T ables II, III) span language-grounded navig a- tion (T alk2Car [76]), LLM-driven reasoning (DiLu [77], LanguageMPC [78]), HD-map-free planning (NaV id [79]), camera-only systems (CarLLaV A [80]), 3D multimodal un- derstanding (VLM2Scene [81], MAPLM [82]), zero-shot de- tection (CLIP2Scene [83]), and language-based trajectory gen- eration (DriveGPT4 [75]). These approaches vary in en viron- ments, modalities, learning paradigms, and deployment readi- ness—early systems tackled constrained language grounding while recent models integrate lar ge-scale reasoning and 3D spatial understanding. V ersatility assesses cross-domain appli- cability (environmental range, task generalizability); V iability ev aluates deployment readiness (latency , hardware require- ments). High viability indicates ≤ 200ms latency on consumer GPUs; Moderate requires ofﬂine processing or specialized hardware. Collecti vely , these approaches are compared across their en vironments (real or simulated), input and output modalities, learning types (supervised vs. self-supervised vs. knowledge-dri ven), and k ey inno vations. W e also note each model’ s generalization ability , practical viability (e.g., infer- ence speed), and the datasets used for dev elopment. A clear trend emerges as early VLM systems tackled direct language grounding in relatively constrained settings, whereas later systems integrated large-scale reasoning, 3D spatial under- standing, and rob ust simulation ties. Critical analysis re veals deployment trade-of fs: research models (Dri veGPT4 [75], DiLu [77]) achie ve 2-5 FPS requiring > 16GB memory versus navigation-focused systems (VLFM [90], NaV id [79]) reach- ing 15-25 FPS with reduced reasoning depth. CarLLaV A [80] achiev ed 32.6% performance gains but requires R TX 4090 GPUs with 500ms-2s delays, exceeding safe dri ving’ s 100- 200ms budget. Robustness remains challenging: 20-30% per- formance drops in low-light, 15-25% reductions in heavy rain. While conceptually powerful, practical deployment requires efﬁcient architectures, hardware acceleration, and domain- speciﬁc optimizations to balance real-time constraints with reasoning capabilities. Success would enable interactiv e sce- nario testing where engineers query A Vs in natural language, bridging human insight with machine precision for safer systems. W orld models show promise for future state prediction and synthetic video generation, with approaches like Cosmos- T ransfer1 [95] enabling conditional multimodal control. A recent survey [96] comprehensively covers generati ve AI for sim2real tasks. Howe ver , VLM application to controllable corner-case generation and causal reasoning requires further exploration. The next section examines how simulation ad- dresses VLM limitations through inﬁnite data v ariation and edge-case testing. I I I . A N N OTA T I O N - B A S E D T A X O N O M Y O F A U T O N O M O U S D R I V I N G D A TA S E T S Robust autonomous driving systems critically depend on comprehensiv e datasets that ef fectiv ely represent real-world complexity . An essential step in bridging the domain gap be- tween simulation and reality inv olves meticulously categoriz- ing driving datasets according to their annotation types. Such categorization allows researchers and practitioners to identify gaps, understand dataset capabilities, and select appropriate resources for developing advanced perception models. Fig. 5 illustrates a taxonomy of recent driving datasets categorized by their annotation types, speciﬁcally , 2D bounding boxes, segmentation masks, and 3D bounding boxes and highlights signiﬁcant coverage gaps in dataset annotations. While com- prehensiv e perception tasks ideally require datasets annotated simultaneously across these three modalities, fewer than 10% of datasets surveyed offer such completeness. a) Datasets with 2D Bounding Box Annotations: The most prev alent dataset category features 2D bounding box annotations, reﬂecting the fundamental object detection task in autonomous driving. These annotations typically comprise rectangular bounding boxes around objects such as vehicles, pedestrians, and cyclists within images, providing coarse spa- tial and class information. Datasets like KITTI [97] hav e JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 4 T ABLE I C O MPA R I SO N OF R EV I E W P A PE R S O N A UT O N OM O U S D R I V IN G T O P I CS Paper Title Synthetic Data Simulators VLMs Sim2Real/ Real2Sim T ransfer Digital T wins A V T rials How Simulation Helps Autonomous Driving: A Surve y of Sim2r eal, Digital T wins, and P arallel Intelligence [61] ✓ ✓ ✓ ✓ Synthetic Datasets for Autonomous Driving: A Survey [62] ✓ ✓ V ision Language Models in Autonomous Driving: A Survey and Outlook [63] ✓ A Survey on A utonomous Driving Datasets: Statistics, Anno- tation Quality , and a Future Outlook [64] ✓ A Compr ehensive Review on T rafﬁc Datasets and Simulators for Autonomous V ehicles [65] ✓ ✓ Review and analysis of synthetic dataset generation methods and techniques for application in computer vision [66] ✓ A Survey on Datasets for the Decision Making of Autonomous V ehicles [66] ✓ Exploring the Sim2Real Gap Using Digital T wins [67] ✓ ✓ ✓ Digital T wins for Autonomous Driving: A Compr ehensive Implementation and Demonstration [68] ✓ ✓ A Survey of Autonomous Driving: Common Practices and Emer ging T echnologies [69] ✓ Explanations in Autonomous Driving: A Survey [70] End-to-End Autonomous Driving: Challenges and F rontiers [71] ✓ W orld Models for Autonomous Driving: An Initial Surve y [72] ✓ A Survey on Recent Advancements in Autonomous Driving Using Deep Learning [44] ✓ Milestones in Autonomous Driving and Intelligent V ehicles: Survey of Surve ys [73] ✓ P erspective, Survey , and Tr ends: Public Driving Datasets and T oolkits for Autonomous Driving V irtual T est [74] ✓ ✓ Our Survey ✓ ✓ ✓ ✓ ✓ ✓ historically established benchmarks for vision-based object detection, signiﬁcantly inﬂuencing perception algorithm devel- opment. Similarly , large-scale datasets such as BDD100K [98] provide div erse on-road video frames annotated with 2D bounding boxes across various object classes, geographical regions, and en vironmental conditions. Although 2D bounding boxes are relativ ely simple to annotate, their informational depth remains limited, capturing only object presence and approximate location without precise shape or spatial context. b) Datasets with Se gmentation Annotations: Datasets offering se gmentation annotations provide granular , pixel-le vel scene understanding. Semantic segmentation delineates each pixel into distinct classes (e.g., roads, v ehicles, pedestrians), thus supporting advanced perception tasks including scene parsing and en vironmental mapping. Cityscapes [99] repre- sents a foundational semantic segmentation dataset, featuring pixel-le vel annotations of complex urban environments, while Mapillary V istas provides global coverage with a diverse array of object classes. Segmentation annotations signiﬁcantly enrich scene understanding but entail substantial annotation efforts, explaining their relativ e scarcity and smaller scales compared to bounding-box-only datasets. c) Datasets with 3D Bounding Box Annotations: Datasets annotated with 3D bounding boxes support com- prehensiv e spatial reasoning by pro viding precise three- dimensional positions, sizes, and orientations of objects. T yp- ically aligned with multi-sensor setups in volving LiD AR and stereo cameras, such datasets like nuScenes [100], W aymo Open Dataset [101], Argov erse [102], and L yft Le vel 5 fa- cilitate robust multi-sensor fusion and advanced 3D detection algorithms. These datasets typically accompany synchronized 2D annotations but rarely include segmentation labels due to prohibitiv e annotation costs because manual 3D labeling requires skilled labor and specialized tools [103], [104]. Furthermore, capturing safety-critical yet infrequent scenarios (emergenc y vehicle interactions, extreme weather) remains challenging in real-world collection, motiv ating simulation- based augmentation approaches discussed in Section IV. I V . S I M U L AT I O N T E C H N O L O G I E S : B R I D G I N G T H E O RY A N D P R A C T I C E Simulation technologies are essential for dev eloping and validating autonomous vehicle (A V) systems, offering con- trolled, repeatable, and safe en vironments that bridge the gap between theoretical models and real-world deployment. This section explores key simulation methodologies, including Sim2Real and Real2Sim transfer , digital twin-based testing, and comparativ e analyses of simulation platforms and domain adaptation strategies. Collectiv ely , these approaches highlight how virtual testing accelerates A V development while support- ing real-world applicability . JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 5 T ABLE II E V A L UATI O N O F TH E DI FF E R EN T VL M S DE S I G NE D FO R SC E N E U N D E RS TA ND I N G . S TA T E - O F - T H E - A RT M ET H O D S H A V E B E E N C O M P A R E D W I T H R E SP E C T T O T H E DAT A S E TS T HE Y US E , T H E I N P U T M O DA L IT I E S , L E A RN I N G T Y P E S , T A R G ET T A S K S , C O M P UTA T I O NA L C O M P LE X I T Y , R O BU S T NE S S , A N D G E NE R A L I ZATI O N CA PA BI L I T IE S . A L L TH E MO D E L S P ROV I D E U N I QU E I N S I G HT S ON U SI N G L A N G UAG E A S A M E D IU M TO DE E P E N T H E C O NT E X T O F A S C EN E . T H E ON LY C A V E A T IS T HAT T H E RE I S N O S TAN DA R D I Z ED A PP L I C A T I O N N O R A N AP P RO P R IAT E M E T RI C TH AT H A S B E EN D EV E L O PE D TO AC C U R A T E L Y ES T I M A T E T H E E FFI C IE N C Y A N D U S E F UL N E S S O F T H E S E M O DE L S . Paper , Y ear En v . Input Modality Output Modality Learning T ype T echnique Key point V ersatility Viability Data T alk2Car (2019) [76] Real-world RGB camera, language instruc- tions Navigation adjust- ments Supervised learning V isual- language grounding Language- based interaction safety High Moderate T alk2Car dataset DiLu (2024) [77] Simulated, real-world multimodal inputs (text, visual) Decision- making commands Knowledge- driv en reasoning Explainability via reasoning modules Handles human-like reasoning in safety-critical tasks V ery High High Knowledge- based datasets LanguageMPC (2023) [78] V arious simulations Language instruc- tions, visual data Control ac- tions, navi- gation Language- to-control mapping Attention maps from language to control Multi-vehicle coordination High V ery High Custom multi-agent datasets LimSim++ (2024) [84] Urban sim- ulations RGB, language- based commands Adaptiv e control actions Closed- loop multimodal learning Insights through continuous adaptation Robustness across diverse scenarios Moderate Moderate Multi- scenario simulated datasets NaV id (2024) [79] Real- world, simulation Continuous RGB streams W aypoints, naviga- tional cues V ision- language navigation V isual-textual mappings for navigation Navigates un- seen en viron- ments High Moderate Continuous RGB- based dataset CarLLaV A (2024) [80] CARLA, real-world Camera (RGB), language input Steering, throttle V ision- language integration Auxiliary depth outputs, semantic maps High performance in safety- critical tasks High High CARLA driving data VLM2Scene (2024) [81] Synthetic datasets Image, Li- D AR, text 3D scene under- standing Self- supervised contrastiv e learning Region-based semantic reasoning Detection in complex 3D en vironments V ery High High 3D synthetic datasets NuScenes- QA (2024) [57] NuScenes, Custom RGB, mul- timodal in- puts V isual question answering multimodal supervised learning T ext-based scene reasoning Understands multimodal relationships High Moderate NuScenes QA dataset MAPLM (2024) [82] V arious en- vironments LiD AR, panoramic images, HD maps multimodal scene under- standing Cross- modal supervised learning Cross-modal reasoning Handles complex cross-modal relationships V ery High High Large- scale trafﬁc datasets VLPD (2023) [85] KITTI, NuScenes Multi- camera, LiD AR Pedestrian detection V ision- language self- supervision Activ ation maps for pedestrians Addresses occlusion and small object detection High Moderate Pedestrian datasets CLIP2Scene (2023) [83] CARLA RGB, depth, 3D point clouds Zero-shot object detection Zero-shot learning CLIP-based visual- language reasoning Robustness in data-sparse conditions V ery High High CARLA 3D datasets Driv eGPT4 (2023) [75] BDD-X Language instruc- tions, visual sensor data Control predictions Instruction- tuned language modeling Attention maps based on language input Handles complex navigation through reasoning High V ery High BDD-X dataset V AD (2024) [53] Synthetic en viron- ments V ectorized scene data Planning efﬁcienc y improve- ments Self- supervised vector learning Instance- based representation analysis Efﬁcient in com- putational resource use Moderate High V ector datasets CityLLaV A (2024) [86] Urban sim- ulations Bounding box, language prompts Navigation adjust- ments V ision- language ﬁne-tuning Bounding box explanations Manages language- to-scene mapping challenges High Moderate Urban simulation datasets VLAAD (2024) [87] BDD-X, HDD, DRAMA RGB, LiD AR, natural language V ehicle commands, navigation multimodal instruction tuning Explainable through natural language Handles long- tail and rare driving condi- tions V ery High High VLAAD dataset JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 6 Fig. 3. Showcasing a Dri veGPT [75] workﬂo w in which natural language inputs from users are interpreted by LLM agents to allocate 3D models, gather rele vant contextual elements like the surroundings and mov ement dynamics, and execute rendering functions. It also includes view modiﬁcation and background creation, ultimately producing a photorealistic driving scene with dynamic simulation. This workﬂo w demonstrates the capability of integrating advanced language models with simulation tools to enable precise and ﬂexible autonomous vehicle testing and scenario visualization, bridging the g ap between human intent and machine-generated environments. Fig. 4. Illustration of the integration of static, dynamic, and external factors into a uniﬁed framework for driving scene understanding. T ogether , they inform perception tasks, guide decision-making processes, and enable domain adaptation strategies to handle the complexities of real-world en vironments. By simultaneously considering these three aspects, A Vs can gain a comprehensiv e insight into their environment, leading to safer and more efﬁcient navig ation. A. The Role of Sim2Real and Real2Sim in Generating Com- plex Driving Scenarios Sim2Real and Real2Sim are core paradigms for transferring knowledge between simulation and reality in autonomous dri v- ing. Sim2Real in volves training models in virtual en vironments and deploying them in the real world. Platforms such as CARLA [5], V irtual KITTI [6], and SYNTHIA [7] simulate urban dri ving scenes with div erse conditions, enabling scalable training (see T able IV). Howe ver , models trained in simulation often underperform in reality due to the domain gap caused by visual and behavioral differences [8]–[10]. These gaps result from unmodeled real-world complexity , such as unpredictable lighting, rare ev ents, or non-compliant road users. Real2Sim takes the reverse approach, using real-world data to reﬁne simulation environments. This includes importing sensor noise characteristics, trafﬁc patterns, and trajectory logs to improve realism and support scenario replay [8]. T ogether , Sim2Real and Real2Sim enable a complementary feedback loop, where simulation aids scalable training, and real data calibrates and validates the virtual en vironment. 1) Addr essing the Domain Gap Challenge: A ke y challenge in transferring models from simulation to the real world is the domain gap, deﬁned as the difference in data distribu- JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 7 T ABLE III ( C ON T I N U ED ) A D D I T IO N A L V L M S F OR S CE N E U N D E R STA N DI N G Paper , Y ear En v . Input Modality Output Modality Learning T ype T echnique Key point V ersatility Viability Data DRAMA (2023) [88] Real-world Camera, LiD AR, language Risk local- ization Joint risk explana- tion T ext-based explanations for risks Identiﬁes safety-critical objects High Moderate Risk annotation datasets VLP (2024) [89] Simulated en viron- ments V isual data, text Planning tasks V ision- language planning T ext- embedded reasoning Supports challenging scenario planning High Moderate Planning datasets VLFM (2024) [90] Custom scenarios V ision- language mappings, spatial in- formation Zero-shot navigation Zero-shot navigation via language integration Spatial- textual integration reasoning Effecti veness in unfamiliar en vironments High Moderate Custom scenarios data Natural Language Can Facilitate Sim2Real T ransfer (2024) [91] V arious en- vironments Language, RGB camera Navigation adjust- ments Language- guided domain adaptation Explainability through language- based cues Bridging Sim2Real gap V ery High High Domain adaptation datasets Driv eVLM- Dual (2023) [92] Real- world, simulated RGB, multimodal data T rajectory planning multimodal integration Explainability through trajectory reasoning Adaptability in dynamic driving scenarios V ery High High Driv eVLM datasets V ision- Language Frontier Maps (2024) [90] Custom, real-world V ision- language mappings, semantic context Zero-shot object lo- calization Semantic navigation via language Spatial reasoning using language prompts Robustness in real-world navigation V ery High High Boston Dynamics Spot dataset ChatScene (2024) [93] Simulated, real-world Language inputs, knowledge graphs Safety- critical driving scenarios Knowledge- based generation LLM-driv en scenario creation Combines domain knowledge and language models to create corner- case scenes High Moderate Synthetic + Real scenario datasets Editable Scene Simulation (2024) [94] Simulated Collaborativ e text in- structions Dynamically updated 3D scenes Multi- agent learning LLM-based scene editing Allows interactiv e editing of driving scenarios in simulation High Moderate Custom scene simulation dataset tions between the two domains. This gap typically arises from two sources: (a) the appearance gap, in volving visual discrepancies such as missing textures, lighting effects, and sensor artifacts [9], and (b) the content gap, which refers to differences in scene layout, object interactions, and behavioral patterns. Simulated environments often lack the unpredictabil- ity of real-world trafﬁc, including jaywalking pedestrians or irregular driving beha viors, which are difﬁcult to reproduce in rule-based simulations [8], [10]. Formally , this gap can be expressed as the difference between the joint distributions of sensor inputs X and outputs Y in simulation and real domains: ∆ P = P sim ( X, Y ) − P real ( X, Y ) A larger ∆ P indicates greater performance degradation when transitioning across domains. For instance, a model trained solely on clear, daytime simulations may fail under dusk lighting (appearance gap) or struggle to detect a child running across the street if such behavior was absent in training data (content gap). T o mitigate this gap, se veral strategies hav e been proposed. Feature alignment methods, such as CARE [9], align simulated and real data distributions using adversarial training or statistical matching, often with class-speciﬁc reweighting. High-ﬁdelity digital twins of fer geographic realism by repli- cating real-world infrastructure and traf ﬁc ﬂow [8]. In control systems, Sim2Real for MPC employs system identiﬁcation and robust control to address discrepancies in vehicle dy- namics and surface friction [10]. Domain randomization intro- duces v ariation in visual and structural elements to promote in variance. Quantitative performance comparisons of these strategies, including domain randomization (8-12% gains), CARE (18% improvement), and supervised ﬁne-tuning (90% gap closure). Recent work has critically examined ev alua- tion metrics for domain transfer . Lambertenghi et al. [114] demonstrated that commonly used metrics such as FID scores are not reliable indicators of neural reality gap mitigation, proposing alternative quality metrics that better correlate with downstream task performance in autonomous driving testing. 2) Generating Robustness in Real2Sim and Sim2Real: Robustness in domain transfer depends on both simulation augmentation and iterativ e integration of real-world feedback. Diffusion models have recently emerged as powerful tools for domain augmentation; Baresi et al. [115] demonstrated JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 8 Fig. 5. The V enn diagram shows the distribution of recent driving datasets across annotation types . About 50–60% of datasets provide only 2D bounding boxes; 20–30% include 3D bounding boxes along with other annotations; nearly 20% include segmentation masks (again typically alongside other modalities). Only a small fraction (5–10%) of datasets contain all three annotation types. Moreover , accounting for static, dynamic, and external factors makes data collection and labeling even more challenging. This underscores the need to supplement real datasets with simulation-based pipelines to cover rare or complex scenarios. efﬁcient test set augmentation for autonomous driving us- ing diffusion-based image synthesis, enabling targeted gen- eration of challenging scenarios for model validation. In Real2Sim, systems like AutoDRIVE [116] embed real-time data streams into simulations, enabling continuous model v ali- dation under realistic conditions. A more adv anced framew ork, Real2Sim2Real, iterati vely reﬁnes both simulation ﬁdelity and model robustness. T echniques such as neural surface reconstruction improve virtual asset realism based on real- world trials, which are then used to retrain control policies in simulation before re-deployment. Over successiv e itera- tions, the simulation environment incorporates rare and safety- critical patterns, increasing co verage and training efﬁciency . Studies indicate that beyond a certain volume of real data (e.g., 10,000 labeled frames), the marginal utility of further annotation diminishes [117]. Simulation then becomes the primary driv er of data di versity , enabling rob ust A V models through a balanced integration of real-world grounding and synthetic scalability . B. Digital T wins: A Compr ehensive Appr oach to A V T esting Digital T wins (DTs) are high-ﬁdelity virtual replicas of physical systems that remain synchronized with their real- world counterparts in real time. In the context of autonomous driving, a DT mirrors critical aspects of both the vehicle and its environment, enabling developers to ev aluate hypothetical scenarios in simulation while the physical vehicle operates normally or remains stationary . This approach facilitates safe and scalable testing, allowing detailed analysis of system re- sponses without physical risk. DTs model comple x interactions between hardware components such as sensors and actuators and software modules responsible for perception, planning, and control. They are especially valuable for assessing system behavior in rare or hazardous scenarios, including sudden pedestrian crossings or low-visibility conditions. Real-time data integration is essential for effecti ve DT performance. Projects such as T okyo T ech’ s smart mobility ﬁeld [118], [119] utilize edge computing and cloud infrastructure to stream liv e sensor data, including LiD AR and camera inputs, into the sim- ulation. This ensures the twin maintains close alignment with real-world conditions, supporting synchronized perception and planning ev aluations. Feedback from the DT can guide ofﬂine dev elopment or inform the A V’ s real-time decision-making process. T able V compares v arious DT implementations, sho w- ing trade-of fs across scenario complexity , synchronization accuracy , processing latency , and computational requirements. While full-scale DTs offer rich simulation capabilities, they often demand signiﬁcant infrastructure, whereas lighter DTs focus on speciﬁc monitoring tasks and require fe wer resources. Beyond their role in real-time mirroring, DTs support a wide range of autonomous vehicle development workﬂows. In particular , they enable structured scenario-based testing and contribute signiﬁcantly to bridging the gap between simulated and real-world data distrib utions. These tw o complementary capabilities are discussed below . 1) Scenario-Based T esting with Digital T wins: Scenario- based testing is a ke y application of DTs, providing a struc- tured alternati ve to broad or randomized testing. Developers JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 9 T ABLE IV T E CH N I C A L C O MPA R IS O N O F M OD E R N A U T ON O M O U S - D R I VI N G S I M U L A T O R S Simulator Physics / Graph- ics Sensor Fidelity / Re- alism Scenario & T rafﬁc W eather & Env . V ariation ROS Integra- tion Multi-Agent & Par - allelism Open Source Certiﬁcation / Compliance Key T echnical High- lights CARLA [5] Unreal Engine, custom physics pipeline Camera (RGB, depth), LiD AR, Radar, GPS, IMU Built-in Trafﬁc Manager; Scenario Runner Dynamic weather, time-of-day changes, varying road conditions (wet/dry, fog) Native ROS bridge (ROS1, partial R OS2) Multiple vehicles/agents in synchronous or real-time mode; large-scale HPC possible with manual setup Y es N/A (Research) High-ﬁdelity urban maps, realistic trafﬁc logic, ﬂexible Python APIs, large open- source community SVL Simulator ( formerly LGSVL ) [105] Unity with modular physics plug-ins Camera, LiD AR, Radar, GPS, IMU Scenario Editor Dynamic weather and lighting changes Native support (ROS1 and ROS2) Scenario Editor Y es N/A (Research) Integrated Autoware and Apollo AD stacks; user-friendly scenario creation and sensor conﬁguration; active community NVIDIA DRIVE Sim [106] Omniverse and PhysX Photorealistic camera feeds (with sensor artifacts), LiDAR, Radar Scenario author- ing via Omni- verse Physically based sky system, dynamic lighting, adv anced reﬂections and refractions for weather v ariability Closed-source but adapters exist for ROS/R OS2 integration Highly parallelizable on GPU clusters; enterprise HPC pipelines for large- scale synthetic data generation Partial Enterprise (not certiﬁed) Real-time photorealism powered by GPU ray tracing, robust sensor simulation, strong inte gration with NVIDIA ’s AD software stack rFpro [107] Proprietary motorsport-grade engine High-ﬁdelity camera feeds, LiD AR, Radar AI trafﬁc vehicles and real- time scenario deﬁnition, SUMO integration Physically accurate weather and surface conditions (wet track, ice, mud), day/night transitions Limited (via proprietary bridges) Optimized for real- time HPC and Hardware-in-the- Loop (HIL); multi- vehicle concurrency common in OEM testing No OEM validation (not ISO 26262) Ultra-accurate vehicle dynamics (e.g. tire modeling), widely used by automotive OEMs; ideal for track-level precision and HIL dSP A CE [108] ISO 26262- certiﬁed simulation platform High-ﬁdelity sensor models Scenario-based testing Conﬁgurable weather models Native support Hardware-in-the- Loop (HiL/SiL) No ISO 26262 certi- ﬁed Industry standard for functional safety validation, OEM- grade HiL workﬂows aiSim [109] Certiﬁed simula- tion frame work Camera, LiD AR, Radar Traf ﬁc scenario generation W eather variation support Integration av ailable Scalable multi-agent No ISO 26262 certi- ﬁed Used for AD AS validation, regulatory compliance testing MORAI [110] Proprietary graphics & physics Camera, LiD AR, Radar, GPS, IMU Advanced trafﬁc behaviors in digital twin city maps; multi- agent scenario setup Day/night cycles, rain, fog Limited Scalable to full city environments No N/A Emphasis on digital twin ﬁdelity , realistic sensor/dynamic models, enterprise- level support for large-scale city simulations BeamNG.tech [111] BeamNG’s soft- body physics en- gine Primarily camera- based simulation; LiDAR possible via third-party plug-ins Proprietary AI controlled traf ﬁc Limited weather changes (sunny , rain, fog) with basic environment variation Partial (via APIs or bridging) Multiple vehicles run in real time, but HPC- scale parallelism re- quires manual setup No N/A Unparalleled collision realism and soft- body dynamics, widely used for crash scenarios and advanced vehicle kinematics research Scenario Gym [112] Lightweight en- gine Basic camera, conﬁg- urable sensors Scenario-centric execution Limited ROS compati- ble Single-agent focus Y es N/A (Research) Minimal setup, fast iteration, scenario- focused testing GarchingSim [113] Unity-based with photorealistic rendering Camera, LiD AR Simpliﬁed scenario deﬁnition Basic weather varia- tion ROS integration Limited multi-agent Y es N/A (Research) Photorealistic output, minimalist workﬂow , academic-friendly can deﬁne speciﬁc high-risk scenarios, such as highway merg- ing or emergenc y braking, and run them repeatedly in simu- lation. Frameworks like PanoSim [119] support this by com- bining real vehicle data with virtual scenarios. For example, a twin vehicle can be placed in an emergency braking situation while the physical vehicle dri ves under normal conditions. This enables exhaustiv e testing of rare or unsafe situations, allowing dev elopers to adjust perception or control parameters and re-ev aluate performance without physical risk. Hazardous conditions, including near-collisions, heavy rainfall, or low traction, can also be safely e xplored in the twin, helping to identify and address potential system weaknesses. 2) Digital T wins and Domain Gap Mitigation: In addition to structured testing, DTs play a signiﬁcant role in narrowing the gap between simulation and reality . Because DTs are continuously updated with real-world data, they serve as dynamic and accurate models of the current environment. This form of Real2Sim adaptation helps ensure that simulation scenarios reﬂect live operating conditions. For instance, a city- scale DT may integrate trafﬁc camera feeds, weather reports, and IoT data to maintain an up-to-date view of congestion, road closures, and environmental hazards. When A V algo- rithms are v alidated in such an en vironment, the results are more likely to translate effecti vely to real-world performance. Studies have demonstrated the beneﬁts of DTs that integrate LiD AR localization, roadside unit inputs, and cloud processing to support path planning and dynamic rerouting [120]. One deployment used live updates to run A V decision-making algorithms inside the twin, continuously ev aluating behavior under real-time conditions [121]. This strategy enables early detection of system ﬂaws and supports rapid model reﬁnement. As DT technology adv ances, its utility for validation, domain adaptation, and safe deployment will continue to grow . 3) Neural Reconstruction for A V V alidation: Neural ren- dering and reconstruction techniques are increasingly be- ing applied to autonomous driving simulation, offering pho- torealistic sensor data synthesis from real-world captures. RealEngine [122] demonstrates simulating autonomous dri v- ing within reconstructed realistic contexts, enabling closed- loop testing in digitized real environments. Hybrid approaches such as that proposed by T ´ oth et al. [123] merge neural and physics-based rendering for multimodal sensor simulation, combining the visual ﬁdelity of neural methods with the physical accuracy of traditional simulation. Industry adoption of neural reconstruction is accelerating JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 10 Fig. 6. A cyclical pipeline of Real2Sim and Sim2Real bridges the domain gap between synthetic training environments and real-world conditions. In the phase of Real2Sim, the physical parameters and expert knowledge ha ve been encoded into dynamic models that are enriched by domain randomization. Similarly , in the Sim2Real, the adaptation of the models back to real-world sensor inputs is done using curriculum learning, transfer learning, and knowledge distillation. The iteration of this loop progressiv ely reduces the gap, enhancing robustness and operational safety in autonomous driving. rapidly . NVIDIA ’ s Neural Reconstruction Engine (NuRec) lev erages neural radiance ﬁelds for A V simulation, while aiSim W orld Extractor enables automatic scene reconstruction from recorded sensor data. Applied Intuition’ s Neural Sim and W aabi’ s simulation platform similarly employ neural reconstruction to achiev e unprecedented realism in synthetic sensor generation. These tools represent a paradigm shift from handcrafted virtual en vironments tow ard data-driv en scene reconstruction, signiﬁcantly reducing the domain gap between simulation and reality . C. Critical Comparison and T rade-of f Analysis While earlier sections detail the capabilities of simula- tion platforms and domain adaptation techniques, effecti ve deployment of autonomous vehicle (A V) systems requires a deeper understanding of their performance trade-offs. This section critically examines leading simulation tools and do- main adaptation strategies, emphasizing practical limitations, comparativ e strengths, and guidance for selecting technologies across the A V de velopment pipeline. 1) Simulation Platform P erformance T rade-offs: Simula- tion platforms dif fer in ﬁdelity , scalability , cost, and ap- plicability . Open-source tools like CARLA [5] are popular in academia due to their ﬂexibility and low cost. CARLA supports high-ﬁdelity urban scenarios but experiences per- formance bottlenecks under load (e.g., frame rates fall with ∼ 30 vehicles), and domain gaps remain, with real-world transfer performance de grading by up to 20%. By contrast, NVIDIA DRIVE Sim [106] deliv ers photorealistic rendering with sensor modeling errors as low as ∼ 2 pixels, reducing the Sim2Real gap to 5–10%. Howe ver , its reliance on high-end GPUs increases infrastructure cost and energy consumption. At the high end, rFpro [107] is used in certiﬁcation workﬂo ws, offering < 3% dynamics error and hardware-in-the-loop com- patibility . It excels in controller validation but is less suited for perception model de velopment and requires signiﬁcant capital in vestment. Each simulator serves a speciﬁc phase: CARLA for rapid prototyping, DRIVE Sim for realistic validation, and rFpro for re gulatory testing. This phased use optimizes cost, realism, and technical suitability throughout dev elopment. 2) Domain Adaptation Strate gy Analysis: Bridging the sim–real domain gap in volv es techniques such as domain randomization, feature alignment, and ﬁne-tuning. Domain randomization improves robustness by exposing models to div erse simulated conditions, offering 8–12% accuracy gains. Howe ver , excessi ve variation can degrade performance due to unrealistic training data. Feature alignment techniques like CARE [9] of fer more targeted improvements, yielding up to 18% better transfer accuracy while reducing training over - head. Y et, they require careful tuning of adaptation losses and sometimes labeled real data. Semi-supervised ﬁne-tuning with a small real dataset often closes 90% of the domain gap, compared to 70–80% for unsupervised methods, though it increases computational cost signiﬁcantly . Hence, layered adaptation, consisting of randomization followed by alignment and light ﬁne-tuning, emerges as an effecti ve strategy . 3) Deployment Readiness Assessment: Deployment readi- ness varies across platforms and applications. CARLA dom- inates academic use, while NVIDIA DRIVE Sim and rFpro are more common in industry and regulatory contexts. How- ev er , several bottlenecks remain. Integrating vision-language models introduces latency (500 ms–2 s), limiting real-time applicability . Sensor simulation is also uneven: camera models are mature, but radar and LiDAR simulations remain less ac- curate under adverse conditions. Economic constraints further inﬂuence readiness. High-ﬁdelity simulation infrastructure can cost millions annually [106], [107], limiting accessibility for JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 11 T ABLE V C O MP R E H E NS I V E T E C HN I C A L C O M P A R I SO N OF D IG I TAL T WI N I MP L E M EN TA T I ON S IN A U TO N O M OU S D R I V I N G Study Digital T win Scenario Complex- ity Perf ormance Applications T ype Sync Capability Real-Time Processing Computational Req. Duan et al. [119] PanoSim-Based T win Integration of real vehicles and virtual environments High: End-to-end testing for A V systems High Moderate: Requires edge computing Autonomous vehicle test- ing and e valuation Samak et al. [124] Open Ecosystem T win Synchronization of simulated and real-world data Medium: Autonomous parking and driving tasks Moderate Moderate: Lightweight models Dev elopment and deploy- ment of autonomous dri v- ing algorithms Zhang et al. [125] T est Platform Digi- tal T win Combined use of simulation and real-world testing tools High: Scenario- speciﬁc testing en vironments High Moderate: Optimized for validation setups T esting and validation of intelligent vehicles Liang et al. [126] Platform-Agnostic Digital T win Deep RL framew ork support- ing various platforms High: Lane keeping, multi-lane decision- making High High: GPU- accelerated training Reinforcement learning- based decision-making Klar et al. [127] Monitoring and Control T win Data-driv en vehicle condition monitoring Medium: Predictive analysis for A V control Medium Moderate: Minimal hardware requirements V ehicle monitoring, fault prediction, and autonomous driving V oogd et al. [128] V ehicle-Centric T win Bi-directional synchronization of vehicle dynamics High: Path following and steering control High High: Integration of real and simulated data Reinforcement learning for vehicle dynamics W ang et al. [119] Infrastructure- Centric T win Real-time traf ﬁc updates using RSUs and edge computing Medium: City-level trafﬁc and road networks High Moderate: Edge com- puting and cloud pro- cessing Traf ﬁc ﬂo w optimization and route planning Sudhakar et al. [67] Adversarial Digital T win Robust domain transfer for ad- verse conditions High: Adverse weather scenarios High High: Cloud-based simulation Adversarial scenarios for stress testing He et. al. [129] Generativ e Scene T win Language-driv en 3D scene and scenario generation High: User-deﬁned dynamic scenarios High V ery High: GPU- intensiv e rendering Scenario generation, visu- alization, A V testing smaller teams. Additionally , simulation-based testing alone may not sufﬁce without real-world validation, especially in safety-critical domains. Deployment readiness v aries across platforms and use cases. CARLA dominates academic research, while NVIDIA DRIVE Sim and rFpro are common in industry certiﬁcation work- ﬂows. Howe ver , automotiv e dev elopment employs diverse simulation strategies across XiL test benches (Software-in-the- Loop, Model-in-the-Loop, Hardware-in-the-Loop), with tool selection depending on dev elopment phase, validation require- ments, and organizational infrastructure. Alternativ e platforms including Gazebo, IPG CarMaker , VTD, and MORAI serv e specialized needs across perception dev elopment, vehicle dy- namics v alidation, and sensor modeling. This layered approach ensures efﬁciency , robustness, and regulatory conﬁdence. V . R E A L - W O R L D T R I A L S Real-world trials represent the most decisive test for au- tonomous vehicle (A V) systems, exposing limitations that simulations and controlled environments may fail to reveal. Pilot programs across North America, Europe, and Asia are ac- tiv ely assessing how A V technologies, including sensor fusion, machine learning, and connectivity , can contribute to safer roads, reduced emissions, and improved urban mobility [144]– [147]. These deployments are expanding the operational scope of A Vs in both structured and complex en vironments, while simultaneously uncovering persistent challenges in percep- tion, decision-making, and safety [148]. Despite adv ances in sensing and planning, A Vs often struggle with ambiguous layouts, unpredictable road users, and rare events. These short- comings raise critical questions about liability , certiﬁcation, and ethical behavior in edge cases. In response, governments are dev eloping legal framew orks and validation procedures, although consensus on standards remains incomplete. T rials hav e highlighted technical bottlenecks, such as inconsistent lane markings, sensor limitations in adv erse conditions, and the need for robust redundancy [149], [150]. Regulatory agen- cies now require transparent diseng agement reporting [148], while municipalities like Phoenix are focusing on real-world performance in high-risk maneuvers [134]. Although trials demonstrate the beneﬁts of A Vs in select contexts, such as reduced collisions and enhanced accessibility , they also underscore the complexity of scaling these systems across div erse en vironments [144], [146], [148], [151]–[153]. The following sections examine core issues highlighted through these trials, including V2X integration, public trust, ev olving regulations, and cross-regional lessons. A. V2I and V2X Connectivity in T rials T o achieve safe autonomy in complex settings, A Vs must interact not only with their immediate en vironment b ut also with external entities such as infrastructure and other vehi- cles. This is enabled through V ehicle-to-Infrastructure (V2I) and V ehicle-to-Ev erything (V2X) communication. Real-world trials increasingly lev erage these technologies to overcome perception limitations in occluded or low-visibility conditions. For example, roadside units (RSUs) can broadcast signal phase, road hazard alerts, or pedestrian locations beyond an A V’ s line of sight. Advances in telecommunications, including 5G and Driving Safety Crash Reporting(DSRC), support lo w- latency , near-real-time data exchange critical for these ap- plications [154]–[156]. Early deployments hav e demonstrated that V2X connectivity can enhance situational awareness, supplement sensor fusion, and enable safer navigation under challenging conditions. Howe ver , deploying V2X systems in JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 12 T ABLE VI I M P AC T OF A V T EC H N O LO G I E S O N U R BA N M O B I L IT Y A ND I NF R A S TR U CT U R E City/Case Study A V T echnology Used Observed Impacts Challenges Addressed San Francisco [130] Shared A Vs Reduced vehicle ownership, altered parking needs, improved trafﬁc ﬂow Congestion, urban spra wl, pollution Singapore [131] Integrated A V and Public Trans- portation Enhanced ﬁrst-mile service, im- proved public transport efﬁciency Congestion, high public transport demand T oronto [132], [133] A V Ridehailing Operations Legal adaptation to new transport modes, better city planning Regulatory frame works, urban de- sign Phoenix, Arizona [134] W aymo One Robo-T axi Service Increased accessibility , large-scale real-world testing, user acceptance Safety concerns, insurance policies, liability issues Pittsbur gh, Pennsylvania [135] Autonomous Shuttles and Rideshar- ing Pilots Expanded coverage in underserved areas, boosted local robotics ecosys- tem Public acceptance, legal complexi- ties, inclement weather adaptations Hambur g, Germany [136] Autonomous Shuttle Trials (e.g., HEA T) Improved integration with public transport, data-driven trafﬁc man- agement Infrastructure readiness, cybersecu- rity , operational costs Dubai, UAE [130] Self-Driving T axi Trials Pathway to achie ving 25% driver - less trips by 2030, improved road safety Regulatory approv als, cultural ac- ceptance, extreme climate condi- tions Helsinki, Finland [137] Automated Bus Pilot Projects Eco-friendly transit solutions, en- hanced suburban connectivity Harsh winter weather , univ ersal de- sign for accessibility Milton Keynes, UK [138] StreetCA V Driv erless Shuttles Enhanced urban mobility , reduced congestion, increased public aware- ness Public safety , infrastructure integra- tion, regulatory compliance Sydney , Australia [139] Cit-e Autonomous V ehicle Trials Improved trafﬁc light coordination, enhanced pedestrian safety Integration with existing traf ﬁc sys- tems, public acceptance Columbus, Ohio [140] Smart Columbus A V Shuttle Pro- gram Increased adoption of electric vehi- cles, improved last-mile connectiv- ity Public engagement, technological infrastructure, safety validation T okyo, Japan [130] T ier IV Self-Driving EV T axis Dev elopment of spacious autonomous taxis, addressing public transport challenges V ehicle design, regulatory approval, public acceptance Masdar City , UAE [141] Personal Rapid T ransit (PR T) Sys- tem Demonstrated feasibility of au- tonomous electric pods for urban transit Integration with pedestrian traf ﬁc, infrastructure costs Greenwich, London, UK [142] GA TEway Autonomous Shuttle Tri- als Assessed public acceptance and op- erational challenges of A Vs in urban settings Mixed-use pathways, safety regula- tions, public perception Gothenbur g, Sweden [143] V olvo’ s Autonomous Bus Trials Ev aluated performance of large au- tonomous buses in real trafﬁc con- ditions T rafﬁc integration, safety standards, scalability real-world settings presents notable challenges. Communica- tion protocols must support both low latenc y and adequate bandwidth. Trials by MIT’ s A V lab and others [154], [155] demonstrate that basic messages can be e xchanged quickly , but transmitting high-bandwidth data such as raw video remains impractical. As a result, it is more effecti ve to share processed outputs like object locations rather than raw sensor streams. The added data from V2X also increases computational load, requiring A Vs to process external messages alongside onboard sensor data within strict timing constraints. Studies [156], [157] indicate that dedicated V2X modules and efﬁcient GPUs are essential for real-time performance. Several pilot programs are also exploring edge computing, where roadside or cloud units aggregate data from multiple sources and provide sum- marized information to vehicles. This approach helps reduce the processing burden on individual A Vs and allo ws detection of complex hazards that a single vehicle might miss. A consistent observ ation across pilot deplo yments is that cooperativ e perception, where vehicles share their sensor data, signiﬁcantly enhances situational awareness. OpenCDA [158] represents a pioneering framework in this domain, providing an inte grated co-simulation platform for cooperativ e dri ving automation (CD A) research. Building on this foundation, OpenCD A-R OS [159] enables seamless integration between simulation and real-world cooperative driving automation, facilitating direct deployment of algorithms de veloped in simulation to physical vehicles through standardized R OS interfaces. CD A-SimBoost [160] provides a uniﬁed framework speciﬁcally designed to bridge real sensor data with sim- ulation en vironments for infrastructure-based CDA systems, addressing the unique challenges of roadside perception and vehicle-infrastructure coordination. Systems like COOPER- N A UT [161] demonstrate that collaborativ e communication helps vehicles manage occlusions and long-range detection more effecti vely than isolated perception. AutoCastSim ev al- uations showed a 40% impro vement in scenario success rates with only one-ﬁfth the bandwidth of na ¨ ıve data-sharing methods. COOPERN A UT achie ves this by encoding LiD AR data into compact, loss-tolerant messages. Despite promising results, real-world deployment still faces challenges such as unreliable communication, latency variability , and inconsis- tent standards. T echnologies like IEEE 802.11p (DSRC) and Cellular-V2X are being tested globally [156], [162]. Ensuring data authenticity is also critical, as A Vs must trust e xternal information to a void security risks. Active pilots in Europe and Asia hav e shown early success. Some cities provide A Vs JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 13 with trafﬁc signal information to optimize timing and reduce stop frequency [156], [163]. In smart city districts, 5G-enabled roadside units enable high-speed platooning and coordinated lane changes [164]. These trials highlight the beneﬁts of V2X is in improving safety through shared awareness and smoother trafﬁc via cooperativ e behavior . Howe ver , scaling requires in vestment in infrastructure such as smart traf ﬁc lights and edge computing nodes, along with standardization across manufacturers. In Phoenix and other testbeds, A Vs connected to infrastructure demonstrated reduced idle time and improved reliability [165]. V2V communication also showed potential to prev ent accidents by transmitting early warnings between ve- hicles. T o address bandwidth constraints, current models em- phasize efﬁcient data exchange. For example, V2VNet [166] and V2X-V iT [167] use compact feature aggregation and asynchronous T ransformer architectures for cooperative scene understanding. CoBEVT [168] enables collaborativ e bird’ s- eye view mapping using sparse fusion techniques and has demonstrated strong performance on the OPV2V dataset. Platforms such as OpenCD A-Loop [158] enable benchmarking of these models under realistic network conditions, including latency and packet loss. ST AMP [169] offers a task-agnostic framew ork that integrates multi-agent inputs regardless of model type, facilitating interoperability . Overall, V2X con- nectivity and cooperativ e perception are maturing through global trials. While technical and infrastructure challenges remain, early evidence supports their value in improving A V safety and efﬁcienc y . Standardization efforts by organizations such as SAE and ISO are likely to play a key role in scaling these systems for broader deployment. Beyond pilot programs and cooperati ve systems, commercial autonomous deployments have reached operational scale. T esla’ s Full Self- Driving (Supervised) operates across millions of consumer vehicles, generating large-scale data on edge-case handling and human oversight requirements. W aymo One achieved approximately 4 million fully autonomous paid rides in 2024. This is a ∼ 7 × increase over 2023, demonstrating commercial viability within constrained operational design domains. B. T rustworthiness, Re gulation, and Societal Impact The successful deployment of autonomous vehicles (A Vs) hinges not only on technical maturity but also on addressing critical ethical, legal, and societal challenges. Real-world trials consistently rev eal gaps in current regulatory frameworks, emphasizing the need for comprehensive standards that ensure safety , cybersecurity , accountability , and public trust. Foun- dational safety standards such as ISO 26262 [170] and ISO 21448 (SO TIF) [171] guide the design of functionally safe systems ev en under non-failure conditions, while ISO/SAE 21434 [172] introduces a cybersecurity engineering lifecycle that addresses risks associated with connected vehicle systems. ANSI/UL 4600 [173] complements these by focusing on safety assurance in fully autonomous, driv erless products. Recent work such as AutoT rust [174] further underscores the importance of trustworthiness benchmarks tailored to A V systems that increasingly incorporate vision-language models (VLMs). These AI-driven architectures, which blend percep- tion with natural language reasoning, introduce novel risks not adequately captured by con ventional metrics like disen- gagement counts or crash frequency . AutoT rust proposes a holistic benchmarking framework that ev aluates robustness to adversarial inputs, fairness across demographic and situational contexts, explainability of decision-making processes, and resilience to cyber -physical threats. As A Vs become more in- telligent and autonomous, these dimensions are critical for val- idating real-world performance and earning public conﬁdence. Simulation and testing standards are also ev olving to support these requirements. ASAM’ s OpenSCENARIO [175], [176], OpenDRIVE [177], OpenODD [178], and OpenLABEL [179] promote scenario interoperability , consistent road modeling, ODD speciﬁcation, and sensor annotation. As summarized in T able VII, these framew orks support a layered governance structure encompassing safety , simulation, and data standards. Howe ver , performance e valuation remains fragmented. Metrics such as those reported to the California DMV or under NHTSA ’ s Standing General Order [180] lack global con- sistency and fail to capture higher-order trust dimensions. Framew orks like RA VE [181] enable human-A V performance comparison, yet the absence of a uniﬁed ”autonomy score” limits cross-system benchmarking. International efforts by UNECE including UN-R157 [182] for Le vel 3 automation, UN-R155 [183] for cybersecurity , and UN-R156 [184] for software updates represent critical steps to ward harmonized regulation but require further dev elopment to support higher automation levels and di verse urban deployments. Legal and gov ernance frameworks must adv ance alongside technical standards to ensure responsible A V deployment. Most existing trafﬁc laws assume a human dri ver , which creates ambiguity when control shifts to automated systems. Jurisdictions such as Arizona and T oronto ha ve updated their deﬁnitions to classify the automated dri ving system (ADS) as the legal dri ver and now require companies to provide ﬁnancial guarantees through insurance or bonds [134], [138]. These policy adaptations reﬂect an incremental learning process based on real-world trials, although global agreement on liability structures is still ev olving [135]. As A Vs move to ward commercial deployment, responsibility may shift toward manufacturers under product liability frameworks, which will require strong documentation, audit trails, and compliance mechanisms. Ethical dilemmas hav e also surfaced in real-world deployments. The 2018 Uber A TG crash in T empe, Arizona, illustrated how A V design decisions can encode implicit value judgments, especially in unav oidable harm scenarios [134]. In response, cities like Hambur g and London have incorporated public input into pol- icy formation. Hamburg’ s HEA T shuttle project and London’ s GA TEway initiati ve in volv ed citizen panels and community workshops to guide A V behavior and expectations [136], [142]. Transparent reporting is vital to maintaining trust. Studies in Pittsbur gh and T empe found that lack of trans- parency can reduce public support by up to 80 percent, while activ e community engagement in Phoenix and Columbus has improv ed acceptance [135], [140]. A V deployment requires integrated management across four dimensions. Liability: Transition from driver -based tort law to manufacturer product liability necessitates clear documenta- tion chains and OT A update accountability such as the Ari- JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 14 zona’ s ADS-as-driver classiﬁcation [134] exempliﬁes ev olving framew orks. T ranspar ency: Black-box AI decision-making un- dermines trust; explainability requirements (AutoT rust [174]) and public disengagement reporting (California DMV , NHTSA SGO [180]) establish baseline disclosure standards. F airness: Algorithmic bias risks discriminatory outcomes. For example: Pittsbur gh and Columbus trials emphasize equitable service distribution across socioeconomic zones [135], [140]. Public T rust: Hamburg and London projects demonstrate that citizen engagement during dev elopment increases acceptance by 60- 80% compared to post-deployment transparency alone [136], [142]. These dimensions must evolv e in parallel with technical standards to ensure responsible scaling. Wider societal impacts also warrant attention. Trials in Columbus and Pittsbur gh hav e shown that A Vs can improv e mobility for underserv ed populations [135], [140]. Howe ver , concerns persist around job displacement and unequal service distribution. In San Francisco, critics noted that early deployments fav ored high- income areas [130], prompting discussions on equitable roll- out and work er transition programs. Public reaction remains mixed. Sydney’ s A V trials receiv ed widespread interest and support [139], while opposition in San Francisco has included protests and obstruction of vehicles. These ﬁndings highlight the importance of transparenc y , fairness, and public engage- ment as integral components of successful A V integration. C. P athways to Overcoming Curr ent Limitations and Exploit- ing New Opportunities T able VI summarizes global case studies of A V deploy- ment, highlighting diverse challenges and strategies. Sev eral clear pathways hav e emerged. Infrastructure inv estment is a major factor: cities such as Singapore and Dubai, which hav e implemented advanced V2I systems, report 90 to 95 percent technical success and over 80 percent public accep- tance [130], [131]. In contrast, cities with limited infrastructure upgrades, including Phoenix and Pittsbur gh, report 85 to 90 percent success b ut e xperience longer deployment timelines and increased re gulatory uncertainty [134], [135]. Howe ver , data suggests that the return on inv estment diminishes beyond 20 million USD per deployment zone. This points to the efﬁcienc y of targeted upgrades like smart intersections and V2X systems over fully integrated infrastructure. Urban trials hav e beneﬁted from integrating A Vs with public transport and adaptive signal systems, as seen in Singapore [131]. Regulatory strategies in Dubai and Hambur g ha ve also em- phasized c ybersecurity readiness and liability deﬁnitions [130], [136]. Performance analysis shows that constrained, geofenced en vironments deliver up to 98 percent success within design limits but perform signiﬁcantly worse outside those zones. Dense urban deployments maintain 85 to 90 percent reliability under normal trafﬁc but fall to 60 to 70 percent during peak hours or construction [137]. New b usiness models are also shaping A V rollouts. Robo-taxi services in Phoenix [134] and ride-hailing initiatives in T oronto [132], [133] illustrate shifts tow ard subscription and pay-per-use models. Suburban services can reach break-ev en with under 10,000 rides per vehicle per month, while urban operations require signiﬁcantly higher utilization due to cost and complexity . W aymo’ s 2024 record of 4 million paid trips, a sevenfold increase from 2023, sho ws commercial momentum. Con versely , GM’ s exit from Cruise illustrates the ﬁnancial burden of A V deployment, with validation costs often exceeding 1 to 5 million USD per automaker . Public trust is a crucial enabler . Hands-on A V experiences in cities like Pittsbur gh [135] and Greenwich [142] hav e enhanced acceptance and enabled producti ve feedback loops between communities, planners, and dev elopers. Proac- tiv e engagement with regulators can reduce deployment delays by up to 80 percent. Public education initiativ es launched within 12 months of deployment hav e raised acceptance by approximately 70 percent. T ransparent incident reporting helps maintain trust, while lack of disclosure can reduce support by as much as 80 percent. In conclusion, successful A V deployment depends on targeted infrastructure in vestment, adaptiv e regulation, inclusiv e economic strategies, and con- sistent community engagement. The intersection of mature technology , formalized validation frameworks, and sustained public support marks the path toward safe and scalable A V integration. V I . D I S C U S S I O N : E N V I S I O N I N G T H E F U T U R E O F A U T O N O M O U S D R I V I N G While our surve y has focused on simulation-to-real-world deployment, the following forward-looking approaches rep- resent architectural and methodological innovations essential for bridging gaps identiﬁed throughout this work particularly in Sim2Real transfer , digital twin validation, and real-world trial scalability . A Vs must ev olve beyond visual perception to reasoning about semantics, causality , and uncertainty [35], [190]. Hybrid approaches such as neuro-symbolic systems offer promise by combining deep learning with logical rea- soning [191], [192]. System-lev el integration is also shifting tow ard decentralized cooperation. Trials of V2V and V2X communication hav e shown up to 40% gains in occluded and long-range perception [161], [168], b ut reliable communica- tion and consensus protocols remain essential for scale. Digital twins are central to continuous A V v alidation, especially when synchronized with real-world data [68], [120]. Emerging approaches no w embed human behavior into simulations to improv e negotiation and planning in complex trafﬁc [94]. Despite progress in standards like ISO 26262 and UNECE reg- ulations, deployment is hindered by gaps in V2X consistency and safety metrics [5], [65], [154]. T raditional metrics like disengagements do not fully capture system risk, moti vating more scenario-driv en and adversarial testing frameworks [60], [167]. Finally , the sim-to-real gap remains a key obstacle. While simulations help generate edge cases, current models struggle to transfer reliably without domain adaptation [ ? ], [8], [9]. Bridging this gap will be vital for safe and scalable A V deployment. Sev eral forward-looking approaches show strong potential to overcome current A V limitations. One is the integration of neuro-symbolic A V stacks, which combine neural netw orks for perception and control with symbolic reasoning for planning and scenario understanding. This hybrid method enables A Vs JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 15 T ABLE VII C U RR E N T S TA ND AR D S A N D M E T R I CS F OR A U TO N O MO U S V E H I C LE D EV E L O P ME N T A N D S I M U LAT IO N Standard/Metric Organization Domain Purpose & Scope K ey Applications Safety and Functional Standards ISO 26262:2018 [170] ISO TC22/SC32 Functional Safety Road vehicle functional safety for E/E systems. Addresses system- atic and random hardware failures through ASIL (A-D) risk classiﬁ- cation AD AS development, ECU val- idation, safety-critical system design ISO 21448:2022 (SOTIF) [171] ISO TC22/SC32 Safety of Intended Func- tionality Addresses hazards from functional insufﬁciencies and foreseeable mis- use without system failures. Com- plements ISO 26262 AI/ML validation, autonomous driving systems, edge case sce- narios ISO/SAE 21434:2021 [172] ISO/SAE Cybersecurity Automotive cybersecurity engineering lifecycle. Risk assessment (T ARA), security goals, and continuous monitoring Connected vehicles, O T A up- dates, cyber threat mitigation ANSI/UL 4600:2023 [173] UL Autonomous Product Safety Safety evaluation for autonomous products beyond traditional func- tional safety Autonomous vehicles, robotics, safety certiﬁcation A utomation Level Classiﬁcation SAE J3016:2021 [185] SAE/ISO Driving Automation Lev- els T axonomy for 6 levels (0-5) of driving automation. Deﬁnes DDT , ODD, fallback responsibilities A V classiﬁcation, regulatory framew orks, ODD deﬁnition Simulation and T esting Standards ASAM OpenSCENARIO 1.3.1 [175] ASAM Dynamic Simulation Con- tent XML-based format for describ- ing dynamic driving scenarios with multiple entities and complex ma- neuvers Scenario-based testing, AD AS validation, simulation interop- erability ASAM OpenSCENARIO 2.0 [176] ASAM Domain-Speciﬁc Language Declarativ e language for abstract scenario descriptions with param- eterization and coverage speciﬁca- tion T est generation, stochastic sce- narios, automated validation ASAM OpenDRIVE 1.8.1 [177] ASAM Road Network Descrip- tion XML format for static road net- work descriptions including geom- etry , lanes, signals, and junctions HD map creation, simulation en vironments, road network exchange ASAM OpenCRG 1.1.2 [186] ASAM Road Surface Description Curved regular grid format for high-precision road surface eleva- tion and properties data T ire simulation, vehicle dy- namics, surface modeling ASAM OSI 3.x [187] ASAM Simulation Interface Standardized interfaces between simulation models and components for distributed simulations Sensor simulation, en viron- ment perception, tool integra- tion Perf ormance and Evaluation Metrics NHTSA Standing General Order [180] NHTSA Crash Reporting Mandatory reporting of A V crashes in volving injury , fatality , or signif- icant property damage Safety monitoring, performance benchmarking, regulatory compliance RA VE Framew ork [181] Industry Consortium Retrospectiv e A V Ev alua- tion Best practices for comparing A V and human driving performance us- ing crash data analysis Safety impact assessment, benchmarking methodologies Emerging and Specialized Standards ISO 34502:2022 [188] ISO Road V ehicle Data Data speciﬁcations for intelligent transport systems and automated driving functions Data exchange, interoperabil- ity , system integration IEEE 2846:2022 [189] IEEE Assumptions for A V Safety Assumptions for mathematical models used in safety-related automated vehicle behavior Model validation, safety argu- mentation, formal veriﬁcation ASAM OpenLABEL 1.0 [179] ASAM Sensor Data Annotation Standardized format for annotating sensor data (camera, LiD AR, radar) for ML training Dataset annotation, ML model training, validation ASAM OpenODD [178] ASAM Operational Design Do- main Standardized description of oper- ational design domains for auto- mated driving systems ODD speciﬁcation, system limitations, safety boundaries International Harmonization UN-R157 ALKS [182] UNECE WP .29 Automated Lane Keeping International regulation for auto- mated lane keeping systems on highways (Lev el 3) Highway automation, interna- tional deployment, type ap- prov al UN-R155 Cybersecurity [183] UNECE WP .29 V ehicle Cybersecurity Cybersecurity and software update management systems for road ve- hicles T ype approval, cybersecurity management, international standards UN-R156 Software Up- dates [184] UNECE WP .29 Software Update Manage- ment Over -the-air software update sys- tems and management for road ve- hicles O T A updates, software lifecy- cle, regulatory approval JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 16 to generalize from data while reasoning about constraints and intent, improving interpretability and regulatory trans- parency [193]. F or instance, such a system could recognize an emergency vehicle using neural vision and apply logical rules to yield, even in novel situations [191], [192]. A second direction in volv es federated ﬂeet learning. Unlike traditional cloud-based updates, federated learning allows vehicles to share model parameters without transmitting raw data. This facilitates real-time adaptation to local trafﬁc conditions, such as roadwork or unusual pedestrian behavior , while preserv- ing priv acy and minimizing bandwidth. Fleet-wide learning from rare edge cases becomes feasible, supporting collective resilience. The third area is self-adaptiv e digital twins. These systems continuously synchronize with liv e A V data to update their simulation state, road layout, and agent behaviors. Such adaptiv e twins support scenario replay , predicti ve diagnostics, and closed-loop training. For example, anomalous pedestrian mov ements encountered in real dri ving can be re-simulated with variations to impro ve av oidance strategies [68], [120]. Finally , zero-shot Sim2Real transfer aims to enable models trained solely in simulation to perform reliably in real-world deployments without ﬁne-tuning. This requires high domain div ersity through procedural content generation and realism in sensor and physics modeling. Long-term goals include using foundation models trained across multiple simulation domains to support robust, generalizable A V behavior [91], [117]. T o advance safe and scalable A V deployment, sev eral actionable insights and open challenges should guide future research: • Hybrid Reasoning Architectures: A Vs should integrate neural learning with symbolic logic to improve inter- pretability , safety , and compliance. Logic-based rules can constrain unsafe behavior and enable formal veriﬁcation, critical for regulated deployment [190], [192]. • Human-in-the-Loop Simulation: Embedding realistic human behaviors into digital twins allows testing of nuanced interactions such as eye contact or unexpected pedestrian actions. This supports safer polic y learning and interactiv e debugging without physical risk [94], [194]. • Cooperative A utonomy: T o handle occlusions and dy- namic trafﬁc, A Vs must share perception, intent, and planning cues through V2X. Future work should address robust multi-agent fusion, low-latency communication, and fail-safe coordination [161], [168]. • Regulatory Synchronization: Fragmented standards in safety , connecti vity , and liability pose barriers to deploy- ment. Harmonizing protocols, establishing performance metrics, and integrating runtime traceability into A V stacks are urgent priorities [5], [154]. • Zero-Shot Sim2Real T ransfer: Generalizing to unseen real-world domains without ﬁne-tuning remains a major bottleneck. Progress requires better domain-in variant rep- resentations, curriculum design, and rigorous benchmark- ing across diverse conditions [91], [117]. These challenges reﬂect both the technical complexity and societal integration required for large-scale A V adoption. Sustained collaboration between researchers, industry , and policymakers will be essential to na vigate these next stages effecti vely . V I I . C O N C L U S I O N This surve y has explored the growing role of synthetic data and simulation technologies in adv ancing autonomous driving. With real-world testing often limited by cost and safety constraints, virtual environments, procedural content generation, and digital twins offer scalable alternativ es for dev eloping and validating A V systems. W e revie wed how synthetic datasets support perception and planning tasks, and how digital twins enable system-lev el ev aluation through closed-loop and human-in-the-loop testing. The integration of domain adaptation techniques and foundation models, in- cluding vision-language framew orks, points to increasingly transferable and robust A V capabilities. Ke y challenges persist, particularly in achieving reliable Sim2Real transfer, establish- ing standardized validation protocols, and ensuring safety in rare or ambiguous scenarios. Addressing these issues will require continued progress in simulation ﬁdelity , multi-agent coordination, and learning ef ﬁciency . Emerging research direc- tions such as adaptiv e digital twins, decentralized cooperativ e autonomy , and neuro-symbolic decision-making offer promis- ing pathways forward. By closing the gap between synthetic and real-world domains, the community can accelerate the deployment of safe and scalable autonomous driving systems. A C K N O W L E D G M E N T S This research is supported by the UK Research and Innov ation (UKRI)—Engineering and Physical Sciences Research Coun- cil (EPSRC) Grant A TRA CT (EP/X028631/1). R E F E R E N C E S [1] E. Blasch, T . Pham, C.-Y . Chong et al. , “Machine learning/artiﬁcial intelligence for sensor data fusion–opportunities and challenges, ” IEEE Aer osp. Electron. Syst. Mag. , vol. 36, no. 7, pp. 80–93, 2021. [2] E. Thomas, C. McCrudden, Z. Wharton et al. , “Perception of au- tonomous v ehicles by the modern society: A surve y , ” IET Intell. T ransp. Syst. , vol. 14, no. 10, pp. 1228–1239, 2020. [3] K. Muhammad, A. Ullah, J. Lloret et al. , “Deep learning for safe autonomous driving: Current challenges and future directions, ” IEEE T rans. Intell. T ransp. Syst. , vol. 22, no. 7, pp. 4316–4336, 2020. [4] A. Kar , A. Prakash, M.-Y . Liu et al. , “Meta-sim: Learning to generate synthetic datasets, ” in Proc. of the IEEE/CVF Int. Conf. on Computer V ision , 2019, pp. 4551–4560. [5] A. Dosovitskiy , G. Ros, F . Codevilla et al. , “Carla: An open urban driving simulator , ” in Conference on Robot Learning . PMLR, 2017, pp. 1–16. [6] A. Gaidon, Q. W ang, Y . Cabon et al. , “V irtual worlds as proxy for multi-object tracking analysis, ” in Proceedings of the IEEE Confer ence on Computer V ision and P attern Recognition , 2016, pp. 4340–4349. [7] G. Ros, L. Sellart, J. Materzynska et al. , “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2016, pp. 3234–3243. [8] L. W ang, R. Guo, Q. V uong et al. , “ A real2sim2real method for robust object grasping with neural surface reconstruction, ” in 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE) , 2023, pp. 1–8. [9] V . U. Prabhu, D. Acuna, R. Mahmood et al. , “Bridging the sim2real gap with CARE: Supervised detection adaptation with conditional alignment and reweighting, ” Tr ans. Mach. Learn. Res. , 2023. [10] A. Stocco, B. Pulfer, and P . T onella, “Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems, ” IEEE T rans. Softw . Eng. , vol. 49, no. 4, pp. 1928– 1940, 2022. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 17 [11] J. Zhang, J. Huang, S. Jin et al. , “V ision-language models for vision tasks: A survey , ” IEEE Tr ans. P attern Anal. Mach. Intell. , 2024. [12] M. Bojarski, D. Del T esta, D. Dworakowski et al. , “End to end learning for self-driving cars, ” arXiv preprint , 2016. [13] J. Redmon, S. Divv ala, R. Girshick et al. , “Y ou only look once: Uniﬁed, real-time object detection, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2016, pp. 779–788. [14] J. Redmon and A. Farhadi, “Y olov3: An incremental improvement, ” arXiv preprint arXiv:1804.02767 , 2018. [15] W . Liu, D. Anguelov , D. Erhan et al. , “Ssd: Single shot multibox detector , ” in European Conference on Computer V ision . Springer , 2016, pp. 21–37. [16] V . A. Aher, S. R. Jondhale, B. S. Agarkar et al. , “ Advances in deep learning-based object detection and tracking for autonomous driving: A revie w and future directions, ” in Multi-Strate gy Learning En vir onment . Springer , 2024, pp. 569–581. [17] S. Mozaff ari, O. Y . Al-Jarrah, M. Dianati et al. , “Deep learning-based vehicle behaviour prediction for autonomous driving applications: a revie w , ” IEEE Tr ans. Intell. Tr ansp. Syst. , vol. 22, no. 7, pp. 3716– 3735, 2021. [18] A. Prav allika, M. F . Hashmi, and A. Gupta, “Deep learning frontiers in 3d object detection: A comprehensi ve review for autonomous driving, ” IEEE Access , vol. 12, pp. 173 936–173 980, 2024. [19] J. Smith, J. Doe, and E. Bro wn, “Object detectors in autonomous vehicles: Analysis of deep learning models, ” Int. J. Adv . Comput. Sci. Appl. , vol. 14, no. 10, pp. 210–220, 2023. [20] A. H. Lang, S. V ora, H. Caesar et al. , “Pointpillars: Fast encoders for object detection from point clouds, ” in Proceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2019, pp. 12 697–12 705. [21] J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, ” in Eur opean Confer ence on Computer V ision (ECCV) . Springer , 2020, pp. 194– 210. [22] Z. Liu, H. T ang, A. Amini et al. , “Bevfusion: Multi-task multi-sensor fusion with uniﬁed bird’ s-eye vie w representation, ” arXiv pr eprint arXiv:2205.13542 , 2022. [23] J. Luettin, S. Monka, C. Henson et al. , “ A survey on knowledge graph-based methods for automated driving, ” in Knowledge Graphs and Semantic W eb , 2022, pp. 16–31. [24] E. Meyer, M. Brenner, B. Zhang et al. , “Geometric deep learning for autonomous driving: Unlocking the power of graph neural networks with commonroad-geometric, ” in 2023 IEEE Intelligent V ehicles Sym- posium (IV) , 2023, pp. 1–8. [25] H. Li, Y . Zhao, Z. Mao et al. , “Graph neural networks in intelli- gent transportation systems: Advances, applications and trends, ” arXiv pr eprint arXiv:2401.00713 , 2024. [26] S. N. N. Htun and K. Fukuda, “Integrating knowledge graphs into autonomous vehicle technologies: A survey of current state and future directions, ” Information , vol. 15, no. 10, p. 645, 2024. [27] Y . W ang, Z. Liu, H. Lin et al. , “Pregsu: A generalized trafﬁc scene understanding model for autonomous driving based on pretrained graph attention network, ” IEEE T rans. Syst. Man Cybern. Syst. , v ol. 55, no. 12, pp. 9604–9616, 2025. [28] A. Paszke, A. Chaurasia, S. Kim et al. , “Enet: A deep neural net- work architecture for real-time semantic segmentation, ” arXiv preprint arXiv:1606.02147 , 2016. [29] H. Zhao, X. Qi, X. Shen et al. , “Icnet for real-time semantic segmenta- tion on high-resolution images, ” in Pr oc. of the European Confer ence on Computer V ision (ECCV) , 2018, pp. 405–420. [30] L.-C. Chen, Y . Zhu, G. Papandreou et al. , “Encoder -decoder with atrous separable con volution for semantic image segmentation, ” in Pr oc. of the Eur opean Conference on Computer V ision (ECCV) , 2018, pp. 801–818. [31] J. Sun, Z. Liu, J. Jia et al. , “High-resolution representations for labeling pixels and regions, ” arXiv preprint , 2019. [32] A. Alahi, K. Goel, V . Ramanathan et al. , “Social lstm: Human trajectory prediction in crowded spaces, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2016, pp. 961–971. [33] N. Deo and M. M. T rivedi, “Multi-modal trajectory prediction of surrounding vehicles with maneuver-based lstms, ” in 2018 IEEE In- telligent V ehicles Symposium (IV) . IEEE, 2018, pp. 1179–1184. [34] S. Zhang, J. Wu, and H. Ogai, “T actical decision-making for au- tonomous driving using dueling double deep q network with double attention, ” IEEE Access , vol. 9, pp. 151 983–151 992, 2021. [35] S. Grigorescu, B. T rasnea, T . Cocias et al. , “ A surv ey of deep learning techniques for autonomous driving, ” J . Field Robot. , v ol. 37, no. 3, pp. 362–386, 2020. [36] P . S. Chib and P . Singh, “Recent advancements in end-to-end au- tonomous driving using deep learning: A survey , ” IEEE T rans. Intell. V eh. , 2023. [37] Y . K ong and Y . Fu, “Human action recognition and prediction: A survey , ” Int. J. Comput. V is. , vol. 130, no. 5, pp. 1366–1401, 2022. [38] A. Kendall, J. Hawke, D. Janz et al. , “Learning to drive in a day , ” in 2019 Int. Conf . on Robotics and Automation (ICRA) . IEEE, 2019, pp. 8248–8254. [39] D. Lee and M. Kwon, “Instant in verse modeling of stochastic driving behavior with deep reinforcement learning, ” IEEE T rans. Consum. Electr on. , 2024. [40] ——, “AD AS-RL: Safety learning approach for stable autonomous driving, ” ICT Express , vol. 8, no. 3, pp. 479–483, 2022. [41] ——, “Stability analysis in mixed-autonomous trafﬁc with deep re- inforcement learning, ” IEEE Tr ans. V eh. T echnol. , vol. 72, no. 3, pp. 2848–2862, 2022. [42] B. R. Kiran, I. Sobh, V . T alpaert et al. , “Deep reinforcement learning for autonomous driving: A survey , ” IEEE T rans. Intell. T ransp. Syst. , vol. 23, no. 6, pp. 4909–4926, 2021. [43] S. Aradi, “Survey of deep reinforcement learning for motion planning of autonomous vehicles, ” IEEE T rans. Intell. T ransp. Syst. , vol. 23, no. 2, pp. 740–759, 2020. [44] R. Zhao, Y . Li, Y . Fan et al. , “ A survey on recent advancements in autonomous driving using deep reinforcement learning: Applications, challenges, and solutions, ” IEEE T rans. Intell. T ransp. Syst. , 2024. [45] T .-A.-Q. Nguyen, L. Rold ˜ ao, N. Piasco et al. , “Rodus: Rob ust decom- position of static and dynamic elements in urban scenes, ” in European Confer ence on Computer V ision . Springer, 2024, pp. 112–130. [46] P . W ei, L. Kong, X. Qu et al. , “Unsupervised video domain adaptation for action recognition: A disentanglement perspective, ” Advances in Neural Information Processing Systems , vol. 36, pp. 17 623–17 642, 2023. [47] T . W u, F . Zhong, A. T agliasacchi et al. , “Dˆ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video, ” Advances in Neural Information Pr ocessing Systems , vol. 35, pp. 32 653–32 666, 2022. [48] B. W en, H. Xie, Z. Chen et al. , “3d scene generation: A survey , ” arXiv pr eprint arXiv:2505.05474 , 2025. [49] H. Xie, Z. Chen, F . Hong et al. , “Citydreamer: Compositional genera- tiv e model of unbounded 3d cities, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition (CVPR) , 2024, pp. 9666–9675. [50] ——, “Compositional generative model of unbounded 4d cities, ” IEEE T rans. P attern Anal. Mach. Intell. , vol. 48, no. 1, pp. 312–328, 2026. [51] R. Gao, K. Chen, E. Xie et al. , “Magicdrive: Street view generation with div erse 3d geometry control, ” in International Conference on Learning Representations (ICLR) , 2024. [52] X. Zhou, Z. Lin, X. Shan et al. , “Dri vinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2024, pp. 21 634–21 643. [53] Z. Jiang, Y . Zhang et al. , “V ad: V ectorized scene representation for efﬁcient autonomous driving, ” in Pr oc. of the IEEE Int. Conf. on Computer V ision (ICCV) , 2023. [54] Y . Liu and S. Diao, “ An automatic driving trajectory planning approach in complex trafﬁc scenarios based on integrated driver style inference and deep reinforcement learning, ” PLoS ONE , vol. 19, no. 1, p. e0297192, 2024. [55] P . Li, A. Kusari, and D. J. LeBlanc, “ A novel trafﬁc simulation framew ork for testing autonomous vehicles using sumo and carla, ” arXiv preprint arXiv:2110.07111 , 2021. [56] K. Huang, B. Shi, X. Li et al. , “Multi-modal sensor fusion for auto driving perception: A survey , ” arXiv preprint , 2022. [57] T . Qian, J. Chen, L. Zhuo et al. , “Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario, ” in Pr oc. of the AAAI Conf. on Artiﬁcial Intellig ence , v ol. 38, no. 5, 2024, pp. 4542–4550. [58] H. Zhao, Y . W ang, T . Bashford-Rogers et al. , “Exploring generative ai for sim2real in driving data synthesis, ” in 2024 IEEE Intelligent V ehicles Symposium (IV) , 2024, pp. 3071–3077. [59] C. Hu, S. Hudson, M. Ethier et al. , “Sim-to-real domain adaptation for lane detection and classiﬁcation in autonomous driving, ” in 2022 IEEE Intelligent V ehicles Symposium (IV) . IEEE, 2022, pp. 457–463. [60] J. Rev ell, D. W elch, and J. Hereford, “Sim2real: Issues in transferring autonomous driving model from simulation to real world, ” in South- eastCon 2022 . IEEE, 2022, pp. 296–301. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 18 [61] X. Hu, S. Li, T . Huang et al. , “How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence, ” IEEE T rans. Intell. V eh. , vol. 9, no. 1, pp. 593–612, 2024. [62] Z. Song, Z. He, X. Li et al. , “Synthetic datasets for autonomous driving: A survey , ” IEEE T rans. Intell. V eh. , 2023. [63] X. Zhou, M. Liu, E. Y urtsev er et al. , “V ision language models in autonomous driving: A survey and outlook, ” IEEE T rans. Intell. V eh. , 2024. [64] M. Liu, E. Y urtsever , J. Fossaert et al. , “ A surve y on autonomous driving datasets: Statistics, annotation quality , and a future outlook, ” IEEE T rans. Intell. V eh. , pp. 1–29, 2024. [65] S. Sarker, B. Maples, and W . Li, “ A comprehensive revie w on traf- ﬁc datasets and simulators for autonomous vehicles, ” arXiv preprint arXiv:2412.14207 , 2024. [66] G. Paulin and M. Ivasic-K os, “Review and analysis of synthetic dataset generation methods and techniques for application in computer vision, ” Artif. Intell. Rev . , vol. 56, no. 9, pp. 9221–9265, 2023. [67] S. Sudhakar , J. Hanzelka, J. Bobillot et al. , “Exploring the sim2real gap using digital twins, ” in Proc. of the IEEE/CVF Int. Conf . on Computer V ision , 2023, pp. 20 418–20 427. [68] K. W ang, T . Y u, Z. Li et al. , “Digital twins for autonomous driving: A comprehensiv e implementation and demonstration, ” in 2024 Interna- tional Conference on Information Networking (ICOIN) . IEEE, 2024, pp. 452–457. [69] E. Y urtsever , J. Lambert, A. Carballo et al. , “ A survey of autonomous driving: Common practices and emerging technologies, ” IEEE Access , vol. 8, pp. 58 443–58 469, 2020. [70] D. Omeiza, H. W ebb, M. Jirotka et al. , “Explanations in autonomous driving: A survey , ” IEEE Tr ans. Intell. T ransp. Syst. , vol. 23, no. 8, pp. 10 142–10 162, 2021. [71] L. Chen, P . W u, K. Chitta et al. , “End-to-end autonomous driving: Challenges and frontiers, ” IEEE T rans. P attern Anal. Mach. Intell. , 2024. [72] Y . Guan, H. Liao, Z. Li et al. , “W orld models for autonomous dri ving: An initial survey , ” IEEE Tr ans. Intell. V eh. , 2024. [73] L. Chen, Y . Li, C. Huang et al. , “Milestones in autonomous driving and intelligent vehicles: Survey of surveys, ” IEEE Tr ans. Intell. V eh. , vol. 8, no. 2, pp. 1046–1056, 2022. [74] P . Ji, R. Li, Y . Xue et al. , “Perspective, survey and trends: Public driving datasets and toolsets for autonomous driving virtual test, ” in 2021 IEEE Int. Intelligent T ransportation Systems Confer ence (ITSC) . IEEE, 2021, pp. 264–269. [75] Z. Xu, Y . Zhang, E. Xie et al. , “Drive gpt4: Interpretable end-to-end autonomous driving via large language model, ” IEEE Robot. Autom. Lett. , 2024. [76] T . Deruyttere, D. Grujicic, M. B. Blaschko et al. , “T alk2car: Predicting physical trajectories for natural language commands, ” IEEE Access , vol. 10, pp. 123 809–123 834, 2022. [77] L. W en, D. Fu, X. Li et al. , “Dilu: A knowledge-dri ven approach to autonomous driving with large language models, ” in The T welfth International Conference on Learning Representations , 2024. [78] H. Sha, Y . Mu, Y . Jiang et al. , “Large language models as decision makers for autonomous driving, ” 2024. [Online]. A vailable: https://openrevie w .net/forum?id=NkYCuGM7E2 [79] J. Zhang, K. W ang, R. Xu et al. , “Navid: V ideo-based vlm plans the next step for vision-and-language navigation, ” in Robotics: Science and Systems , 2024. [80] C. Sima, K. Renz, K. Chitta et al. , “Drivelm: Dri ving with graph visual question answering, ” in Proc. of the European Conf. on Computer V ision (ECCV) , 2024. [81] G. Liao, J. Li, and X. Y e, “Vlm2scene: Self-supervised image-text- lidar learning with foundation models for autonomous driving scene understanding, ” in Pr oc. of the AAAI Conf. on Artiﬁcial Intelligence , vol. 38, no. 4, 2024, pp. 3351–3359. [82] X. Cao, T . Zhou, Y . Ma et al. , “Maplm: A real-world large-scale vision-language benchmark for map and trafﬁc scene understanding, ” in Pr oceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition , 2024, pp. 21 819–21 830. [83] J. Chen et al. , “Clip2scene: T owards label-efﬁcient 3d scene under- standing by clip, ” in Pr oceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition , 2023. [84] D. Fu, W . Lei, L. W en et al. , “LimSim++: A closed-loop platform for deploying multimodal LLMs in autonomous driving, ” in 2024 IEEE Intelligent V ehicles Symposium (IV) , 2024, pp. 1084–1090. [85] M. Liu, J. Jiang, C. Zhu et al. , “Vlpd: Context-a ware pedestrian detection via vision-language semantic self-supervision, ” in Proceed- ings of the IEEE/CVF Conference on Computer V ision and P attern Recognition , 2023, pp. 6662–6671. [86] Z. Duan, H. Cheng, D. Xu et al. , “Cityllav a: Efﬁcient ﬁne-tuning for vlms in city scenario, ” in Proc. of the IEEE/CVF Conf. on Computer V ision and P attern Recognition W orkshops , 2024. [87] S. Park, M. Lee, J. Kang et al. , “Vlaad: V ision and language assistant for autonomous driving, ” in Pr oc. of the IEEE/CVF W inter Conf. on Applications of Computer V ision W orkshops , 2024. [88] S. Malla, C. Choi, I. Dwiv edi et al. , “Drama: Joint risk localization and captioning in driving, ” in Proc. of the IEEE W inter Conf. on Applications of Computer V ision , 2023. [89] C. Pan, B. Y aman et al. , “Vlp: V ision language planning for au- tonomous driving, ” in Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition , 2024, pp. 14 760–14 769. [90] N. Y okoyama, S. Ha, D. Batra et al. , “Vlfm: Vision-language frontier maps for zero-shot semantic navigation, ” in 2024 IEEE Int. Conf. on Robotics and Automation (ICRA) , 2024, pp. 42–48. [91] A. Y u, A. Foote, R. Mooney et al. , “Natural language can facilitate sim2real transfer, ” in The 4th International Combined W orkshop on Spatial Language Understanding and Grounded Communication for Robotics , 2024. [92] X. Tian, J. Gu, B. Li et al. , “Driv evlm: The con ver gence of au- tonomous driving and large vision-language models, ” arXiv preprint arXiv:2402.12289 , 2024. [93] J. Zhang, C. Xu, and B. Li, “Chatscene: Knowledge-enabled safety- critical scenario generation for autonomous vehicles, ” in Pr oceedings of the IEEE/CVF Conference on Computer V ision and P attern Recog- nition , 2024, pp. 15 459–15 469. [94] Y . W ei, Z. W ang, Y . Lu et al. , “Editable scene simulation for au- tonomous driving via collaborative llm-agents, ” in Proceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2024, pp. 15 077–15 087. [95] NVIDIA Corporation, “Cosmos-transfer1: Conditional world generation with adapti ve multimodal control, ” arXiv pr eprint arXiv:2503.14492 , 2025. [96] Y . Gao, M. Piccinini, Y . Zhang et al. , “Foundation models in au- tonomous driving: A survey on scenario generation and scenario analysis, ” IEEE Open J. Intell. Tr ansp. Syst. , 2026. [97] A. Geiger, P . Lenz, and R. Urtasun, “ Are we ready for autonomous driving? the kitti vision benchmark suite, ” in 2012 IEEE Confer ence on Computer V ision and P attern Recognition . IEEE, 2012, pp. 3354– 3361. [98] F . Y u, H. Chen, X. W ang et al. , “Bdd100k: A diverse driving dataset for heterogeneous multitask learning, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2020, pp. 2636–2645. [99] M. Cordts, M. Omran, S. Ramos et al. , “The cityscapes dataset for semantic urban scene understanding, ” in Pr oceedings of the IEEE Confer ence on Computer V ision and P attern Recognition , 2016, pp. 3213–3223. [100] H. Caesar, V . Bankiti, A. H. Lang et al. , “nuscenes: A multimodal dataset for autonomous dri ving, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2020, pp. 11 621–11 631. [101] P . Sun, H. Kretzschmar , X. Dotiwalla et al. , “Scalability in perception for autonomous driving: W aymo open dataset, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2020, pp. 2446–2454. [102] M.-F . Chang, J. Lambert, P . Sangkloy et al. , “ Argoverse: 3d tracking and forecasting with rich maps, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2019, pp. 8748–8757. [103] C. Agnew , E. M. Grua, P . V an de V en et al. , “Pretraining instance segmentation models with bounding box annotations, ” Intell. Syst. Appl. , vol. 24, p. 200454, 2024. [104] W . Chen, A. Edgley , R. Hota et al. , “Rebound: An open-source 3d bounding box annotation tool for active learning, ” in Automa- tionXP@CHI , ser . CEUR W orkshop Proceedings, vol. 3394, 2023. [105] G. Rong, B. H. Shin, H. T abatabaee et al. , “Lgsvl simulator: A high ﬁdelity simulator for autonomous driving, ” in 2020 IEEE 23rd Int. Conf. on Intelligent T ransportation Systems (ITSC) . IEEE, 2020, pp. 1–6. [106] N. Corporation, “NVIDIA DRIVE Sim, ” https://developer .n vidia.com/ driv e/simulation, 2024. [107] rFpro, “rfpro simulation software, ” https://rfpro.com/ simulation- software/, 2024. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 19 [108] dSP ACE GmbH, “dSP ACE: Simulation and validation solutions for automotiv e de velopment, ” https://www .dspace.com/en/pub/home.cfm, 2024. [109] A VSimulation, “aiSim: Simulation platform for AD AS and autonomous driving, ” https://www .avsimulation.com/aisim/, 2024. [110] MORAI Inc., “Morai simulator: Simulation platform for autonomous vehicles, ” https://www .morai.ai/, 2024. [111] B. GmbH, “Beamng.tech, ” https://beamng.tech/, 2024. [112] Scenario Gym Contributors, “Scenario gym: A scenario-centric lightweight simulator, ” GitHub Repository , 2024, av ailable: https:// github .com/scenario- gym/scenario- gym. [113] W . Zimmer, C. Creß, H. T . Nguyen et al. , “Garchingsim: An au- tonomous driving simulator with photorealistic scenes and minimalist workﬂo w , ” arXiv preprint , 2024. [114] S. C. Lambertenghi and A. Stocco, “ Assessing quality metrics for neural reality gap input mitigation in autonomous driving testing, ” in 2024 IEEE Conference on Softwar e T esting, V eriﬁcation and V alidation (ICST) , 2024, pp. 173–184. [115] L. Baresi, Q. Hu, A. Stocco et al. , “Ef ﬁcient domain augmentation for autonomous dri ving testing using diffusion models, ” in Pr oceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE) . IEEE/ACM, 2025, pp. 743–755. [116] C. Samak, T . Samak, and V . Krovi, “T owards sim2real transfer of autonomy algorithms using autodrive ecosystem, ” IF AC-P apersOnLine , vol. 56, no. 3, pp. 277–282, 2023. [117] M. Salv ato, T . V alassakis, and J. Smith, “Sim-to-real transfer and reality gap modeling in model predictive control for autonomous dri ving, ” Appl. Intell. , pp. 1–15, 2022. [118] K. W ang, Z. Li, K. Nonomura et al. , “Smart mobility digital twin based automated vehicle navig ation system: A proof of concept, ” IEEE T rans. Intell. V eh. , 2024. [119] J. Duan, Y . W ang, J. Ding et al. , “Digital twin test method for autonomous vehicles based on panosim, ” in SAE 2023 Intelligent and Connected V ehicles Symposium , 2023. [120] T . V alassakis, X. W ang, and E. Y urtsever , “Digital twin concepts for autonomous and electric vehicle testing, ” IEEE T rans. V eh. T echnol. , 2022. [121] H. Suzuki, K. T akahashi, and M. T anaka, “Edge computing and digital twins for real-time trafﬁc monitoring and autonomous v ehicle testing, ” IEEE T rans. Intell. T ransp. Syst. , 2023. [122] J. Jiang, N. Song, J. Li et al. , “Realengine: Simulating autonomous driving in realistic context, ” arXiv preprint , 2025. [123] M. T ´ oth, P . Kov ´ acs, Z. Bendefy et al. , “Hybrid rendering for multi- modal autonomous driving: Mer ging neural and physics-based simula- tion, ” arXiv preprint , 2025. [124] T . Samak, C. Samak, S. Kandhasamy et al. , “ Autodrive: A compre- hensiv e, ﬂexible and integrated digital twin ecosystem for autonomous driving research & education, ” Robotics , vol. 12, no. 3, p. 77, 2023. [125] M. U. Shoukat, L. Y an, B. Zou et al. , “ Application of digital twin technology in the ﬁeld of autonomous dri ving test, ” in 2022 Thir d Int. Conf. on Latest trends in Electr . Engg. and Comp. T ech. (INTELLECT) . IEEE, 2022, pp. 1–6. [126] D. Li and O. Okhrin, “ A platform-agnostic deep reinforcement learning framew ork for effecti ve sim2real transfer towards autonomous dri ving, ” Commun. Eng. , vol. 3, no. 1, p. 147, 2024. [127] R. Klar, N. Arvidsson, and V . Angelakis, “Maturity of vehicle dig- ital twins: From monitoring to enabling autonomous driving, ” arXiv pr eprint arXiv:2404.08438 , 2024. [128] K. L. V oogd, J. P . Allamaa, J. Alonso-Mora et al. , “Reinforcement learning from simulation to real world autonomous dri ving using digital twin, ” IF AC-P apersOnLine , vol. 56, no. 2, pp. 1510–1515, 2023. [129] Y . He, H. Chen, and W . Shi, “ An adv anced framework for ultra- realistic simulation and digital twinning for autonomous vehicles, ” arXiv preprint arXiv:2405.01328 , 2024. [130] A. While, S. Marvin, and M. Ko vacic, “Urban robotic experimentation: San francisco, tokyo and dubai, ” Urban Stud. , v ol. 58, no. 4, pp. 769– 786, 2021. [131] S. Y . T an and A. T aeihagh, “ Adaptive and experimental governance in the implementation of autonomous vehicles: The case of singapore, ” in 4th Int. Conf. on Public P olicy (ICPP4) , 2019. [132] City of T oronto, “W est rouge automated shuttle trial case study , ” City of T oronto, T ech. Rep., 2022. [133] S. Bahrami and M. Roorda, “ Autonomous vehicle parking policies: A case study of the city of toronto, ” T ransp. Res. P art A , vol. 155, pp. 283–296, 2022. [134] C. McCarroll and F . Cugurullo, “No city on the horizon: Autonomous cars, artiﬁcial intelligence, and the absence of urbanism, ” F ront. Sus- tain. Cities , vol. 4, p. 937933, 2022. [135] C. Chmiele wski, “Self-dri ving cars and rural areas: The potential for a symbiotic relationship, ” J. Law Commer ce , vol. 37, no. 1, pp. 57–80, 2018. [136] Connected Automated Driving, “ Autonomous b us project trials in hambur g, ” 2020. [Online]. A vailable: https://www .urban- transport- magazine.com/en/ alike- hambur g- launches- trial- operation- with- autonomous- minibus- holon/ [137] International Transport Forum, “Shared mobility simulations for helsinki, ” OECD Publishing, T ech. Rep., 2017. [138] UK T elecoms Innov ation Network, “Milton keynes (mk5g create) project ﬁnal report, ” UK T elecoms Innov ation Network, T ech. Rep., 2023. [139] R. Dowling and P . McGuirk, “ Autonomous vehicle experiments and the city , ” Urban Geogr . , vol. 43, no. 3, pp. 409–426, 2022. [140] Smart Columbus, “Smart columbus program: Scc-j program ﬁnal report, ” Smart Columbus, T ech. Rep., 2021. [141] K. Randeree and N. Ahmed, “The social imperative in sustainable urban dev elopment: The case of masdar city in the united arab emirates, ” Smart Sustain. Built Envir on. , vol. 8, no. 2, pp. 138–149, 2019. [142] T ransport Research Laboratory, “D3.2.8 rca trial 1 workshop in-vehicle report, ” Transport Research Laboratory, T ech. Rep., 2016. [143] E. Rebalski, M. Adelﬁo, F . Sprei et al. , “Brace for impacts: Per- ceiv ed impacts and responses relating to the state of connected and autonomous vehicles in gothenburg, ” Case Stud. T ransp. P olicy , vol. 15, p. 101140, 2024. [144] UK Autodrive Consortium, “UK Autodrive: Final Report, ” T ech. Rep., 2018. [145] N. Reed and A. Stev ens, “The gateway (greenwich automated trans- port en vironment) project: practical testing of automated vehicles in greenwich using the department for transport code of practice, ” Eng. T echnol. Ref. , 2016. [146] W . Clayton, D. Paddeu, G. Parkhurst et al. , “ Autonomous vehicles: who will use them, and will they share?” T ransp. Plan. T echnol. , vol. 43, no. 4, pp. 343–364, 2020. [147] E. Aittoniemi, T . Itkonen, and S. Innamaa, “Impacts of automated driving on ener gy demand and emissions in motorway trafﬁc, ” T ransp. Res. Interdiscip. P erspect. , vol. 28, 2024. [148] N. Kalra and S. M. Paddock, “Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?” Rand Corporation, T ech. Rep., 2016. [149] W . LLC, “W aymo Safety Report, ” https://waymo.com/safety/, 2021. [150] M. C. D’Agostino, C. E. Michael, M. Ramos et al. , “ A blueprint for improving automated driving system safety , ” 2024. [151] Zoox, “Zoox V oluntary Safety Self-Assessment: Built for Riders, ” T ech. Rep., 2020. [152] C. LLC, “Cruise Safety Report, ” https://getcruise.com/safety/, 2021. [153] NHTSA, “ Automated driving systems 2.0: A vision for safety , ” 2017. [Online]. A vailable: https://www .nhtsa.gov/technology- innov ation/ automated- vehicles- safety [154] 3GPP TS 23.286, Release 16: V2X services based on NR; Stage 2 , 3GPP Std., 2020. [155] U.S. Department of Transportation, “V2I Deployment Guidance, ” ITS Joint Program Ofﬁce, T ech. Rep., 2020. [156] K. Ansari, “Joint use of dsrc and c-v2x for v2x communications in the 5.9 ghz its band, ” IET Intell. T ransp. Syst. , v ol. 15, no. 2, pp. 213–224, 2021. [157] S. Chen, J. Hu, Y . Shi et al. , “ A vision of c-v2x: T echnologies, ﬁeld testing, and challenges with chinese development, ” IEEE Internet Things J. , vol. 7, no. 5, pp. 3872–3881, 2020. [158] R. Xu, Y . Guo, X. Han et al. , “Opencda: an open cooperative dri ving automation framework integrated with co-simulation, ” in 2021 IEEE International Intelligent T ransportation Systems Conference (ITSC) . IEEE, 2021, pp. 1155–1162. [159] Z. Zheng, R. Xu, H. Xiang et al. , “Opencda-ros: Enabling seamless in- tegration of simulation and real-world cooperativ e driving automation, ” IEEE T rans. Intell. V eh. , vol. 8, no. 7, pp. 3775–3780, 2023. [160] Z. Zheng, X. Han, Y . Bao et al. , “Cda-simboost: A uniﬁed frame work bridging real data and simulation for infrastructure-based cda systems, ” arXiv preprint arXiv:2507.19707 , 2025. [161] J. Cui, H. Qiu, D. Chen et al. , “Coopernaut: End-to-end driving with cooperativ e perception for networked vehicles, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Recognition , 2022, pp. 17 252–17 262. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2021 20 [162] N. M. Mari, S. Arrigoni, F . Braghin et al. , “ A v2i communication framew ork of adaptiv e trafﬁc lights and a prototype shuttle, ” in 2022 AEIT International Annual Conference (AEIT) , 2022, pp. 1–6. [163] C. Creß, Z. Bing, and A. C. Knoll, “Intelligent transportation systems using roadside infrastructure: A literature survey , ” IEEE T rans. Intell. T ransp. Syst. , vol. 25, no. 7, pp. 6309–6327, 2023. [164] R. Mene guette, R. De Grande, J. Ueyama et al. , “V ehicular edge com- puting: Architecture, resource management, security , and challenges, ” ACM Comput. Surv . , vol. 55, no. 1, pp. 1–46, 2021. [165] K. Sjoberg, P . Andres, T . Buburuzan et al. , “Cooperativ e intelligent transport systems in europe: Current deployment status and outlook, ” IEEE V eh. T echnol. Mag. , vol. 12, no. 2, pp. 89–97, 2017. [166] T .-H. W ang, S. Maniv asagam, M. Liang et al. , “V2vnet: V ehicle- to-vehicle communication for joint perception and prediction, ” in Eur opean Conference on Computer V ision (ECCV) . Springer , 2020, pp. 605–621. [167] R. Xu, H. Xiang, Z. Tu et al. , “V2x-vit: V ehicle-to-ev erything cooper- ativ e perception with vision transformer, ” in European Confer ence on Computer V ision . Springer, 2022, pp. 107–124. [168] R. Xu, Z. Tu, H. Xiang et al. , “Cobevt: Cooperative bird’ s eye vie w semantic segmentation with sparse transformers, ” in Pr oceedings of The 6th Confer ence on Robot Learning , vol. 205, 2023, pp. 989–1000. [169] X. Gao, R. Xu, J. Li et al. , “ST AMP: Scalable task- and model-agnostic collaborativ e perception, ” in The Thirteenth International Confer ence on Learning Representations , 2025. [170] ISO, “Iso 26262-1:2018 road vehicles – functional safety – part 1: V ocabulary , ” 2018. [Online]. A vailable: https://www .iso.org/standard/ 68383.html [171] ISO, “Road vehicles – safety of the intended functionality , ” T ech. Rep. ISO 21448:2022, 2022. [Online]. A vailable: https: //www .iso.org/standard/77490.html [172] ISO/SAE, “Iso/sae 21434:2021 road vehicles – cybersecurity engineering, ” 2021. [Online]. A vailable: https://www .iso.org/standard/ 70918.html [173] UL, “ Ansi/ul 4600 ed. 3-2023: Standard for safety for the ev aluation of autonomous products, ” 2023. [Online]. A vailable: https://webstore.ansi.org/standards/ul/ul4600ed2023 [174] S. Xing, H. Hua, X. Gao et al. , “ Autotrust: Benchmarking trustworthi- ness in large vision language models for autonomous driving, ” T rans. Mach. Learn. Res. , 2025. [175] ASAM, “ Asam openscenario xml 1.3.1, ” 2024. [Online]. A vailable: https://www .asam.net/standards/detail/openscenario- xml/ [176] ASAM, “ Asam openscenario dsl, ” Association for Standardization of Automation and Measuring Systems, T ech. Rep. V ersion 2.0.0, 2022. [177] ASAM, “ Asam opendriv e 1.8.1, ” 2024. [Online]. A vailable: https: //www .asam.net/standards/detail/opendrive/ [178] ASAM, “ASAM OpenODD: Concept Paper , ” 2021. [On- line]. A vailable: https://www .asam.net/ﬁleadmin/Standards/OpenODD/ ASAM OpenODD Concept Paper 2.html [179] ASAM e.V ., “ASAM OpenLABEL 1.0.0, ” 2021. [Online]. A vailable: https://www .asam.net/standards/detail/openlabel/ [180] NHTSA, “Standing general order 2021-01: Incident reporting for automated driving systems and level 2 advanced driver assistance systems, ” 2021. [Online]. A vailable: https://www .nhtsa.gov/ laws- regulations/standing- general- order - crash- reporting [181] J. M. Scanlon, K. D. Kusano, D. V . McGehee et al. , “Rave checklist: Recommendations for overcoming challenges in retrospective safety studies of automated driving systems, ” T rafﬁc Inj. Pre v . , vol. 25, no. sup1, pp. S22–S32, 2024. [182] United Nations Economic Commission for Europe, “UN Regulation No. 157: Uniform provisions concerning the approv al of v ehicles with regards to Automated Lane Keeping Systems (ALKS), ” 2021. [183] ——, “UN Regulation No. 155: Uniform provisions concerning the approv al of vehicles with regards to cybersecurity and cybersecurity management system, ” 2021. [184] ——, “UN Regulation No. 156: Uniform provisions concerning the approv al of vehicles with reg ards to software update and software update management system, ” 2021. [185] S. International, “Sae j3016 202104: T axonomy and deﬁnitions for terms related to driving automation systems for on-road motor vehicles, ” 2021. [Online]. A vailable: https://www .sae.org/standards/ content/j3016 202104/ [186] VIRES, “Opencrg 1.1.2, ” 2018. [Online]. A vailable: https://www . asam.net/standards/detail/opencrg/ [187] ASAM, “ Asam osi 3.7.0: Open simulation interface, ” 2024. [Online]. A vailable: https://www .asam.net/standards/detail/osi/ [188] International Organization for Standardization, “ISO 34502:2022 Road vehicles – T est scenarios for automated driving systems – Scenario based safety evaluation framework, ” 2022. [Online]. A vailable: https://www .iso.org/standard/78951.html [189] IEEE Standards Association, “IEEE Std 2846-2022: IEEE Standard for Assumptions in Safety-Related Models for Automated Dri ving Systems, ” 2022. [190] D. Sadigh, S. Sastry , S. A. Seshia et al. , “Planning for autonomous cars that le verage ef fects on human actions, ” in Robotics: Science and Systems , vol. 2, 2016, pp. 1–9. [191] L. de Penning, A. Garcez, L. C. Lamb et al. , “ A neural-symbolic cognitiv e agent for online learning and reasoning, ” in Pr oceedings of the T wenty-Second International Joint Conference on Artiﬁcial Intelligence , vol. 2, 2011, pp. 1653–1658. [192] M. Landajuela, B. K. Petersen, S. Kim et al. , “Discovering symbolic policies with deep reinforcement learning, ” in International Conference on Machine Learning , 2021, pp. 5979–5989. [193] C. Eom, D. Lee, and M. Kw on, “Selective imitation for efﬁcient online reinforcement learning with pre-collected data, ” ICT Express , vol. 10, no. 6, pp. 1308–1314, 2024. [194] D. Lee, C. Eom, and M. Kwon, “ Ad4rl: Autonomous driving bench- marks for ofﬂine reinforcement learning with value-based dataset, ” in 2024 IEEE International Confer ence on Robotics and Automation (ICRA) , 2024, pp. 8239–8245.

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment