The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models

PREPRINT , MARCH 2026 1 The Era of End-to-End Autonomy: T ransitioning from Rule-Based Dri ving to Large Dri ving Models Prof. Em. Eduardo Nebot, F ellow , IEEE, and J. Stephany Berrio Perez , Member , IEEE Abstract —A utonomous driving is undergoing a shift from modular rule-based pipelines toward end-to-end (E2E) learning systems. This paper examines this transition by tracing the evo- lution from classical sense–per ceive–plan–control ar chitectures to large driving models (LDMs) capable of mapping raw sensor input directly to driving actions. W e analyze recent developments including T esla’s Full Self-Driving (FSD) V12–V14, Rivian’ s Uniﬁed Intelligence platform, NVIDIA Cosmos, and emerging commercial robotaxi deployments, focusing on architectural de- sign, deployment strategies, safety considerations and industry implications. A key emerging product category is supervised E2E driving, often referr ed to as FSD (Supervised) or L2++, which several manufacturers plan to deploy fr om 2026 onwards. These systems can perform most of the Dynamic Driving T ask (DDT) in complex en vironments while requiring human supervision, shifting the driver’ s role to safety oversight. Early operational evidence suggests E2E learning handles the long-tail distrib ution of real-w orld driving scenarios and is becoming a dominant commercial strategy . W e also discuss how similar architectural advances may extend beyond autonomous vehicles (A V) to other embodied AI systems, including humanoid robotics. Index T erms —A utonomous driving, End-to-end learning, Large driving models, Full self driving supervised, Robotaxi, Neural networks I . I N T R O D U C T I O N T HE dev elopment of autonomous vehicles (A Vs) has been one of the deﬁning technological challenges of the twenty-ﬁrst century . For more than two decades, the dominant paradigm was a rule-based modular pipeline in which specialized subsystems handled sensing, perception, prediction, planning, and control independently [1]. Each module was individually engineered, extensiv ely hand-tuned, and interconnected through carefully speciﬁed interfaces. Al- though this approach deliv ered impressi ve results in structured en vironments [2] [3], it became increasingly brittle when con- fronted with the combinatorial complexity of real-world urban trafﬁc, the so-called long-tail problem [4]. By 2024-2025, a decisiv e architectural transition had taken shape. End-to-end (E2E) neural networks, trained on vast ﬂeets of real-world driving data, had begun to outperform modular stacks in terms of performance and passenger comfort metrics. T esla’ s Full Self-Driving (FSD) systems, having transitioned to true E2E operation with version 12, further expanded the paradigm with versions 13 and 14, incorporating audio-based en vironmental awareness, multi-second temporal reasoning, and potentially a mixture-of-models in its latest FSD softw are version [5]. In early 2026, NVIDIA un veiled an ecosystem for designing, The Australian Centre for Robotics, School of Aeronautical, Mechanical and Mechatronic Engineering, The University of Sydne y , Australia simulating, and validating autonomous driving in urban scenar- ios. The platform includes a world foundation model, a lar ge- scale dataset for ev aluation, and an open-source autonomous agent [6]. More recently , Rivian [7] presented its roadmap to autonomy , emphasizing end-to-end (E2E) approaches to training Large Driving Models using data collected from its ﬂeet. In June 2025, W aymo and T esla launched commercial robotaxi services in Austin, T exas. W aymo has been operating a sensor-rich approach over se veral years, whereas T esla, for the ﬁrst time, deployed production vehicles using a camera- only (E2E) strategy . T esla successful demonstrations helped pav e the way for a larger deployment of a new capability often referred to as FSD (Supervised), commonly described as “L2++” [8]. In practical terms, the system can perform most of the Dynamic Driving T ask (DDT) in urban en vironments, shifting the human role primarily to supervision. Furthermore, automotiv e OEMs hav e become increasingly proactiv e and hav e begun forming joint ventures with providers of E2E driv- ing technology , such as Rivian–V olkswagen [9] and Mercedes- Benz–NVIDIA [6] with plans to start to deploy L2++ for passenger cars in 2026 and potentially L4 in the near future. This paper provides an analysis of the transition to E2E, including a discussion of the technology implications of the wide acceptance of supervised FSD for driv ers and OEMs . Section 2 re views the traditional autonomous driving architec- ture and its limitations. Section 3 introduces the E2E learning paradigm and lar ge driving models. Section 4 analyzes the robotaxi landscape, comparing W aymo and T esla’ s approaches focusing on the recent deployment in Austin, T exas, in June 2025. Section 5 presents a full analysis of FSD supervised widely deployed by T esla across the latest E2E FSD version V13-V14. It also included a discussion of similar approaches from other OEMs. Section 6 explores the extension of E2E intelligence to humanoid robotics and simulation platforms, and Section 7 offers conclusions. I I . T R A D I T I O N A L A U T O N O M O U S D R I V I N G A R C H I T E C T U R E A. The Modular Pipeline The classical autonomous driving stack is typically orga- nized as a sequential pipeline of functional modules (see Figure 1a). The sensing layer collects raw data from onboard sensors such as cameras, LiDAR, radar , ultrasonic sensors, wheel encoders, inertial measurement units (IMUs), and GNSS receiv ers, providing complementary observations of the vehi- cle and its surroundings [10]. A mapping and localization layer integrates these measurements with prior map information to PREPRINT , MARCH 2026 2 Fig. 1. Authors in [16] deﬁne (a) the classical modular approach separates perception, prediction, and planning through intermediate representations such as bounding boxes and trajectories. (b) The end-to-end paradigm jointly learns interconnected modules, allo wing information ﬂow and backpropaga- tion across perception, mapping, prediction, and planning components. maintain a consistent spatial representation of the en vironment and estimate the precise pose of the ego vehicle relati ve to a local or high-deﬁnition (HD) map [11]. The perception module processes sensor data to detect and classify objects, identify driv able areas, and estimate the positions and velocities of surrounding agents. The prediction module then forecasts the future trajectories of these agents on a horizon of sev eral seconds [12]. Using this information, the planning module generates a safe and comfortable trajectory for the ego vehicle, while the control layer con verts the planned trajectory into actuation commands for the steering, throttle, and braking systems. In practice, each module may combine signal pro- cessing, computer vision, and machine learning methods. This modular architecture simpliﬁes debugging, enables targeted improv ements, and allows components to be replaced inde- pendently . HD maps often play a central role by providing detailed geometric and semantic information that reduces the perception burden on onboard algorithms and improves reliability . B. Limitations of Rule-Based Appr oaches Despite its conceptual clarity , the modular rule-based ar - chitecture has sev eral fundamental limitations [13]. First, the long-tail problem: the space of possible driving scenarios is effecti vely unbounded, making it impractical to hand-code rules for e very edge case. Engineers may spend years adding heuristics for rare situations that human dri vers handle intu- itiv ely and conserv ativ ely . Second, sensor and infrastructure dependence: many systems rely on expensi ve sensor suite, par- ticularly multi-beam LiDAR, radar , and continuously updated HD maps, which increases system cost and complexity [14]. Third, maintenance ov erhead: ensuring proper calibration [15] and operation of such complex systems requires signiﬁcant ongoing effort. While these costs may be manageable for com- mercial ﬂeets, they make lar ge-scale deployment in consumer vehicles far less practical. I I I . T H E S H I F T T O E N D - TO - E N D L E A R N I N G A N D L A R G E D R I V I N G M O D E L S A. The End-to-End P aradigm End-to-end learning approaches the dri ving task as a single optimization problem. Rather than decomposing the task into specialized modules, an E2E model recei ves raw sensor inputs, typically camera images, kinematic information, maps, desired trajectory and in some architectures also radar, laser , audio and directly outputs steering angles and acceleration commands. The entire model is trained jointly , allowing the gradients to ﬂow from the output loss all the way back to the raw input representations (see Figure 1b). In principle, this allows the model to ﬁnd intermediate representations that are optimally suited to the full dri ving task, rather than to any particular subtask. Early E2E approaches, such as NVIDIA ’ s D A VE [17] system in 2004 and its successor [18] in 2016, demonstrated that a con volutional neural network could learn to steer a vehicle from camera input alone. Ho wever , these systems were limited to relativ ely simple highway scenarios. The key insight driving the renaissance of E2E learning in the 2020s is scale: when trained on millions of hours of real-world dri ving data collected from large vehicle ﬂeets, E2E models exhibit emergent beha viors that were nev er explicitly programmed, in- cluding appropriate responses to unusual pedestrian behavior , construction zones, emergenc y vehicles, and adverse weather conditions. B. Larg e Driving Models The concept of Large Dri ving Models (LDMs) dra ws an explicit analogy with Large Language Models (LLMs) in natural language processing. Just as GPT -style transform- ers acquire broad linguistic competence from internet-scale text and other types of data, LDMs acquire broad driving competence from ﬂeet-scale driving data. Ke y architectural features include transformer-based sequence modeling for temporal reasoning over multiple seconds of driving history [19], tokenized/diffused representations of scenes that enable attention across spatial and temporal dimensions, and the use of imitation learning and reinforcement learning to align model behavior with human driver preferences and safety constraints. LDMs offer se veral adv antages o ver both modular systems and earlier small-scale E2E models. They generalize more robustly to nov el scenarios because they have been exposed to a vastly larger distribution of real-world situations during training. They can le verage weak supervision signals at scale, such as intervention events where a human driv er corrects the system, without requiring expensi ve manual labeling of ev ery training frame. The model can be continuously ﬁne-tuned as new data is collected from the deployed ﬂeet, allowing a c ycle of improv ement through the updates of the different models version. This is currently done progressi vely , using selected data that are automatically detected as rele vant for additional training. The system is then retrained using reinforcement learning to learn safer behaviors, and the updated polic y is deployed to the ﬂeet after the model is distilled. 1) T wo-step training: from “good human driver” to safer- than-human r obustness: End-to-end driving policies are com- monly developed through a two-step training process: Phase 1 imitation learning and Phase 2 reinforcement/edge-case learning, Fig. 2. The ﬁrst stage uses imitation learning to rapidly build a strong baseline by copying competent human driving [20]. The second stage uses reinforcement learning and edge-case-focused training to systematically improve safety and robustness, especially in rare high-consequence situations that are underrepresented in standard driving data. PREPRINT , MARCH 2026 3 Imitation Learning Curated Dataset Reinforcing Learning New data from Fleet: Edge cases, human, values, . . . . Baseline Driving Policy New Saf er Behaviors Model Updates Fig. 2. T raining Process for large Dri ving Models. Phase one use a curated dataset based on good driving behaviors to obtain a baseline model. In Phase 2, additional data, comprising ﬂeet-collected edge cases and synthetically generated scenarios, are used to further train the policy via reinforcement learning, yielding an updated model with improved safety . Phase 1 — Imitation Learning: cr eating a str ong, human- like baseline policy In the ﬁrst stage, a large vehicle ﬂeet collects real-world driving data while humans driv e, often described as operating in shadow mode. The model is trained through behavior cloning, learning to predict human control actions such as steering and acceleration from visual input. In addition to raw ﬂeet logs, labeled data and synthetic augmentation are used to expand cov erage in variations in weather , lighting, and trafﬁc density . The primary outcome of this stage is a baseline driving policy that performs well in common scenarios. Because the objectiv e of imitation learning is to match human actions, the resulting policy tends to beha ve like a good human driver: it adopts natural lane positioning, comfortable and predictable longitudinal control, socially compliant merging and yielding, and smooth interac- tions that align with how most driv ers expect other vehicles to behave. This is a k ey reason why E2E systems can feel “natural” to ride with: they internalize the implicit conv entions embedded in human driving. Limitation (the long tail): the same mechanism that makes imitation learning effecti ve under normal conditions also constrains it. The model can struggle with rare or unseen edge cases, simply because the relev ant examples are sparse or absent in the dataset, and human responses in unusual situations may be inconsistent. In other words, Phase 1 produces a baseline that is broadly competent, but not yet optimized for the safety-critical “long tail. ” Phase 2 — Reinforcement & edge-case learning: pushing be yond imitation toward safer behavior After establishing a strong baseline, the second stage targets the weaknesses of the policy by explicitly identifying edge cases where the model diver ges from the desired dri ving, including situations where it differs from skilled human be- havior or where the baseline is uncertain. Reinforcement learning and other related policy-improvement methods then reﬁne the policy using reward signals that directly encode desirable outcomes: increasing safety mar gins, smoothness: manage change of speed under changing road conditions, and reduced need for intervention. A central advantage of this phase is that it can use simulation and synthetic scenarios to generate additional rare or dangerous situations that would be difﬁcult, slo w , or unsafe to capture at scale in the real world. These scenarios allow policy to be tested and improved on ev ents such as abrupt cut-ins, unusual construction conﬁg- urations, atypical pedestrian behavior , complex right-of-way negotiations, and other low-frequenc y/high-risk interactions. This is a way of continuous improv ement of the models once a new set of special situations is collected from the ﬂeet of vehicles. Furthermore, this could be an essential mechanism to adapt the system to different cultures around the world. Before deployment, candidate improv ements can be validated through ﬂeet replay in shadow mode, where new trajectories are ranked and assessed against safety and comfort criteria. The practical outcome is model updates that introduce safer behaviors and improv e performance on a new number of rare and critical events. Why Phase 2 can be muc h safer and generalize better than imitation alone: Phase 2 differs fundamentally from Phase 1 in what it optimizes: Imitation learning optimizes similarity to human dri ving, so it tends to reproduce human-level per- formance and human-style biases. Reinforcement/edge-case learning optimizes e xplicit safety and robustness objectiv es, which can push the policy to ward safer-than-human behavior , particularly in scenarios where humans are inconsistent, dis- tracted, or statistically undertrained (rare ev ents). By repeat- edly focusing training on edge cases and rewarding outcomes that reduce risk (collision avoidance, correct yielding, stability , legality , minimal intervention), the system can potentially generalize beyond the patterns most commonly represented in human dri ving logs. This is the key mechanism by which an E2E policy can ev olve from “dri ves like a good human driv er” to “driv es in a way that is potentially safer than human driv ers, ” especially in the long tail. This second stage also introduces real challenges: reward design, simulation ﬁdelity , and compute constraints can limit how quickly and reliably safety improv ements translate to real-world gains. Nev ertheless, the two-stage structure is po werful: Phase 1 deliv ers scalable competence and Phase 2 delivers tar geted safety and rob ustness improvements, with particular impact on rare but consequential e vents. I V . R O B OTA X I D E P L OY M E N T S : T H E R OA D TO L E V E L 4 A U T O N O M Y A. W aymo ’ s commercial robotaxi service launched in Austin, T exas, in March 2025, representing the ﬁrst Lev el 4 autonomous ride-hailing operation in that market. W aymo’ s vehicles employ a sensor-fusion architecture that combines cameras, LiD AR, and Radar, and ha ve accumulated a substan- tial safety record in their San Francisco and Phoenix opera- tions. The Austin deployment leveraged W aymo’ s established operational design domain (ODD) methodology , beginning with a geofenced service area that was progressiv ely expanded as operational data conﬁrmed the performance of the system in local trafﬁc conditions. K ey metrics from the Austin operation include safety , reliability and passenger satisfaction score that compares fa vorably with traditional ride-hailing services [21]. W aymo’ s approach to safety validation emphasizes ex- tensiv e simulation testing, structured road trials, and staged geographic expansion governed by quantitati ve performance thresholds. B. T esla launched its robotaxi pilot in Austin in June 2025, initially deploying a limited ﬂeet of vehicles operating with a safety driv er on the passenger sit. Unlike W aymo’ s PREPRINT , MARCH 2026 4 sensor-fusion approach, T esla’ s robotaxi relies exclusi vely on cameras, with the FSD E2E architecture providing all perception and planning functions. The absence of LiDAR is a deliberate design choice reﬂecting T esla’ s claims that camera- only systems trained on sufﬁcient real-world data can match or exceed the performance of sensor-fusion approaches at a fraction of the hardware cost. The T esla robotaxi trial attracted signiﬁcant attention from regulators, industry analysts, and the public, given the higher stakes of fully driv erless operation. T esla used a remote monitoring and intervention capability that allowed operators to observe vehicle behavior and potentially intervene when needed. Early operational data indicated strong performance in the deﬁned service area, and the system exhibited particular strengths in handling complex intersection geometry and mixed trafﬁc en vironments. C. Comparative Analysis and Common Challenges Although W aymo and T esla adopt distinct architectural philosophies, the y face a broadly similar set of operational challenges. Both systems must handle degradation of sensor performance in typical weather conditions, navigate construc- tion zones with atypical road geometry , and respond appropri- ately to unpredictable human and other driv er behaviors. Pub- lic acceptance remains a key v ariable, and incidents in v olving either platform attract disproportionate media attention relative to the far larger number of unev entful miles accumulated. A critical point of dif ferentiation is scalability . W aymo’ s sensor- rich vehicles carry an estimated hardware cost per v ehicle that is an order of magnitude higher than that of a comparably capable T esla, limiting the speed at which W aymo can expand its ﬂeet. T esla’ s vision-only approach, if v alidated at scale, would represent a substantial cost adv antage and could enable robotaxi deployments in markets where the economics of LiD AR-equipped vehicles are prohibitive. D. Implications for Scalable Supervised Autonomy In general, the Austin deployments align with optimistic expectations. W aymo performed strongly with a sensor-rich vehicle platform, using LiD AR Radar Camera redundancy and tightly controlled ODD [22] to provide a reliable Level 4 ride- hailing service. At the same time, T esla’ s FSD demonstrates that a camera-only system running on production hardware can also achiev e robust performance within a deﬁned service area, an outcome with important implications for scalability and cost. Most importantly , T esla’ s results suggest a credible pathway to a widely deployable consumer technology that can perform nearly the full Dynamic Driving T ask (DDT) in urban en vironments while retaining a human supervisor for attention and intervention. In this paper , we refer to this emerging product category as FSD Supervised, often characterized as “L2++”. The commercial and technical traction of supervised E2E autonomy has not gone unnoticed: multiple companies beyond T esla, most notably NVIDIA and Ri vian, ha ve also released strategies and product roadmaps to deploy FSD supervised, which are examined in the follo wing section. V . F S D S U P E RV I S E D FSD Supervised is an end-to-end driving mode in which the driv er tells the car where to go and the vehicle performs the Fig. 3. FSD Supervised operation: The driv er select the destination and initiate the system; the vehicle will then perform 100% of the Dynamic Driving T ask (DDT) for the entire journey to the destination. driving task. In this case, the driv er remains responsible as is usually referred to as the supervisor . In practical terms, it is designed so that, once engaged, the car can carry out the 100% Dynamic Driving T ask (DDT) to the selected destination, while the driv er continuously monitors the operation and is ready to intervene if necessary , Fig 3. T o use it on an urban city trip, the workﬂo w is simple. First, the driv er selects a destination in the navigation system, exactly as normal GPS guidance. The driver then starts FSD Supervised; the system takes over the steering, acceleration, braking, lane selection, and intersection handling as it progresses along the route to the destination, Fig 3. Throughout the dri ve, the dri ver’ s job is active supervision: keep an e ye on the road, understand what the car is doing and be prepared to take control instantly , for example, if the system hesitates, beha ves unexpectedly or encounters a complex situation it does not handle well. In other words, the driver is not a passenger; the driv er’ s role has changed from continuously controlling the vehicle to monitoring and intervening only when needed. This becomes a signiﬁcant paradigm change for driving since in the majority of cases the car will perform 100% of DDT without any intervention of the driver . The following section describes in detail how the technology works and why it is so natural to the driv er . A. T esla Full Self-Driving: Arc hitecture and Deployment T esla’ s Full Self-Driving (FSD) programme pro vides one of the clearest real-world examples of the industry’ s shift tow ard end-to-end autonomy . This section summarises the architectural progression from V12 to V14 and links these advances to deployment outcomes, including robotaxi pilots and expansion into right-hand-driv e markets. 1) FSD V12: The F irst T rue End-to-End System: T esla’ s Full Self-Dri ving V12, released in 2024, marked a pivotal inﬂection point not only for T esla but for the wider autonomy industry , signaling to many de velopers that end-to-end archi- tectures were becoming a credible path to scalable autonomous driving. For the ﬁrst time, the entire driving stack from camera inputs to vehicle actuation w as replaced by a single neural network, eliminating the thousands of lines of rule-based C++ code that had characterized previous FSD versions. The model PREPRINT , MARCH 2026 5 was trained end-to-end on dri ving data collected from the T esla ﬂeet of millions of vehicles, using human dri vers’ interv entions as a training signal. FSD V12 demonstrated a step-change improv ement in natural driving behavior , handling a wider range of scenarios without the abrupt and hesitant maneuvers characteristic of earlier versions. 2) FSD V13: Scale and Resolution: FSD version 13 e x- tended the E2E architecture with higher-resolution video in- puts, increasing the ef fectiv e temporal sampling rate to ap- proximately 36 Hz. This improvement allowed the model to better track fast-moving objects and to exploit ﬁner temporal dynamics in the driving scene. T raining data was substantially expanded, including a broader distribution of geographic lo- cations, weather conditions, and trafﬁc patterns. V ersion 13 also introduced improvements to the model’ s handling of unprotected turns, construction zones, and interactions with cyclists and pedestrians, areas that had represented persistent challenges for earlier versions. This model was a signiﬁcant step change in performance and was used in the initial robotaxi trials in Austin. This model was also adapted for RHD driving and is currently being deployed in vehicles in Australia and New Zealand. 3) FSD V14: Extended Context and Multimodal inputs: FSD V14 marks the most signiﬁcant architectural step change so far , with the model scale increasing by roughly a factor of 30 compared to V12. V arious Reports claim that V14 is potentially based on a Mixture of Models (MoM) architec- ture, in which a learned routing mechanism selects among a set of specialized e xpert sub-networks depending on the current dri ving context. This approach allo ws the model to allocate computational resources efﬁciently while maintaining specialized competence in di verse scenarios ranging from dense urban intersections to high-speed motorway ov ertaking maneuvers. A particularly notable innov ation in FSD V14 is the integration of audio processing. The model recei ves microphone input as an additional sensory modality , enabling it to detect emergenc y vehicle sirens, construction noise, and other auditory cues that are invisible to cameras but highly informativ e for driving decisions. The V14 architecture also supports multi-second temporal reasoning, maintaining a com- pressed latent representation of dri ving history that informs decisions in scenarios requiring anticipation of future trafﬁc states. Operationally , V14 signiﬁcantly reduces the frequency of intervention prompts directed at the dri ver , reﬂecting greater model conﬁdence and smoother overall performance. This is the model that is currently deployed in USA, Canada, South K orea and being demonstrated in various countries across Europe. B. Other Similar FSD supervised T ec hnologies 1) Rivian: Fleet Data Flywheel and Lar ge Driving Models: Rivian’ s autonomy roadmap is structured around a ﬂeet learn- ing loop in which real-world operation continuously generates the data required to improv e an end-to-end (E2E) driving policy . In the “Rivian Data Flywheel, ” customer vehicles operate as a distributed sensing network, producing millions of miles of dri ving that can be curated for training and ev aluation. A ke y element of this loop is the Autonomy Data Recorder (ADR), which selecti vely captures and uploads “important, interesting ev ents, ” thereby concentrating bandwidth and la- beling effort on the scenarios that matter most for model improv ement. The resulting model is then distilled and rolled out to the ﬂeet via ov er-the-air (O T A) updates, creating a compound improv ement cycle. Rivian’ s technical narrati ve combines E2E learning with high-ﬁdelity sensing and centralized computation. Their cur- rent Gen 3 autonomy platform is described as using a multi- sensor “trinity , ” including 11 Cameras (65 megapixels), a front facing Laser and ﬁ ve Radars, one front imaging and four corner units, aiming to provide robust perception in di verse conditions. T o process the resulting data stream and support increasingly complex models, Rivian emphasizes vertical inte- gration at the compute layer , noting that it has chosen to b uild its own silicon. This platform is presented as the foundation for scaling the capacity of the model and reducing dependence on supplier timelines. At the core of Rivian’ s approach is the Large Dri ving Model (LDM), trained end to-end to map raw sensor input to v ehicle trajectory . Importantly , the improve- ment stage is framed as reinforcement learning aligned not with generic “human values, ” but speciﬁcally to safe, perfor- mant and smooth dri ving, i.e. optimization towards measurable driving objectives rather than pure imitation. Rivian’ s public “road to autonomy” is presented as a progression from wide- cov erage hands-free dri ving to point-to-point capability and ev entually “personal lev el 4” operation, potentially in 2026. Parallel to the autonomy roadmap, Rivian positions software centralization as strate gically important, highlighted by the Rivian–V olksw agen joint v enture, which emphasizes move- ment from scattered ECUs to centralized computing, enabling efﬁcient O T A updates and uniﬁcation of v ehicle functions, including autonomy , under a single software foundation. T aken together , these elements depict a coherent E2E strategy: a data ﬂywheel to source edge cases at scale, transformer-based trajectory prediction, reinforcement-learning-based reﬁnement tow ard safety and smoothness, and vertically integrated com- pute and software distribution to accelerate iteration cycles. 2) NVIDIA: W orld F oundation Models, “Inﬁnite Simula- tion, ” and Dual-Stack Safety with Mercedes: NVIDIA ’ s auton- omy proposition is built around the concept of physical AI: large-scale models that learn reusable representations of the world and can be applied to prediction, generation, and closed- loop decision-making. NVIDIA de veloped Cosmos as a “w orld foundation model” described as the engine that learns the laws of physics. In this framing, Cosmos supports an inﬁnite simu- lation loop intended to enable testing and learning at extremely large scale traveling trillions of miles inside a computer by repeatedly generating scenarios and e valuating agent responses in closed loop. The loop includes a reasoning/action step in which Cosmos “reasons through edge scenarios, ” analyzes what could happen next, and decomposes complex events into familiar physical interactions, thereby enabling interactiv e simulation without lea ving the lab . Building on this simulation- centric approach, Nvidia dev eloped Alpamayo, also referenced as a thinking autonomous agent, as an end-to-end agent in which the raw sensor input is directly mapped to steering PREPRINT , MARCH 2026 6 and braking through a single “reasoning model. ” Alpamayo is described as open source and on the order of 10B parameters [23], suggesting NVIDIA ’ s intent to catalyze an ecosystem around a reference E2E dri ving agent rather than limiting innov ation to proprietary stacks. It also of fers an open dataset with more than 1700 hours of dri ving. A distincti ve archi- tectural idea in the NVIDIA approach is a dual-stack safety architecture that runs an E2E policy alongside a more classical deterministic guardrail stack. The “Policy & Safety Evaluator” monitors conﬁdence in real time and, if conﬁdence drops, switches back to the classical guardrail. This div ersity-and- redundancy design seeks to combine the strong performance of an E2E agent in complex scenes with the predictability and traceability of rule-based safeguards, providing a pragmatic path for incremental deployment under safety constraints. C. Why the E2E system behaves so natural: T esla FSD Implementations such as T esla E2E run at high frequencies, in this case 36 Hz, allo wing the v ehicle to acquire a 360-degree state of the world e very 27.7 milliseconds. This rapid sampling and response cycle far exceeds human reaction times, enabling the system to solve immediate problems close to the vehicle with an intuition that feels ﬂuid to the driv er . The Driver Interface for Building T rust: A natural feel is not only mechanical; it is also psychological because the driv er is constantly forming expectations about what the vehicle will do next. If the system behav es predictably , communi- cates its intent, and matches common driving con ventions, the dri ver’ s mental model aligns with the car’ s [24]. That alignment reduces uncertainty and anxiety and increases trust during supervision. The user interface (UI) serves as a bridge, communicating the v ehicle’ s thoughts to the human supervisor . Modern FSD interfaces build trust by pro viding the follo wing: • Explainability: High-performing models now predict in- terpretable outputs, such as 3D occupancy , ﬂo w , and trafﬁc semantics. • Predictiv e V isualization: The user interface communi- cates the next action of the vehicle before it occurs, reducing passenger anxiety , Fig. 4 • Anticipation: In critical maneuvers, the system provides multimodal feedback to the dri ver and, in some cases, cues to other road users about its intentions. An e xample is merging onto a main road. Before merging, the system provides progressiv ely richer visual, haptic and dynamic cues, such as slight steering-wheel movement and gentle acceleration, to alert the dri ver that the vehicle is prepar- ing to move, Fig. 5 • System Reasoning: Advanced agents, such as those pur- sued by T esla, NVIDIA, and Rivian, can potentially decompose comple x scenarios into plain-language ex- planations, clarifying why a particular decision (e.g., “gentle braking for a pedestrian”) was made. Signiﬁcant research is still needed to develop seamless ways of communicating this information to drivers and, where appropriate, to other road users. In sum, E2E FSD feels natural because it behaves less like a rule-driven machine and more like a skilled dri ver that Fig. 4. V isualization Interface: The system provides the driver with clear situational awareness and communicates the vehicle’ s immediate intended action and trajectory . (a) T op left: the vehicle will exit the roundabout. (b) T op right: the vehicle will follow the road while turning right. (c) Bottom right: the vehicle will turn right at an intersection. (d) The vehicle will proceed straight, while the interface will display road infrastructure and nearby vehicles to indicate the vehicle’ s intended path and surrounding context. Fig. 5. V isual–Haptic–Dynamic interface: (a) T op left: the vehicle is waiting for a gap in trafﬁc to merge. (b) T op right: Haptic—the steering wheel begins to rotate, indicating the vehicle has detected an opening. (c) T op left: V isual—the planned trajectory starts to appear; Haptic—the steering wheel moves; Dynamic—the vehicle accelerates slowly to alert the driver that it is about to mov e. (d) Visual—the full trajectory is displayed in blue; Dynamic—the vehicle begins the mer ge. ex ecutes a continuous stream of small and conﬁdent correc- tions. The high-frequency loop enables immediate responses and smooth control, while the data-driven policy captures the subtle social norms of driving learned from lar ge-scale demonstrations. Reinforcement learning then concentrates on the rare and safety-critical situations where imitation alone is insuf ﬁcient, producing behaviors that can be more con- servati ve, more consistent, and potentially safer than typical human responses. Finally , the UI closes the trust gap by making intent legible through predicti ve trajectories, inter- pretable scene output, and multimodal cues so that the human supervisor can anticipate actions rather than react to surprises. The combination of continuous control, ﬂeet-scale learning, edge-case optimization, and transparent intent communication is what transforms E2E supervised autonomy from technically capable to something that feels intuitively competent to ride with and supervise. PREPRINT , MARCH 2026 7 D. P erformance boundaries, strate gic limitations, and the safety evaluation agenda (T esla FSD Supervised) T esla’ s current E2E implementation can be interpreted through the lens of dual-process decision-making: it exhibits strong System 1 capabilities [25], fast, reactiv e and highly ﬂuent control, while still sho wing System 2 limitations related to longer horizon planning and strategy . In close range driving, the system performs particularly well in solving immediate problems around the vehicle, supported by high rate sampling (36Hz) and continuous estimation of a 360-de gree state of the environment. This enables rapid reactions after each ob- servation cycle and contributes to smooth lane changes, con- servati ve safety margins, and effecti ve interaction with nearby road users, including pedestrians, bic ycles, and motorcycles while maintaining a wareness and rule compliance (e.g., speed control, stop signs) and priority behaviors aligned with human values, such as yielding to pedestrians. Despite these strengths, the system can still improve in System 2 functions that require anticipatory planning ov er longer horizons and complex route intent. Reported limitations include route planning that is logically correct but sometimes difﬁcult to ex ecute under heavy trafﬁc, late preparation for multi-lane positioning (e.g., needing to enter a main avenue shortly before multiple lane changes to turn), and inconsistent early lane selection for right- and left-hand turns particularly in RHD countries, where right- hand turns present distinctiv e challenges. Additional behaviors noted for improv ement include speed sign interpretation and speed selection, often conservati ve and sometimes failing to account for variable speed time zones, which can cause the vehicle to obey a lower posted limit when contextual timing rules should apply . E. FSD Supervised: Deployment and Geogr aphic Expansion T esla’ s FSD Supervised product makes E2E autonomous driving a vailable to consumers while retaining the requirement of driv er attention and readiness to intervene. By early 2026, FSD Supervised had accumulated tens of billion miles of operation across North America, Canada, South K orea and had been deployed in right-hand-drive (RHD) markets, most notably Australia/New Zealand. The RHD deployment rep- resents a signiﬁcant engineering achie vement, as it requires the model to generalize to mirror-image road geometry , dif- ferent road markings, and dif ferent trafﬁc con ventions. The deployment also has important implications for original equip- ment manufacturers (OEMs) beyond T esla. The av ailability of independent FSD providers and licensable LDM-based solutions is expected to accelerate industry-wide adoption of E2E architectures through 2026 and 2027. Ri vian and Nvidia- Mercedes CLA will introduce FSD supervised in 2026 in US, Europe, and potentially Asia. Furthermore, This technology could also be incorporated in V olkswagen vehicles as part of the joint venture with Rivian. In summary , 2026 is likely to mark the start of large-scale consumer deployment of supervised E2E dri ving across multiple OEMs, moving the technology from a single-vendor offering to ward a broader industry capability . This expansion will potentially not be “one-size-ﬁts-all”: each rollout must be adapted to country- speciﬁc road geometry , signage and markings, traf ﬁc rules, driving culture, mapping conv entions, and regulatory require- ments, including differences between the markets of left- and right-hand driv e. As the deployment scales, the main limiting factor will increasingly shift from model capability alone to safe operational integration, notably user training, clear supervision responsibilities, and consistent human machine interaction design that teaches driv ers how to monitor , when to intervene and how to interpret the intent of the vehicle. These requirements imply that successful adoption in 2026–2027 will depend as much on localization and education programs as on continued improv ements in end-to-end model performance. F . Safety in FSD (Supervised): metrics, methodology , and evidence Unlike Level-4 robotaxi services, FSD (Supervised) safety depends on (i) the ability of the end-to-end policy to execute the Dynamic Driving T ask (DDT) and (ii) the human super- visor’ s ability to monitor and intervene when required [26]. A defensible safety case, therefore, treats driver monitoring, interface design, and intervention dynamics as part of the system. Interpr eting safety thr ough baselines and outcome severity: A rigorous ev aluation of safety must specify the baseline (e.g., av erage human dri ving) and separate outcomes by severity , minor vs major collisions. This is essential because supervised E2E systems can reduce certain types of crash while leaving others largely unchanged, and because early rollouts can be biased toward easier roads, better weather, or more attentiv e users. Obtaining the baseline is not a simple exercise. The limited set of recent safety disclosures from industry generally benchmark performance against average human driving, rather than against matched vehicle ﬂeets or comparable driv er populations [27], [28], [29]. This av erage includes bad driv ers, drunk dri vers, etc. At the same time although we w ant a system to perform better than a good driv er , we recognize that the demographics of dri vers include dri vers with very different skills and dif ferent ages. Furthermore, it is also important to consider the type of technology included in a car when ev aluating the baseline. Nev ertheless, the fact that we start to see updated safety performance information from companies becomes a very important ﬁrst step. Wher e the system fails vs where it excels: For supervised E2E, the most useful safety evidence is not only collisions but also precursors of an accident: near misses, harsh braking, unsafe gap acceptance, and late or abrupt maneuvers that force other road users to react. In parallel, the assessment should explicitly document: (i) where the system is prone to issues, (ii) where it outperforms a normal driv er , and (iii) how these regions could potentially change with software updates. Interpr eting T esla’ s r eported collision-rate snapshot: T esla reports miles dri ven before a collision comparisons that in- dicate higher miles-per-collision when FSD (Supervised) is engaged than for broader benchmarks. The chart shows ap- proximately 5.1 million miles per major collision with FSD (Supervised) engaged versus about 0.7 million for a U.S. benchmark, and approximately 1.5 million miles per minor PREPRINT , MARCH 2026 8 Fig. 6. Miles driv en before major/ minor collisions (non-highway , worldwide), showing higher distances between collisions for T esla vehicles using FSD (Supervised) compared to manual driving and the U.S. average. [30] collision versus about 0.23 million for the same benchmark. According to their conclusions, the system is approximately 7 times safer than an average driv er . These ﬁgures should be interpreted as indicative rather than deﬁnitiv e safety evidence, unless they are accompanied by transparent normalization for e xposure and selection effects (e.g., who uses the feature, where and when it is used, operat- ing conditions, collision deﬁnitions, and reporting methodol- ogy . In addition, the comparison to an average dri ver considers many older cars that do not have any safety technology . A case could be made that a more appropriate comparison could be made with T eslas driv en manually with actual safety features. In this case, the result is still safer but 1.7 times instead of 7. Nevertheless, the fact that we start to see updated information on safety performance from companies becomes a very important step [31] Who beneﬁts most in terms of se gmentation and policy r elevance: The safety impact should be e valuated by user seg- ment (e.g., older adults, young drivers, commuters, families, people with disabilities), since the baseline risk, exposure, and supervision capacity v ary substantially between populations. Segmentation is also important for public policy because the net societal beneﬁt of supervised E2E autonomy may be high- est where it reduces high-risk exposure without introducing supervision risk. In summary , a credible safety case for FSD (Supervised) must e valuate the combined human–automation system, using clearly deﬁned baselines, se verity-stratiﬁed outcomes, and leading indicators such as interventions and near-miss proxies. T esla’ s published ﬂeet statistics provide useful early signals and help frame the discussion, but robust conclusions re- quire careful normalization of exposure and fair comparisons between vehicle capabilities and driver populations. As su- pervised E2E deployment expands, safety assessment will increasingly depend on transparent reporting, continuous mon- itoring throughout the long tail, and segmentation to identify where technology delivers the greatest net beneﬁt. V I . B E Y O N D V E H I C L E S : E N D - T O - E N D I N T E L L I G E N C E I N R O B O T I C S Architectural and training advances that ha ve accelerated end-to-end (E2E) autonomy in vehicles large-scale imita- tion learning, reinforcement learning for edge cases, high- frequency closed-loop control, and ﬂeet-scale data pipelines are increasingly being transferred to robotics. The under- lying problem can be considered similarly: mapping high- dimensional sensory inputs to safe and stable actions under uncertainty in complex open-world en vironments. This section outlines two representati ve trajectories toward scalable embod- ied intelligence: T esla’ s transfer of autonomy-style E2E learn- ing to humanoids and NVIDIA ’ s foundation-model approach with simulation at scale, as a transferable technology for robot ﬂeets. A. T esla Optimus The same architectural innov ations driving progress in A Vs are being applied to humanoid robotics. T esla’ s Optimus program dev elops bipedal robots trained using E2E imitation learning from human demonstrations and reinforcement learn- ing in simulation. The robot uses a camera-based perception system that shares architectural elements with FSD, and its motor control policies are trained end-to-end on manipulation and locomotion tasks. By 2025, Optimus had been deployed in T esla’ s Fremont factory performing quality inspection and light assembly tasks, and T esla had announced plans to man- ufacture the robot on a scale for external customers. B. NVIDIA Pr oject GR00T NVIDIA ’ s Project GR00T (Generalist Robot 00 T echnol- ogy) is a foundation model initiativ e for humanoid robots. GR00T aims to provide a generalized sensorimotor policy that can be ﬁne-tuned to speciﬁc robot platforms and tasks, analogous to the role played by large language models as a foundation for do wnstream natural language applications. The model is trained on a combination of human motion capture data, simulation episodes generated by NVIDIA ’ s Isaac Sim platform, and real-world robot demonstrations. GR00T’ s architecture is explicitly designed to lev erage NVIDIA ’ s Cos- mos physical world simulator, which generates photorealis- tic, physics-accurate training data at scales that would be prohibitiv ely expensiv e to collect in the real world. Overall, these efforts suggest a conv ergence toward data-driven end- to-end sensorimotor policies that can be improv ed iteratively through a combination of real-world experience and simula- tion. As these systems mature, key research challenges will focus on long-term robustness, safety v alidation, and ef fective human–robot interaction during supervision and deployment. PREPRINT , MARCH 2026 9 V I I . C O N C L U S I O N This paper has argued that autonomous driving is entering an era in which end-to-end (E2E) learning and Large Dri ving Models (LDMs) are displacing traditional modular pipelines as the dominant engineering and commercial strate gy . The limitations of sense–perceiv e–plan–control stacks, particularly brittleness in the long tail, high integration burden, and reliance on expensi ve sensors and map maintenance, hav e created strong incentiv es for uniﬁed learned policies trained at ﬂeet scale. In contrast, modern E2E systems lev erage two complementary training stages: imitation learning to achieve good human driver competenc y and reinforcement learning to concentrate improvement on edge cases, safety objectiv es, and rob ust generalization. This shift is no longer theoret- ical, it is reﬂected in product deployments and roadmaps from T esla, FSD (V12–V14), Ri vian’ s LDM program and NVIDIA ’ s physical-AI ecosystem, and it is reinforced by the emerging alignment between autonomy providers and OEM partners. A ke y conclusion is that the near-term inﬂection point of the industry is the supervised E2E autonomy , here referred to as FSD Supervised or L2++. The robotaxi pilots discussed in this paper illustrate a credible pathway: sensor- heavy Level 4 systems can deliver strong performance inside tightly deﬁned operational design domains, while camera-only approaches running on production vehicles can also achie ve robust operation in constrained deployments. The latter is strategically signiﬁcant because it implies a cost structure compatible with mass-market vehicles. If validated and scaled, supervised E2E systems could execute most of the Dynamic Driving T ask in urban en vironments, shifting the human role from continuous operator to safety supervisor , a change with implications comparable to other major transitions in transportation technology . Howe ver , wide deployment depends on more than model capability . The primary bottlenecks are increasingly operational and social: (i) localization to country- speciﬁc road rules, signage, infrastructure, and driving culture, including left vs right-hand driv e con ventions, (ii) consistent human–machine interface design that makes intent legible, and (iii) user education and supervision protocols that reduce misuse and clarify responsibility . In parallel, safety assessment must mature from headline collision-rate comparisons toward a transparent framework that combines sev erity-stratiﬁed out- comes with leading indicators: interventions, near misses, conﬂict metrics, exposure normalization, and se gmentation across user groups and operating conditions. Manufacturer ﬂeet statistics are v aluable early signals, but robust conclusions require careful methodology and comparability . Finally , the paper highlights that the same architectural advances that power E2E dri ving, high-frequency closed-loop control, ﬂeet- scale data pipelines, reinforcement learning for rare e vents, and simulation at scale are no w accelerating robotics. As autonomy and robotics conv erge toward general-purpose physical AI, the research agenda expands: scalable safety validation, sim-to- real transfer, interpretable intent communication, and imple- mentations of continuous O T A updates will be as important as model accuracy . The e vidence re viewed here supports a forward-looking conclusion: 2026–2027 is likely to be the beginning of a broad commercialization of supervised E2E autonomy , with success determined by a combination of tech- nical robustness, safety gov ernance, localization and human adaptation. R E F E R E N C E S [1] H. Liu, Z. Cao, X. Y an, S. Feng, and Q. Lu, “ Autonomous vehicles: A critical revie w (2004-2024) and a vision for the future, ” Author ea Pr eprints , 2025. [2] H. Durrant-Whyte, D. Pagac, B. Rogers, M. Stevens, and G. Nelmes, “Field and service applications – an autonomous straddle carrier for movement of shipping containers: From research to operational au- tonomous systems, ” IEEE Robotics & Automation Magazine , vol. 14, no. 3, pp. 14–23, 2007. [3] FutureBridge, “ Autonomous haulage systems: The fu- ture of mining operations. ” https://www .futurebridge. com/industry/perspectiv es- industrial- manufacturing/ autonomous- haulage- systems- the- future- of- mining- operations/, n.d. Report 2: Autonomous Mining Equipment. [4] H. X. Liu and S. Feng, “Curse of rarity for autonomous vehicles, ” Natur e Communications , vol. 15, p. 4808, 2024. [5] A. Elluswamy , “Building an autonomous future. ” Proceedings of ICCV , October 2025. Conference talk, Oct 19–24, 2025. A vailable at: https: //youtu.be/NqQv6GnDK4k. [6] J. Huang, “Nvidia ai technology for autonomous driving. ” CES 2026 Ke ynote, 2026. https://youtu.be/NqQv6GnDK4k. [7] R. Scaringe, “Rivian autonomy & ai day . ” https://www .youtube.com/ liv e/mIK1Y8ssXnU, 2025. [8] Reuters, “How tesla and waymo’ s radically dif ferent robotaxi approaches will shape the industry , ” Reuters , Aug. 2025. Accessed: 2026-03-13. [9] Rivian and V olksw agen Group, “Ri vian v olkswagen group joint venture. ” https://rivian vw .tech, 2025. [10] J. Iba ˜ nez-Guzm ´ an, C. Laugier, J.-D. Y oder, and S. Thrun, “ Autonomous driving: Context and state-of-the-art, ” in Handbook of Intelligent V ehi- cles , Springer , 2012. [11] B. W ijaya, K. Jiang, M. Y ang, T . W en, Y . W ang, X. T ang, Z. Fu, T . Zhou, and D. Y ang, “High deﬁnition map mapping and update: A general overvie w and future directions, ” ArXiv , vol. abs/2409.09726, 2024. [12] M. Liang, B. Y ang, W . Zeng, Y . Chen, R. Hu, S. Casas, and R. Urtasun, “Pnpnet: End-to-end perception and prediction with tracking in the loop, ” in Proceedings of the IEEE/CVF confer ence on computer vision and pattern reco gnition , pp. 11553–11562, 2020. [13] E. Leong, “Bridging the gap between modular and end-to-end au- tonomous driving systems, ” technical report, University of California, Berkeley , Electrical Engineering and Computer Sciences, May 2022. [14] M. Sadaf, Z. Iqbal, A. R. Javed, I. Saba, M. Krichen, S. Majeed, and A. Raza, “Connected and automated vehicles: Infrastructure, applica- tions, security , critical challenges, and future aspects, ” T echnologies , vol. 11, no. 5, 2023. [15] P . Meruva, “Sensor calibration is critical to the fu- ture of automated vehicles. ” https://www .trucks.vc/blog/ sensor- calibration- is- critical- to- the- future- of- automated- v ehicles, Apr . 2020. T rucks VC Research Brief, accessed 2026-03-13. [16] L. Chen, P . W u, K. Chitta, B. Jaeger , A. Geiger , and H. Li, “End-to- end autonomous driving: Challenges and frontiers, ” IEEE T rans. P attern Anal. Mach. Intell. , vol. 46, p. 10164–10183, Dec. 2024. [17] Net-Scale T echnologies, Inc., “ Autonomous of f-road v ehicle control using end-to-end learning, ” tech. rep., Net-Scale T echnologies, Inc., July 2004. Final T echnical Report. A vailable at: http://net- scale.com/doc/ net- scale- dav e- report.pdf. [18] M. Bojarski, D. D. T esta, D. Dworako wski, B. Firner , B. Flepp, P . Goyal, L. D. Jackel, M. Monfort, U. Muller , J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self-driving cars, ” 2016. [19] H. Shao, Y . Hu, L. W ang, G. Song, S. L. W aslander , Y . Liu, and H. Li, “Lmdriv e: Closed-loop end-to-end driving with large language models, ” in Pr oceedings of the IEEE/CVF Confer ence on Computer V ision and P attern Reco gnition , pp. 15120–15130, 2024. [20] S. Paniego Blanco et al. , “ Autonomous driving in trafﬁc with end-to- end vision-based deep learning, ” Neurocomputing , vol. 594, p. 127855, 2024. [21] T . Charmet, V . Cherf aoui, J. Iba ˜ nez-Guzm ´ an, and A. Armand, “Ov erview of the operational design domain monitoring for safe intelligent vehicle navigation, ” in 2023 IEEE 26th International Conference on Intelligent T ransportation Systems (ITSC) , pp. 5363–5370, IEEE, 2023. PREPRINT , MARCH 2026 10 [22] Y . Li and J. Iba ˜ nez-Guzm ´ an, “LiDAR for autonomous driving: The principles, challenges, and trends for automotiv e LiDAR and perception systems, ” IEEE Signal Pr ocessing Magazine , vol. 37, no. 4, pp. 50–61, 2020. [23] NVIDIA, Y . W ang, W . Luo, J. Bai, Y . Cao, T . Che, K. Chen, Y . Chen, J. Diamond, Y . Ding, W . Ding, L. Feng, G. Heinrich, J. Huang, P . Karkus, B. Li, P . Li, T .-Y . Lin, D. Liu, M.-Y . Liu, L. Liu, Z. Liu, J. Lu, Y . Mao, P . Molchanov , L. Pav ao, Z. Peng, M. Ranzinger, E. Schmerling, S. Shen, Y . Shi, S. T ariq, R. T ian, T . W ekel, X. W eng, T . Xiao, E. Y ang, X. Y ang, Y . Y ou, X. Zeng, W . Zhang, B. Ivano vic, and M. Pavone, “ Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail, ” 2026. Preprint. [24] C. Ola verri-Monreal, “Promoting trust in self-driving vehicles, ” Nature Electr onics , vol. 3, pp. 292–294, 2020. [25] D. Kahneman, Thinking, F ast and Slow . New Y ork: Farrar, Straus and Giroux, 2011. [26] C. Olav erri-Monreal, “ Automated driving: A literature review of the take over request in conditional automation, ” Electronics , vol. 9, no. 12, p. 2087, 2020. [27] R. Izquierdo, J. Alonso, O. Benderius, M. ´ A. Sotelo, and D. Fern ´ andez- Llorca, “Pedestrian and passenger interaction with autonomous vehicles: Field study in a crosswalk scenario, ” International Journal of Human– Computer Interaction , pp. 1–19, 2024. [28] M. M. Hussien, A. N. Melo, A. L. Ballardini, C. Salinas Maldonado, R. Izquierdo, and M. ´ A. Sotelo, “RA G-based explainable prediction of road users behaviors for automated driving using knowledge graphs and large language models, ” Expert Systems with Applications , vol. 265, p. 125914, 2025. [29] W . Morales-Alv arez, M. Marouf, H. H. T adjine, and C. Olaverri- Monreal, “Real-world ev aluation of the impact of automated driving system technology on driver gaze behavior , reaction time and trust, ” in 2021 IEEE Intelligent V ehicles Symposium (IV) , IEEE, 2021. [30] B. T empleton, “T esla ﬁnally releases fsd crash data that appears more honest, ” Nov . 2025. Accessed 2026-02-27. [31] T esla, Inc., “T esla full self-driving (fsd) safety report. ” https://www .tesla. com/en au/fsd/safety, 2025.

The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment