NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction

December 03, 2025

Reading time: 5 minute

...

📝 Original Info

Title: NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction
ArXiv ID: 2512.03317
Date: 2025-12-03
Authors: Thomas Monninger, Zihan Zhang, Steffen Staab, Sihao Ding

📝 Abstract

Accurate environmental representations are essential for autonomous driving, providing the foundation for safe and efficient navigation. Traditionally, high-definition (HD) maps are providing this representation of the static road infrastructure to the autonomous system a priori. However, because the real world is constantly changing, such maps must be constructed online from on-board sensor data. Navigation-grade standard-definition (SD) maps are widely available, but their resolution is insufficient for direct deployment. Instead, they can be used as coarse prior to guide the online map construction process. We propose NavMapFusion, a diffusion-based framework that performs iterative denoising conditioned on high-fidelity sensor data and on low-fidelity navigation maps. This paper strives to answer: (1) How can coarse, potentially outdated navigation maps guide online map construction? (2) What advantages do diffusion models offer for map fusion? We demonstrate that diffusion-based map construction provides a robust framework for map fusion. Our key insight is that discrepancies between the prior map and online perception naturally correspond to noise within the diffusion process; consistent regions reinforce the map construction, whereas outdated segments are suppressed. On the nuScenes benchmark, NavMapFusion conditioned on coarse road lines from OpenStreetMap data reaches a 21.4% relative improvement on 100 m, and even stronger improvements on larger perception ranges, while maintaining real-time capabilities. By fusing low-fidelity priors with high-fidelity sensor data, the proposed method generates accurate and up-to-date environment representations, guiding towards safer and more reliable autonomous driving. The code is available at https://github.com/tmonnin/navmapfusion

💡 Deep Analysis

📄 Full Content

NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction Thomas Monninger1,2 Zihan Zhang*,3 Steffen Staab2,4 Sihao Ding1 1Mercedes-Benz Research & Development North America, USA 2University of Stuttgart, Germany 3University of California, San Diego, USA 4University of Southampton, United Kingdom Abstract Accurate environmental representations are essential for autonomous driving, providing the foundation for safe and efficient navigation. Traditionally, high-definition (HD) maps are providing this representation of the static road in- frastructure to the autonomous system a priori. However, because the real world is constantly changing, such maps must be constructed online from on-board sensor data. Navigation-grade standard-definition (SD) maps are widely available, but their resolution is insufficient for direct de- ployment. Instead, they can be used as coarse prior to guide the online map construction process. We propose NavMap- Fusion, a diffusion-based framework that performs itera- tive denoising conditioned on high-fidelity sensor data and on low-fidelity navigation maps. This paper strives to an- swer: (1) How can coarse, potentially outdated naviga- tion maps guide online map construction? (2) What ad- vantages do diffusion models offer for map fusion? We demonstrate that diffusion-based map construction provides a robust framework for map fusion. Our key insight is that discrepancies between the prior map and online perception naturally correspond to noise within the diffusion process; consistent regions reinforce the map construction, whereas outdated segments are suppressed. On the nuScenes bench- mark, NavMapFusion conditioned on coarse road lines from OpenStreetMap data reaches a 21.4 % relative im- provement on 100 m, and even stronger improvements on larger perception ranges, while maintaining real-time capa- bilities. By fusing low-fidelity priors with high-fidelity sen- sor data, the proposed method generates accurate and up- to-date environment representations, guiding towards safer and more reliable autonomous driving. The code is avail- able at https://github.com/tmonnin/navmapfusion. (∗) Work was done during an internship at Mercedes-Benz Research & Development North America. NavMap Fusion Camera Images U HD Map ℳ!" SD Map ℳ#" 𝒩 Figure 1. Overview of our NavMapFusion approach. Diffusion- based map construction starts from random noise and is condi- tioned on camera images and SD map to generate an HD map. 1. Introduction Accurate knowledge of static road infrastructure, such as lanes, dividers, and crosswalks, is essential for decision making in autonomous vehicles. This knowledge must be extracted from sensor data to react to the actual environment around the vehicle in real time. However, limited range and occlusions impose limits to pure perception-based on- line mapping. Navigation maps offer complementary global context but lack the resolution and might be outdated; con- sequently, they can be used as guidance [15, 22]. Lever- aging coarse priors during online HD-map construction can close perception gaps in occluded or far-distance regions, improving safety margins and planning performance. © 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Conflicts between the navigation prior and online sen- sor observations may stem from true environment changes (e.g., construction) or from limited sensor view (e.g., oc- clusion). A fusion algorithm must therefore perform context-aware reasoning: retaining correct but currently in- 1 arXiv:2512.03317v1 [cs.CV] 3 Dec 2025 visible structures while discarding obsolete ones. This is particularly challenging since prior maps are mostly fully correct, but sometimes locally wrong due to roadwork. An- other source of error is inaccurate localization, causing sys- tematic errors, drifts, or sudden jumps. The non-uniform spatial error profile of real-world maps renders heuristics- based map fusion unreliable. Classical late-stage fusion pipelines treat perception out- put and prior maps as separate layers, deferring a hard de- cision until the end; this struggles when the inputs dis- agree. Recent learning-based approaches use neural net- work architectures to condition the online map construc- tion with prior map information. Their deterministic fu- sion process makes it harder to discard stale information. In contrast, we embed the conditioning inside a diffusion framework, allowing the model to attenuate or amplify in- dividual elements in a probabilistic manner. Experiments on nuScenes confirm that integrating prior maps through a dif- fusion process is effective for map fusion

📄 Read Full PDF on ArXiv