CAD-SLAM: Consistency-Aware Dynamic SLAM with Dynamic-Static Decoupled Mapping
Recent advances in neural radiation fields (NeRF) and 3D Gaussian-based SLAM have achieved impressive localization accuracy and high-quality dense mapping in static scenes. However, these methods remain challenged in dynamic environments, where moving objects violate the static-world assumption and introduce inconsistent observations that degrade both camera tracking and map reconstruction. This motivates two fundamental problems: robustly identifying dynamic objects and modeling them online. To address these limitations, we propose CAD-SLAM, a Consistency-Aware Dynamic SLAM framework with dynamic-static decoupled mapping. Our key insight is that dynamic objects inherently violate cross-view and cross-time scene consistency. We detect object motion by analyzing geometric and texture discrepancies between historical map renderings and real-world observations. Once a moving object is identified, we perform bidirectional dynamic object tracking (both backward and forward in time) to achieve complete sequence-wise dynamic recognition. Our consistency-aware dynamic detection model achieves category-agnostic, instantaneous dynamic identification, which effectively mitigates motion-induced interference during localization and mapping. In addition, we introduce a dynamic-static decoupled mapping strategy that employs a temporal Gaussian model for online incremental dynamic modeling. Experiments conducted on multiple dynamic datasets demonstrate the flexible and accurate dynamic segmentation capabilities of our method, along with the state-of-the-art performance in both localization and mapping.
💡 Research Summary
CAD‑SLAM addresses two fundamental challenges in dynamic SLAM: (1) detecting moving objects without relying on predefined semantic categories, and (2) simultaneously tracking and reconstructing both the static background and the dynamic entities in real time. The authors observe that a static‑world assumption guarantees consistency across viewpoints and timestamps; any violation of this consistency manifests as geometric and texture discrepancies between a rendered historical map and the current RGB‑D observation. Leveraging this insight, CAD‑SLAM builds a consistency‑aware dynamic (CAD) module that renders the current pose‑aligned 3D Gaussian Splatting map, computes a pixel‑wise inconsistency map, and thresholds it to obtain a dynamic mask. This mask is generated instantly, is category‑agnostic, and captures fine‑grained motion boundaries that optical‑flow or semantic‑segmentation pipelines often miss.
Once a dynamic region is identified, the system initiates a bidirectional tracklet: it propagates the object’s Gaussian representation backward and forward in time, thereby maintaining a continuous identity even through occlusions or rapid motion. Each dynamic object is modeled as a temporal Gaussian ensemble whose parameters (position, covariance, opacity, and spherical‑harmonic color coefficients) are incrementally updated. This temporal model captures both motion trajectories and appearance changes, enabling an online, incremental reconstruction of moving entities.
The static background is treated separately. After masking out dynamic pixels, the remaining static points are fed into a conventional 3DGS‑based SLAM pipeline (frame‑to‑model tracking, depth‑based alignment, and a ConvGRU refinement step). The static map thus evolves without contamination from moving objects, and previously occluded static surfaces are gradually filled in as the dynamic objects move away. The overall architecture consists of three tightly coupled components: (i) a tracking module that provides an initial pose estimate and refines it, (ii) the CAD module that detects inconsistencies and drives bidirectional object tracking, and (iii) a mapping module that maintains two decoupled Gaussian maps—one static, one dynamic.
Key contributions are: (1) a novel consistency‑based dynamic detection method that requires no semantic priors and works instantly; (2) a bidirectional temporal tracking scheme that yields complete sequence‑wise dynamic recognition; (3) a hybrid static‑dynamic mapping framework that employs a temporal Gaussian model for online dynamic object reconstruction while preserving a clean, progressively completed static map; and (4) extensive experiments on multiple real‑world dynamic datasets (e.g., TUM‑RGBD‑Dynamic, KITTI‑Raw) demonstrating state‑of‑the‑art pose accuracy (ATE < 2 cm) and high‑quality dense reconstructions (PSNR > 30 dB), surpassing recent NeRF‑SLAM, GS‑SLAM, DynaMoN, and WildGS‑SLAM baselines.
Limitations include reduced robustness for extremely fast, non‑rigid deformations (e.g., balloons) where Gaussian parameters may change abruptly, and the current reliance on RGB‑D input, which limits direct applicability to LiDAR‑only platforms. Future work aims to incorporate non‑rigid deformation models, multi‑sensor fusion, and hierarchical Gaussian management to scale to larger scenes while maintaining real‑time performance.
Comments & Academic Discussion
Loading comments...
Leave a Comment