A Dual-Branch Framework for Semantic Change Detection with Boundary and Temporal Awareness

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Semantic Change Detection (SCD) aims to detect and categorize land-cover changes from bi-temporal remote sensing images. Existing methods often suffer from blurred boundaries and inadequate temporal modeling, limiting segmentation accuracy. To address these issues, we propose a Dual-Branch Framework for Semantic Change Detection with Boundary and Temporal Awareness, termed DBTANet. Specifically, we utilize a dual-branch Siamese encoder where a frozen SAM branch captures global semantic context and boundary priors, while a ResNet34 branch provides local spatial details, ensuring complementary feature representations. On this basis, we design a Bidirectional Temporal Awareness Module (BTAM) to aggregate multi-scale features and capture temporal dependencies in a symmetric manner. Furthermore, a Gaussian-smoothed Projection Module (GSPM) refines shallow SAM features, suppressing noise while enhancing edge information for boundary-aware constraints. Extensive experiments on two public benchmarks demonstrate that DBTANet effectively integrates global semantics, local details, temporal reasoning, and boundary awareness, achieving state-of-the-art performance.

💡 Research Summary

Semantic Change Detection (SCD) aims to locate and label land‑cover changes between two remote‑sensing images. Existing approaches often produce blurred change boundaries and fail to capture rich temporal dependencies, which limits their segmentation accuracy. To overcome these two shortcomings, the authors propose DBTANet, a Dual‑Branch framework with Boundary and Temporal Awareness. The architecture consists of a Siamese encoder with two parallel branches: a frozen Segment‑Anything‑Model (SAM) branch that supplies global semantic context and strong boundary priors, and a lightweight ResNet‑34 branch that preserves fine‑grained local spatial details. Feature fusion is performed at shallow and deep levels using learnable gates (α for shallow, β for deep) that balance the contributions of the two streams.

Because SAM’s shallow features contain high‑frequency noise, the authors introduce a Gaussian‑Smoothed Projection Module (GSPM). GSPM applies three sequential depthwise Gaussian convolution blocks with decreasing standard deviations (σ = 1.0, 0.8, 0.6), followed by a 1×1 projection and a residual connection. This progressively suppresses noise while retaining and sharpening edge information, yielding cleaner boundary cues for the downstream tasks.

Temporal modeling is handled by a Bidirectional Temporal Awareness Module (BTAM). For each time step (t1, t2), deep features from both branches are concatenated in both forward (t1→t2) and reverse (t2→t1) directions and fed into a Multi‑Scale Aggregation (MSA) block. MSA consists of parallel 1×1, 3×3 dilated (dilation = 2), and 5×5 convolutions, enabling the network to capture both fine‑scale and large‑scale change patterns. The two directional representations are then fused via an Efficient Channel Attention (ECA) module, combined with the absolute feature difference, and processed through residual blocks. This symmetric design allows the model to reason about complex temporal relationships without relying on simple differencing.

Three task‑specific decoders are attached: (1) a semantic segmentation decoder for each timestamp, (2) a change detection decoder that incorporates the BTAM output and a task‑interaction module (using semantic differences as constraints and a similarity loss for consistency), and (3) a boundary detection decoder that employs a Sobel‑based operator as an auxiliary supervision signal.

Experiments are conducted on two public SCD benchmarks: Landsat‑SCD (8,468 image pairs, 416 × 416) and SECOND (4,662 pairs, 512 × 512). The network is trained with AdamW (lr = 0.001), batch size = 8, on a single RTX 4090. Evaluation metrics include Overall Accuracy (OA), mean Intersection‑over‑Union (mIoU), Separated Kappa (SeK), and F1‑score. DBTANet achieves state‑of‑the‑art results: on Landsat‑SCD it reaches OA = 96.92 %, mIoU = 90.84 %, SeK = 65.72 %, F1 = 90.90 %, surpassing the previous best (BT‑HRSCD) by +0.57 % OA, +1.36 % mIoU, +4.13 % SeK, and +1.52 % F1. On SECOND it obtains OA = 88.23 %, mIoU = 73.74 %, SeK = 24.12 %, F1 = 64.08 %, again outperforming all baselines.

Ablation studies on SECOND demonstrate the individual contributions: adding SAM alone improves OA and mIoU modestly; incorporating GSPM further raises SeK, indicating better boundary quality; finally, integrating BTAM yields the highest performance, confirming that global‑local feature fusion, noise‑reduced boundary refinement, and bidirectional multi‑scale temporal modeling are complementary. Qualitative visualizations show sharper, more accurate change masks, especially in cluttered or subtle‑change scenarios where conventional CNN backbones produce blurred or incomplete results.

In summary, DBTANet successfully merges (i) global semantic priors from a frozen SAM, (ii) local detail from ResNet‑34, (iii) Gaussian‑based boundary denoising, and (iv) symmetric multi‑scale temporal reasoning. This combination addresses the two major weaknesses of prior SCD methods and delivers robust, high‑precision change maps across diverse datasets, making it a promising foundation for future high‑resolution, multi‑temporal remote‑sensing applications.

A Dual-Branch Framework for Semantic Change Detection with Boundary and Temporal Awareness

💡 Research Summary

Comments & Academic Discussion

Leave a Comment