Scene-aware SAR ship detection guided by unsupervised sea-land segmentation

Scene-aware SAR ship detection guided by unsupervised sea-land segmentation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

DL based Synthetic Aperture Radar (SAR) ship detection has tremendous advantages in numerous areas. However, it still faces some problems, such as the lack of prior knowledge, which seriously affects detection accuracy. In order to solve this problem, we propose a scene-aware SAR ship detection method based on unsupervised sea-land segmentation. This method follows a classical two-stage framework and is enhanced by two models: the unsupervised land and sea segmentation module (ULSM) and the land attention suppression module (LASM). ULSM and LASM can adaptively guide the network to reduce attention on land according to the type of scenes (inshore scene and offshore scene) and add prior knowledge (sea land segmentation information) to the network, thereby reducing the network’s attention to land directly and enhancing offshore detection performance relatively. This increases the accuracy of ship detection and enhances the interpretability of the model. Specifically, in consideration of the lack of land sea segmentation labels in existing deep learning-based SAR ship detection datasets, ULSM uses an unsupervised approach to classify the input data scene into inshore and offshore types and performs sea-land segmentation for inshore scenes. LASM uses the sea-land segmentation information as prior knowledge to reduce the network’s attention to land. We conducted our experiments using the publicly available SSDD dataset, which demonstrated the effectiveness of our network.


💡 Research Summary

This paper addresses a critical limitation of current deep‑learning‑based synthetic aperture radar (SAR) ship detection methods: the lack of explicit prior knowledge about the sea‑land layout, which often leads to false detections on land and reduced accuracy in coastal (inshore) scenes. To remedy this, the authors propose a scene‑aware two‑stage detection framework that integrates unsupervised sea‑land segmentation and a novel attention‑suppression mechanism. The pipeline builds upon the classic Faster‑RCNN architecture with a ResNet‑50 backbone, but inserts two new modules: an Unsupervised Land‑Sea Segmentation Module (ULSM) before feature extraction and a Land Attention Suppression Module (LASM) after feature extraction.

ULSM first extracts high‑level features from each SAR image using ResNet‑50, then applies K‑means clustering (k = 2) to automatically separate the dataset into “inshore” (coastal) and “offshore” (open‑sea) scenes. The smaller cluster is assumed to be inshore because coastal images are typically fewer in number. For the inshore subset only, the traditional Otsu thresholding method is employed to generate a binary sea‑land mask: pixels below the optimal gray‑level threshold are labeled as land (0), those above as sea (1). This unsupervised approach eliminates the need for costly pixel‑level annotations, which are scarce in public SAR ship datasets.

LASM takes the sea‑land mask and the corresponding feature map of an inshore image and computes a set of suppression weights. After a 1×1 convolution reduces the feature dimension to 256, average pooling yields a 7 × 7 × 256 tensor, which is flattened to a 12 544‑dimensional vector. Two fully‑connected layers with Tanh activations generate a scalar λ that modulates the mask M via the formula A = 1 − (M · λ). Consequently, land regions (M = 0) receive near‑zero attention, while sea regions (M = 1) retain most of their original activation. The weighted feature map is then up‑sampled to match the original spatial size and multiplied element‑wise with the initial feature map before being fed to the Region Proposal Network (RPN) and ROI heads. By suppressing land‑related activations at the feature level, the detector becomes more sensitive to ships on water and less prone to confuse land structures with vessels.

The authors evaluate the method on the publicly available SSDD dataset, resizing images to 512 × 512 pixels. Training uses a learning rate of 0.02, momentum of 0.9, weight decay of 0.001, and runs for 12 epochs on an RTX 3060 GPU. Compared with three strong baselines—Faster‑RCNN, Deformable Convolutional Networks (DCN), and Grid‑RCNN—the proposed approach achieves the highest overall mean Average Precision (mAP = 63.5 %) and, more strikingly, the best mAP@50 (91.8 %). This is a 2.6 % absolute gain over the second‑best DCN (89.8 %). The method also excels on small‑ship detection (mAP_S = 63.8 %), indicating that the attention‑suppression strategy benefits objects of all scales. Qualitative results show fewer false negatives and robust performance in both inshore (complex coastal clutter) and offshore (open sea) scenarios.

An ablation study isolates the contribution of the K‑means clustering step. When Otsu thresholding is applied to the entire dataset without prior scene classification (ID‑2), performance drops to mAP = 62.2 % and mAP@50 = 88.4 %. The degradation is attributed to inappropriate sea‑land masks being generated for offshore images, where land segmentation is unnecessary and can mistakenly label large ships as land, thereby corrupting the attention mechanism.

In summary, the paper makes three key contributions: (1) an entirely unsupervised pipeline for generating scene‑specific sea‑land masks, (2) a mask‑guided attention suppression module that directly reduces land‑related activations in the feature space, and (3) comprehensive experiments demonstrating that these innovations yield measurable gains in SAR ship detection accuracy, especially in challenging coastal environments. Limitations include reliance on Otsu’s global threshold, which may struggle with noisy or highly heterogeneous sea‑land boundaries, and the fixed two‑cluster K‑means assumption that may not capture more nuanced scene variations. Future work could explore more sophisticated unsupervised clustering (e.g., deep embedded clustering), multi‑band SAR data, and model compression for real‑time deployment. Overall, the study offers a practical and effective strategy for incorporating prior geographic knowledge into deep SAR object detectors without requiring additional annotation effort.


Comments & Academic Discussion

Loading comments...

Leave a Comment