End-to-end Cloud Segmentation in High-Resolution Multispectral Satellite Imagery Using Deep Learning

End-to-end Cloud Segmentation in High-Resolution Multispectral Satellite   Imagery Using Deep Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Segmenting clouds in high-resolution satellite images is an arduous and challenging task due to the many types of geographies and clouds a satellite can capture. Therefore, it needs to be automated and optimized, specially for those who regularly process great amounts of satellite images, such as governmental institutions. In that sense, the contribution of this work is twofold: We present the CloudPeru2 dataset, consisting of 22,400 images of 512x512 pixels and their respective hand-drawn cloud masks, as well as the proposal of an end-to-end segmentation method for clouds using a Convolutional Neural Network (CNN) based on the Deeplab v3+ architecture. The results over the test set achieved an accuracy of 96.62%, precision of 96.46%, specificity of 98.53%, and sensitivity of 96.72% which is superior to the compared methods.


💡 Research Summary

This paper presents a comprehensive study on automating cloud segmentation in high-resolution multispectral satellite imagery using deep learning, addressing a critical need for institutions like Peru’s National Commission for Aerospace Research and Development (CONIDA) that process large volumes of daily satellite data.

The authors make a dual contribution. First, they introduce and publicly release the CloudPeru2 dataset, a significant resource for the remote sensing community. It consists of 22,400 image-mask pairs, created by extracting 2,800 unique 512x512 pixel patches from 153 diverse PERUSAT-1 satellite scenes (featuring deserts, snowy mountains, forests, urban areas, and oceans) and applying data augmentation (rotations and horizontal flips). Each patch has a meticulously hand-drawn cloud mask, making it suitable for semantic segmentation tasks, a substantial advancement over their previous classification-focused CloudPeru dataset.

Second, they propose an end-to-end convolutional neural network (CNN) model for semantic cloud segmentation. The architecture is based on the powerful Deeplab v3+ framework but is adapted for satellite imagery by modifying the input to accept four spectral channels: Red, Green, Blue, and Near-Infrared (NIR). This allows the model to leverage crucial spectral information, such as the distinct reflectance of vegetation in the NIR band, to better discriminate clouds from other features. The network employs an encoder-ASPP-decoder structure to capture multi-scale contextual information and refine object boundaries. It utilizes efficient building blocks like Inverted Residual Units and Atrous Separable Convolutions to balance performance and computational cost.

The proposed method was rigorously evaluated against four other cloud detection techniques: a progressive refinement scheme, a traditional Artificial Neural Network (ANN) using handcrafted features, a superpixel-based CNN, and the recent CloudNet architecture. On the validation set, the proposed model achieved superior scores across all metrics: 97.50% Accuracy, 96.45% Precision, 98.46% Sensitivity/Recall, and 96.58% Specificity. Visual comparisons further confirmed its robustness, particularly in challenging scenarios like distinguishing bright clouds from snow. On a held-out test set, it maintained high performance with 96.62% Accuracy. An important practical finding was that despite having more parameters, the proposed model was more memory and computationally efficient during training than the compared CloudNet, making it more suitable for large-scale applications.

To process full-scale satellite scenes (often >6000 pixels wide), the authors implemented a sliding window approach with overlap. The model generates a probability mask for each 512x512 window, and overlapping regions are fused by taking the maximum probability value to ensure smooth transitions, followed by a final thresholding step at 0.5.

In conclusion, this work successfully demonstrates a practical, high-performance deep learning solution for cloud segmentation. It provides a valuable public dataset, a well-designed and efficient model architecture that outperforms existing methods, and a complete processing pipeline. The technology has already been integrated into a user-friendly tool for CONIDA, enabling the automated and rapid processing of hundreds of satellite images, thereby translating research into tangible operational utility.


Comments & Academic Discussion

Loading comments...

Leave a Comment