Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning

Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present a self-supervised machine learning framework for detecting and mapping the severity and speciation of harmful algal blooms (HABs) using multi-sensor satellite data. By fusing reflectance data from operational polar-orbiting satellite-based instruments (VIIRS, MODIS, OLCI, and OCI) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning and hierarchical deep clustering to segment phytoplankton cell abundance and species into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karena brevis, and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in environments where ground truth observations are limited, while enabling exploratory analysis via hierarchical embeddings - a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.


💡 Research Summary

This paper introduces a novel self-supervised machine learning framework named SIT-FUSE (Segmentation, Instance Tracking, and data FUsion Using multi-SEnsor imagery) designed to revolutionize the monitoring of Harmful Algal Blooms (HABs). The core challenge addressed is the traditional reliance on instrument-specific, manually-tuned algorithms or supervised deep learning models that require vast amounts of labeled data, which are scarce and costly to obtain for global aquatic monitoring.

The SIT-FUSE framework overcomes these limitations by employing self-supervised representation learning. It trains on large volumes of unlabeled satellite data—specifically, atmospherically-corrected surface reflectance—from multiple polar-orbiting sensors. For the primary validation period (2018-2019), data from multispectral instruments (VIIRS, MODIS, OLCI) were fused with coarser-resolution red Solar-Induced Fluorescence (SIF) data from TROPOMI. A separate test case utilized hyperspectral data from NASA’s new PACE-OCI instrument (2024-2025). All data were standardized to a 7km resolution. The framework uses deep learning encoders (like Deep Belief Networks or Vision Transformers) to learn meaningful feature representations directly from this fused data stream without any per-pixel labels for HABs.

Following representation learning, a hierarchical deep clustering technique segments the learned features into interpretable classes. This process automatically identifies and maps both HAB severity (phytoplankton cell abundance) and, crucially, speciation—distinguishing between different harmful phytoplankton taxa. The framework was applied and validated in two critical HAB-prone regions: the Gulf of Mexico (focusing on Karenia brevis red tides) and Southern California waters (focusing on toxic Pseudo-nitzschia spp.). Validation against in-situ phytoplankton cell count data from 2018 to 2025 showed strong agreement for total phytoplankton, K. brevis, and Pseudo-nitzschia spp., demonstrating the model’s effectiveness even in complex coastal waters.

The study’s key innovations and conclusions are manifold. First, it demonstrates instrument and resolution flexibility, showing that a single framework can integrate data from diverse multispectral and hyperspectral sensors without retraining, a significant step towards sustainable, long-term monitoring as satellite constellations evolve. Second, it achieves label efficiency, bypassing the need for massive, instrument-specific labeled datasets, which makes scalable global monitoring feasible. Third, by using raw surface reflectance alongside SIF, it captures latent patterns that might be missed by using only derived ocean color products like chlorophyll-a. Finally, the hierarchical embeddings produced by the model enable exploratory data analysis, allowing scientists to discover new patterns and relationships within the data.

In summary, this work successfully operationalizes a self-supervised learning approach for a critical Earth science application. The SIT-FUSE framework provides a scalable, flexible, and powerful tool for HAB detection and speciation, with the potential to expand to other global aquatic biogeochemistry parameters, marking a critical advancement towards the operational use of self-supervised learning in satellite remote sensing.


Comments & Academic Discussion

Loading comments...

Leave a Comment