Estimation of Fish Catch Using Sentinel-2, 3 and XGBoost-Kernel-Based Kernel Ridge Regression

Estimation of Fish Catch Using Sentinel-2, 3 and XGBoost-Kernel-Based Kernel Ridge Regression
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Oceanographic factors, such as sea surface temperature and upper-ocean dynamics, have a significant impact on fish distribution. Maintaining fisheries that contribute to global food security requires quantifying these connections. This study uses multispectral images from Sentinel-2 MSI and Sentinel-3 OLCI to estimate fish catch using an Extreme Gradient Boosting (XGBoost)-kernelized Kernel Ridge Regression (KRR) technique. According to model evaluation, the XGBoost-KRR framework achieves the strongest correlation and the lowest prediction error across both sensors, suggesting improved capacity to capture nonlinear ocean-fish connections. While Sentinel-2 MSI resolves finer-scale spatial variability, emphasizing localized ecological interactions, Sentinel-3 OLCI displays smoother spectral responses associated with poorer spatial resolution. By supporting sustainable ecosystem management and strengthening satellite-based fisheries assessment, the proposed approach advances SDGs 2 (Zero Hunger) and 14 (Life Below Water).


💡 Research Summary

This paper presents a novel approach for estimating fish catch by integrating multispectral satellite imagery from Sentinel‑2 MSI and Sentinel‑3 OLCI with an Extreme Gradient Boosting (XGBoost)‑based kernel applied to Kernel Ridge Regression (KRR). The authors argue that oceanographic variables such as sea‑surface temperature, chlorophyll concentration, and mixed‑layer depth strongly influence fish distribution, yet traditional statistical or tree‑based models either oversimplify the relationship or suffer from piecewise constant predictions. To address this, they formulate KRR with a regularization term λ‖w‖² to balance bias and variance, and replace conventional linear or radial‑basis‑function kernels with a data‑driven kernel derived from the leaf‑index embeddings of a trained XGBoost model. Each tree’s leaf index is one‑hot encoded, concatenated across all P trees, and the kernel matrix is computed as K = (1/P) ZZᵀ, where Z contains the embeddings for all training samples. This construction transfers the complex, non‑linear feature interactions learned by XGBoost into the Hilbert space of KRR, enabling a more expressive regression without sacrificing the closed‑form solution of KRR.

The empirical dataset consists of monthly logbook records of seabream fisheries in Taiwanese coastal waters (22°‑27° N, 117°‑126° E) up to December 2019, paired with coincident Sentinel‑2 (10 m) and Sentinel‑3 (300 m) observations. After filtering for cloud cover and clipping the catch values to the 10‑th–90‑th percentile, the authors select ten Sentinel‑2 reflectance bands (B2–B9) and six Sentinel‑3 radiance bands (Oa03–Oa10) that are most sensitive to oceanic biophysical properties. Violin plots reveal that Sentinel‑2 captures finer spatial heterogeneity, especially in red‑edge and near‑infrared bands, whereas Sentinel‑3 exhibits smoother, broader distributions due to spatial averaging.

Model performance is evaluated using Root Mean Square Error (RMSE), Pearson correlation coefficient (ρ), p‑value, and Kolmogorov‑Smirnov D‑value. For Sentinel‑2, the XGBoost‑kernel KRR (KRR‑XGB) achieves RMSE = 0.085, ρ = 0.924, and D‑value = 0.952, outperforming linear KRR (RMSE = 0.218, ρ ≈ ‑0.032) and RBF‑kernel KRR (RMSE = 0.210, ρ = 0.069) by 59‑61 % lower error and over 200 % higher correlation. For Sentinel‑3, KRR‑XGB yields RMSE = 0.116, ρ = 0.731, D‑value = 0.771, again surpassing the linear and RBF baselines (RMSE reductions of 27‑41 %). Correlation plots show tight agreement at low catch values and modest dispersion at higher catches, indicating reliable predictions across the observed range. Spatial maps of the estimated catch along Taiwan’s eastern coast display a clear gradient, with higher predicted catches near the shoreline, confirming that the satellite‑derived spectral signatures are linked to biologically productive zones.

In summary, the study demonstrates that (1) embedding XGBoost leaf structures as a kernel dramatically enhances KRR’s ability to model nonlinear ocean‑fish relationships, (2) Sentinel‑2’s high spatial resolution captures localized ecological interactions while Sentinel‑3 provides broader, smoother context, and (3) the combined framework yields substantially lower prediction errors than conventional kernels. The authors highlight the method’s relevance for real‑time fisheries monitoring, sustainable ecosystem management, and progress toward United Nations Sustainable Development Goals 2 (Zero Hunger) and 14 (Life Below Water).


Comments & Academic Discussion

Loading comments...

Leave a Comment