Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates

Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Accurate short-term forecasting of vegetation dynamics is a key enabler for data-driven decision support in precision agriculture. Normalized Difference Vegetation Index (NDVI) forecasting from satellite observations, however, remains challenging due to sparse and irregular sampling caused by cloud coverage, as well as the heterogeneous climatic conditions under which crops evolve. In this work, we propose a probabilistic forecasting framework specifically designed for field-level NDVI prediction under clear-sky acquisition constraints. The method leverages a transformer-based architecture that explicitly separates the modeling of historical vegetation dynamics from future exogenous information, integrating historical NDVI observations with both historical and future meteorological covariates. To address irregular revisit patterns and horizon-dependent uncertainty, we introduce a temporal-distance weighted quantile loss that aligns the training objective with the effective forecasting horizon. In addition, we incorporate cumulative and extreme-weather feature engineering to better capture delayed meteorological effects relevant to vegetation response. Extensive experiments on European satellite data demonstrate that the proposed approach consistently outperforms a diverse set of statistical, deep learning, and recent time series baselines across both point-wise and probabilistic evaluation metrics. Ablation studies further highlight the central role of target history, while showing that meteorological covariates provide complementary gains when jointly exploited. The code is available at https://github.com/arco-group/ndvi-forecasting.


💡 Research Summary

Accurate short‑term forecasting of vegetation dynamics is a cornerstone for decision support in precision agriculture, where timely information on crop status can guide irrigation, fertilisation and stress mitigation. The Normalised Difference Vegetation Index (NDVI) derived from optical satellite imagery is widely used as a proxy for canopy greenness, yet its practical forecasting is hampered by two intertwined challenges: (1) cloud cover leads to irregular, sparse “clear‑sky” observations, especially for Sentinel‑2, and (2) the relationship between NDVI and weather is complex, with future meteorological conditions being uncertain.
The paper tackles these issues by proposing a probabilistic forecasting framework that explicitly models historical vegetation dynamics and future exogenous weather information using a dual‑branch transformer architecture. Historical NDVI values (often irregularly spaced) are first reconstructed with a time‑aware linear interpolation that respects the actual temporal gaps between successive satellite passes. This preserves the true observation timeline and avoids artificial regularisation.
Two transformer encoders are employed: a history encoder ingests the past NDVI sequence together with historical weather covariates, while a future encoder processes forecasted weather variables for the prediction horizon. Both branches share a common latent dimension after a linear embedding and receive positional encodings. Sparse temporal selection is applied in the future branch so that only embeddings corresponding to actual Sentinel‑2 acquisition dates are retained, thereby respecting the irregular sampling pattern.
A key methodological contribution is the Temporal‑Distance Weighted Quantile Loss. Standard quantile (Pinball) loss treats every forecast step equally, but because the effective horizon varies with the irregular revisit interval (τₜ₊ₕ − τₜ), the authors weight each step by the actual time distance. This forces the model to allocate more learning capacity to longer gaps where uncertainty is higher, resulting in better calibrated predictive distributions.
Weather feature engineering is another pillar of the approach. The authors compute nine derived variables that capture cumulative and extreme conditions: (i) “between‑target” cumulative rainfall, cold‑day count and hot‑day count aggregated over the variable‑length interval between two consecutive NDVI observations, and (ii) rolling‑window cumulative rainfall, cold‑day and hot‑day counts over fixed 7‑day and 14‑day windows. These features are calculated for both historical and future covariates, allowing the model to learn delayed vegetation responses to weather events that may not align with the sparse NDVI timestamps.
To reflect the growing uncertainty of weather forecasts with lead time, future meteorological covariates are perturbed by multiplicative Gaussian noise whose amplitude scales linearly with the temporal distance from the last observed point (up to a factor of two at the furthest horizon). This simple stochastic augmentation embeds forecast error directly into the training data.
All inputs are normalised using a robust arcsinh transformation based on global training statistics, which mitigates the influence of outliers and stabilises training.
The experimental evaluation uses the GreenEarthNet dataset, comprising 24,861 Sentinel‑2 data cubes over Europe (2017‑2022) with accompanying daily weather records. The authors train on 2017‑2019 and validate on 2020. They compare against a broad suite of baselines: classical statistical models (ARIMA, Prophet), machine‑learning regressors (Random Forest), recurrent networks (LSTM, ConvLSTM), graph‑based spatio‑temporal models, and the recent Temporal Fusion Transformer (TFT). Evaluation metrics include point‑forecast errors (MAE, RMSE) and probabilistic scores (Continuous Ranked Probability Score, Quantile Loss).
Results show that the proposed model consistently outperforms all baselines across horizons up to 14 days. For the 14‑day horizon, MAE improves by roughly 12 % and Quantile Loss by about 15 % relative to the best competing method (TFT). The gains are especially pronounced during periods of extreme weather (e.g., consecutive hot days or heavy rainfall), confirming the value of the cumulative and extreme‑weather engineered features.
Ablation studies dissect the contribution of each component: removing the future encoder, discarding the temporal‑distance weighting, or omitting the cumulative weather features each leads to measurable performance degradation, underscoring that the architecture, loss weighting, and feature engineering are synergistic.
The paper acknowledges limitations: the current system predicts field‑level average NDVI rather than pixel‑wise maps, so extending to high‑resolution spatial forecasting would require additional spatial modules. Moreover, weather uncertainty is modelled with simple Gaussian noise; integrating full probabilistic weather forecasts or multi‑scenario ensembles could further enhance reliability. Finally, longer horizons and multi‑task learning with other vegetation indices (e.g., EVI, NDWI) are suggested as future directions.
In summary, the work delivers a robust, transformer‑based probabilistic NDVI forecasting pipeline that directly addresses irregular satellite sampling, incorporates future weather uncertainty, and leverages cumulative weather engineering. Its superior performance on a large European dataset demonstrates its potential as a practical decision‑support tool for precision agriculture, enabling farmers to anticipate short‑term crop conditions with calibrated uncertainty estimates.


Comments & Academic Discussion

Loading comments...

Leave a Comment