Decision-oriented benchmarking to transform AI weather forecast access: Application to the Indian monsoon
Artificial intelligence weather prediction (AIWP) models now often outperform traditional physics-based models on common metrics while requiring orders-of-magnitude less computing resources and time. Open-access AIWP models thus hold promise as transformational tools for helping low- and middle-income populations make decisions in the face of high-impact weather shocks. Yet, current approaches to evaluating AIWP models focus mainly on aggregated meteorological metrics without considering local stakeholders’ needs in decision-oriented, operational frameworks. Here, we introduce such a framework that connects meteorology, AI, and social sciences. As an example, we apply it to the 150-year-old problem of Indian monsoon forecasting, focusing on benefits to rain-fed agriculture, which is highly susceptible to climate change. AIWP models skillfully predict an agriculturally relevant onset index at regional scales weeks in advance when evaluated out-of-sample using deterministic and probabilistic metrics. This framework informed a government-led effort in 2025 to send 38 million Indian farmers AI-based monsoon onset forecasts, which captured an unusual weeks-long pause in monsoon progression. This decision-oriented benchmarking framework provides a key component of a blueprint for harnessing the power of AIWP models to help large vulnerable populations adapt to weather shocks in the face of climate variability and change.
💡 Research Summary
The paper introduces a decision‑oriented benchmarking framework that links meteorology, artificial‑intelligence weather prediction (AIWP) models, and development economics to evaluate forecast utility for real‑world decisions. Using the Indian monsoon as a case study, the authors focus on the timing of local monsoon onset—a critical variable for rain‑fed agriculture—by defining a region‑specific onset index based on the first five‑day consecutive rain period that does not revert to a dry spell within 30 days. This index is adapted to operational constraints by anchoring the 30‑day dry‑spell condition to the climatological median onset date declared by the India Meteorological Department (IMD).
Six state‑of‑the‑art global AIWP models (including NeuralGCM, GenCast, GraphCast, FuXi, FuXi‑S2S) and one leading numerical weather prediction (NWP) system (ECMWF IFS) are evaluated against IMD rain‑gauge observations over a long hindcast period (1965‑2024). The authors construct a baseline forecast using historical average onset dates on a 4°×4° grid, which represents a simple, locally understood reference that low‑resource agencies can readily implement.
Performance is assessed with a suite of decision‑relevant deterministic metrics—mean absolute error (MAE), miss rate (MR), and false‑alarm rate (FAR)—as well as probabilistic scores—ranked probability score (RPS), Brier skill score (BSS), and area under the ROC curve (AUC). Deterministic results show that most AIWP models outperform the climatological baseline for 1‑15 day lead times, reducing MAE by roughly two days and delivering lower MR and comparable FAR. At 16‑30 days, skill generally declines, but a few models (notably IFS and GenCast) retain statistically significant advantages. Probabilistic evaluation reveals that the IFS ensemble and two AIWP models (GenCast and NeuralGCM) achieve positive BSS and competitive AUC values out to 15 days, indicating that ensemble forecasts can be reliably used for risk‑averse agricultural decision‑making.
Because only one onset event occurs per year, the authors augment the test sample by generating hindcasts for the pre‑satellite era (1965‑1978). AIWP models can produce these hindcasts with the exact operational version, avoiding the configuration mismatches that plague traditional NWP hindcasts. This larger sample confirms that model skill is robust across decades, though baseline performance deteriorates for the earlier period, making relative skill appear stronger.
The benchmarking outcomes directly informed a government‑led initiative in 2025, where the Ministry of Agriculture and Farmers’ Welfare disseminated AI‑based monsoon onset forecasts to 38 million Indian farmers. The operational system blended two AIWP models using the identified skill metrics, successfully capturing an unusual multi‑week pause in monsoon progression and enabling timely agronomic advisories.
In sum, the study demonstrates that (1) a decision‑oriented benchmark, grounded in locally relevant indices and stakeholder‑centric error metrics, can objectively compare AIWP and NWP systems; (2) open‑source AIWP models can generate skillful short‑ to medium‑range onset forecasts at a fraction of the computational cost of traditional NWP; and (3) such benchmarks can guide large‑scale, impact‑focused deployments that improve climate adaptation for vulnerable populations in low‑ and middle‑income countries. The framework offers a replicable blueprint for other regions and sectors seeking to translate AI weather forecasts into actionable decision support.
Comments & Academic Discussion
Loading comments...
Leave a Comment