Beyond Accuracy: A Stability-Aware Metric for Multi-Horizon Forecasting

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event as the forecast origin changes. We introduce the forecast accuracy and coherence score (forecast AC score for short) for measuring the quality of probabilistic multi-horizon forecasts in a way that accounts for both multi-horizon accuracy and stability. Our score additionally allows user-specified weights to balance accuracy and consistency requirements. As an example application, we implement the score as a differentiable objective function for training seasonal auto-regressive integrated models and evaluate it on the M4 Hourly benchmark dataset. Results demonstrate substantial improvements over traditional maximum likelihood estimation. Regarding stability, the AC-optimized model generated out-of-sample forecasts with 91.1% reduced vertical variance relative to the MLE-fitted model. In terms of accuracy, the AC-optimized model achieved considerable improvements for medium-to-long-horizon forecasts. While one-step-ahead forecasts exhibited a 7.5% increase in MAPE, all subsequent horizons experienced an improved accuracy as measured by MAPE of up to 26%. These results indicate that our metric successfully trains models to produce more stable and accurate multi-step forecasts in exchange for some degradation in one-step-ahead performance.

💡 Research Summary

The paper introduces a novel evaluation metric for probabilistic multi‑horizon forecasting that simultaneously accounts for predictive accuracy and temporal consistency, termed the forecast accuracy and coherence (AC) score. Traditional forecasting research has largely focused on accuracy‑only measures such as CRPS or MAPE, neglecting how forecasts evolve as the origin shifts. The authors formalize “forecast stability” in two dimensions: vertical stability (the variability of forecasts for the same target time across different origins) and horizontal stability (the smoothness of the forecast vector across horizons at a single origin).

Vertical stability is captured by arranging forecast samples into a matrix and examining anti‑diagonals, each representing a sequence of predictions for a fixed target time issued at successive origins. The variance of these sequences, averaged over all target times, quantifies vertical instability. Horizontal stability is measured analogously across rows, reflecting consistency of a multi‑step forecast vector. Both stability components are evaluated using the energy distance, a proper metric that incorporates location, spread, and shape of distributions.

Accuracy is measured with a weighted multivariate energy score. For each origin t, the true future vector y_true(t) (the next m observations) is compared to the predictive distribution f_t via the energy score, with a horizon‑specific weight vector w that can emphasize short‑term forecasts. The overall accuracy component Acc is the average of these scores across all origins.

Stability is assessed by comparing successive predictive distributions after marginalizing non‑overlapping horizons, thereby isolating the common target times. The squared weighted energy distance between the marginalized distributions of f_t and f_{t+1} is computed, and the average over all consecutive origin pairs yields the stability component Stb.

The final AC score is a convex combination: Score = λ·Acc + (1‑λ)·Stb, where λ∈

Beyond Accuracy: A Stability-Aware Metric for Multi-Horizon Forecasting

💡 Research Summary

Comments & Academic Discussion

Leave a Comment