Clustering Methods for Identifying and Modelling Areas with Similar Temperature Variations
This paper proposes a novel data-driven approach for identifying and modelling areas with similar temperature variations throufigureh clustering and Space-Time AutoRegressive (STAR) models. Using annual temperature data from 168 countries (1901-2022), we apply three clustering methods based on (i) warming rates, (ii) annual temperature variations, and (iii) persistence of variation signs, using Euclidean and Hamming distances. These clusters are then employed to construct alternative spatial weight matrices for STAR models. Empirical results show that distance-based STAR models outperform classical contiguity-based ones, both in-sample and out-of-sample, with the Hamming distance-based STAR model achieving the best predictive accuracy. The study demonstrates that using statistical similarity rather than geographical proximity improves the modelling of global temperature dynamics, suggesting broader applicability to other environmental and socioeconomic datasets.
💡 Research Summary
The paper presents a data‑driven framework that combines clustering techniques with Space‑Time AutoRegressive (STAR) modeling to improve the representation and prediction of global temperature dynamics. Using an extensive dataset of annual average temperatures for 168 countries spanning 1901‑2022 (122 years, 20,496 observations), the author extracts three distinct features from each country’s time series: (i) the linear warming rate (slope of a simple regression), (ii) the magnitude of annual temperature fluctuations (first‑difference series), and (iii) the persistence of the sign of those fluctuations (binary increase/decrease series).
For each feature a separate clustering is performed.
- Clustering A (warming rates) uses Euclidean distance on the estimated slopes. An agglomerative hierarchical algorithm with average linkage yields four meaningful groups—“very high”, “high”, “medium”, and “low” warming—plus a “null” group of six countries whose slopes are statistically indistinguishable from zero. The groups map neatly onto geography: Europe, Asia and the Eurasian block dominate the high‑rate clusters, while South America and Oceania contain most low‑rate cases.
- Clustering B (temperature variations) computes Euclidean distance on the vector of first differences. Five clusters emerge, including a large heterogeneous “Miscellaneous” cluster (119 countries) and three idiosyncratic nations (Canada, Iceland, Russia). The clusters reflect regional patterns such as a Middle‑East core, a Central‑Asia core, and two distinct European sub‑regions.
- Clustering C (sign persistence) encodes each year‑to‑year change as 0/1 and measures dissimilarity with the Hamming distance. This produces 12 relatively homogeneous clusters and 20 outliers, revealing clear geographic coherence (e.g., a pan‑European cluster, four Asian sub‑clusters, two African clusters, and a split between North‑American and South‑American groups).
The clusters are then used to construct alternative spatial weight matrices for STAR models. Instead of the conventional binary contiguity matrix, the author builds weight matrices based on (i) Euclidean distance derived from the warming‑rate clusters and (ii) Hamming distance derived from the sign‑persistence clusters. The STAR specification is
yₜ = ρ W yₜ₋₁ + Φ yₜ₋₁ + εₜ,
where ρ captures spatial dependence, Φ the temporal autoregressive component, and εₜ white noise. Parameters are estimated by maximum likelihood.
Model performance is evaluated both in‑sample (goodness‑of‑fit, AIC/BIC) and out‑of‑sample (forecasting 2020‑2022). All distance‑based STAR models outperform the baseline contiguity‑based STAR model. The Hamming‑distance‑based STAR achieves the lowest mean squared error (MSE) and mean absolute error (MAE), indicating that the persistence of temperature‑change signs provides valuable information about spatial spillovers that pure geographic adjacency misses.
The paper also analyses the composition of each cluster. High‑warming clusters are concentrated in Europe, Asia, and the Eurasian region, consistent with Arctic amplification and high greenhouse‑gas emissions. Low‑warming clusters are found in South America and Oceania. The sign‑persistence clusters further disaggregate the “Miscellaneous” group from Clustering B, showing that countries with similar variance can have very different directional dynamics.
Key contributions are: (1) a multi‑facet clustering approach that captures distinct aspects of temperature dynamics; (2) the integration of statistically derived spatial weights into STAR models, leading to superior predictive performance; (3) empirical evidence that statistical similarity can be a more effective basis for spatial interaction than simple geographic contiguity. The study suggests that policymakers could use these statistically defined regions to design more targeted climate‑adaptation strategies.
Future research directions include extending the framework to other climate variables (precipitation, extreme events), incorporating socioeconomic indicators, and testing non‑linear or non‑stationary spatial‑time models such as spatial GARCH or graph neural networks to further enhance forecasting accuracy.
Comments & Academic Discussion
Loading comments...
Leave a Comment