Machine Learning-Based Prediction of Heat Index in Selected U.S. Cities
Heat stress has harmful effects that impact communities across the Unitedt States, particularly when high temperatures are accompanied by high humidity. The combined impact of temperature and humidity can be summarized by the heat index (HI). Current state-of-the-art numerical weather prediction models are often biased when forecasting temperature and humidity even within a 24-hour forecast lead time. This study explores the ability of machine learning (ML) models to accurately predict the next-day heat index using Random Forest and single-layer Gated Recurrent Unit (GRU) models in four locations across the United States. We find that Random Forest and GRU models perform reasonably well at all four selected locations. Mean absolute HI error ranges from 4.5 to 6.6 °F. All model versions have an accuracy rate exceeding 80% in three of the four locations in terms of successfully forecasting an extreme heat day, as indicated by a high afternoon HI. The GRU model achieves over 95% accuracy in these three locations. Model performance details vary by location. In Minneapolis and Portland, which have relatively few days with high HI values, models’ accuracy is high, but the recall and precision are generally very low. In contrast, Dallas, a location with many high HI days, shows moderately high accuracy, as well as extremely high recall and precision. These differences are likely due to distinct causes of heatwaves in different climatological regions of the United States, as reflected in the feature importance scores output by Random Forest models. The ML models designed in this study can be used to assist with local heat index forecasting and extreme heat warning issuance at minimal computational cost.
💡 Research Summary
This paper investigates the feasibility of using machine‑learning (ML) techniques to forecast the next‑day afternoon heat index (HI) for four climatically distinct U.S. cities: Portland (Pacific Northwest), Minneapolis (Great Plains/Northeast), Dallas (Gulf Plains), and Boston (Northeast). The motivation stems from documented biases in conventional numerical weather prediction (NWP) models for 2‑meter temperature and relative humidity within a 24‑hour lead time, biases that can translate into heat‑index errors of 10 °F or more and undermine local heat‑alert issuance.
Data and Predictors
The authors employ ERA5 reanalysis (1979‑2022) at hourly resolution, extracting variables at 10 a.m. local time on the day before the forecast day. Predictors are taken from four pressure levels: surface, 850 hPa, 700 hPa, and 500 hPa. At the surface they use temperature, dew point, relative humidity, wind speed and direction, and cumulative precipitation (3 a.m.–9 a.m.). At 850 hPa they include temperature, subsidence, and RH; at 700 hPa subsidence and RH; at 500 hPa geopotential height and subsidence. These variables capture local thermodynamics, boundary‑layer processes, and synoptic‑scale patterns (e.g., ridging, advection).
The target HI is computed for 4 p.m. using the National Weather Service (NWS) formulation (Rothfusz 1990). Two thresholds are defined per city based on the empirical HI distribution: 80 °F (“Caution”) for Portland and Boston, and 90 °F (“Extreme Caution”) for Dallas and Minneapolis. This city‑specific binning reflects the large regional variability in typical heat‑index values (e.g., >90 °F on 64.9 % of Dallas summer days versus only 4.1 % in Portland).
Modeling Approach
Two families of ML models are trained separately for each city:
-
Random Forest (RF) – both a regression model (predicting continuous HI) and a classifier (predicting whether the next‑day HI exceeds the city‑specific threshold). RF is chosen for its robustness, interpretability, and ability to rank predictor importance. Class imbalance (few extreme‑heat days) is addressed by oversampling the minority class and/or applying class weights.
-
Single‑layer Gated Recurrent Unit (GRU) – a recurrent neural network with gating mechanisms that mitigates vanishing gradients. Although GRUs are often used with multi‑step sequences, the authors employ a single‑step input (the 10 a.m. predictor set) because the forecast horizon is only one day and the predictors lack strong temporal autocorrelation. The GRU thus functions similarly to a feed‑forward network but retains the capacity to learn nonlinear interactions.
Performance Metrics
Model skill is evaluated using mean absolute error (MAE) for regression, and accuracy, recall, and precision for classification. Across all cities, MAE ranges from 4.5 °F to 6.6 °F, indicating that the ML forecasts are within a few degrees of the ERA5‑derived “truth.” Classification accuracy exceeds 80 % in three of the four locations; the GRU achieves >95 % accuracy in those three. However, recall and precision vary markedly with the prevalence of extreme‑heat days.
- Portland & Minneapolis (few extreme‑heat days): high overall accuracy but low recall/precision, reflecting the difficulty of detecting rare events.
- Dallas (many extreme‑heat days): both recall and precision are >90 %, demonstrating that abundant training examples enable the models to capture the underlying patterns.
Variable Importance and Physical Interpretation
RF importance scores reveal distinct regional drivers:
- Dallas: surface temperature and relative humidity dominate, with subsidence and 500 hPa geopotential height as secondary contributors, consistent with a heat‑wave regime driven by strong surface heating and upper‑level ridging.
- Portland & Boston: wind direction and speed emerge as key, indicating that advection of maritime air masses modulates heat stress.
- Minneapolis: vertical temperature gradients and subsidence are most influential, reflecting the role of synoptic‑scale high‑pressure blocking and associated subsidence in the Upper Midwest.
These insights align with climatological knowledge that heat‑wave mechanisms differ across the United States (e.g., moisture‑limited desert Southwest versus humidity‑rich Gulf Plains).
Computational Efficiency and Operational Potential
Both RF and GRU models are lightweight; training can be completed on a standard workstation, and inference requires negligible CPU time. This makes them attractive for local weather service offices that need rapid, low‑cost heat‑index forecasts to support public warnings.
Limitations and Future Work
The study acknowledges several constraints:
- Wind direction handling: treating 0° and 360° as far apart can introduce artifacts, though wind‑rose analysis suggested minimal overlap for the cities examined.
- Class imbalance: extreme‑heat days are scarce in some locations, limiting the classifier’s ability to learn rare patterns. Techniques such as SMOTE, cost‑sensitive learning, or synthetic extreme‑event generation could improve performance.
- Temporal depth: only a single prior‑day snapshot is used; incorporating multi‑day sequences or lagged variables might capture antecedent soil‑moisture or atmospheric memory effects.
- Model ensemble: combining RF and GRU outputs could leverage the strengths of both tree‑based interpretability and deep‑learning flexibility.
Conclusions
The paper demonstrates that relatively simple ML models, trained on high‑resolution reanalysis predictors, can reliably forecast next‑day heat index values and exceedance categories for diverse U.S. cities. Accuracy is comparable to, and in some cases surpasses, that of traditional NWP products for the same lead time, especially when local climatology provides sufficient extreme‑heat examples. Variable importance analysis offers physically meaningful explanations of regional heat‑wave drivers, supporting targeted mitigation strategies. The authors suggest that extending the approach to additional locations, incorporating richer temporal inputs, and integrating the models into operational warning pipelines represent promising next steps.
Comments & Academic Discussion
Loading comments...
Leave a Comment