Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study
The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in cross-validation. The Demographic and Health Surveys (DHS) data from Bangladesh for 2011-2022, with n = 33,962, are used in this paper. We trained the model on (2011-2014) data, validated it on 2017 data, and tested it on 2022 data. Eight years after the initial test of the model, a genetic algorithm-based Neural Architecture Search found a single-layer neural architecture (with 64 units) to be superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). Additionally, through a detailed fairness audit, we identified an overall “Socioeconomic Predictive Gradient,” with a positive correlation between regional poverty level (r = -0.62) and the algorithm’s AUC. In addition, we found that the model performed at its highest levels in the least affluent divisions (AUC 0.74) and decreased dramatically in the wealthiest divisions (AUC 0.66). These findings suggest that the model is identifying areas with the greatest need for intervention. Our model would identify approximately 1300 additional at-risk children annually than a Gradient Boosting model when screened at the 10% level and validated using SHAP values and Platt Calibration, and therefore provide a robust, production-ready computational phenotype for targeted maternal and child health interventions.
💡 Research Summary
This study tackles the persistent problem of “look‑ahead bias” in machine‑learning models for child mortality by employing a strict temporal validation framework using Bangladesh Demographic and Health Survey (DHS) data from 2011, 2014, 2017, and 2022. A total of 33,962 birth records (with under‑five mortality as a binary outcome) were split so that training used only 2011‑2014 data (≈14,380 births), validation used the 2017 cohort (≈8,044 births), and testing was performed on the completely unseen 2022 cohort (≈11,538 births). This eight‑year separation mimics real‑world deployment where models are built on historical data and later applied to future populations, thereby eliminating the optimistic performance inflation typical of random k‑fold cross‑validation.
Feature engineering was guided by epidemiologic knowledge: raw DHS items (>50) were collapsed into 31 clinically meaningful categories covering maternal age, education, household wealth quintile, urban/rural residence, antenatal care adequacy, delivery location, skilled birth attendance, preceding birth interval, birth order, and perceived birth size. This domain‑driven reduction preserved interpretability while providing sufficient non‑linear signal for machine learning.
Model development leveraged a genetic‑algorithm‑based Neural Architecture Search (NAS). The search space included 1‑5 hidden layers, layer widths of 16, 32, 64, or 128 units, four activation functions (ReLU, ELU, SELU, tanh), dropout rates sampled uniformly between 0 and 0.5, and optional batch normalization. Starting with a population of 20 architectures, each candidate was trained for 30 epochs on the validation set, ranked by AUROC, and the top five were kept as elites. Crossover and mutation (10% mutation rate) generated new candidates over 15 generations. The optimal architecture turned out to be surprisingly simple: a single hidden layer with 64 units, ELU activation, 30 % dropout, and batch normalization.
For comparison, seven baseline models were tuned under the same computational budget: logistic regression (L2), random forest (500 trees), XGBoost, LightGBM, TabNet, and ResNet. On the held‑out 2022 test set, the NAS model achieved AUROC 0.766
Comments & Academic Discussion
Loading comments...
Leave a Comment