Compatibility of Missing Data Handling Methods across the Stages of Producing Clinical Prediction Models
Missing data is a challenge when developing, validating and deploying clinical prediction models (CPMs). Traditionally, decisions concerning missing data handling during CPM development and validation havent accounted for whether missingness is allowed at deployment. We hypothesised that the missing data approach used during model development should optimise model performance upon deployment, whilst the approach used during model validation should yield unbiased predictive performance estimates upon deployment; we term this compatibility. We aimed to determine which combinations of missing data handling methods across the CPM life cycle are compatible. We considered scenarios where CPMs are intended to be deployed with missing data allowed or not, and we evaluated the impact of that choice on earlier modelling decisions. Through a simulation study and an empirical analysis of thoracic surgery data, we compared CPMs developed and validated using combinations of complete case analysis, mean imputation, single regression imputation, multiple imputation, and pattern sub-modelling. If planning to deploy a CPM without allowing missing data, then development and validation should use multiple imputation when required. Where missingness is allowed at deployment, the same imputation method must be used during development and validation. Commonly used combinations of missing data handling methods result in biased predictive performance estimates.
💡 Research Summary
This paper addresses a critical yet under‑explored aspect of clinical prediction model (CPM) development: the compatibility of missing‑data handling methods across the entire model life‑cycle, from development through validation to deployment. The authors introduce two guiding principles. First, the method used to handle missing predictor values during model development should minimise degradation of model performance when the model is deployed under a specific missing‑data strategy. Second, the method employed during validation should provide an unbiased estimate of the model’s predictive performance under the same deployment conditions. When both principles are satisfied, the authors deem the combination of methods “compatible.”
Five missing‑data strategies are examined: (1) complete case analysis (CCA), (2) mean/mode imputation, (3) single deterministic regression imputation (RI), (4) multiple imputation by chained equations (MI) – both with and without inclusion of the outcome in the imputation model, and (5) pattern sub‑modeling (PSM), which fits separate CPMs for each observed predictor pattern. The authors construct a comprehensive simulation framework that varies the missingness mechanism (MCAR, MAR, MNAR), the proportion of missing data (10 %–40 %), and the intended deployment scenario (allowing missing data versus requiring complete data). For each simulated dataset, CPMs are built using each of the five methods, then validated under all admissible combinations of handling strategies. Performance is assessed via discrimination (AUROC), calibration (calibration slope and intercept), and overall accuracy (Brier score). Bias is quantified as the difference between the true performance (known from the data‑generating process) and the performance estimated in the validation step.
Key simulation findings are: (i) when the model will be deployed in a setting that does not permit missing predictor values, the only compatible approach is to use multiple imputation (including the outcome) both at development and validation. All other combinations produce systematic over‑ or under‑estimation of performance. (ii) When the deployment strategy allows missing predictors, compatibility requires that the same imputation technique be applied consistently in development and validation. For example, a model built with single regression imputation remains unbiased only if the same RI model is used to impute missing values in the validation set; mixing MI and RI introduces appreciable bias. (iii) Pattern sub‑models perform well when the number of missingness patterns is moderate and each pattern contains sufficient cases, but they become unstable with sparse patterns. (iv) Mean/mode imputation, while simple, leads to substantial bias under MAR and MNAR mechanisms because it ignores variability and can distort calibration.
To illustrate these principles in a real‑world context, the authors analyse a thoracic surgery registry comprising patients undergoing lung resection. They develop CPMs for 30‑day mortality using each missing‑data method and then externally validate them on a temporally distinct cohort. The empirical results mirror the simulation: under a “missing‑allowed” deployment plan, using different handling methods between development and validation (e.g., MI at development, CCA at validation) inflates AUROC by up to 0.07 and mis‑calibrates risk estimates. Conversely, applying the identical method in both stages yields performance estimates within 0.01 of the true values.
The discussion highlights practical considerations. Multiple imputation, though statistically optimal for the “no‑missing‑allowed” scenario, demands considerable computational resources and may be infeasible for real‑time clinical decision support systems. In such cases, deterministic regression imputation, pre‑trained on the development data and frozen for deployment, offers a pragmatic compromise provided the deployment strategy tolerates missingness. Conversely, when a health system mandates complete data entry before risk calculation, embedding the MI procedure (including the outcome) within the model package ensures congeniality and unbiased performance reporting.
Overall, the study warns against the common practice of mixing missing‑data strategies across CPM stages—a practice that can lead to misleading performance claims and suboptimal patient care. The authors recommend that investigators explicitly define the intended deployment missing‑data policy at the outset and align development and validation methods accordingly. Future work should extend the compatibility framework to high‑dimensional predictors, time‑to‑event outcomes, and non‑tabular data (e.g., imaging), as well as develop automated pipelines that enforce compatible missing‑data handling throughout the CPM lifecycle.
Comments & Academic Discussion
Loading comments...
Leave a Comment