Missing At Random as Covariate Shift: Correcting Bias in Iterative Imputation
Accurate imputation of missing data is critical to downstream machine learning performance. We formulate missing data imputation as a risk minimisation problem, which highlights a covariate shift between the observed and unobserved data distributions. This covariate shift induced bias is not accounted for by popular imputation methods and leads to suboptimal performance. In this paper, we derive theoretically valid importance weights that correct for the induced distributional bias. Furthermore, we propose a novel imputation algorithm that jointly estimates both the importance weights and imputation models, enabling bias correction throughout the imputation process. Empirical results across benchmark datasets show reductions in root mean squared error and Wasserstein distance of up to 7% and 20%, respectively, compared to otherwise identical unweighted methods.
💡 Research Summary
The paper “Missing At Random as Covariate Shift: Correcting Bias in Iterative Imputation” reframes the problem of imputing missing data as a risk‑minimisation task and shows that, under the Missing‑At‑Random (MAR) assumption, the mechanism that generates missingness itself creates a covariate‑shift between the distribution of observed entries and that of the unobserved entries. This shift is analogous to the classic covariate‑shift setting where the training marginal p_train(x) differs from the test marginal p_test(x) while the conditional p(y|x) remains unchanged. In the context of imputation, the “output” is the missing variable X_i, the “predictors” are the observed components X_obs together with the missingness indicators of the other variables R_{¬i}, and the two sub‑populations correspond to rows where X_i is observed (R_i=1) and rows where it is missing (R_i=0).
The authors derive an importance‑weighting factor
w_i(x_obs, r_{¬i}) = p(x_obs, r_{¬i} | R_i=0) / p(x_obs, r_{¬i} | R_i=1)
which exactly re‑weights the observed data so that empirical risk minimisation on the weighted observed sample targets the true risk on the missing‑value distribution. They prove that, for mean‑squared‑error loss, the overall imputation risk decomposes into a sum of coordinate‑wise risks J_i(g_i) and that each J_i can be expressed using only observed data if the weights w_i are applied. Ignoring these weights, as most existing iterative imputation methods (MICE, MissForest, HyperImpute) do, implicitly assumes no covariate shift and therefore yields biased imputations.
To operationalise this insight, the paper proposes a weighted iterative imputation algorithm that fits naturally into the round‑robin framework used by popular methods. The algorithm proceeds as follows:
- Initialisation – Fill missing entries with a simple rule (e.g., column means) to obtain a fully completed dataset.
- Weight Estimation (Algorithm 1) – For each target column i, train a binary classifier η_i that predicts the probability of being observed, η_i( x_{¬i} ) ≈ P(R_i=1 | x_{¬i}). The importance weight for each row is then estimated as \hat w_i = (1‑η_i) / η_i, which is proportional to the true density‑ratio w_i by Bayes’ rule. This step uses standard density‑ratio estimation techniques (logistic regression, tree‑based classifiers, etc.).
- Weighted Model Fitting – Using the current imputed covariates as predictors, fit a conditional model g_i (linear regression, random forest, neural net, etc.) that minimises the weighted MSE loss
E
Comments & Academic Discussion
Loading comments...
Leave a Comment