Correcting Measurement Error and Zero Inflation in Functional Covariates for Scalar-on-Function Quantile Regression
Wearable devices collect time-varying biobehavioral data, offering opportunities to investigate how behaviors influence health outcomes. However, these data often contain measurement error and excess zeros (due to nonwear, sedentary behavior, or connectivity issues), each characterized by subject-specific distributions. Current statistical methods fail to address these issues simultaneously. We introduce a novel modeling framework for zero-inflated and error-prone functional data by incorporating a subject-specific time-varying validity indicator that explicitly distinguishes structural zeros from intrinsic values. We iteratively estimate the latent functional covariates and zero-inflation probabilities via maximum likelihood, using basis expansions and linear mixed models to adjust for measurement error. To assess the effects of the recovered latent covariates, we apply joint quantile regression across multiple quantile levels. Through extensive simulations, we demonstrate that our approach significantly improves estimation accuracy over methods that only address measurement error, and joint estimation yields substantial improvements compared with fitting separate quantile regressions. Applied to a childhood obesity study, our approach effectively corrects for zero inflation and measurement error in step counts, yielding results that closely align with energy expenditure and supporting their use as a proxy for physical activity.
💡 Research Summary
The paper tackles a pervasive problem in modern digital health research: wearable devices generate time‑varying functional data that are simultaneously contaminated by measurement error and an excess of zeros (zero‑inflation) arising from non‑wear periods, sedentary behavior, or connectivity failures. Existing statistical tools typically address either measurement error or zero‑inflation, but not both, leading to biased inference when these phenomena co‑occur.
To resolve this, the authors propose a unified modeling framework that introduces a subject‑specific, time‑varying validity indicator (V_{ij}(t)). The observed measurement (W_{ij}(t)) is expressed as (W_{ij}(t)=V_{ij}(t)W^{}{ij}(t)), where (V{ij}(t)\sim\text{Bernoulli}(1-\pi_i(t))). The probability (\pi_i(t)) captures structural zeros (device off or non‑wear) and is allowed to vary across subjects and be piecewise constant over pre‑defined time segments. Conditional on the device being “on”, the latent surrogate measurement (W^{}{ij}(t)) follows an exponential‑family distribution with mean equal to the true underlying functional covariate (X_i(t)). This construction yields a measurement error term (U{ij}(t)=W^{*}_{ij}(t)-X_i(t)) with zero conditional mean, while preserving the possibility of correlated, heteroscedastic error structures across time and subjects.
The latent functional covariate (X_i(t)) and the zero‑inflation probabilities (\pi_i(t)) are estimated jointly via an iterative maximum‑likelihood algorithm akin to an EM procedure. Basis expansions (e.g., B‑splines or Fourier bases) combined with linear mixed‑effects models provide a flexible representation of (X_i(t)) and naturally accommodate within‑subject temporal correlation and subject‑specific error variance components.
Having recovered the latent functional predictor, the authors embed it in a scalar‑on‑function quantile regression (SoFQR) model:
\
Comments & Academic Discussion
Loading comments...
Leave a Comment