Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome
For infectious diseases, characterizing symptom duration is of clinical and public health importance. Symptom duration may be assessed by surveying infected individuals and querying symptom status at the time of survey response. For example, in a SARS-CoV-2 testing program at the University of Washington, participants were surveyed at least $28$ days after testing positive and asked to report current symptom status. This study design yielded current status data: outcome measurements for each respondent consisted only of the time of survey response and a binary indicator of whether symptoms had resolved by that time. Such study design benefits from limited risk of recall bias, but analyzing the resulting data necessitates tailored statistical tools. Here, we review methods for current status data and describe a novel application of modern nonparametric techniques to this setting. The proposed approach is valid under weaker assumptions compared to existing methods, allows use of flexible machine learning tools, and handles potential survey nonresponse. From the university study, under an assumption that the survey response time is conditionally independent of symptom resolution time within strata of measured covariates, we estimate that 19% of participants experienced ongoing symptoms 30 days after testing positive, decreasing to 7% at 90 days. We assess the sensitivity of these results to deviations from conditional independence, finding the estimates to be more sensitive to assumption violations at 30 days compared to 90 days. Female sex, fatigue during acute infection, and higher viral load were associated with slower symptom resolution.
💡 Research Summary
The paper addresses the problem of estimating the distribution of COVID‑19 symptom duration using “current status” data, where each participant provides only the time of a single survey response and a binary indicator of whether symptoms have resolved by that time. This design, employed in the Husky Coronavirus Testing (HCT) study at the University of Washington, minimizes recall bias but poses statistical challenges because the exact symptom resolution time is never observed and many participants never respond. Traditional analysis of current status data relies on the non‑parametric maximum likelihood estimator (NPMLE), which assumes that the survey response time carries no information about the underlying event time—a strong assumption often violated in practice.
The authors propose to replace NPMLE with a causal isotonic regression (CIR) framework originally developed for estimating monotone dose‑response curves under weaker conditions. The key assumption is conditional independence: given baseline covariates (W), the survey response time (Y) and the unobserved symptom resolution time (T) are independent. This is more plausible than unconditional independence because it allows the response time to be related to observable factors such as age, comorbidities, or vaccination status.
CIR proceeds by first estimating two nuisance functions: (1) (\mu(y,w)=E
Comments & Academic Discussion
Loading comments...
Leave a Comment