Continuous-Time Learning of Probability Distributions: A Case Study in a Digital Trial of Young Children with Type 1 Diabetes

Con tin uous-Time Learning of Probabilit y Distributions: A Case Study in a Digital T rial of Y oung Children with T yp e 1 Diab etes An tonio Álv arez-López ∗ Univ ersidad A utónoma de Madrid and Marcos Matabuena † Mohamed bin Za yed Univ ersit y of Articial In telligence Marc h 26, 2026 Abstract Understanding ho w biomark er distributions ev olve o v er time is a cen tral c hallenge in digital health and chronic disease monitoring. In diab etes, changes in the distribu- tion of glucose measuremen ts can rev eal patterns of disease progression and treatment resp onse that con v en tional summary measures miss. Motiv ated by a 26-week clini- cal trial comparing the closed-lo op insulin delivery system t:slim X2 with standard therap y in children with t yp e 1 diabetes, w e prop ose a probabilistic framework to mo del the con tin uous-time evolution of time-indexed distributions using con tin uous glucose monitoring data (CGM) collected every v e min utes. W e represent the glu- cose distribution as a Gaussian mixture, with time-v arying mixture weigh ts gov erned b y a neural ODE. W e estimate the mo del parameter using a distribution-matc hing criterion based on the maximum mean discrepancy . The resulting framework is in- terpretable, computationally ecien t, and sensitiv e to subtle temporal distributional c hanges. Applied to CGM trial data, the metho d detects treatment-related improv e- men ts in glucose dynamics that are dicult to capture with traditional analytical approac hes. K eywor ds: Con tinuous glucose monitoring; Digital health; Distribution dynamics; Neural ODEs; Gaussian mixture mo dels; Maxim um Mean Discrepancy . ∗ Univ ersidad A utónoma de Madrid † Mohamed bin Za yed Univ ersity of Articial Intelligence 1 1 In tro duction Characterizing the distribution of a random v ariable is a classical problem in statistics Silv erman ( 2018 ) and remains a central c hallenge in mo dern mac hine learning LeCun et al. ( 2015 ), where accurate distribution represen tations are essen tial for tasks such as text generation and automated rep orting Meskó & T opol ( 2023 ). More broadly , many scientic questions require understanding not only individual observ ations but also ho w the full distribution of a pro cess ev olv es o v er time. This p ersp ective is particularly relev ant in clinical applications. In digital health, estimating the distribution of individual physiological time-series data o v er sp ecic time p erio ds enables the construction of individual represen tations that cap- ture their underlying physiological pro cesses with high precision Matabuena et al. ( 2021 ), Matabuena & P etersen ( 2023 ), Ghosal et al. ( 2023 ). Recent studies show that, when used prop erly , such representations can rev eal clinically relev an t patterns that traditional (non- digital) biomarkers do not detect Katta et al. ( 2024 ), Matabuena, Ghosal, Aguilar, Keshet, W agner, F ernández Merino, Sánc hez Castro, Zipunniko v, Onnela & Gude ( 2025 ), Park et al. ( 2025 ), Matabuena et al. ( 2026 ). In this pap er, motiv ated by digital health applications, w e study the problem of con tin- uously estimating a time-indexed distribution      from sequentially observ ed data. The goal is to learn ho w the underlying distribution ev olves and to represen t that ev olution in a w ay that is b oth exible and interpretable. Standard approac hes are often unsatisfac- tory in this setting. Extending classical k ernel densit y estimators (KDEs) Chacón & Duong ( 2018 ) to include time t ypically leads to a strong sensitivit y to tuning parameters and to the curse of dimensionality , while ow-based generative mo dels Papamakarios et al. ( 2021 ) ma y b e less interpretable and can require substantial training eorts. Semiparametric al- ternativ es oer partial remedies. F or example, time-v arying mo dels suc h as Generalized 2 A dditiv e Mo dels for Lo cation, Scale, and Shape (GAMLSS) Rigb y & Stasinop oulos ( 2005 ) alleviate some of these issues, but most implemen tations are designed for scalar responses and ma y impose rigid functional forms. More recen t m ultilevel functional approaches based on functional-quan tile representations Matabuena & Crainicean u ( 2026 ) oer interpretabil- it y but rely on linear dynamics, whic h limits their abilit y to capture complex non-linear relationships and m ultiv ariate distributions. T o address these limitations, w e prop ose a con tin uous-time Gaussian mixture framew ork in whic h distributional dynamics are represen ted through time-v arying mixture weigh ts go v erned by a neural ODE. Problem form ulation Let    . F or eac h      , let      denote a random vector representing the quan tit y of in terest at time  . Its (cum ulative) distribution function is               d       (1) where inequalit y and integral are tak en comp onen t-wise when    , and    is the probabilit y density function (when it exists) at time  . Our target ob ject is    , or equiv alen tly , the density curve     , from whic h   can b e reco v ered through ( 1 ). In practice, ho w ev er, the pro cess is not observ ed contin uously in time. Instead, data are av ailable on a discrete time grid 󰄞               (2) A t each    󰄞  , we observe a sample dra wn from the distribution 󰄘   ,              󰄘       󰄞   (3) Dep ending on the application, these observ ations may be treated either as approximately indep enden t snapshots across time or as part of a longitudinal setting with temp oral de- 3 p endence. In b oth cases, the statistical problem is to recov er a coheren t con tinuous-time represen tation of the underlying distributional dynamics from these discrete observ ations. W e mo del eac h   as a Gaussian mixture with  comp onents,        󰄌           where      and        are the mean v ector and co v ariance matrix of the  th Gaussian comp onen t, respectively , and the w eigh t vector 󰄌  󰄌     󰄌    lies in the probability simplex 󰄌                      (4) The comp onent means and co v ariance matrices are shared o ver time, while the mixture w eigh ts v ary contin uously with  . This shared-dictionary representation is natural in our motiv ating application, where the Gaussian comp onents may b e viewed as latent glycemic regimes whose locations and scales remain relativ ely stable o v er the study perio d, while their relativ e prev alence c hanges o ver time. A t the same time, this is a strong structural assumption: b y allowing temp oral v ariation only through the mixture weigh ts, we trade some mo deling exibility for in terpretability and a more parsimonious characterization of distributional change. W e mo del the resulting w eigh t dynamics through a neural ODE. As  increases, Gaussian mixtures pro vide substantial appro ximation exibility , whereas for mo derate v alues of  the resulting weigh t tra jectories remain smo oth, in terpretable and statistically tractable. Digital health motiv ation and distributional data analysis Our motiv ation comes from the analysis of glucose distributions in longitudinal diab etes trials Battelino et al. ( 2023 ), where glucose is constan tly recorded using con tinuous glu- cose monitoring (CGM). Under free-living conditions, individual glucose time series can- 4 not b e directly aligned, making the ra w temp oral sto chastic pro cess dicult to compare b et w een participants Ghosal & Matabuena ( 2024 ), Matabuena et al. ( 2021 ), Matabuena, Ghosal, Aguilar, Keshet, W agner, F ernández Merino, Sánc hez Castro, Zipunniko v, On- nela & Gude ( 2025 ). In this setting, the time-v arying probability distribution pro vides a more natural biomark er to characterize the evolution of glucose metabolism Katta et al. ( 2024 ), P ark et al. ( 2025 ), Matabuena et al. ( 2021 ). Compared to conv entional CGM summary statistics—such as mean glucose or time-in-range metrics—this represen tation con v eys richer information by capturing the full sp ectrum of low, mo derate and high glu- cose v alues within a unied functional prole Matabuena et al. ( 2021 ), Katta et al. ( 2024 ). More broadly , distributional data analysis Ghosal et al. ( 2023 ), Szabó et al. ( 2016 ) is an emerging area that treats probability distributions, or collections of them, as statistical ob- jects for unsup ervised and supervised learning, including the prediction of clinical outcomes Matabuena et al. ( 2021 ). Biomedical applications are among their most prominent use cases. In digital health, measurements collected by con tin uous glucose monitors, accelerometers, or imaging mo dalities suc h as functional magnetic resonance imaging (fMRI) are increas- ingly represen ted through empirical distributions that serve as laten t descriptions of under- lying ph ysiological processes Ghosal et al. ( 2025 ), Ghosal & Matabuena ( 2024 ), Matabuena, Ghosal, Aguilar, Keshet, W agner, F ernández Merino, Sánchez Castro, Zipunnik o v, Onnela & Gude ( 2025 ), Matabuena et al. ( 2022 ). In recent y ears, sev eral regression frameworks ha v e b een prop osed in whic h predictors, resp onses, or b oth are represen ted as probabil- it y distribution functions Matabuena & Petersen ( 2023 ), Ghosal et al. ( 2025 ), Ghosal & Matabuena ( 2024 ), Ghosal et al. ( 2026 ), Matabuena, Ghosal, Meiring & Petersen ( 2025 ). A related line of work represen ts probability distributions as random objects in metric spaces and dev elops statistical pro cedures for that setting (see, e.g., Lugosi & Matabuena ( 2024 )). Despite this progress, there remains no general framework for mo deling mo derate- 5 to high-dimensional distributions that sim ultaneously oers exibilit y and in terpretabilit y . The metho dology introduced here is in tended to help bridge this gap. Con tributions This paper develops a statistical framew ork to mo del the contin uous-time ev olution of probabilit y distributions from longitudinal data and shows its practical v alue in a case study of digital health. Our main contributions are as follo ws: 1. W e prop ose a general framew ork for mo deling the dynamics of multiv ariate probabil- it y distributions in con tin uous time b y com bining Gaussian mixture represen tations with neural ODE smo othing, yielding an in terpretable estimator. 2. W e introduce an estimation pro cedure based on a Maxim um Mean Discrepancy ob- jectiv e Gretton et al. ( 2012 ). This av oids the need to sp ecify and optimize a full lik eliho o d under temporal dep endence, while pro ducing a simple dierentiable loss with closed-form expressions for Gaussian mixtures under Gaussian kernels. A t eac h time p oin t, the empirical loss function takes the form of a V-statistic Sering ( 2009 ). 3. W e demonstrate the practical utilit y of the prop osed methods in a biomedical appli- cation by analyzing contin uous glucose monitoring data from a longitudinal trial in y oung c hildren with t yp e 1 diab etes W adw a et al. ( 2023 ). Our approac h pro vides clin- ically meaningful insights on glucose dynamics and the benets of the new closed-lo op insulin system compared to standard therap y , including treatment-related c hanges that are less apparent from con ven tional analytical approaches. 4. W e pro vide theoretical supp ort for the rst stage of the pro cedure, prior to tempo- ral smo othing. In particular, w e establish an approximation result for the shared- dictionary represen tation and a nite-sample b ound for the minimum-MMD estima- tor of the mixture weigh ts at each observ ed time p oint. The mathematical results 6 and the corresp onding proofs are given in the Supplementary Material (Section A ). 2 Case study: closed-lo op insulin deliv ery in y oung c hildren with t yp e 1 diab etes T reatment of t yp e 1 diabetes in young c hildren remains a particularly c hallenging task Sc ho elw er et al. ( 2024 ), W are et al. ( 2024 ). In children y ounger than 6 y ears of age, insulin doses are small, while fo o d intak e, meal timing, and physical activit y are often unpredictable, making dosing decisions esp ecially dicult. Y oung c hildren ma y also exhibit greater glycemic v ariability than older c hildren and adults. As a result, treatmen t strategies and therap eutic goals are often harder to dene in this population, and only a limited n um b er of hybrid closed-lo op systems Kitagaw a et al. ( 2025 ), Hughes et al. ( 2023 ) hav e receiv ed formal approv al from the U.S. F o o d and Drug Administration for c hildren under 6 years of age. This case study is motiv ated b y a randomized clinical trial ev aluating the t:slim X2 insulin pump with Control-IQ T echnology (T andem Diab etes Care) in y oung children with type 1 diab etes W adwa et al. ( 2023 ). The t:slim X2 system is a h ybrid closed-loop device that uses con tin uous glucose monitoring (CGM) measurements to guide automated insulin delivery through basal rate adjustmen ts and correction b oluses every ve minutes. Although this tec hnology has b een extensiv ely studied in older children, adolescen ts and adults Bec k et al. ( 2023 ), Stahl-Pehe et al. ( 2025 ), the evidence in children under 6 years of age has remained relativ ely limited. The trial W adw a et al. ( 2023 ) enrolled 102 children aged 2–5 years and randomized them in a 2:1 ratio to closed-lo op con trol (    ) or standard care (    ) for 26 weeks. Our goal in revisiting these data is not simply to reassess treatmen t ecacy using conv entional 7 endp oin ts, but to examine whether a distributional represen tation of CGM measuremen ts can rev eal treatmen t-related c hanges in glucose regulation that are less apparen t from standard summary measures. This question is motiv ated b y the gro wing need for analytical to ols that c haracterize glucose b eha vior beyond conv entional CGM summaries, suc h as mean glucose, time in range, and related comp ositional metrics. Although these summaries are clinically useful, they do not fully capture the ric hness of CGM data, which con tain information on m ultiple time scales, including short-term uctuations and rate of c hange. These features ma y reect clinically signican t dierences in glucose regulation and resp onse to treatment. Recen t w ork has prop osed gluc o density as a functional representation of a glucose time series through the marginal distribution of CGM measurements Matabuena et al. ( 2021 ). This framew ork can capture asp ects of glucose b eha vior that are not fully summarized b y stan- dard scalar metrics. Ho w ev er, a univ ariate glucodensity does not directly capture temp oral dynamics, such as whether glucose lev els c hange rapidly or more gradually ov er time. T o address this limitation, m ultiv ariate extensions incorp orate dynamic features Matabuena, Ghosal, Aguilar, Keshet, W agner, F ernández Merino, Sánchez Castro, Zipunnik o v, Onnela & Gude ( 2025 ), including the rate of change and, p otentially , acceleration, through join t densit y representations such as    or      . These multiv ariate representations pro vide a natural framework for studying how glucose lev els and glucose dynamics evolv e o v er time. In this paper, w e reanalyze the trial data W adwa et al. ( 2023 ) using a m ultiv ariate glu- co densit y framew ork Matabuena, Ghosal, Aguilar, Keshet, W agner, F ernández Merino, Sánc hez Castro, Zipunniko v, Onnela & Gude ( 2025 ) designed to capture b oth the distribu- tion of glucose v alues and key asp ects of glucose dynamics. W e summarize each participan t’s time-v arying distribution using a Gaussian mixture mo del with    comp onents and 8 a shared component dictionary ov er time. The w eigh ts of the participant-specic mixture then dene longitudinal trajectories that characterize the ev olution of individual glucose distributions and, in the multiv ariate setting, the corresp onding glucose dynamics during follo w-up. Figure 1 illustrates the structure of the data and the proposed representation for three represen tativ e participants (tw o in the in terv en tion group and one in the con trol group). F or each participan t, w e sho w the ra w CGM time series, the estimated tra jectories of the mixture w eights, and the tted biv ariate densities based on glucose and its rate of c hange at the b eginning and end of the trial. Our primary objective in this case study is to assess whether this distributional frame- w ork yields clinically in terpretable comparisons b etw een treatment groups and whether incorp orating glucose dynamics into the gluco densit y represen tation provides additional insigh t b eyond conv entional summary measures. In the original article, W adwa et al. ( 2023 ) found that glucose lev els were within the target range for a greater prop ortion of time under the closed-lo op system than in standard care. How ev er, the authors did not nd clear dierences in the time sp ent in other glycemic ranges such as h yp oglycemia or, for some subgroups of individuals, in other common diabetes biomarkers such as glycated hemoglobin (HbA1c). W e analyze this dataset from a distributional persp ectiv e with the goal of enric hing and extending the original analysis. 3 Metho dology W e introduce a Gaussian mixture framew ork for learning time-v arying distributions from longitudinal data observed on a discrete time grid. The metho d represen ts distributional dynamics in contin uous time, yields interpretable sub ject-sp ecic trajectories, and supp orts distributional comparisons across groups (interv en tion arms) and downstream statistical 9 0 50 100 150 200 250 Time slot (5-min intervals) 0 50 100 150 200 250 300 350 400 Glucose (mg/dL) 0.0 0.2 0.4 0.6 0.8 1.0 T ime 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 10 5 0 5 10 Glucose rate of change (mg/dL/time) 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 10 5 0 5 10 Glucose rate of change (mg/dL/time) P articipan t 20 (T reatment) 0 50 100 150 200 250 Time slot (5-min intervals) 0 50 100 150 200 250 300 350 400 Glucose (mg/dL) 0.0 0.2 0.4 0.6 0.8 1.0 T ime 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 10 5 0 5 10 15 Glucose rate of change (mg/dL/time) 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 10 5 0 5 10 15 Glucose rate of change (mg/dL/time) P articipan t 58 (T reatment) 0 50 100 150 200 250 Time slot (5-min intervals) 0 50 100 150 200 250 300 350 400 Glucose (mg/dL) 0.0 0.2 0.4 0.6 0.8 1.0 T ime 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 12.5 Glucose rate of change (mg/dL/time) 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 12.5 Glucose rate of change (mg/dL/time) P articipan t 82 (Control) Figure 1: Individual participan t analysis for the biv ariate mo del (glucose and its rst deriv ativ e) using    Gaussian comp onents. Each row corresp onds to one participant (IDs 20, 58, and 82). Left: raw CGM time series sho wing glucose concentration (mg/dL) o v er the observ ation p erio d. Middle left: estimated weigh t tra jectories 󰄌  󰄌     learned b y the neural ODE, represen ting the ev olution of the mixture prop ortions ov er normalized time     . Middle right and right: con tours of the tted biv ariate Gaussian mixture density at the initial and nal times, summarizing the joint distribution of glucose lev el and its rate of c hange. inference. Although motiv ated b y digital health applications, the framew ork is broadly applicable to settings in whic h an underlying distribution evolv es con tinuously o v er time. 10 3.1 Bac kground Our prop osed framew ork com bines t w o mathematical ingredien ts: maxim um mean discrep- ancy , which we use to t the mo del at the observed time p oints, and neural ODEs, whic h mo del the contin uous-time ev olution of the underlying distribution. Maxim um Mean Discrepancy . T o compare probability distributions 󰄘 and 󰄙 on a common measurable space  , we use the maximum me an discr ep ancy (MMD) Gretton et al. ( 2012 ), Muandet et al. ( 2017 ). T o do so, MMD represents each distribution as an element in a Hilb ert space. Let  b e the repro ducing kernel Hilbert space (RKHS) induced by a positive denite k ernel        . If 󰄘 is a probabilit y distribution suc h that    d 󰄘   (e.g., if  is bounded), then the kernel me an emb e dding of 󰄘 , denoted by  󰅬   , is w ell dened:  󰅬         d 󰄘 F ormally , MMD ev aluates the distance b etw een t w o distributions as the  -norm of the dierence b et w een their em b eddings: MMD  󰄘 󰄙    󰅬   󰅭                       (5) where     i.i.d.  󰄘 and     i.i.d.  󰄙 . If the mapping 󰄘   󰅬 is injectiv e, equiv alen tly if the k ernel  is char acteristic , then MMD denes a v alid metric, satisfying MMD 󰄘 󰄙    if and only if 󰄘  󰄙 , see Srip erum budur et al. ( 2011 ), Sejdinovic et al. ( 2013 ). Throughout the pap er, w e x     and use the Gaussian k ernel:     exp        󰄝    (6) 11 where 󰄝   is a bandwidth parameter. Since the Gaussian k ernel is b ounded and c har- acteristic, the kernel mean embedding is well-dened, and the associated MMD denes a metric on probabilit y distributions. In practice, the c hoice of 󰄝 strongly impacts the sensitivit y of the metric to dierent scales. W e set it using the median heuristic Garreau et al. ( 2017 ): 󰄝 is c hosen as the median of the pairwise Euclidean distances betw een sample p oin ts. In tuition. MMD measures ho w far apart tw o distributions are in the feature space induced b y the kernel  . In our setting, at each observ ed time p oint   in the grid ( 2 ), w e use MMD to compare the empirical distribution associated with the observ ations in ( 3 ) to a tted Gaussian mixture distribution. Wh y MMD? (i) With a Gaussian k ernel, the discrepancy b etw een an empirical distribu- tion and a Gaussian mixture admits closed-form terms, whic h leads to stable and computa- tionally ecient updates. (ii) Characteristic k ernels yield well-posed tting ob jectiv es. (iii) Empirical evidence suggests that MMD-based pro cedures are more robust than lik eliho o d- based metho ds under temp oral dep endence and mo del missp ecication Chérief-Ab dellatif & Alquier ( 2022 ), Alquier & Gerb er ( 2024 ), Gao et al. ( 2021 ), Alquier et al. ( 2023 ). Neural ODEs. W e replace discrete la y ers by the con tinuous evolution of a hidden state      󰅴            where  󰅴              is a learnable v ector eld, typically parameterized b y a m ultila y er p erceptron. The tra jectory   is computed n umerically , and gradients with resp ect to 󰄠 can b e obtained using adjoin t metho ds Massaroli et al. ( 2020 ). Neural ODEs ha v e b een used to mo del laten t tra jectories in contin uous time Kidger et al. ( 2020 ), R ubanov a et al. ( 2019 ), Jia & Benson ( 2019 ), including in biomedical applications 12 Qian et al. ( 2021 ). In our framework, the laten t tra jectory is used to parameterize the mixture weigh ts o v er time. This provides a smooth con tin uous-time interpolation of the w eigh ts estimated on the discrete grid ( 2 ), without imposing a rigid parametric form on their evolution. Wh y a neural ODE? Our ob ject of in terest is the con tin uous-time distribution of CGM in free-living environmen ts, where measuremen ts are irregular and not directly aligned across participan ts. Mo deling the weigh t tra jectories 󰄌 through a neural ODE is adv antageous for sev eral reasons. (i) It naturally accommodates irregularly sampled data without re- quiring ad ho c grid alignmen t, while its con tinuous nature mitigates sensor noise. (ii) It pro vides a parameter-ecien t framew ork (a single vector eld enco des arbitrary depth) to generate smo oth, con tin uous laten t trajectories. (iii) By con trast, discrete sequence mo dels W ang ( 2024 ), W u et al. ( 2023 ) are eectiv e in forecasting ra w traces but do not directly target the con tin uous-time distributional dynamics central to our aims. 3.2 Our mo del W e mo del the contin uous-time densit y   as a Gaussian mixture:  󰅧      󰄌           (7) where the comp onent means      and co v ariance matrices      are shared across time, while the mixing weigh ts 󰄌   v ary contin uously with  (within the simplex). This shared-dictionary structure mak es the represen tation comparable across b oth time p oin ts and individuals. This mo deling choice is theoretically grounded in Wiener–T aub erian approximation argu- men ts Wiener ( 1932 ). As detailed in Theorem A.1 , under mild regularity conditions, these shared-dictionary mixtures can uniformly appro ximate any contin uous curv e of densities 13 in      . Specically , for any 󰄤   , choosing a suciently large  ensures: sup       󰅧        󰄤 In practice, we x a mo derate  to preserve in terpretabilit y , selecting its v alue based on the sp ecic application (cf. Section 5 ). F ormally , our mo del is dened b y the time-dep endent parameter v ector: 󰄓  󰄌     󰄌                                  whose eective dimension at any xed  is             . 3.2.1 Discrete-time MMD tting Giv en the observ ation grid 󰄞           and the sample              󰄘   —recall ( 2 ) and ( 3 )—, we dene the empirical distribution by 󰄘             󰄏            A t each discrete time step   , we t a static Gaussian mixture  󰅧       󰄌          󰄌   󰄌     󰄌        b y minimizing MMD  󰄘      󰄘 󰅧   , where 󰄘 󰅧  is the distribution with densit y  󰅧  . W e use a Gaussian kernel   of the form ( 6 ), with bandwidth 󰄝  selected by the median heuristic from the sample         . By expanding the squared MMD, the ob jectiv e function reduces to a con v enien t quadratic form in the mixing w eigh ts 󰄌  : MMD  󰄘      󰄘 󰅧    󰄌     󰄌    󰄌         (8) where                            dep ends solely on the empirical data and can therefore b e omitted from the minimization, while the matrix        and the 14 v ector      admit closed-form expressions for their en tries:      󰄝      det       󰄝   Id  exp                 󰄝   Id                     󰄝      det    󰄝   Id  exp                󰄝   Id           Optimization. W e minimize ( 8 ) using the following alternating sc heme: 1. Initialization. W e run  -means clustering Jain ( 2010 ) on       to initialize the parameters: the means      are the cen troids of the iden tied clusters;      are the empirical co v ariance matrices of the p oin ts within eac h cluster; and the initial weigh ts 󰄌     corresp ond to the proportion of data p oints at time   assigned to cluster  . 2. Lo cal up date. F or each   , k eeping means and co v ariances xed, we up date the w eigh ts 󰄌      via the quadratic program: 󰄌   argmin 󰅠  󰄌    󰄌   󰄌        󰄗  󰄌     (9) where 󰄗  󰄗        is a vector of ridge hyperparameters that impro v es numerical conditioning and stabilizes the solution when Gaussian comp onents b ecome nearly collinear in the RKHS feature space. 3. Global up date. Up date   and   iterativ ely via (Adam) gradien t descent on the MMD ob jectiv e, keeping the curren t w eigh ts 󰄌  xed. 3.2.2 Con tin uous-time weigh t evolution Once the discrete-time weigh ts 󰄌     are tted, we use a con tin uous-time mo del for their ev olution. On   , we solve:             󰅴            󰄌   (10) 15 where  󰅴           is a multila y er p erceptron (architecture and solv er hyperpa- rameters are rep orted in T able 2 ), and then map   to v alid mixture w eigh ts b y a simplex normalization op erator. Dene 󰄌      1           (11) (with the con ven tion that if  1        , w e replace   b y    󰄤 1 for a small 󰄤   ). The parameters 󰄠 are optimized b y matching the ODE predictions to the tted weigh ts 󰄌  : 󰄠     󰄌   󰄠   󰄌     󰄙 󰄠   (12) where 󰄌   󰄠  is obtained b y integrating ( 10 ) and applying ( 11 ) at time   , and 󰄙   is a ridge hyperparameter. P erm utation symmetry . Because       are shared o v er time and k ept xed after the global t, comp onent lab els are anchored. The neural ODE stage only ev olves 󰄌   , so tra jectories cannot exc hange lab els, removing p ermutation ambiguit y . Remark 3.1. As an alternative to the simplex normalization based on the p ositiv e part (see ( 11 )), one may evolv e logits     and set     󰅴   󰄌  softmax       If 󰄌  󰄌      has strictly p ositive comp onen ts, then c ho osing   log 󰄌     1    arbitrary  ensures 󰄌  󰄌  , since softmax is in v ariant under shifts b y  1 . If some components of 󰄌  are zero, one may initialize with 󰄌 󰅸   󰄌   󰄤 1 for a small 󰄤   and set   log 󰄌 󰅸   . 16 4 Sim ulation study W e b enc hmark nite-sample p erformance against represen tativ e baselines. Unlik e comp et- ing approac hes, our metho d prioritizes in terpretability through the time-v arying w eigh ts 󰄌   for       . The results indicate that this emphasis on interpretabilit y does not come at the exp ense of accuracy: the prop osed metho d remains comp etitiv e in statistical error and, in sev eral multiv ariate settings, outp erforms the alternatives. A dditional details are provided in the Supplementary Material (Section C ). 5 Case Study: CGM T rial F rom a distributional data analysis p ersp ective, the ob jectiv e of this case study is to sho w that the prop osed metho dology can lev erage the thousands of glucose measurements recorded by contin uous glucose monitoring (CGM) more eectively than con ven tional scalar summaries. Standard CGM metrics are naturally em b edded in the glucodensity framew ork Matabuena et al. ( 2021 ); ho w ev er, our ob jective here is to sho w that m ultiv ariate functional represen tations can also reveal clinically signican t asp ects of glucose regulation that are not fully captured by standard summaries alone. F rom a mo deling p ersp ective, our goal is to illustrate the interpretabilit y of the prop osed framew ork to c haracterize longitudinal dierences in glycemic proles betw een the t wo study arms. In particular, w e fo cus on iden tifying distributional dierences b etw een treat- men t and control ov er time and on assessing whether incorp orating glucose dynamics through rates of change improv es the c haracterization of these dierences b etw een groups o v er the course of follo w-up. 17 Preliminaries and scien tic questions As describ ed in Section 2 , our analysis is motiv ated by data from the randomized clinical trial published in the New England Journal of Me dicine en titled “T rial of Hybrid Closed- Lo op Control in Y oung Children with Type 1 Diabetes” W adwa et al. ( 2023 ). 1 This study ev aluated hybrid closed-lo op con trol in c hildren under 6 y ears of age and represen ts an imp ortan t clinical setting in which to assess longitudinal changes in glucose regulation. A total of  participan ts W adwa et al. ( 2023 ) with t yp e 1 diab etes mellitus were random- ized in a 2:1 ratio to a closed-lo op tr e atment arm or to a c ontr ol arm receiving standard diab etes care. The clinical bac kground and additional details of the study w ere presented in Section 2 . T o illustrate the contin uous-time mo del in tro duced in Section 3 , w e consider the case    , in whic h glucose is treated as the rst co ordinate and the rate of glucose change as the second co ordinate. Our analysis addresses the follo wing questions: 1. Are there statistically signicant dierences b etw een the treatment and con trol groups in their gluco density representations from baseline to the end of follow-up? 2. Ho w do these dierences ev olv e ov er time, including at intermediate time p oints, and do they reveal temp oral resp onse patterns that are not captured by endp oin t summaries alone? 3. Do the t wo groups dier not only in the distribution of glucose v alues, but also in glucose dynamics, as reected by the rate of glucose change? More broadly , do es incorp orating rate-of-change information impro ve the detection or c haracterization of dierences b etw een groups? 1 Data are publicly av ailable at https://public.jaeb.org/datasets/diabetes . 18 Mo deling the biv ariate distribution of glucose tra jectories Let    denote the CGM measuremen t of participan t  at time    󰄞   , and let    denote its rate of change. In practice, CGM is observed on a discrete time grid, and our analysis is carried out o ver longitudinal windows indexed by  (for example, weekly or 10- da y interv als). F or participant  , let     denote a generic analysis windo w and let   b e the num b er of CGM measuremen ts recorded in that windo w. F or eac h participant  and window     , w e consider the biv ariate sample               where   is the glucose measurement at the  th observ ation in the windo w and          is the corresp onding nite-dierence rate of change, with  denoting the CGM sampling in terv al. Our target is the joint distribution                 together with its asso ciated density       . T o obtain a representation that is b oth computationally tractable and directly comparable across participants, we appro ximate   using a dynamic Gaussian mixture model,           󰄌             where         denotes the biv ariate Gaussian density with mean      and p ositiv e-denite co v ariance matrix       . The comp onent-specic parameters         are shared across participan ts and time windows, whereas the mixing w eigh ts 󰄌  are allow ed to v ary across participants and o ver time, subject to 󰄌       󰄌    19 F or the biv ariate analysis, w e x    to balance exibility and in terpretability . This c hoice is ric h enough to capture heterogeneous joint patterns in glucose lev el and short- term glucose dynamics while preserving a common reference structure across participants. Consequen tly , the global comp onent parameters         are estimated once from a common reference sample, whereas only the weigh ts 󰄌  v ary across participants and ov er time. F rom a clinical p oint of view, the v e comp onen ts may b e in terpreted as a dictionary of glycemic regimes, dened by their estimated lo cations and cov ariance structure in the  glucose  rate-of-change  space. These regimes can be group ed in to broader proles ranging from more fa v orable to less fa v orable glucose con trol, thereb y pro viding a parsimonious clinical summary while preserving the ner resolution of the    represen tation. T able 1 rep orts the global means and co v ariance matrices of the ve Gaussian components. Ov erview of ndings Throughout follow-up, the treatmen t arm exhibits a gradual redistribution of mixture w eigh t tow ard more fa v orable glycemic regimes, whereas the control arm remains com- parativ ely stable or shifts to ward less fa v orable proles. These dierences b etw een groups are mo dest at early follow-up, but b ecome more pronounced tow ard the end of the in- terv en tion. The biv ariate represen tation further shows that the eect of treatmen t is not limited to glucose lev els alone, but also in volv es c hanges along the rate-of-c hange dimension, suggesting reduced short-term glucose uctuations. Finally , quan tile-based summaries in- dicate that these changes are heterogeneous b etw een participan ts and are primarily driven b y a subset of individuals rather than b y a uniform shift throughout the cohort. 20 T emp oral ev olution of mixture weigh ts b et ween groups T o assess whether dierences b etw een arms emerge gradually during follow-up, Figure 2 dis- pla ys the estimated tra jectories of the comp onent weigh ts in the treatment and con trol arms for the ve comp onents of the biv ariate mo del. In general, the tra jectories are smo oth and fairly stable, indicating that the underlying glucose distribution ev olv es gradually rather than abruptly . This temp oral regularit y suggests that treatment-related dierences are unlik ely to b e fully captured by baseline-v ersus-endp oint comparisons alone and motiv ates the use of a con tin uous longitudinal represen tation. Although the group-a v erage trajectories remain relatively close for some comp onen ts, sev- eral systematic dierences emerge. In particular, comp onent 1 tends to carry a larger a v erage w eigh t in the treatment arm, whereas comp onen ts 2 and 5 tend to b e somewhat more prominen t in the con trol arm. By contrast, comp onen ts 3 and 4 sho w w eak er separa- tion at the lev el of group means. Since higher weigh ts indicate that a participan t sp ends more time in the glycemic regime represen ted b y the corresp onding comp onen t, these pat- terns suggest a gradual redistribution of time sp ent in clinically distinct glucose proles o v er the course of follo w-up. F rom a clinical p ersp ective, the most relev an t descriptive pattern is the increasing weigh t of the comp onen t associated with more fa vorable glucose control in the treatmen t arm, together with the relative p ersistence of less fav orable comp onen ts in the control arm. Th us, even at the descriptiv e level, the dynamic mixture representation suggests that the in terv en tion is asso ciated with a progressiv e shift tow ard impro v ed glycemic regulation. Redistribution in the biv ariate glucose space T o examine how the join t glucose distribution c hanges ov er time, Figure 3 summarizes the evolution of the biv ariate densities in the  glucose  rate-of-change  space o ver six-week 21 in terv als, for example, b etw een w eeks 20 and 26 at the group lev el. In the con trol arm, the initial and nal densities remain relativ ely similar, and the corresponding dierence surface is spatially heterogeneous, with alternating positive and negativ e regions and no clear dominan t direction of c hange. In addition, the baseline densit y in the con trol arm app ears to assign relatively more mass to hypoglycemic regions and to higher glucose concen trations than in the treatmen t arm. By con trast, the treatment arm displa ys a more structured mass redistribution in the biv ariate space, c haracterized by a mark ed p ositive band in the mid-to-high glucose range and comp ensatory negativ e regions elsewhere. T aken together, these patterns suggest that the interv ention is asso ciated with a more systematic mo dication of the join t glucose distribution than is observed in the control arm. Imp ortan tly , the observed redistribution is not conned to the glucose axis alone: changes along the rate-of-change axis indicate that the in terv en tion also aects short-term glucose dynamics, with a pattern consisten t with reduced glucose uctuations o v er time. This gure directly addresses the third scien tic question. Compared to a univ ariate glu- co densit y analysis, the biv ariate representation reveals how treatment-related changes are join tly organized in glucose lev el and glucose dynamics, providing a richer c haracterization of the ev olving metab olic prole. T emp oral inference for b et ween-group dierences T o formally assess whether mixture-weigh t trajectories dier b et w een groups, we apply the exploratory wild b o otstrap MMD pro cedure describ ed in the Supplementary Material (Section B ). F or eac h comp onent-specic trajectory  󰄌   (with      and    ), the test ev aluates whether the treatmen t and control groups dier at the distributional lev el ov er time, while adjusting for the sequen tial structure and serial dep endence of the 22 longitudinal CGM data. The resulting time-v arying  -v alues are rep orted in Figure 4 . The strongest evidence of b et w een-group dierences is observ ed for components 3 and 4, whose  -v alues remain b elow the  threshold for most of the follow-up p erio d. Component 1 approaches the thresh- old during the middle of follow-up and b ecomes clearly signican t near the end, whereas comp onen ts 2 and 5 remain non-signican t throughout. These ndings show that the treatment eect is not uniformly distributed across the com- p onen t representation but is instead concentrated in a subset of glycemic regimes. More generally , the inferential evidence strengthens tow ards the end of the interv ention, consis- ten t with the gradual separation seen in the estimated weigh t tra jectories. This temp oral pattern suggests that the eect of the closed-loop in terven tion accum ulates o v er time rather than app earing immediately after the initiation of treatment. Heterogeneit y of resp onse T o c haracterize the heterogeneit y of the response to treatmen t, Figure 5 rep orts, for each comp onen t     , the empirical quantile curves of the cen tered weigh t tra jectories      󰄌     󰄌           in the t w o arms. F or each xed      , these curv es describ e the cross-sectional dis- tribution of deviations from baseline within each treatmen t arm. In most components, the p oin t wise median tra jectory remains close to zero during follo w-up, indicating that the typical participant exp eriences only a mo dest c hange relative to baseline. Ho w ev er, the in terquartile env elop es and the outer quan tile bands widen with time, sho wing that the dispersion of    is substantial and that the temp oral redistribution of the mixture w eigh ts is driv en primarily b y a subset of individuals. 23 The clearest separation b etw een groups is observed for comp onents 1 and 2. In comp onent 1, whic h is asso ciated with a more fa vorable glucose-control region, the treatment arm remains more tightly concen trated around zero, while its upper quan tiles b ecome positive at later follo w-up times. This indicates that a subgroup of treated participan ts gradually shifts w eigh t tow ard this fa vorable comp onent. By contrast, the con trol arm exhibits a more pronounced negativ e displacemen t of its central quantiles, indicating a reduction relative to baseline. F or comp onent 2, the con trol arm sho ws a stronger p ositive shift in both the cen tral and upp er quan tiles, suggesting increasing w eight in a less fa vorable glycemic prole o v er time. F or the remaining components, the p oin t wise quantile curv es are more similar across groups, although their spread still indicates appreciable sub ject-sp ecic v ariability . This heterogeneit y analysis complements the MMD results. Although the MMD test detects global distributional dierences o ver time and is most sensitiv e for components 3 and 4, the quan tile summaries reveal that the most clinically in terpretable subgroup-level separation o ccurs in comp onents 1 and 2. The tw o analyses therefore highlight dierent asp ects of the treatment eect: one at the level of global distributional inference and the other at the lev el of sub ject-sp ecic response heterogeneit y . In general, these quantile summaries indicate that the treatment eect is not well de- scrib ed b y a homogeneous lo cation shift that acts uniformly b et w een participan ts. Rather, the eect is heterogeneous, with the main temp oral redistribution concentrated in specic subp opulations and in a limited subset of mixture components. In particular, the treatment arm shows evidence of an increase in weigh t in the comp onent asso ciated with the normo- glycemic range, whereas the control arm tends to shift to w ard a less fav orable prole. This supp orts the view that resp onses to the in terven tion are individualized, while still rev ealing an ov erall trend tow ard impro v ed glucose regulation under treatmen t. 24 Summary of the results T ogether, the biv ariate gluco density analysis yields four main conclusions. First, the treat- men t and control arms dier not only in their endp oin t distributions but also in the w ay their glycemic proles ev olve ov er time. Second, these dierences b ecome more apparent to w ard the end of follo w-up, indicating a progressive treatmen t eect rather than an im- mediate separation after the initiation of treatmen t. Third, incorp orating rate-of-change information rev eals treatmen t-related c hanges in glucose dynamics that w ould not b e visi- ble from a purely marginal analysis of glucose v alues alone. F ourth, the eect of treatment is heterogeneous betw een participan ts, with the clearest impro v emen ts concentrated in a subset of children who mov e tow ard mixture comp onen ts asso ciated with more fav orable glycemic regulation. Our results complement and extend those of W adwa et al. ( 2023 ) b y highligh ting the imp ortance of glucose rate-of-change information to detect dierences b et w een treatment arms in c hildren y ounger than six y ears of age. The role of the glucose rate-of-c hange Richardson ( 2025 ) remains relativ ely underexplored in clinical trials, partic- ularly in the p ediatric p opulation considered here. More broadly , these ndings provide a more holistic distributional view of the data and extend the conclusions that can b e drawn from conv entional automated summary measures based on glucodensity representations. 6 Final remarks W e dev elop ed an interpretable statistical framew ork to model the dynamics of time-indexed probabilit y distributions in longitudinal digital health studies. The prop osed metho dology com bines a shared mixture representation with contin uous-time evolution, allo wing complex distributional changes to b e track ed o v er follow-up while preserving a clinically interpretable lo w-dimensional structure. The simulation results rep orted in the Supplementary Material (Section C ) further indicate that this approach achiev es an estimation precision that is 25 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (a) Comp onent 1 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (b) Comp onen t 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (c) Comp onent 3 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (d) Comp onent 4 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (e) Comp onen t 5 Figure 2: Comparison of w eight tra jectory dynamics b etw een T reatment ( green ) and Con- trol ( red ) groups for the    biv ariate mo del with    mixture components. Each panel shows the evolution of comp onent w eigh ts 󰄌   b etw een w eeks 20–26 ov er normalized time     . Group means are sho wn as thick dashed lines. The shaded bands represent a statistical en v elop e around the mean (e.g., betw een the 5th and 95th p ercentiles). comp etitiv e with existing alternativ es. In the CGM application, the framew ork pro vides a distributional view of the resp onse to treatmen t that go es b eyond con v en tional scalar summaries Battelino et al. ( 2022 ). Rather than reducing each sub ject tra jectory to a small set of isolated metrics, the prop osed ap- proac h captures ho w the full glucose distribution and its short-term dynamics ev olve join tly o v er time. This yields a richer c haracterization of glycemic regulation and makes it p ossi- ble to distinguish global distributional c hanges from subject-sp ecic resp onse heterogeneit y within a common mo deling framework. More broadly , the presen t work illustrates the v alue of multiv ariate con tin uous-time dis- 26 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 6 4 2 0 2 4 6 Glucose rate of change (mg/dL/time) (a) Control: Initial 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 6 4 2 0 2 4 6 Glucose rate of change (mg/dL/time) (b) Control: Final 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 6 4 2 0 2 4 6 Glucose rate of change (mg/dL/time) -7.1 -5.4 -3.6 -1.8 0.0 1.8 3.6 5.4 7.1 × 1 0 4 (c) Control: Dierence 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 6 4 2 0 2 4 6 Glucose rate of change (mg/dL/time) (d) T reatment: Initial 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 6 4 2 0 2 4 6 Glucose rate of change (mg/dL/time) (e) T reatment: Final 50 100 150 200 250 300 350 Glucose concentration (mg/dL) 6 4 2 0 2 4 6 Glucose rate of change (mg/dL/time) -7.1 -5.4 -3.6 -1.8 0.0 1.8 3.6 5.4 7.1 × 1 0 4 (f ) T reatment: Dierence Figure 3: Predicted glucose densit y distributions b et w een w eeks 20 and 26, comparing T reatment ( green ) and Con trol ( red ) groups for the    model with    mixture comp onen ts. The marginal densit y ov er glucose concentration and sp eed is computed by dra wing samples from eac h participant’s GMM, estimating the sliced-W asserstein barycen- ter in 2D, and conv erting the barycen ter samples in to a smooth density via a Gaussian KDE on the grid. The top ro w corresp onds to the Control group and the b ottom ro w to the T reatment group. In b oth rows, the rst column sho ws the initial distribution at w eek 20, the second column displa ys the nal distribution at w eek 26, and the third column presen ts the dierence b et w een the t w o distributions. tributional mo deling for mo dern digital health data Matabuena, Ghosal, Aguilar, Keshet, W agner, F ernández Merino, Sánc hez Castro, Zipunnik o v, Onnela & Gude ( 2025 ), where re- p eated dense measurements are b ecoming increasingly common. The prop osed framew ork is particularly app ealing in settings where b oth in terpretabilit y and temp oral resolution are imp ortan t, since it enables changes in clinically meaningful laten t regimes to be follow ed 27 19 20 21 22 23 24 25 26 W e e k s 0 . 0 0 1 0 . 0 1 0 . 0 5 0 . 1 1 p − v a l u e Figure 4: Wild Bo otstrap MMD test  -v alues comparing T reatment vs. Con trol groups o ver time for the    mo del with    mixture comp onen ts. The dashed blac k line indicates the signicance threshold 󰄌   . The colors correspond to the dieren t comp onents: blue (comp onent 1), orange (comp onen t 2), green (comp onent 3), red (comp onent 4), and violet (comp onent 5). con tin uously throughout an interv ention. Sev eral limitations and directions remain for future w ork. First, the framework should b e ev aluated across a broader range of digital health studies and interv ention settings in order to b etter assess its robustness and generalizability . Second, scalable online and distributed implemen tations would enhance their utility in large-scale epidemiological studies. Third, extensions to higher-dimensional distributional represen tations and to functional biomark- ers, including those arising in biomec hanics Matabuena et al. ( 2023 ), could substan tially broaden their applicabilit y . In general, this work sho ws that distributional mo deling can yield clinically signicant insigh ts b ey ond conv en tional scalar summaries and pro vides a foundation for further statistical metho dology in digital health. References Alquier, P ., Chérief-Ab dellatif, B.-E., Derumign y , A. & F ermanian, J.-D. (2023), ‘Estima- tion of copulas via maxim um mean discrepancy’, Journal of the A meric an Statistic al 28 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (a) Comp onent 1 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (b) Comp onen t 2 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (c) Comp onent 3 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (d) Comp onent 4 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (e) Comp onen t 5 Figure 5: Quan tile curv es (median and 25%–75% bands) of the c hange in GMM mixture w eigh ts for each of the    comp onents o v er time, relative to their initial v alue, for T reatment ( green ) and Control ( red ) groups. Eac h panel shows the temp oral evolution of a comp onen t’s weigh t deviation from baseline. A sso ciation 118 (543), 1997–2012. Alquier, P . & Gerb er, M. (2024), ‘Univ ersal robust regression via maxim um mean discrep- ancy’, Biometrika 111 (1), 71–92. Battelino, T., Alexander, C. M., Amiel, S. A., Arreaza-R ubin, G., Beck, R. W., Bergenstal, R. M., Buckingham, B. A., Carroll, J., Ceriello, A. & Cho w, E. (2022), ‘Contin uous glucose monitoring and metrics for clinical trials: an international consensus statement’, The L anc et Diab etes & Endo crinolo gy . Battelino, T., Alexander, C. M., Amiel, S. A., Arreaza-R ubin, G., Beck, R. W., Bergenstal, R. M., Buckingham, B. A., Carroll, J., Ceriello, A. & Cho w, E. (2023), ‘Contin uous glucose monitoring and metrics for clinical trials: an international consensus statement’, 29 The lanc et Diab etes & endo crinolo gy 11 (1), 42–57. Bec k, R. W., Kanapka, L. G., Breton, M. D., Brown, S. A., W adw a, R. P ., Buckingham, B. A., K ollman, C. & Ko v atchev, B. (2023), ‘A meta-analysis of randomized trial out- comes for the t: slim x2 insulin pump with con trol-iq tec hnology in y outh and adults from age 2 to 72’, Diab etes T e chnolo gy & Ther ap eutics 25 (5), 329–342. Chacón, J. E. & Duong, T. (2018), Multivariate kernel smo othing and its applic ations , CR C Press. Chérief-Ab dellatif, B.-E. & Alquier, P . (2022), ‘Finite-sample prop erties of parametric MMD estimation: Robustness to misspecication and dep endence’, Bernoul li 28 (1), 181– 213. Ch wialk o wski, K., Sejdinovic, D. & Gretton, A. (2014), A wild bo otstrap for degenerate k ernel tests, in Z. Ghahramani, M. W elling, C. Cortes, N. Lawrence & K. W ein b erger, eds, ‘Adv ances in Neural Information Processing Systems’, V ol. 27, Curran Associates, Inc. Gao, R., Liu, F., Zhang, J., Han, B., Liu, T., Niu, G. & Sugiy ama, M. (2021), Maxim um mean discrepancy test is a ware of adv ersarial attac ks, in ‘Pro ceedings of the In ternational Conference on Mac hine Learning (ICML)’, ML Researc h Press, pp. 3564–3575. Garreau, D., Jitkrittum, W. & Kanagaw a, M. (2017), ‘Large sample analysis of the median heuristic’, arXiv pr eprint arXiv:1707.07269 . Ghosal, R., Cho, S. E. & Matabuena, M. (2026), ‘Surviv al on image regression with applica- tion to partially functional distributional represen tation of physical activity’, Statistic al A nalysis and Data Mining: A n ASA Data Scienc e Journal 19 (1), e70068. e70068 SAM- 25-505.R1. Ghosal, R., Ghosh, S. K., Schrac k, J. A. & Zipunnik o v, V. (2025), ‘Distributional outcome 30 regression via quantile functions and its application to modelling con tinuously monitored heart rate and physical activit y’, Journal of the A meric an Statistic al A sso ciation pp. 1– 20. Ghosal, R. & Matabuena, M. (2024), ‘Multiv ariate scalar on multidimensional distribution regression with application to mo deling the asso ciation b etw een ph ysical activity and cognitiv e functions’, Biometric al Journal 66 (7), e202400042. Ghosal, R., V arma, V. R., V olfson, D., Hillel, I., Urbanek, J., Hausdor, J. M., W atts, A. & Zipunniko v, V. (2023), ‘Distributional data analysis via quan tile functions and its application to mo deling digital biomarkers of gait in alzheimer’s disease’, Biostatistics 24 (3), 539–561. Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölk opf, B. & Smola, A. (2012), ‘A kernel t w o-sample test’, The journal of machine le arning r ese ar ch 13 (1), 723–773. Hughes, M. S., Addala, A. & Buc kingham, B. (2023), ‘Digital tec hnology for diab etes’, New England Journal of Me dicine 389 (22), 2076–2086. Jain, A. K. (2010), ‘Data clustering: 50 y ears b ey ond k-means’, Pattern r e c o gnition letters 31 (8), 651–666. Jia, J. & Benson, A. R. (2019), Neural jump sto chastic dieren tial equations, in ‘A dv ances in Neural Information Pro cessing Systems (NeurIPS)’, V ol. 32, Curran Associates, Inc. Katta, S., P arikh, H., R udin, C. & V olfovsky , A. (2024), In terpretable causal inference for analyzing wearable, sensor, and distributional data, in ‘International Conference on Articial Intelligence and Statistics’, PMLR, pp. 3340–3348. Kidger, P ., Morrill, J., F oster, J. & Ly ons, T. (2020), Neural controlled dierential equa- tions for irregular time series, in ‘Adv ances in Neural Information Pro cessing Systems (NeurIPS)’, V ol. 33, Curran Asso ciates, Inc., pp. 6696–6707. 31 Kitaga w a, H., Munekage, M., Seo, S. & Hanazaki, K. (2025), ‘Articial pancreas: the past and the future’, Journal of A rticial Or gans 28 (4), 514–521. LeCun, Y., Bengio, Y. & Hin ton, G. (2015), ‘Deep learning’, Natur e 521 (7553), 436–444. Lehmann, E. L. & Romano, J. P . (2005), T esting statistic al hyp otheses , Springer. Leuc h t, A. & Neumann, M. H. (2013), ‘Dependent wild b o otstrap for degenerate u-and v-statistics’, Journal of Multivariate A nalysis 117 , 257–280. Lugosi, G. & Matabuena, M. (2024), ‘Uncertain t y quan tication in metric spaces’, arXiv pr eprint arXiv:2405.05110 . Massaroli, S., Poli, M., Park, J., Y amashita, A. & Asama, H. (2020), Dissecting neural ODEs, in ‘Adv ances in Neural Information Pro cessing Systems (NeurIPS)’, V ol. 33, Cur- ran Asso ciates, Inc., pp. 3952–3963. Matabuena, M. & Crainiceanu, C. M. (2026), ‘Multilev el functional distributional mo dels with applications to con tinuous glucose monitoring in diab etes clinical trials’, The A nnals of A pplie d Statistics 20 (1), 476 – 495. URL: https://doi.or g/10.1214/26-A O AS2139 Matabuena, M., F élix, P ., Hammouri, Z. A. A., Mota, J. & del Pozo Cruz, B. (2022), ‘Ph ysical activit y phenotypes and mortalit y in older adults: a nov el distributional data analysis of accelerometry in the NHANES’, A ging Clinic al and Exp erimental R ese ar ch 34 (12), 3107–3114. Matabuena, M., Ghosal, A., Meiring, W. & P etersen, A. (2025), ‘Predicting distributions of physical activit y proles in the national health and nutrition examination survey database using a partially linear fréc het single index mo del’, Biostatistics 26 (1), kxaf013. Matabuena, M., Ghosal, R., Aguilar, J. E., Keshet, A., W agner, R., F ernández Merino, C., 32 Sánc hez Castro, J., Zipunnik ov, V., Onnela, J.-P . & Gude, F. (2025), ‘Glucodensity func- tional proles outp erform traditional con tinuous glucose monitoring metrics’, Scientic R ep orts 15 (1), 33662. URL: https://doi.or g/10.1038/s41598-025-18119-2 Matabuena, M., Karas, M., Riazati, S., Caplan, N. & Hay es, P . R. (2023), ‘Estimating knee mo v emen t patterns of recreational runners across training sessions using m ultilev el functional regression mo dels’, The A meric an Statistician 77 (2), 169–181. URL: https://doi.or g/10.1080/00031305.2022.2105950 Matabuena, M. & Petersen, A. (2023), ‘Distributional data analysis of accelerometer data from the nhanes database using nonparametric surv ey regression mo dels’, Journal of the R oyal Statistic al So ciety Series C: Applie d Statistics 72 (2), 294–313. Matabuena, M., Petersen, A., Vidal, J. C. & Gude, F. (2021), ‘Gluco densities: A new represen tation of glucose proles using distributional data analysis’, Statistic al metho ds in me dic al r ese ar ch 30 (6), 1445–1464. Matabuena, M., Straczkiewicz, M., Calcagno, N., Burke, K. M., Ro yse, T. B., Iy er, A., Carney , K. T., Hall, S., Berry , J. D. & Onnela, J.-P . (2026), ‘Exploratory analysis of smartphone-based step coun ts as a digital biomarker for surviv al in als patients’, F r on- tiers in Digital He alth 7 , 1705368. Mesk ó, B. & T op ol, E. J. (2023), ‘The imp erative for regulatory o v ersigh t of large language mo dels (or generative AI) in healthcare’, NPJ digital me dicine 6 (1), 120. Muandet, K., F ukumizu, K., Srip erum budur, B. & Schölk opf, B. (2017), ‘Kernel mean em b edding of distributions: A review and b ey ond’, F oundations and T r ends® in Machine L e arning 10 (1-2), 1–141. P apamakarios, G., Nalisnic k, E., Rezende, D. J., Mohamed, S. & Lakshminaray anan, B. 33 (2021), ‘Normalizing ows for probabilistic mo deling and inference’, Journal of Machine L e arning R ese ar ch 22 (57), 1–64. URL: https://jmlr.or g/p ap ers/v22/19-1028.html P apamakarios, G., Pa vlakou, T. & Murra y , I. (2017), Mask ed autoregressiv e o w for density estimation, in ‘A dv ances in Neural Information Pro cessing Systems (NeurIPS)’, V ol. 30, Curran Asso ciates, Inc. P ark, J., Kok, N. & Gaynano v a, I. (2025), ‘Bey ond xed thresholds: optimizing summaries of wearable device data via piecewise linearization of quantile functions’, arXiv pr eprint arXiv:2501.11777 . Qian, Z., Zame, W., Fleuren, L., Elbers, P . & v an der Sc haar, M. (2021), In tegrating exp ert ODEs into neural ODEs: pharmacology and disease progression, in ‘Adv ances in Neural Information Pro cessing Systems (NeurIPS)’, V ol. 34, pp. 11364–11383. Ric hardson, R. R. (2025), ‘Normal reference range for glucose rates of change in nondiab etic individuals using con tin uous glucose monitoring’, Diab etes T e chnolo gy & Ther ap eutics p. 15209156251390822. Rigb y , R. A. & Stasinopoulos, D. M. (2005), ‘Generalized additive mo dels for lo cation, scale and shap e’, Journal of the R oyal Statistic al So ciety Series C: A pplie d Statistics 54 (3), 507–554. R ubano v a, Y., Chen, R. T. Q. & Duv enaud, D. K. (2019), Latent ordinary dieren tial equa- tions for irregularly-sampled time series, in ‘A dv ances in Neural Information Processing Systems (NeurIPS)’, V ol. 32, Curran Asso ciates, Inc. Sc ho elw er, M. J., DeBo er, M. D. & Breton, M. D. (2024), ‘Use of diabetes technology in c hildren’, Diab etolo gia 67 (10), 2075–2084. Sejdino vic, D., Sriperumbudur, B., Gretton, A. & F ukumizu, K. (2013), ‘Equiv alence of 34 distance-based and RKHS-based statistics in hypothesis testing’, The A nnals of Statistics pp. 2263–2291. Sering, R. J. (2009), A ppr oximation the or ems of mathematic al statistics , John Wiley & Sons. Silv erman, B. W. (1986), Density estimation for statistics and data analysis , V ol. 26, CR C press. Silv erman, B. W. (2018), Density estimation for statistics and data analysis , Routledge. Srip erum budur, B. K., F ukumizu, K. & Lanckriet, G. R. (2011), ‘Univ ersality , c haracteristic k ernels and RKHS em b edding of measures. ’, Journal of Machine L e arning R ese ar ch 12 (7). Stahl-P ehe, A., Shokri-Mashhadi, N., Wirth, M., Sc hlesinger, S., Kuss, O., Holl, R. W., Bäc hle, C., W arz, K.-D., Bürger-Büsing, J., Sp örk el, O. et al. (2025), ‘Ecacy of auto- mated insulin delivery systems in people with t yp e 1 diab etes: a systematic review and net w ork meta-analysis of outpatien t randomised controlled trials’, EClinic alMe dicine 82 (103190). Szab ó, Z., Srip erumbudur, B. K., Póczos, B. & Gretton, A. (2016), ‘Learning theory for distribution regression’, Journal of Machine L e arning R ese ar ch 17 (152), 1–40. T sybako v, A. B. (2008), Intr o duction to Nonp ar ametric Estimation , 1st edn, Springer Pub- lishing Company , Incorporated. W adwa, R. P ., Reed, Z. W., Buc kingham, B. A., DeBoer, M. D., Ekhlaspour, L., F orlenza, G. P ., Schoelwer, M., Lum, J., K ollman, C., Beck, R. W. et al. (2023), ‘T rial of h ybrid closed-lo op con trol in y oung children with t yp e 1 diab etes’, New England Journal of Me dicine 388 (11), 991–1001. 35 W ang, S. (2024), Timemixer: Decomp osable multiscale mixing for time series forecasting, in ‘International Conference on Learning Represen tations (ICLR)’ . W are, J., Allen, J. M., Bough ton, C. K., Wilinska, M. E., Hartnell, S., Thankamony , A., de Beaufort, C., Campb ell, F. M., F röhlich-Reiterer, E., F ritsch, M. et al. (2024), ‘Eigh teen-mon th h ybrid closed-lo op use in v ery young c hildren with t yp e 1 diab etes: A single-arm multicen ter trial’, Diab etes Car e 47 (12), 2189–2195. Wiener, N. (1932), ‘T aub erian theorems’, A nnals of Mathematics 33 (1), 1–100. W u, H., Hu, T., Liu, Y., Zhou, H., W ang, J. & Long, M. (2023), Timesnet: T emp oral 2d-v ariation mo deling for general time series analysis, in ‘In ternational Conference on Learning Representations (ICLR)’ . 36 Supplemen tary Material A Statistical theory and pro ofs W e provide theoretical guarantees for the discrete-time MMD tting step at each observed time p oint    󰄞  , prior to the neural-ODE smo othing stage. The results b elow formalize (i) appro ximation b y a shared Gaussian dictionary and (ii) nite-sample stability of the w eigh t estimator in ( 9 ). Theorem A.1 (Univ ersalit y) . L et            b e a family of pr ob ability densities. A ssume: 1. F or every 󰄒   ther e exists    such that      d   󰄒 for al l      ; 2. lim  sup                   . Then for every 󰄤   ther e exist    , 󰄝    and c enters 󰄘        such that, for e ach      , one c an cho ose 󰄌     with sup          󰄌         󰄝  Id        󰄤 If, in addition,     is c ontinuous in   , then   󰄌 c an b e chosen c ontinuous. Since Gaussian lo cation mixtures with common v ariance form a sub class of the shared- dictionary Gaussian mixture mo del in tro duced in Section 3.2 , this result pro vides theoret- ical supp ort for the approximation capacity of the prop osed represen tation. Theorem A.2 (Finite-sample stabilit y) . Fix    󰄞 and c onsider the quadr atic pr o gr am ( 9 ) , with solution  󰄌  b ase d on the sample         . L et       b e dene d by                                 d       37 and, in addition, dene 󰄌   by 󰄌    argmin 󰅠  󰄌    󰄌  󰄌         󰄗  󰄌     A ssume 󰄗 min  min   󰄗    and sup        . Then, for any 󰄏    , with pr ob ability at le ast   󰄏 ,   󰄌   󰄌       󰄗 min   log  󰄏     Mor e over, with pr ob ability at le ast   󰄏 , max    󰄌   󰄌       󰄗 min   log    󰄏        min     Remark A.3. F or xed         , the mixture density is linear in the weigh ts. Hence   󰅧    󰅧            󰄌   󰄌          󰄌   󰄌      Moreo v er, for the comp onen t.wise distribution function dened in ( 1 ), w e hav e   󰅧     󰅧       󰄌   󰄌     for all      Pro ofs Pr o of of The or em A.1 . Fix 󰄤   . By assumption 1 , c ho ose    suc h that sup        d   󰄤 Let 󰄩 󰅱 b e the Gaussian mollier and set       󰄩 󰅱 . By the appro ximate-identit y property and assumption 2 , for 󰄝 suciently small we hav e sup               󰄤 Fix suc h a v alue of 󰄝 . Partition             in to nitely man y sets      of diameter at most  , and pic k      . Dene  󰄌           d          󰄌           d      󰄤  38 and then set 󰄌     󰄌    . W e decompose              󰄩 󰅱     d          󰄩 󰅱     d   The last term has   -norm b ounded b y      󰄤 . F or the rst term, use 󰄩 󰅱      󰄩 󰅱         󰄩 󰅱             󰄝  to obtain sup          󰄌  󰄩 󰅱         󰄤     󰄝  P assing from  󰄌 to 󰄌 c hanges the   -error b y at most       󰄤 , since 󰄩 󰅱      . Cho osing   󰄝󰄤   and com bining the b ounds yields sup          󰄌   󰄩 󰅱            󰄤 Since 󰄩 󰅱      is the density of    󰄝  Id  , this pro v es the appro ximation claim. Finally , if     is contin uous in      , then eac h map    󰄌           d  is contin uous, b ecause   󰄌     󰄌                   d                Since      󰄤   , the normalization   󰄌     󰄌    is also con tin uous. Pr o of of The or em A.2 . Let       diag 󰄗 , which satises    󰄗 min Id b ecause     and 󰄗 min   . The optimalit y conditions for minimizers ov er    giv e    󰄌      󰄌   󰄌       󰄌        󰄌  󰄌      for all 󰄌      T aking 󰄌  󰄌   in the rst inequality and 󰄌   󰄌  in the second, and then adding, yields     󰄌   󰄌     󰄌   󰄌             󰄌   󰄌    39 Using    󰄗 min Id and Cauc h y–Sc h w arz, we obtain   󰄌   󰄌      󰄗  min          F or eac h     , dene                      d   Then               , and the v ariables   are i.i.d. with mean      . Since          and         integrates to  , we ha v e       By Ho eding’s inequalit y and a union b ound ov er     , with probabilit y at least   󰄏 , max                log  󰄏     Therefore,            max                 log  󰄏     Com bining the last display with the previous b ound yields   󰄌   󰄌       󰄗 min   log  󰄏     whic h prov es the rst claim. F or the uniform-in-  b ound, apply the same argumen t together with a union bound o ver       , and use    min    . B Inference based on estimated w eigh t tra jectories W e no w turn to statistical inference and describ e how the estimated mixture-weigh t tra jec- tories can b e used to compare treatment arms in a randomized clinical trial. F or simplicity , supp ose that there are tw o study arms, indexed b y     , where    denotes the con trol group and    the treatment group. Let   b e the num b er of sub jects in the arm 40  , with the total sample size        , and assume that the sub jects are indep enden t within and b et w een groups. F or sub ject      in arm  , let  󰄌             denote the estimated weigh t tra jectory of the mixture comp onen t  . A t an y xed time  , the quantit y  󰄌    represents the estimated contribution of comp onen t  to the sub ject- sp ecic laten t distribution. Consequently , the tra jectory    󰄌    summarizes how the sub ject’s distributional prole ev olv es ov er time. Comparing these tra jectories across treatmen t arms provides a natural and in terpretable w ay to assess treatment-related dis- tributional dierences, while also allo wing for heterogeneit y within eac h arm. Let              b e a common grid of time p oints at which inference is made. F or eac h comp onen t     and time p oin t   , we consider the tw o samples      󰄌                  󰄌             corresp onding to the estimated w eigh ts of the comp onent  in the control and treatmen t groups, resp ectiv ely . T o formalize the comparison, let    denote the distribution of the random mixture co ef- cien t 󰄌     in arm  . F or each     , w e test the p oint wise n ull h yp othesis            Equiv alen tly ,    󰄌          󰄌          Th us, under   , the distribution of the weigh t of the  th mixture at time   is the same in the tw o study arms. This is a fully distributional h yp othesis, not merely a comparison of means, and is therefore sensitive to dierences in spread, skewness, or m ultimo dality in addition to lo cation shifts. In practice, inference is based on plug-in estimates  󰄌      . 41 B.0.0.1 T w o-sample MMD statistic. Fix     and       , and write     󰄌              and     󰄌              T o compare the distributions of       and       , we use the maximum mean discrep- ancy (MMD) Gretton et al. ( 2012 ) with the Gaussian radial basis function kernel 󰄕 󰅱     exp        󰄝    where the bandwidth 󰄝   is selected using the median heuristic on the po oled sample      . The empirical MMD in V-statistic form is  MMD                  󰄕 󰅱                       󰄕 󰅱                      󰄕 󰅱       W e use the scaled statistic              MMD    Large v alues of   indicate stronger evidence against   , corresp onding to a greater discrepancy b et w een the t w o arm-sp ecic distributions of 󰄌     . B.0.0.2 Wild b o otstrap calibration. The null distribution of   is generally not a v ailable in closed form, especially in nite sam- ples, and remains dicult to derive ev en asymptotically under dependence. W e therefore appro ximate it using a m ultiplier (wild) b o otstrap Chwialk owski et al. ( 2014 ), follo wing 42 Leuc h t & Neumann ( 2013 ). . Sp ecically , for eac h   , we generate bo otstrap replicates of  .           under   , where  is c hosen sucien tly large. The bo otstrap  -v alue is then          1         Hence,    is the prop ortion of b o otstrap replic as that is at least as large as the observed test statistic. Small v alues of    indicate that the observ ed discrepancy betw een treatmen t arms would b e unlik ely under the n ull. Consequently , for a nominal signicance lev el 󰄌 , w e reject   whenev er     󰄌 The app eal of the wild b o otstrap in this setting is that it p erturbs the statistic through auxiliary mean-zero multipliers while keeping the observed sample xed. This is particularly con v enien t for kernel-based statistics such as MMD and is also well suited to extensions in whic h weak dep endence m ust b e tak en into account. B.0.0.3 Relation to p ermutation calibration. When observ ations are independent and iden tically distributed among subjects at a xed time point   , p erm utation calibration is also v alid and may b e used as a simpler alternative to calibrate an exact test statistic Lehmann & Romano ( 2005 ). W e prefer the wild b o otstrap b ecause it extends more natu- rally to settings in whic h a dep endence-aw are calibration is desired. B.0.0.4 T emp oral in terpretation. Rep eating the abov e test on the time grid        yields, for eac h comp onen t  , a sequence of v alues  .            43 Plotting these v alues as a function of time produces a signicance curv e that indicates when the treatment and con trol groups dier in the distribution of the weigh t of the  th mixture comp onen t. In this wa y , the tra jectory   󰄌   serves as an in terpretable temp oral mark er of distributional dierences b etw een treatmen t arms. 44 C Sim ulation study Belo w, w e describ e the synthetic data, the comp eting methods against whic h w e b enc hmark our prop osed approach, and the simulation results. C.0.0.1 Data-generating pro cess. Fix    and    . The target density is a 3-comp onen t Gaussian mixture with time-v arying means and a common time-v arying v ariance:               󰄝   Id           (13) where 󰄝       and                    (14) If    , then eac h       has iden tical co ordinates giv en b y ( 14 ). This data-generating pro cess design captures both m ultimo dal and unimo dal regimes ov er time. W e ev aluate the mo dels on the regular grid     for       (yielding      time p oints) and generate    independent replicates. At eac h    󰄞  , we dra w  indep enden t observ ations, with sample sizes         This simulation scenario is in tentionally more general than the working mo del in Section 3.2 , b ecause both the comp onent means and the v ariance v ary with time. It is included to ev aluate ho w well the metho d appro ximates smoothly evolving distributions b ey ond the ideal shared-dictionary setting. Comp etitors W e compare the prop osed estimator with three baselines for estimating the time-indexed densit y   from snapshot samples observed on the discrete time grid: 45 (i) a univ ariate generalized additive mo del for lo cation, scale, and shap e (GAMLSS) Rigby & Stasinop oulos ( 2005 ); (ii) a time-conditional kernel densit y estimator (KDE), see Silv erman ( 1986 ), Chacón & Duong ( 2018 ), T sybako v ( 2008 ); (iii) a conditional masked autoregressiv e o w (MAF), see P apamakarios et al. ( 2017 , 2021 ). Hyp erparameters for KDE and MAF are rep orted in T able 2 . Generalized additiv e mo dels for lo cation, scale, and shape. W e t a univ ariate Gaussian distributional regression mo del with time-v arying mean and v ariance,     󰄝   where  and log 󰄝  are mo deled as smooth spline functions of  and estimated using the gamlss pac kage in R , see Rigb y & Stasinop oulos ( 2005 ). W e include this baseline only for    , as m ultiv ariate extensions w ould require additional strong mo deling assumptions on the dep endence structure. Time-conditional kernel density estimator. A t eac h observ ed time    󰄞  , w e estimate    using a KDE:  KDE                                 using a Gaussian k ernel  and Scott’s bandwidth rule for   Silv erman ( 1986 ), Chacón & Duong ( 2018 ). F or in termediate times         , w e linearly interpolate the endp oint densities:  KDE      󰄗 KDE     󰄗 KDE    󰄗            As a fully nonparametric metho d, KDE is inheren tly aected by the curse of dimensionalit y , see T sybako v ( 2008 ). 46 Mask ed autoregressiv e ow. W e mo del   with a conditional normalizing ow  󰅧    built from an inv ertible map    󰅧   with base noise    Id  . The conditional densit y is giv en by the c hange-of-v ariables form ula:  󰅧       󰅧       Id   det    󰅧      In MAF ( P apamakarios et al. ( 2017 )), the in v erse map is autoregressive (implemented via mask ed net w orks), rendering the Jacobian triangular and the log-likelihoo d tractable. P arameters 󰄓 are learned via conditional maximum lik eliho o d o v er the full sample          , see P apamakarios et al. ( 2021 ). Results W e rep ort results for a low-dimensional (    ) and a higher-dimensional setting (    ). Lo w dimension (    ). Figure 6 shows that our mo del is comp etitiv e with the baselines. While MAF can achiev e slightly smaller   errors in certain sample-size regimes, our approac h remains accurate while simultaneously pro viding directly interpretable weigh t tra jectories 󰄌 . High dimension (    ). Figure 7 sho ws that our mo del p erforms b etter o verall. The KDE p erforms the worst, whic h is consistent with its w ell-known sensitivit y to high dimensionalit y T sybako v ( 2008 ). 47 Figure 6: P oint wise   -error o ver time for    . W e compare our mo del , KDE , MAF , and GAMLSS . Curv es represent a verages o ver  independent runs. Errors corresponding to the discrete-time MMD stage are sho wn as ■ .   -errors are approximated by Mon te Carlo integration. D Case study for univ ariate mo del with    . T o examine how the analysis changes with fewer mixture components, and thus with re- duced mo del expressiveness, we applied the metho d to univ ariate probabilit y distributions (    ) with a smaller n um b er of comp onents, namely    . Overall, the conclusions dier somewhat: (i) statistical p o w er decreases and the dierences become more b orderline; (ii) the resp onder analysis b ecomes more heterogeneous, with fewer clear dierences, lik ely b ecause more components are needed to adequately capture the complexit y of glucose dy- namics o v er time; and (iii) while dierences b etw een the densities across treatment arms remain, they app ear less substan tial than those reported in the main pap er. These ndings indicate that increasing the num b er of comp onents in the mixture is imp ortan t to improv e 48 Figure 7: Poin twise   -error ov er time for    . W e compare our model , KDE and MAF . Curves represen t av erages o ver  indep endent runs. Errors corresp onding to the discrete-time MMD stage are sho wn as ■ . the expressiv e capacit y of the mo dels and to ensure their practical relev ance in this type of digital health application. T aken together, these results supp ort the use of the ric her biv ariate    represen tation in the main text. 49 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (a) Comp onent 1 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (b) Comp onen t 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Time 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 (c) Comp onent 3 Figure 8: Comparison of w eight tra jectory dynamics b etw een T reatment ( green ) and Con- trol ( red ) groups for the univ ariate mo del (    ) with    mixture comp onents. Each panel shows the evolution of component weigh ts 󰄌   betw een weeks 20–26 o ver normal- ized time     . Group means are shown as thick dashed lines. The shaded bands represen t a statistical env elop e around the mean, where dark er shading indicates a higher densit y of tra jectories. 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (a) Comp onent 1 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (b) Comp onen t 2 0.0 0.2 0.4 0.6 0.8 1.0 Time 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 (c) Comp onent 3 Figure 9: Quan tile curv es (median and 25%–75% bands) of the c hange in GMM mixture w eigh ts for each of the    comp onents o v er time, relative to their initial v alue, for T reatment ( green ) and Control ( red ) groups in the univ ariate model (    ). Each panel sho ws the temp oral ev olution of a comp onen t’s weigh t deviation from baseline. E Implemen tation details W e rep ort the main hyperparameters used in our exp eriments. Unless stated otherwise, w e reuse the same settings in sim ulations and in the diab etes case study . 50 0 100 200 300 400 Glucose concentration (mg/dL) 0.000 0.002 0.004 0.006 0.008 (a) Initial (W eek 20): T reatment vs. Con trol 0 100 200 300 400 Glucose concentration (mg/dL) 0.000 0.002 0.004 0.006 0.008 (b) Final (W eek 26): T reatmen t vs. Con trol 0 100 200 300 400 Glucose concentration (mg/dL) 0.00075 0.00050 0.00025 0.00000 0.00025 0.00050 0.00075 Density difference (final - initial) -8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0 × 1 0 4 (c) Dierence: Control 0 100 200 300 400 Glucose concentration (mg/dL) 0.00075 0.00050 0.00025 0.00000 0.00025 0.00050 0.00075 Density difference (final - initial) -8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0 × 1 0 4 (d) Dierence: T reatment Figure 10: Predicted glucose density distributions for the univ ariate mo del (    ) with    mixture comp onents. The curv es represent the marginal densit y ov er glucose con- cen tration, computed as the F réchet mean (1D W asserstein barycen ter) across individuals within each group. The top row compares the T reatmen t ( green ) and Control ( red ) groups at the initial observ ation (w eek 20, left) and nal observ ation (w eek 26, righ t). The b ottom ro w displays the dierence b et w een the nal and initial distributions for the Control group (left) and the T reatment group (righ t). 51 19 20 21 22 23 24 25 26 W e e k s 0 . 0 0 1 0 . 0 1 0 . 0 5 0 . 1 1 p − v a l u e Figure 11: Wild Bo otstrap MMD test  -v alues comparing T reatment vs. Control groups o v er time for the univ ariate mo del (    ) with    mixture comp onents. The dashed blac k line indicates the signicance threshold 󰄌   . The colors corresp ond to the dieren t comp onents: blue (component 1), orange (comp onent 2), and green (component 3). T able 1: Calibrated global statistics for univ ariate and biv ariate data. Univ ariate (    ) P aram.          󰄘      󰄝      Biv ariate (    ) P aram.                󰄘                                                                                                                  52 T able 2: Hyperparameters for the prop osed estimator and baselines. Prop osed T wo-Stage Estimator Step 1: p er-time MMD mixtur e t Step 2: time-series neur al ODE t Mixture comp onents (  )  V ector eld MLP 2 lay ers, width 64 Kernel bandwidth ( 󰅱 ) median heuristic Integration horizon (  ) 1.0 Ridge regularization ( 󰅫 )   Solv er step size 0.01 A dam learning rate   A dam learning rate   Iterations / inner grad. steps  /  Max ep o chs 2000 Ridge regularization ( 󰅭 )   Normalizing Flow Baseline (MAF) Con text em b edding MLP 128 Ep o c hs 50 Flo w transform hidden 64 Batc h size 64 Blo c ks (Coupling/AR) 6 Optimizer A dam Stac ked blo c ks 2 Learning rate   Time-Conditional KDE Baseline Con text em b edding MLP 128 Bandwidth selection scott 53

Continuous-Time Learning of Probability Distributions: A Case Study in a Digital Trial of Young Children with Type 1 Diabetes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment