E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice
Adaptive clinical trials rely on interim analyses, flexible stopping, and data-dependent design modifications that complicate statistical guarantees when fixed-horizon test statistics are repeatedly inspected or reused after adaptations. E-values and e-processes provide anytime-valid tests and confidence sequences that remain valid under optional stopping and optional continuation without requiring a prespecified monitoring schedule. This paper is a methodology guide for practitioners. We develop the betting-martingale construction of e-processes for two-arm randomized controlled trials, show how e-values naturally handle composite null hypotheses and support futility monitoring, and provide guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows. A numerical study compares five monitoring rules – naive and calibrated versions of frequentist, Bayesian, and e-value approaches – in a two-arm binary-endpoint trial. Naive repeated testing and naive posterior thresholds inflate Type I error substantially under frequent interim looks. Among the valid methods, the calibrated group sequential rule achieves the highest power, the e-value rule provides robust anytime-valid control with moderate power, and the calibrated Bayesian rule is the most conservative. Extended simulations show that the power gap between group sequential and e-value methods depends on the monitoring schedule and reverses under continuous monitoring. The methodology, including futility monitoring, platform trial multiplicity control, and hybrid strategies combining e-values with established methods, is implemented in the open-source R package evalinger and situated within the regulatory framework of the January 2026 FDA draft guidance on Bayesian methodology.
💡 Research Summary
This paper presents a practitioner‑focused methodology for using e‑values and e‑processes to achieve anytime‑valid monitoring in adaptive clinical trials. The authors begin by highlighting the tension between the frequent interim looks, protocol amendments, and sample‑size re‑estimations that are now commonplace in modern trials, and the traditional fixed‑sample statistical guarantees that break down under repeated inspection. They then introduce the theoretical foundation of e‑values—non‑negative statistics with expectation at most one under the null—and e‑processes, which are wealth trajectories of betting strategies that remain valid under optional stopping and optional continuation. By invoking Ville’s inequality and the betting‑martingale framework, they show how a carefully chosen betting fraction (the GR‑optimal fraction) can maximize power while preserving a uniform Type I error bound of α at any time point.
The methodological core is a construction of e‑processes for two‑arm randomized trials with binary outcomes. The betting strategy is made predictable (i.e., measurable with respect to past data), allowing the e‑process to be used as an evidential ledger regardless of how many interim analyses are performed or how the design is adapted (e.g., sample‑size changes, enrichment, arm dropping). Composite null hypotheses—such as “treatment effect ≤ 0” or “effect size is smaller than a clinically meaningful margin”—are handled naturally by incorporating the null into the betting odds.
Futility monitoring is addressed through two complementary tools. First, a confidence sequence derived from the e‑process provides an anytime‑valid interval for the treatment effect; stopping for futility can be declared when the upper bound falls below the minimum clinically important difference. Second, a “reciprocal” e‑process tests the opposite hypothesis (effect ≥ margin); a large value of this reciprocal process signals strong evidence against a meaningful benefit. Both approaches are implemented in the accompanying R package evalinger.
A comprehensive simulation study compares five monitoring rules in a two‑arm binary‑endpoint trial with 20 equally spaced looks: (1) naive repeated p‑values, (2) an O’Brien–Fleming‑type group‑sequential boundary, (3) the betting‑martingale e‑process, (4) a naive Bayesian posterior‑probability threshold, and (5) a simulation‑calibrated Bayesian threshold. Naive methods inflate Type I error to 13–15 %. Among the valid methods, the calibrated group‑sequential design achieves the highest power (86.1 %), the e‑process attains moderate power (72.3 %), and the calibrated Bayesian rule is the most conservative (68.8 %). Additional simulations reveal that the power gap between group‑sequential and e‑process methods narrows or even reverses under continuous (real‑time) monitoring, underscoring the advantage of anytime‑valid procedures when the monitoring schedule cannot be fixed in advance.
Regulatory considerations are discussed in the context of the January 2026 FDA draft guidance on Bayesian methodology. The authors argue that e‑values are especially suitable for platform trials, pandemic‑driven rapid‑response studies, and any setting where interim looks are irregular or unplanned. Conversely, for confirmatory trials with pre‑specified information fractions and standard endpoints, traditional group‑sequential designs remain the preferred option due to their higher power and established regulatory familiarity.
The paper culminates in the release of the open‑source evalinger R package and an interactive web dashboard. These tools provide real‑time computation of e‑processes, visualization of confidence sequences, futility assessment utilities, multiplicity adjustments for platform trials, and side‑by‑side comparison with group‑sequential and Bayesian monitoring. By integrating these resources with the regulatory framework, the authors offer a complete, end‑to‑end solution for trialists seeking robust, flexible, and auditable interim monitoring.
In summary, the work demonstrates that e‑values and e‑processes constitute a powerful, flexible alternative to classical group‑sequential and Bayesian adaptive methods, delivering anytime‑valid error control without sacrificing interpretability. The detailed theoretical exposition, extensive simulation evidence, practical guidance, and publicly available software together make this paper a valuable reference for statisticians and clinicians designing and operating modern adaptive clinical trials.
Comments & Academic Discussion
Loading comments...
Leave a Comment