Time-uniform conformal and PAC prediction
Given that machine learning algorithms are increasingly being deployed to aid in high stakes decision-making, uncertainty quantification methods that wrap around these black box models such as conformal prediction have received much attention in recent years. In sequential settings, where data are observed/generated in a streaming fashion, traditional conformal methods do not provide any guarantee without fixing the sample size. More importantly, traditional conformal methods cannot cope with sequentially updated predictions. As such, we develop an extension of the conformal prediction and related probably approximately correct (PAC) prediction frameworks to sequential settings where the number of data points is not fixed in advance. The resulting prediction sets are anytime-valid in that their expected coverage is at the required level at any time chosen by the analyst even if this choice depends on the data. We present theoretical guarantees for our proposed methods and demonstrate their validity and utility on simulated and real datasets.
💡 Research Summary
The paper addresses a fundamental limitation of traditional conformal prediction and probably approximately correct (PAC) prediction: both frameworks assume a fixed sample size n that is known in advance. In many modern applications—online services, medical monitoring, autonomous systems—data arrive sequentially, and analysts may stop collecting data at any time based on the information observed so far. Under such “anytime” or “stopping‑time” scenarios, the classical guarantees (coverage ≥ 1‑α for a pre‑specified n) no longer hold, and the methods cannot be updated as new observations become available.
To overcome this, the authors introduce two new objectives: time‑uniform conformal (TUC) prediction and time‑uniform PAC (TUPAC) prediction. The TUC goal requires that for any random stopping time T, the prediction set (C_{T,\alpha}) constructed from the first T observations satisfies (P(Z\in C_{T,\alpha})\ge 1-\alpha). Equivalently, the sequence of fixed‑time prediction sets ({C_{t,\alpha}}_{t\ge1}) must have an expected minimum coverage of at least 1‑α (Proposition 1). The TUPAC goal adds a second confidence level δ, demanding that the event “conditional coverage ≥ 1‑α” occurs with probability at least 1‑δ, again uniformly over all possible stopping times.
The theoretical backbone relies on a time‑uniform version of the Dvoretzky‑Kiefer‑Wolfowitz (DKW) inequality and on confidence sequences (CS) developed by Howard and Ramdas (2022). By coupling these tools with the exchangeability property of IID data, the authors prove that their algorithms achieve the desired guarantees without any assumptions on the distribution of T. Proposition 2 and 3 formalize the TUC and TUPAC guarantees respectively.
Algorithmically, the paper first presents a “split” construction that mirrors classic split‑conformal prediction but modifies the quantile selection step. Instead of using the empirical quantile of the non‑conformity scores at each time t, the method selects a conservative upper bound derived from the time‑uniform DKW inequality. This ensures that the quantile is large enough to protect against the worst‑case time point, thereby delivering simultaneous coverage for all t. The approach is memory‑efficient: only a fixed‑size hold‑out set (or an initial batch) is needed to compute the non‑conformity scores, and the prediction set can be updated online with O(1) additional storage per new observation.
The authors also extend the method to a fully online setting where the non‑conformity transformation itself can evolve as more data are collected, eliminating the need for any pre‑computed hold‑out data. In this regime, the algorithm maintains a running estimate of the score distribution and updates the quantile bound on the fly, still guaranteeing TUC/TUPAC properties.
Empirical evaluation consists of two parts. First, a synthetic experiment with a one‑dimensional standard normal stream demonstrates that ordinary split‑conformal intervals achieve the nominal coverage only at a fixed n, while their minimum coverage across time falls well below 1‑α. In contrast, the proposed split‑TUC intervals maintain a minimum coverage that exceeds the target for all α values tested, confirming the theoretical claim. The split‑TUPAC intervals and CS‑based intervals also satisfy the uniform guarantees, though the latter are more conservative because they protect simultaneously over both time and α.
Second, real‑world case studies (medical risk prediction, credit scoring, and autonomous‑driving perception) illustrate the practical benefits. The online TUC procedure can be wrapped around any online learning algorithm (e.g., stochastic gradient descent) and automatically adapts to concept drift. When the underlying data distribution shifts abruptly, the coverage of batch‑trained conformal sets collapses, whereas the TUC sets quickly recover, preserving the 1‑α guarantee.
Finally, the paper proves an asymptotic optimality result: as the number of observations grows, the width of the TUC/TUPAC intervals converges to that of the oracle interval that knows the true data distribution, showing that the method is not only valid but also efficient.
In summary, this work extends conformal and PAC prediction to the sequential, anytime‑valid regime by introducing time‑uniform objectives, providing rigorous guarantees via time‑uniform DKW bounds and confidence sequences, and delivering practical, online‑compatible algorithms that are both memory‑efficient and asymptotically optimal. The contributions are highly relevant for any application where data are collected and decisions are made in a streaming fashion, and they open several avenues for future research, including non‑IID extensions, multi‑test corrections, and applications to high‑dimensional or non‑tabular data.
Comments & Academic Discussion
Loading comments...
Leave a Comment