Quantifying Epistemic Predictive Uncertainty in Conformal Prediction
We study the problem of quantifying epistemic predictive uncertainty (EPU) – that is, uncertainty faced at prediction time due to the existence of multiple plausible predictive models – within the framework of conformal prediction (CP). To expose the implicit model multiplicity underlying CP, we build on recent results showing that, under a mild assumption, any full CP procedure induces a set of closed and convex predictive distributions, commonly referred to as a credal set. Importantly, the conformal prediction region (CPR) coincides exactly with the set of labels to which all distributions in the induced credal set assign probability at least $1-α$. As our first contribution, we prove that this characterisation also holds in split CP. Building on this connection, we then propose a computationally efficient and analytically tractable uncertainty measure, based on \emph{Maximum Mean Imprecision}, to quantify the EPU by measuring the degree of conflicting information within the induced credal set. Experiments on active learning and selective classification demonstrate that the quantified EPU provides substantially more informative and fine-grained uncertainty assessments than reliance on CPR size alone. More broadly, this work highlights the potential of CP serving as a principled basis for decision-making under epistemic uncertainty.
💡 Research Summary
The paper tackles the largely overlooked problem of quantifying epistemic predictive uncertainty (EPU) within the conformal prediction (CP) framework. While CP is celebrated for its finite‑sample coverage guarantees and has been widely adopted, practitioners have traditionally relied on the size of the conformal prediction region (CPR) as a proxy for uncertainty. This metric, however, only reflects uncertainty about the predicted set itself and fails to capture the uncertainty arising from the existence of multiple plausible predictive models—i.e., epistemic uncertainty.
The authors build on recent work that showed a full (transductive) CP procedure implicitly defines a closed and convex set of predictive distributions, known as a credal set. They first extend this result to the more commonly used split (inductive) CP, proving that under a mild “consonance” assumption (the existence of at least one label with conformal p‑value equal to one) the split CP also induces a credal set. Consonance can be enforced by a simple re‑scaling of the largest p‑value, which does not affect coverage guarantees and even yields a stronger Type‑II validity property.
Given a consonant transducer πₓ(·), they define an upper probability (plausibility measure) Pₓ(A)=sup_{y∈A}πₓ(y). The core of this upper probability, M(Pₓ), is exactly the implicit predictive credal set of the CP procedure, and the CPR C_α(x)= {y | πₓ(y)>α} coincides with the imprecise highest‑density region of M(Pₓ). Thus, the CPR can be interpreted as the set of outcomes that all distributions in the credal set assign at least (1‑α) probability.
To quantify the epistemic uncertainty encoded in this credal set, the paper adapts the Maximum Mean Imprecision (MMI) functional, originally proposed for general imprecise‑probability models. MMI measures the largest discrepancy between the optimistic (upper) and pessimistic (lower) expectations over a prescribed class of test functions, using Choquet integration. For classification, choosing the class of indicator functions reduces MMI to the total variation distance between the upper and lower probabilities, i.e., the maximal absolute difference |P(A)−P̲(A)| over all events A.
The resulting metric, dubbed MMI‑CP, can be computed in linear time with respect to the number of classes because it only requires the conformal p‑values for each label. Importantly, MMI‑CP captures the “conflict” among the plausible predictive models that underlie the CPR, offering a finer‑grained uncertainty signal than CPR size alone.
Empirical evaluation is performed on two downstream tasks. In active learning, selecting instances with high MMI‑CP leads to faster reduction of test error compared to selecting by CPR cardinality, demonstrating that the metric effectively identifies samples whose labels are most informative under model uncertainty. In selective classification (learning to reject), a rejection rule based on MMI‑CP achieves lower error at comparable coverage, and substantially outperforms a baseline that rejects solely on large CPRs. The authors also illustrate a synthetic example (Figure 1) where two instances share identical CPRs but have markedly different conformal p‑value distributions; MMI‑CP correctly reflects the higher epistemic uncertainty of the more “skewed” instance.
Overall, the paper provides a solid theoretical bridge between conformal prediction and imprecise probability theory, extends the credal‑set interpretation to split CP, and introduces a computationally efficient, axiomatic‑consistent measure of epistemic uncertainty. By doing so, it positions CP as a principled foundation for decision‑making under model uncertainty, opening avenues for its use in safety‑critical domains, risk‑aware AI, and any application where distinguishing epistemic from aleatoric uncertainty is crucial. Future work may explore extensions to regression, alternative transducers, and integration with Bayesian model averaging for hybrid uncertainty quantification.
Comments & Academic Discussion
Loading comments...
Leave a Comment