CAOS: Conformal Aggregation of One-Shot Predictors

CAOS: Conformal Aggregation of One-Shot Predictors
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

One-shot prediction enables rapid adaptation of pretrained foundation models to new tasks using only one labeled example, but lacks principled uncertainty quantification. While conformal prediction provides finite-sample coverage guarantees, standard split conformal methods are inefficient in the one-shot setting due to data splitting and reliance on a single predictor. We propose Conformal Aggregation of One-Shot Predictors (CAOS), a conformal framework that adaptively aggregates multiple one-shot predictors and uses a leave-one-out calibration scheme to fully exploit scarce labeled data. Despite violating classical exchangeability assumptions, we prove that CAOS achieves valid marginal coverage using a monotonicity-based argument. Experiments on one-shot facial landmarking and RAFT text classification tasks show that CAOS produces substantially smaller prediction sets than split conformal baselines while maintaining reliable coverage.


💡 Research Summary

The paper tackles uncertainty quantification for one‑shot prediction, where each labeled example induces its own predictor (π_i) from a large pretrained foundation model. Standard split‑conformal methods are statistically inefficient in this regime because they require a disjoint calibration set, and naïvely aggregating or selecting among many one‑shot predictors breaks the exchangeability assumption that underlies conformal guarantees.

CAOS (Conformal Aggregation of One‑Shot Predictors) resolves these issues with two key mechanisms. First, it aggregates non‑conformity scores across all reference examples but only sums the k smallest scores for each candidate label y, a “k‑min aggregation” that emphasizes the most informative references while discarding noisy ones. Second, it employs a leave‑one‑out (LOO) calibration scheme: for each labeled pair (X_i, Y_i) the aggregation is computed using the remaining n − 1 references, producing n calibration scores S_i^{CAOS}. The empirical (1‑α)(1 + 1/n) quantile of these scores defines a threshold \hat q^{CAOS}. A new test input X_{n+1} receives a prediction set C^{CAOS}(X_{n+1}) consisting of all labels whose aggregated score does not exceed \hat q^{CAOS}.

Theoretically, the authors prove that despite the lack of exchangeability, CAOS attains finite‑sample marginal coverage. The proof hinges on two observations: (i) the aggregated score is monotone with respect to the reference set, and (ii) CAOS prediction sets contain those of a full‑conformal variant that does satisfy exchangeability. By reduction, CAOS inherits the exact coverage guarantee of full conformal prediction while remaining computationally cheap (only k minima need to be summed).

Empirically, the method is evaluated on two one‑shot tasks. In facial landmark transfer, a vision model’s patch embeddings are used; the non‑conformity score is 1 − cosine similarity between the test patch and the reference landmark patch. In one‑shot text classification, a large language model is prompted with a single demonstration, and the average negative log‑likelihood of each candidate label serves as the score. Across both domains, with n≈10–20 labeled examples and k=3, CAOS yields prediction sets that are 30–45 % smaller than those from split‑conformal baselines while maintaining the target 95 % coverage (actual coverage 93–96 %). The advantage is especially pronounced when data are scarce, as split‑conformal becomes overly conservative.

In summary, CAOS provides a principled, data‑efficient conformal framework for aggregating multiple one‑shot predictors, preserving rigorous finite‑sample guarantees without sacrificing predictive sharpness. The work opens avenues for adaptive k selection, extensions to non‑categorical outputs, and online one‑shot scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment