Benchmarking Uncertainty Quantification of Plug-and-Play Diffusion Priors for Inverse Problems Solving
Plug-and-play diffusion priors (PnPDP) have become a powerful paradigm for solving inverse problems in scientific and engineering domains. Yet, current evaluations of reconstruction quality emphasize point-estimate accuracy metrics on a single sample, which do not reflect the stochastic nature of PnPDP solvers and the intrinsic uncertainty of inverse problems, critical for scientific tasks. This creates a fundamental mismatch: in inverse problems, the desired output is typically a posterior distribution and most PnPDP solvers induce a distribution over reconstructions, but existing benchmarks only evaluate a single reconstruction, ignoring distributional characterization such as uncertainty. To address this gap, we conduct a systematic study to benchmark the uncertainty quantification (UQ) of existing diffusion inverse solvers. Specifically, we design a rigorous toy model simulation to evaluate the uncertainty behavior of various PnPDP solvers, and propose a UQ-driven categorization. Through extensive experiments on toy simulations and diverse real-world scientific inverse problems, we observe uncertainty behaviors consistent with our taxonomy and theoretical justification, providing new insights for evaluating and understanding the uncertainty for PnPDPs.
💡 Research Summary
This paper addresses a critical gap in the evaluation of Plug‑and‑Play Diffusion Priors (PnPDP) for solving inverse problems. While recent PnPDP methods have demonstrated impressive reconstruction quality, existing benchmarks focus almost exclusively on point‑estimate metrics such as PSNR or SSIM obtained from a single sample. This approach ignores two fundamental aspects of inverse problems: (1) the true objective is to recover the posterior distribution p(x|y), which often contains multiple plausible solutions, and (2) PnPDP solvers are inherently stochastic, producing a distribution of reconstructions rather than a deterministic output. Consequently, a high PSNR does not guarantee that a method faithfully captures posterior uncertainty, leading to what the authors call an “Accuracy Trap.”
To remedy this, the authors propose a systematic study of uncertainty quantification (UQ) for PnPDP solvers. Their contributions are threefold. First, they introduce a UQ‑driven taxonomy that groups existing solvers into three families based on their ability to approximate the Bayesian posterior: (i) posterior‑targeting solvers (e.g., MCG‑Diff, FPS‑SMC, PnP‑DM) that have theoretical convergence guarantees; (ii) heuristic solvers (e.g., DPS, DAPS) that inject stochasticity through algorithmic heuristics but lack provable posterior consistency; and (iii) MAP‑like solvers (e.g., DDRM, DiffPIR) that collapse to one or a few modes and provide little uncertainty information. This categorization complements the algorithm‑structure taxonomy of InverseBenc h and offers a user‑centric view of reliability.
Second, the paper defines a practical UQ metric: the pixel‑wise empirical variance V ar(j)=1/(K‑1)∑_{k=1}^K‖x^{(k)}(j)−\bar{x}(j)‖² computed from K independent runs of a solver. By running each method multiple times with different random seeds, the variance map approximates the solver‑induced posterior variance, which reflects both aleatoric uncertainty (measurement noise) and epistemic uncertainty (information loss due to ill‑posed operators). The authors emphasize that larger variance is not inherently better; instead, the variance pattern should align with the true posterior’s structure.
Third, they validate the metric through a rigorously designed toy simulation where the ground‑truth posterior is analytically known. Two regimes are examined: (a) full observation (A=I) to test aleatoric calibration, and (b) a decoupled forward operator that induces strong epistemic uncertainty. Results show that posterior‑targeting methods recover the true variance pattern and achieve high correlation with the analytical posterior, heuristic methods display mixed over‑/under‑estimation, and MAP‑like methods produce near‑zero variance, effectively hiding uncertainty.
Beyond synthetic experiments, the authors evaluate a diverse set of real scientific inverse problems—including sparse‑view CT, accelerated MRI, astronomical imaging, and oceanographic wavefield inversion. For each task they report both accuracy scores and variance maps. Consistently, posterior‑targeting solvers provide calibrated uncertainty that highlights regions of high measurement ambiguity, while heuristic solvers either exaggerate uncertainty in structured areas or miss subtle ambiguities. MAP‑like solvers, despite achieving competitive PSNR, generate overly confident reconstructions that could mislead downstream risk‑sensitive decisions.
The paper argues that reliable UQ is not an optional add‑on but a prerequisite for deploying PnPDP methods in safety‑critical domains such as medical diagnosis or geophysical exploration. It also points out that increasing computational budget (more particles, longer MCMC chains) improves posterior fidelity for posterior‑targeting methods, whereas heuristic and MAP‑like methods show limited gains.
Finally, the authors release code, pretrained diffusion priors, and benchmark datasets to promote reproducibility. Their UQ‑driven taxonomy and variance‑based evaluation framework lay the groundwork for standardized assessment of future diffusion‑based inverse solvers, encouraging the community to prioritize calibrated posterior sampling alongside traditional reconstruction accuracy.
Comments & Academic Discussion
Loading comments...
Leave a Comment