Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method
Large language models (LLMs) exhibit strong generative capabilities but remain vulnerable to confabulations, fluent yet unreliable outputs that vary arbitrarily even under identical prompts. Leveraging a quantum tensor network based pipeline, we propose a quantum physics inspired uncertainty quantification framework that accounts for aleatoric uncertainty in token sequence probability for semantic equivalence based clustering of LLM generations. This offers a principled and interpretable scheme for hallucination detection. We further introduce an entropy maximization strategy that prioritizes high certainty, semantically coherent outputs and highlights entropy regions where LLM decisions are likely to be unreliable, offering practical guidelines for when human oversight is warranted. We evaluate the robustness of our scheme under different generation lengths and quantization levels, dimensions overlooked in prior studies, demonstrating that our approach remains reliable even in resource constrained deployments. A total of 116 experiments on TriviaQA, NQ, SVAMP, and SQuAD across multiple architectures including Mistral-7B, Mistral-7B-instruct, Falcon-rw-1b, LLaMA-3.2-1b, LLaMA-2-13b-chat, LLaMA-2-7b-chat, LLaMA-2-13b, and LLaMA-2-7b show consistent improvements in AUROC and AURAC over state of the art baselines.
💡 Research Summary
The paper tackles the persistent problem of hallucinations in large language models (LLMs)—outputs that are fluent yet factually incorrect or fabricated, often varying arbitrarily even when the same prompt is repeated. Existing detection methods rely heavily on supervised classifiers, complex semantic‑syntactic analyses, or Bayesian deep learning, all of which either incur high latency, struggle to capture semantic instability, or fail to reflect the aleatoric uncertainty inherent in token‑level probability distributions.
The authors propose a physics‑inspired uncertainty quantification (UQ) framework that treats the token‑sequence probability distribution P(s|y) as a quantum wavefunction. By embedding this distribution into a reproducing kernel Hilbert space (RKHS) using a Gaussian kernel, they obtain a kernel mean embedding (KME) ψ_y(x). This KME is then interpreted as an eigenmode of a quantum tensor network (QTN) Hamiltonian H. Applying first‑order perturbation theory to H yields corrections to eigenvalues and eigenfunctions; the magnitude of these corrections quantifies local sensitivity of the token probabilities to infinitesimal perturbations. Concretely, the authors compute mode‑wise “spectrograms” V^(1)_m(x) based on the Laplacian of the perturbed eigenfunctions and aggregate them into an uncertainty score UQ(p_s) (Equation 6). Large UQ indicates unstable, high‑variance probability regions, while small UQ signals locally stable predictions.
Having obtained a per‑token uncertainty estimate, the method proceeds to adjust the original token probabilities via a maximum‑entropy objective. The adjusted probabilities p*_s are derived by maximizing Rényi entropy (−log p²) while penalizing the Kullback‑Leibler divergence between p*_s and the original p_s, weighted inversely by the uncertainty UQ(p_s) (Equation 7). The hyperparameter λ balances entropy expansion against fidelity to the model’s raw logits. In high‑uncertainty zones the KL term is down‑weighted, allowing the distribution to become more uniform (higher entropy); in low‑uncertainty zones the KL term dominates, preserving the model’s confidence.
The calibrated probabilities are then fed into a semantic clustering stage. Output sequences generated from repeated prompts are grouped using bidirectional entailment scores from a DeBERTa model. For each cluster c_j, the authors compute a cluster probability p_cj as the sum of adjusted token probabilities over all sequences in that cluster. The semantic Rényi entropy SE_R(y) = −log ∑_j p_cj² serves as a global measure of semantic diversity; higher SE_R correlates with a higher likelihood of hallucination.
Empirically, the framework is evaluated on four benchmark QA/reading‑comprehension datasets (TriviaQA, Natural Questions, SVAMP, SQuAD) across eight LLM architectures, including Mistral‑7B, Falcon‑rw‑1b, and various LLaMA variants. A total of 116 experimental configurations are reported. The proposed method consistently outperforms state‑of‑the‑art baselines such as Semantic Entropy, Kernelized Likelihood Entropy, Semantic Nearest‑Neighbour Entropy, and Semantic Density, achieving average AUROC and AURAC improvements of 4–7 percentage points. Importantly, the authors test robustness to model quantization (4‑bit, 8‑bit) and to generation length (from short phrases to full sentences). Their approach maintains performance under these constraints, while computational overhead is reduced to a single perturbation pass plus an entropy‑maximization step—approximately 30 % of the cost of sampling‑based uncertainty methods.
Key contributions and strengths include: (1) a physically interpretable uncertainty metric derived from quantum perturbation theory, offering a clear link between wavefunction instability and model confidence; (2) a deterministic, one‑shot UQ procedure that avoids costly Monte‑Carlo sampling; (3) integration of uncertainty into a principled entropy‑maximization scheme that adaptively smooths probabilities based on local confidence; (4) demonstrated robustness to quantization and generation length, making the method suitable for edge or resource‑constrained deployments.
Limitations are acknowledged: the construction of the QTN Hamiltonian is heuristic and may not capture all higher‑order token dependencies; the method currently relies on a single Gaussian kernel and first‑order perturbations, which could be extended to richer kernels or higher‑order corrections. Future work could explore multi‑mode perturbations, non‑Gaussian kernels, and coupling with external knowledge graphs to further enhance hallucination detection, especially for domain‑specific or multi‑modal LLMs.
Overall, the paper introduces a novel, theoretically grounded, and practically efficient framework for quantifying and mitigating hallucinations in LLMs, bridging quantum‑inspired uncertainty modeling with semantic entropy‑based detection.
Comments & Academic Discussion
Loading comments...
Leave a Comment