Dealing with Uncertainty in Contextual Anomaly Detection
Contextual anomaly detection (CAD) aims to identify anomalies in a target (behavioral) variable conditioned on a set of contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In many anomaly detection tasks, there exist contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In this work, we propose a novel framework for CAD, normalcy score (NS), that explicitly models both the aleatoric and epistemic uncertainties. Built on heteroscedastic Gaussian process regression, our method regards the Z-score as a random variable, providing confidence intervals that reflect the reliability of the anomaly assessment. Through experiments on benchmark datasets and a real-world application in cardiology, we demonstrate that NS outperforms state-of-the-art CAD methods in both detection accuracy and interpretability. Moreover, confidence intervals enable an adaptive, uncertainty-driven decision-making process, which may be very important in domains such as healthcare.
💡 Research Summary
The paper addresses the problem of Contextual Anomaly Detection (CAD), where the goal is to identify anomalous observations of a behavioral variable y conditioned on a set of contextual variables x. Traditional CAD methods typically rely on a simple Z‑score computed from a conditional mean predictor and a fixed variance estimate, flagging points whose absolute Z‑score exceeds a threshold. This approach suffers from two major shortcomings: (i) it ignores heteroscedasticity, i.e., the fact that the variability of y often depends on x, leading to over‑ or under‑estimation of anomalies in regions with low or high variance; and (ii) it does not account for epistemic uncertainty, which arises when the training data sparsely cover certain regions of the context space, making the model’s predictions unreliable there.
To overcome these limitations, the authors propose a novel framework called Normalcy Score (NS). The core idea is to model the conditional distribution p(y|x) as a Gaussian whose mean and variance are both functions of x, learned via two independent Gaussian Process (GP) regressors: f₁(x) for the mean and f₂(x) for the log‑standard deviation. Modeling the log‑standard deviation guarantees positivity and naturally captures heteroscedasticity. Because both GPs are Bayesian, the Z‑score becomes a random variable:
NS(x, y) = y − f₁(x) · e^{−f₂(x)}.
The expected value of this random variable yields an anomaly score s(x, y) that reduces to a closed‑form expression involving the posterior means m₁(x), m₂(x) and the posterior variance σ₂²(x) of the log‑variance GP. Aleatoric uncertainty (AU) is directly represented by σ₂²(x), reflecting intrinsic variability of y given x, while epistemic uncertainty (EU) is captured by the posterior variances of both GPs, which inflate in sparsely populated regions of the context space.
A key contribution is the quantification of EU through a 95 % Highest‑Density Interval (HDI) of the NS distribution. Since the distribution of NS is not analytically tractable, the authors draw samples from the joint posterior of f₁ and f₂, estimate the density via kernel methods, and numerically integrate to obtain the HDI. The length of the interval, denoted i(x, y), serves as a calibrated measure of model confidence: larger i means higher EU and can trigger a “defer” decision rather than a hard anomaly flag, which is especially valuable in high‑risk domains such as healthcare.
Scalability is achieved with a sparse variational GP approximation using M ≪ N inducing points, reducing the computational complexity from O(N³) to O(M²N). Training employs natural‑gradient descent for the variational parameters and the Adam optimizer for kernel hyper‑parameters.
The authors evaluate NS on three fronts: (1) a synthetic dataset derived from WHO growth curves that exhibits strong heteroscedasticity and context sparsity; (2) several public CAD benchmarks (including KDD‑Cup, NASA‑SMAP, and NAB) where NS consistently outperforms state‑of‑the‑art methods such as ROCOD, QCAD, Isolation Forest, and Deep SVDD in both ROC‑AUC and PR‑AUC; (3) a real‑world cardiology cohort measuring aortic diameters, where contextual variables include weight, height, age, and sex. In the medical case study, NS not only achieves higher detection accuracy for clinically relevant outliers (e.g., abnormal aortic enlargement) but also provides HDI‑based confidence intervals that help clinicians decide when additional testing is warranted.
Overall, the paper makes four major contributions: (i) explicit disentanglement of aleatoric and epistemic uncertainties via dual GPs; (ii) a Bayesian reformulation of the Z‑score that yields calibrated confidence intervals; (iii) an efficient variational inference scheme suitable for moderate‑size datasets; and (iv) demonstration of practical impact in a high‑stakes medical application. Limitations include sensitivity to kernel choice and inducing‑point placement in high‑dimensional context spaces, and the need for further engineering to support real‑time streaming scenarios. Future work is suggested on deep kernel learning, online variational updates, and extensions to multivariate behavioral variables.
Comments & Academic Discussion
Loading comments...
Leave a Comment