A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data

A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Machine learning models are increasingly applied to biomedical data, yet their adoption in high stakes domains remains limited by poor robustness, limited interpretability, and instability of learned features under realistic data perturbations, such as missingness. In particular, models that achieve high predictive performance may still fail to inspire trust if their key features fluctuate when data completeness changes, undermining reproducibility and downstream decision-making. Here, we present CACTUS (Comprehensive Abstraction and Classification Tool for Uncovering Structures), an explainable machine learning framework explicitly designed to address these challenges in small, heterogeneous, and incomplete clinical datasets. CACTUS integrates feature abstraction, interpretable classification, and systematic feature stability analysis to quantify how consistently informative features are preserved as data quality degrades. Using a real-world haematuria cohort comprising 568 patients evaluated for bladder cancer, we benchmark CACTUS against widely used machine learning approaches, including random forests and gradient boosting methods, under controlled levels of randomly introduced missing data. We demonstrate that CACTUS achieves competitive or superior predictive performance while maintaining markedly higher stability of top-ranked features as missingness increases, including in sex-stratified analyses. Our results show that feature stability provides information complementary to conventional performance metrics and is essential for assessing the trustworthiness of machine learning models applied to biomedical data. By explicitly quantifying robustness to missing data and prioritising interpretable, stable features, CACTUS offers a generalizable framework for trustworthy data-driven decision support.


💡 Research Summary

The paper introduces CACTUS (Comprehensive Abstraction and Classification Tool for Uncovering Structures), a machine‑learning framework specifically engineered for small, heterogeneous clinical datasets that often contain missing values. CACTUS consists of three tightly coupled components: (1) feature abstraction, (2) interpretable classification, and (3) systematic feature‑stability analysis. In the abstraction stage, raw clinical variables (blood and urine biomarkers, demographics, etc.) are grouped using domain‑driven clustering and dimensionality‑reduction techniques, producing a compact set of “abstract features” that preserve the underlying biological relationships while reducing redundancy. This step mitigates over‑fitting, a common problem when training models on limited data.

The classification stage employs models that are intrinsically interpretable (e.g., decision trees or sparse linear models) on the abstract features. Post‑hoc explanation tools such as SHAP or LIME are then applied to provide per‑prediction attribution, allowing clinicians to see exactly which biomarkers drive each decision.

The novel contribution of CACTUS lies in its feature‑stability analysis. The authors simulate realistic data degradation by randomly removing 10 %, 20 %, and 30 % of entries under a Missing Completely At Random (MCAR) assumption. For each perturbed dataset, they compute feature‑importance rankings for every model and quantify stability using two metrics: (i) the average relative change in importance for the top‑10 features across missingness levels, and (ii) the standard deviation of these changes. Low values indicate that a model’s key features remain consistent despite increasing missingness, which the authors argue is a crucial proxy for trustworthiness in high‑stakes medical decision support.

The framework is evaluated on a real‑world haematuria cohort (HaBio) comprising 568 patients screened for bladder cancer. The dataset includes roughly 30 biomarkers measured in blood and urine, together with age, sex, smoking history, and other clinical covariates. The task is binary classification (bladder‑cancer vs. non‑cancer). CACTUS is benchmarked against widely used algorithms—Random Forest (RF), CatBoost, and LightGBM—across the four missingness conditions, using five‑fold cross‑validation. Performance metrics (AUC, accuracy, F1‑score) show that CACTUS achieves AUC values between 0.86 and 0.89, comparable to or slightly better than the baselines (RF 0.84–0.87, CatBoost/LGBM 0.83–0.86).

Crucially, CACTUS dramatically outperforms the baselines in feature stability. For the complete dataset, the average relative change for the top‑10 features is 0.04 with a standard deviation of 0.02; under 30 % missingness, these numbers remain essentially unchanged. In contrast, RF exhibits an average change of 0.12 (SD 0.07), CatBoost 0.09 (SD 0.05), and LGBM 0.10 (SD 0.06). The authors also conduct sex‑stratified analyses. CACTUS maintains a nearly identical set of top biomarkers (e.g., hematuria, clusterin, N‑serine) for both male and female subsets, whereas CatBoost’s top features diverge markedly for females, and LGBM’s stability drops substantially.

The paper discusses several implications. First, it demonstrates that predictive accuracy alone is insufficient for clinical adoption; stability of learned features across data perturbations provides complementary evidence of model robustness. Second, the abstraction‑classification pipeline yields interpretable models that clinicians can audit, addressing the “black‑box” criticism of many modern AI systems. Third, by quantifying robustness to missing data, CACTUS aligns with real‑world clinical workflows where not every test is performed on every patient.

Limitations are acknowledged. The stability analysis assumes MCAR missingness, whereas clinical data often exhibit Missing At Random (MAR) or Missing Not At Random (MNAR) patterns. The abstraction step relies on domain knowledge for clustering, which may need re‑tuning for other diseases. Future work is proposed to extend the framework to MAR/MNAR scenarios, automate hyper‑parameter selection for feature abstraction, and test CACTUS on additional biomedical domains.

In summary, CACTUS offers a unified solution that simultaneously delivers high predictive performance, interpretable decision rules, and quantified feature stability under incomplete data. By doing so, it provides a concrete pathway toward trustworthy, data‑driven decision support in medicine, especially in settings where data quality cannot be guaranteed.


Comments & Academic Discussion

Loading comments...

Leave a Comment