Bayesian Integration of Nonlinear Incomplete Clinical Data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multimodal clinical data are characterized by high dimensionality, heterogeneous representations, and structured missingness, posing significant challenges for predictive modeling, data integration, and interpretability. We propose BIONIC (Bayesian Integration of Nonlinear Incomplete Clinical data), a unified probabilistic framework that integrates heterogeneous multimodal data under missingness through a joint generative-discriminative latent architecture. BIONIC uses pretrained embeddings for complex modalities such as medical images and clinical text, while incorporating structured clinical variables directly within a Bayesian multimodal formulation. The proposed framework enables robust learning in partially observed and semi-supervised settings by explicitly modeling modality-level and variable-level missingness, as well as missing labels. We evaluate BIONIC on three multimodal clinical and biomedical datasets, demonstrating strong and consistent discriminative performance compared to representative multimodal baselines, particularly under incomplete data scenarios. Beyond predictive accuracy, BIONIC provides intrinsic interpretability through its latent structure, enabling population-level analysis of modality relevance and supporting clinically meaningful insight.

💡 Research Summary

The paper introduces BIONIC (Bayesian Integration of Nonlinear Incomplete Clinical data), a unified probabilistic framework designed to tackle the pervasive challenges of multimodal clinical datasets: high dimensionality, heterogeneous representations, and structured missingness at the view, variable, and label levels. Rather than training end‑to‑end deep encoders on raw images, histopathology slides, or free‑text reports, BIONIC leverages frozen embeddings extracted from large‑scale foundation models (e.g., CLIP, MedicalNet, LLaVA). These embeddings become “views” in a Bayesian multi‑view model, alongside traditional structured clinical variables.

The core architecture consists of two latent spaces. A generative latent variable gₙ (standard normal prior) captures shared structure across all modalities; each observed view x⁽ᵐ⁾ₙ is generated as a linear projection V(m)·gₙ plus Gaussian noise. Sparsity‑inducing Automatic Relevance Determination (ARD) priors on the loading matrices V(m) automatically shrink irrelevant dimensions, effectively reducing the high‑dimensional embeddings to a low‑dimensional subspace that is appropriate for the limited cohort size.

A discriminative latent variable zₙ aggregates task‑relevant information. It is modeled as a Gaussian whose mean is a linear combination of the observed views via loadings W(m), again regularized by ARD priors. The two latent spaces are coupled through an intermediate variable tₙ = U·zₙ + V(T)·gₙ + εₜ, which feeds into a Bayesian logistic regression to produce the final class label yₙ. This coupling enables semi‑supervised learning: unlabeled patients contribute to the posterior of gₙ and zₙ through the generative pathway, while labeled patients drive the discriminative pathway.

Missing data are handled intrinsically. If a variable or an entire view is absent for a patient, the corresponding likelihood term is simply omitted, and inference marginalizes over the unobserved dimensions. Because the generative component provides a full probabilistic model of the data, imputation is performed jointly with prediction, preserving consistency and avoiding the bias introduced by ad‑hoc preprocessing.

Inference is performed with a mean‑field variational approximation, yielding closed‑form updates for all parameters. This makes the method computationally tractable even with multiple high‑dimensional views.

Interpretability is a built‑in feature. Since the discriminative pathway is linear after the orthogonal rotation of the embedding space, the sensitivity of the expected output with respect to each view’s embedding can be analytically derived as S(m) = R(m)·D(m)⁻¹·W(m)·U. This global sensitivity vector can be projected onto individual patient embeddings to obtain sample‑level relevance scores, offering clinicians a clear view of which modality (e.g., CT, MRI, text) drives a particular prediction.

The authors evaluate BIONIC on three publicly available multimodal clinical datasets: (1) MMIST (618 renal‑cell carcinoma patients, five modalities, 12‑month survival classification, with a semi‑supervised subset of 47 unlabeled patients providing only WSI features), (2) MOTUM (67 brain‑tumor patients, four MRI modalities), and (3) an additional biomedical dataset with heterogeneous biomarkers. Across fully observed, partially missing (30‑70% missingness), and semi‑supervised scenarios, BIONIC consistently outperforms state‑of‑the‑art baselines such as early‑fusion concatenation, late‑fusion ensembles, and transformer‑based multimodal networks. Performance gains are especially pronounced when view‑level data are scarce or when unlabeled data are incorporated, demonstrating the robustness of the joint generative‑discriminative formulation.

In summary, BIONIC advances multimodal clinical learning by (i) integrating frozen high‑dimensional embeddings with structured variables in a Bayesian framework, (ii) automatically selecting relevant dimensions and modalities via ARD priors, (iii) modeling missingness directly rather than as a preprocessing step, and (iv) providing analytically tractable, clinically meaningful interpretability. These contributions make BIONIC a practical solution for real‑world healthcare settings where data are heterogeneous, incomplete, and cohort sizes are limited.

Bayesian Integration of Nonlinear Incomplete Clinical Data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment