Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The goal of personalized medicine is to discover a treatment regimen that optimizes a patient’s clinical outcome based on their personal genetic and environmental factors. However, candidate treatments cannot be arbitrarily administered to the patient to assess their efficacy; we often instead have access to an in silico surrogate model that approximates the true fitness of a proposed treatment. Unfortunately, such surrogate models have been shown to fail to generalize to previously unseen patient-treatment combinations. We hypothesize that domain-specific prior knowledge - such as medical textbooks and biomedical knowledge graphs - can provide a meaningful alternative signal of the fitness of proposed treatments. To this end, we introduce LLM-based Entropy-guided Optimization with kNowledgeable priors (LEON), a mathematically principled approach to leverage large language models (LLMs) as black-box optimizers without any task-specific fine-tuning, taking advantage of their ability to contextualize unstructured domain knowledge to propose personalized treatment plans in natural language. In practice, we implement LEON via ‘optimization by prompting,’ which uses LLMs as stochastic engines for proposing treatment designs. Experiments on real-world optimization tasks show LEON outperforms both traditional and LLM-based methods in proposing individualized treatments for patients.

💡 Research Summary

The paper frames personalized medicine as a conditional black‑box optimization problem: given a patient’s covariates z (genomics, environment, clinical history) and a space X of possible treatment designs x (drug selections, dosages, schedules), the true clinical outcome f(x; z) is inaccessible for direct evaluation. Practitioners therefore rely on a surrogate model \hat f(x; z) (e.g., a digital twin or ML predictor) trained on historical data that often differs in distribution from the target patient population, leading to severe out‑of‑distribution (OOD) errors.

To mitigate these issues, the authors propose LEON (LLM‑based Entropy‑guided Optimization with Knowledgeable priors). LEON leverages large language models (LLMs) as zero‑shot stochastic generators of treatment proposals, conditioning them on domain‑specific priors such as medical textbooks, PubMed abstracts, and biomedical knowledge graphs. The LLM is prompted with (i) the patient’s context z, (ii) a concise summary of relevant medical knowledge, and (iii) a cache of previously proposed designs and their surrogate scores. By sampling multiple outputs for the same prompt, LEON estimates the entropy of the LLM’s response distribution; low entropy indicates high model confidence in a particular design class.

Two constraints are imposed on the evolving design distribution q(x) to keep the search grounded:

1‑Wasserstein distance constraint – Using an auxiliary source‑critic c* trained to discriminate between real historical designs D_src and generated designs, LEON bounds the 1‑Wasserstein distance between q(x) and D_src by a threshold W₀. This limits extrapolation into regions where the surrogate \hat f is unreliable.
∼‑coarse‑grained entropy constraint – An equivalence relation ∼ partitions the treatment space into N classes. The coarse‑grained entropy H∼(q) of q over these classes is bounded by H₀, preventing the optimizer from scattering across too many disparate design families and encouraging consistent, knowledge‑aligned proposals.

Optimization proceeds via “optimization‑by‑prompting”: each iteration the LLM receives the current context and generates a batch of b candidate treatments. Each candidate is scored by the surrogate \hat f and penalized if it violates the Wasserstein or entropy constraints (as measured by c* and the class distribution). The highest‑scoring, constraint‑satisfying candidates are added to the cache, influencing the next prompt. The process repeats until convergence of the surrogate objective under the constraints.

Theoretical analysis shows that, when the constraints hold, the generalization gap between \hat f and the true objective f is bounded by the Wasserstein distance and the entropy term, providing a formal guarantee of robustness against OOD surrogate errors.

Empirically, LEON is evaluated on five real‑world personalized treatment design tasks, including oncology (mutation‑guided drug selection), autoimmune disease (combination immunosuppressants), and metabolic disorders (gene‑expression‑guided nutraceuticals). Baselines include traditional Bayesian optimization, reinforcement‑learning‑based design, prior LLM zero‑shot optimization, and digital‑twin‑driven optimization. Across all tasks, LEON achieves an average rank of 1.2 (compared to baseline ranks of 3–5) and improves simulated clinical outcome scores by 15–23 %. Notably, LEON’s performance degrades far less when the patient distribution diverges from the surrogate’s training data, demonstrating its robustness. Patient‑specific data z never appear in the LLM prompt, preserving privacy.

Limitations identified are (i) residual hallucination risk of LLMs, (ii) dependence on the quality and representativeness of D_src, and (iii) sensitivity to hyper‑parameters W₀ and H₀. Future work proposes integrating multimodal LLMs (text + images + graphs), online updating with real‑time patient feedback, and learning the constraint thresholds via differentiable Lagrangian methods.

In summary, LEON offers a principled, knowledge‑augmented black‑box optimization framework that turns large language models into effective, privacy‑preserving designers of personalized treatment regimens, outperforming both conventional surrogate‑driven optimizers and prior LLM‑only approaches.

Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine

💡 Research Summary

Comments & Academic Discussion

Leave a Comment