DeepDTF: Dual-Branch Transformer Fusion for Multi-Omics Anticancer Drug Response Prediction

DeepDTF: Dual-Branch Transformer Fusion for Multi-Omics Anticancer Drug Response Prediction
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Cancer drug response varies widely across tumors due to multi-layer molecular heterogeneity, motivating computational decision support for precision oncology. Despite recent progress in deep CDR models, robust alignment between high-dimensional multi-omics and chemically structured drugs remains challenging due to cross-modal misalignment and limited inductive bias. We present DeepDTF, an end-to-end dual-branch Transformer fusion framework for joint log(IC50) regression and drug sensitivity classification. The cell-line branch uses modality-specific encoders for multi-omics profiles with Transformer blocks to capture long-range dependencies, while the drug branch represents compounds as molecular graphs and encodes them with a GNN-Transformer to integrate local topology with global context. Omics and drug representations are fused by a Transformer-based module that models cross-modal interactions and mitigates feature misalignment. On public pharmacogenomic benchmarks under 5-fold cold-start cell-line evaluation, DeepDTF consistently outperforms strong baselines across omics settings, achieving up to RMSE=1.248, R^2=0.875, and AUC=0.987 with full multi-omics inputs, while reducing classification error (1-ACC) by 9.5%. Beyond accuracy, DeepDTF provides biologically grounded explanations via SHAP-based gene attributions and pathway enrichment with pre-ranked GSEA.


💡 Research Summary

DeepDTF introduces a dual‑branch Transformer architecture designed to predict anticancer drug response by jointly modeling high‑dimensional multi‑omics profiles of cancer cell lines and the chemical structure of drugs. The cell‑line branch first tokenizes each omics modality (gene expression, mutation, copy‑number variation, proteomics, proteomics‑difference, DNA methylation) using a CNN‑Attention module that captures local patterns and re‑weights channels, producing a compact token sequence. These tokens are then processed by a standard Transformer encoder to learn long‑range dependencies across modalities. The drug branch converts SMILES strings into molecular graphs via RDKit, assigns categorical node and edge features, and applies a Graph Neural Network (GNN) message‑passing scheme to capture local chemical context. The resulting node embeddings are further refined by a Transformer encoder, allowing global substructure interactions to be modeled.
Both token sets are concatenated and fed into a Fusion‑Transformer, which computes self‑attention over the combined token collection. This cross‑modal attention dynamically aligns drug substructures with context‑dependent cellular states, mitigating semantic misalignment that plagues simple concatenation or static weighting schemes. The fused representation is pooled into a fixed‑dimensional vector and passed to two task‑specific heads: a regression head trained with mean‑squared error to predict log(IC50) and a classification head trained with focal loss to predict binary sensitivity (sensitive vs. resistant). The overall loss is a weighted sum of the regression and classification components plus L2 regularization.
Experiments use integrated data from GDSC2 and CCLP, comprising 782 cell lines, 256 drugs, and 164 165 drug‑cell line pairs. Multi‑omics features are rigorously pre‑processed (RMA normalization for expression, binary encoding for mutations, ternary encoding for CNV, additive weighting for proteomics, coverage filtering for methylation). A 5‑fold cold‑start cross‑validation (cell lines held out) evaluates generalization to unseen cellular contexts. DeepDTF is benchmarked against several strong baselines: CDRscan, tCNN, DeepCDR, DeepTTA, and GraTransDRP. Across all metrics—RMSE, R², Pearson correlation for regression; AUC, accuracy, sensitivity, specificity for classification—DeepDTF consistently outperforms the competitors. With the full multi‑omics input it achieves RMSE = 1.248, R² = 0.875, PCC = 0.938, AUC = 0.987, and reduces classification error by 9.5% relative to the best baseline.
Interpretability is addressed by computing SHAP values for each gene feature, producing a signed ranking of importance. This ranking feeds into pre‑ranked Gene Set Enrichment Analysis (GSEA), revealing that top‑ranked genes are enriched in canonical cancer pathways such as p53, MAPK, and PI3K‑AKT signaling, thereby linking model predictions to biologically meaningful mechanisms.
In summary, DeepDTF advances drug response prediction by (1) preserving modality‑specific inductive biases through dedicated CNN‑Attention and GNN encoders, (2) employing a cross‑modal Transformer fusion that learns dynamic alignments between drug chemistry and cellular molecular states, (3) jointly optimizing regression and classification objectives in a multi‑task setting, and (4) providing transparent, pathway‑level explanations via SHAP‑GSEA. The authors suggest future work on transfer learning for limited‑sample domains, integration with clinical cohorts, and deployment in real‑time precision oncology decision support systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment