ATTNSOM: Learning Cross-Isoform Attention for Cytochrome P450 Site-of-Metabolism

ATTNSOM: Learning Cross-Isoform Attention for Cytochrome P450 Site-of-Metabolism
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Identifying metabolic sites where cytochrome P450 enzymes metabolize small-molecule drugs is essential for drug discovery. Although existing computational approaches have been proposed for site-of-metabolism prediction, they typically ignore cytochrome P450 isoform identity or model isoforms independently, thereby failing to fully capture inherent cross-isoform metabolic patterns. In addition, prior evaluations often rely on top-k metrics, where false positive atoms may be included among the top predictions, underscoring the need for complementary metrics that more directly assess binary atom-level discrimination under severe class imbalance. We propose ATTNSOM, an atom-level site-of-metabolism prediction framework that integrates intrinsic molecular reactivity with cross-isoform relationships. The model combines a shared graph encoder, molecule-conditioned atom representations, and a cross-attention mechanism to capture correlated metabolic patterns across cytochrome P450 isoforms. The model is evaluated on two benchmark datasets annotated with site-of-metabolism labels at atom resolution. Across these benchmarks, the model achieves consistently strong top-k performance across multiple cytochrome P450 isoforms. Relative to ablated variants, the model yields higher Matthews correlation coefficient, indicating improved discrimination of true metabolic sites. These results support the importance of explicitly modeling cross-isoform relationships for site-of-metabolism prediction. The code and datasets are available at https://github.com/dmis-lab/ATTNSOM.


💡 Research Summary

The paper addresses a critical gap in cytochrome P450 (CYP) site‑of‑metabolism (SOM) prediction: most existing models either ignore isoform identity or treat each isoform independently, thereby missing the substantial overlap in metabolic patterns across isoforms. By analyzing the publicly available Zaretzki dataset, the authors demonstrate high Jaccard similarities (0.66–0.89) between pairs of CYP isoforms, confirming that many substrates are metabolized at the same atoms by multiple enzymes. This observation motivates a unified model that can learn and exploit cross‑isoform relationships.

ATTNSOM (Attention‑based SOM) is built around four tightly integrated components. First, a shared graph encoder based on GraphCliff processes each molecule as a graph, producing atom‑level embeddings (n_i) and a global molecular vector (g). GraphCliff’s gated short‑ and long‑range message passing mitigates oversmoothing, preserving atom‑specific reactivity cues essential for SOM prediction. Second, the model applies feature‑wise linear modulation (FiLM) to the atom embeddings. The global vector g is transformed into FiLM parameters (γ, β) via a small MLP; each atom embedding is then modulated as n′_i = (1 + tanh(γ)) ⊙ n_i + β. This step allows the same atom to exhibit different reactivity profiles depending on its molecular context. Third, a cross‑attention mechanism links atom representations to learnable CYP isoform embeddings (c_k). Atoms act as queries, while isoform embeddings serve as keys and values. Attention scores α_ik are computed with scaled dot‑product attention, normalized across all K isoforms. The attended atom representation n_attn_i = Σ_k α_ik W_v c_k captures how strongly each isoform contributes to the reactivity of a given atom. Finally, for a target isoform t, the concatenated vector


Comments & Academic Discussion

Loading comments...

Leave a Comment