DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics
Data-independent acquisition mass spectrometry (DIA-MS) has established itself as a cornerstone of proteomic profiling and large-scale systems biology, offering unparalleled depth and reproducibility. Current DIA analysis frameworks, however, require semi-supervised training within each run for peptide-spectrum match (PSM) re-scoring. This approach is prone to overfitting and lacks generalizability across diverse species and experimental conditions. Here, we present DIA-CLIP, a pre-trained model shifting the DIA analysis paradigm from semi-supervised training to universal cross-modal representation learning. By integrating dual-encoder contrastive learning framework with encoder-decoder architecture, DIA-CLIP establishes a unified cross-modal representation for peptides and corresponding spectral features, achieving high-precision, zero-shot PSM inference. Extensive evaluations across diverse benchmarks demonstrate that DIA-CLIP consistently outperforms state-of-the-art tools, yielding up to a 45% increase in protein identification while achieving a 12% reduction in entrapment identifications. Moreover, DIA-CLIP holds immense potential for diverse practical applications, such as single-cell and spatial proteomics, where its enhanced identification depth facilitates the discovery of novel biomarkers and the elucidates of intricate cellular mechanisms.
💡 Research Summary
Data‑independent acquisition mass spectrometry (DIA‑MS) provides deep, reproducible proteome coverage, yet current analysis pipelines rely on run‑specific semi‑supervised learning to re‑score peptide‑spectrum matches (PSMs). This per‑run training is prone to over‑fitting, limits generalizability across species and experimental conditions, and hampers applications such as single‑cell or spatial proteomics where data are sparse.
The authors introduce DIA‑CLIP, a universal, pre‑trained framework that shifts DIA analysis from run‑specific calibration to zero‑shot inference using cross‑modal contrastive learning. The architecture consists of three core components: (1) a dual‑encoder that separately processes peptide sequences (via a Transformer‑based encoder) and extracted ion chromatograms (XICs) (via a specialized spectral encoder); (2) a contrastive learning objective that aligns the two modalities in a shared latent space while explicitly treating true PSMs and entrapment (negative) samples as opposite classes; and (3) an encoder‑decoder module that refines the aligned latent vectors into high‑dimensional discriminative features for final PSM scoring.
Training data comprise over 28 million high‑confidence PSMs collected from multiple species (human, yeast, E. coli, mouse) and diverse instruments (Orbitrap Astral, TripleTOF). By incorporating entrapment peptides as negative examples during pre‑training, the model learns subtle spectral cues that distinguish genuine signals from false matches. After pre‑training, DIA‑CLIP can be applied directly to new DIA datasets without any additional fine‑tuning, delivering true zero‑shot performance.
Benchmarking was performed on four heterogeneous scenarios: (i) HeLa cell lysates across 30–240 min LC gradients, (ii) a multi‑species consortium measured on next‑generation Orbitrap Astral, (iii) clinical breast‑cancer samples, and (iv) ultra‑low‑input single‑cell preparations. In all cases, DIA‑CLIP outperformed leading tools such as DIA‑NN, MaxDIA, and MSFragger‑DIA. For the 90‑minute HeLa gradient, DIA‑CLIP increased peptide identifications by 6.5 % and protein identifications by 3.7 % while maintaining a 1 % FDR. In the Astral multi‑species dataset, it achieved a 1 % absolute gain in precursor identifications and, in the high‑precision regime (CV < 5 %), delivered roughly threefold more precursors and twice as many proteins compared with DIA‑NN.
Entrapment experiments using a mixed‑species library demonstrated superior error control: at a stringent decoy FDR of 0.001, DIA‑CLIP reduced entrapment counts by ~30 % relative to competitors. Visual inspection of XICs for uniquely identified peptides showed clear peak symmetry, co‑eluting fragment ions, and high signal‑to‑noise ratios, confirming that the model captures genuine biochemical signals rather than artifacts.
Quantitative accuracy was evaluated on the multi‑species mixture with known mixing ratios. DIA‑CLIP’s measured precursor and protein intensities closely matched theoretical ratios, exhibiting tighter variance and comparable median values to DIA‑NN. Even for peptides and proteins identified exclusively by DIA‑CLIP, quantitative reproducibility remained high, underscoring the robustness of the learned representations.
Finally, the authors present an early application to spatial proteomics, where DIA‑CLIP was applied directly to tissue‑section DIA data. The model identified more region‑specific proteins than conventional pipelines, illustrating its potential for complex, spatially resolved studies without the need for bespoke training.
In summary, DIA‑CLIP demonstrates that (1) large‑scale pre‑training can produce universal cross‑modal embeddings for peptide sequences and DIA spectra, (2) contrastive learning with entrapment negatives yields strong discrimination between true and false matches, and (3) an encoder‑decoder refinement stage provides high‑resolution scoring suitable for zero‑shot inference. This paradigm eliminates the bottleneck of run‑specific model training, expands proteome depth, improves quantitative fidelity, and opens new avenues for low‑input and spatial proteomics, positioning DIA‑CLIP as a transformative tool for next‑generation proteomic research.
Comments & Academic Discussion
Loading comments...
Leave a Comment