Systematic review of self-supervised foundation models for brain network representation using electroencephalography

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Automated analysis of electroencephalography (EEG) has recently undergone a paradigm shift. The introduction of transformer architectures and self-supervised pretraining (SSL) has led to the development of EEG foundation models. These models are pretrained on large amounts of unlabeled data and can be adapted to a range of downstream tasks. This systematic review summarizes recent SSL-trained EEG foundation models that learn whole-brain representations from multichannel EEG rather than representations derived from a single channel. We searched PubMed, IEEE Xplore, Scopus, and arXiv through July 21, 2025. Nineteen preprints and peer-reviewed articles met inclusion criteria. We extracted information regarding pretraining datasets, model architectures, pretraining SSL objectives, and downstream task applications. While pretraining data heavily relied on the Temple University EEG corpus, there was significant heterogeneity in model architecture and training objectives across studies. Transformer architectures were identified as the predominant pretraining architecture with state-space models such as MAMBA and S4 as emerging alternatives. Concerning SSL objectives, masked auto-encoding was most common, and other studies incorporate contrastive learning. Downstream tasks varied widely and implemented diverse fine-tuning strategies, which made direct comparison challenging. Furthermore, most studies used single-task fine-tuning, and a generalizable EEG foundation model remains lacking. In conclusion, the field is advancing rapidly but still limited by limited dataset diversity and the absence of standardized benchmarks. Progress will likely depend on larger and more diverse pretraining datasets, standardized evaluation protocols, and multi-task validation. The development will advance EEG foundation models towards robust and general-purpose relevant to both basic and clinical applications.

💡 Research Summary

This systematic review surveys the emerging field of self‑supervised learning (SSL)–based foundation models for electroencephalography (EEG) that learn whole‑brain representations from multichannel recordings. The authors searched PubMed, IEEE Xplore, Scopus, and arXiv up to July 21 2025, identifying 1,343 records; after duplicate removal and screening, 19 papers met the inclusion criteria (English language, SSL pre‑training on large‑scale EEG, whole‑brain representation, cross‑task evaluation).

Pre‑training datasets
The majority of studies (12/19) relied heavily on the Temple University Hospital EEG (TUEG) corpus, with total pre‑training durations often exceeding 2,000 hours and five studies using more than 10,000 hours. Dataset diversity is limited; only a few works combined multiple public BCI datasets or proprietary collections. Channel counts ranged from 8 to 128, and most papers down‑sampled the data to 125–256 Hz after applying a band‑pass filter (typically 0.1/0.5 Hz – 100 Hz).

Model architectures
Seventeen of the nineteen models employed a transformer‑based backbone, while three used state‑space models (SSMs) such as MAMBA and S4. Within the transformer family, four were vanilla transformers, three vision‑transformer variants, six hybrid CNN‑transformer designs, and four tokeniser‑plus‑transformer pipelines. Attention mechanisms varied: six models applied attention only temporally, one only spatially, and ten employed spatiotemporal attention (six joint, four separate). Spatial information was encoded in fourteen models via fixed channel embeddings, learnable embeddings, head‑model based positioning, or 2‑D convolutions. Model sizes spanned 3 M to 540 M trainable parameters, with many papers reporting several scale variants.

SSL objectives
Fourteen models used a single‑stage pre‑training loss; five of these relied on masked reconstruction (continuous signal or value prediction), two on autoregressive prediction, and three on contrastive learning. Four models combined multiple objectives (e.g., masked reconstruction + contrastive loss, spatiotemporal alignment, autoregression, or power‑band estimation). Five studies adopted a two‑stage scheme: first discretising EEG into codebooks, then training a masked token predictor on the discrete symbols. Masked reconstruction—either of raw waveforms or quantised tokens—was the most common objective (12 models).

Fine‑tuning and downstream tasks
All studies fine‑tuned the pretrained models on downstream tasks. Eighteen of nineteen performed single‑task fine‑tuning; only the ALFEE study explored multi‑task adaptation. Fifteen papers fine‑tuned the entire network, three fine‑tuned selected modules, and twelve updated only the classification head. Comparative experiments (in eight papers) showed full‑parameter fine‑tuning generally outperformed partial updates. Downstream applications were diverse: seizure detection, sleep‑stage classification, abnormal EEG event detection, motor imagery/execution, emotion recognition, ERP decoding, subject identification, and artifact detection. The most frequently used benchmarks were motor imagery (10 studies), emotion recognition (8), seizure detection (8), sleep staging (7), and abnormal event classification (10).

Key findings and challenges
The review highlights three major limitations of the current landscape: (1) dataset homogeneity—over‑reliance on TUEG limits demographic and recording‑hardware diversity; (2) methodological heterogeneity—wide variance in architectures, SSL losses, and spatial encoding hampers reproducibility and direct performance comparison; (3) lack of a truly generalizable foundation model—most works fine‑tune for a single downstream task, preventing assessment of multi‑task transferability. Additionally, transformer models suffer from quadratic complexity with respect to sequence length, making long‑duration EEG analysis computationally expensive; SSMs (MAMBA, S4) offer linear‑time alternatives but have yet to demonstrate clear superiority on EEG benchmarks.

Future directions
The authors argue that progress will depend on (i) assembling larger, more diverse, and publicly available EEG corpora spanning multiple clinical populations and recording setups; (ii) establishing standardized evaluation protocols, including shared benchmark datasets and uniform metrics; (iii) exploring multi‑task and continual‑learning fine‑tuning strategies to realize genuinely general-purpose EEG foundation models; and (iv) further investigating efficient architectures (e.g., SSMs, sparse attention) that can handle long recordings without prohibitive computational cost.

In conclusion, self‑supervised EEG foundation models have rapidly emerged, driven by transformer and emerging state‑space architectures, and have shown promise across a range of clinical and BCI tasks. However, the field remains constrained by limited data diversity, lack of benchmark standardization, and the absence of a universally applicable model. Addressing these gaps will be essential for translating these powerful representations into robust, real‑world neurotechnology and clinical decision‑support tools.

Systematic review of self-supervised foundation models for brain network representation using electroencephalography

💡 Research Summary

Comments & Academic Discussion

Leave a Comment