Cognitive Load Estimation Using Brain Foundation Models and Interpretability for BCIs
Accurately monitoring cognitive load in real time is critical for Brain-Computer Interfaces (BCIs) that adapt to user engagement and support personalized learning. Electroencephalography (EEG) offers a non-invasive, cost-effective modality for capturing neural activity, though traditional methods often struggle with cross-subject variability and task-specific preprocessing. We propose leveraging Brain Foundation Models (BFMs), large pre-trained neural networks, to extract generalizable EEG features for cognitive load estimation. We adapt BFMs for long-term EEG monitoring and show that fine-tuning a small subset of layers yields improved accuracy over the state-of-the-art. Despite their scale, BFMs allow for real-time inference with a longer context window. To address often-overlooked interpretability challenges, we apply Partition SHAP (SHapley Additive exPlanations) to quantify feature importance. Our findings reveal consistent emphasis on prefrontal regions linked to cognitive control, while longitudinal trends suggest learning progression. These results position BFMs as efficient and interpretable tools for continuous cognitive load monitoring in real-world BCIs.
💡 Research Summary
This paper tackles the challenge of continuously estimating cognitive load from electroencephalography (EEG) in real‑time Brain‑Computer Interface (BCI) applications. Traditional EEG‑based approaches rely on handcrafted spectral or connectivity features and require extensive, task‑specific preprocessing, which limits their ability to generalize across users and sessions. To overcome these limitations, the authors leverage recent Brain Foundation Models (BFMs)—large, self‑supervised neural networks pre‑trained on diverse EEG corpora—to extract high‑resolution, subject‑agnostic representations.
Two state‑of‑the‑art BFMs are evaluated: LaBraM, which uses a convolutional temporal encoder, trainable spatio‑temporal positional embeddings, and a stack of 12 transformers; and CBraMod, which combines temporal and frequency encoders with cross‑attention. Both models produce per‑second, per‑channel embeddings of size 200. Raw EEG recordings (500 Hz, later down‑sampled to 200 Hz) from a VR flight‑simulator study are segmented into 90‑second excerpts, further divided into ten 16‑second windows with 50 % overlap. Within each window, the per‑second embeddings are averaged, yielding a feature tensor of shape (10 windows × channels × 200).
Because participants used heterogeneous dry‑electrode caps (26, 28, or 32 channels), the authors introduce two spatial pooling strategies to obtain a fixed‑size representation: (1) “group‑average” pooling, where electrodes are averaged within nine anatomically defined regions (frontal, central, parietal, etc.), preserving neuroscientific interpretability; and (2) “intersection” pooling, which selects a single electrode per region to minimize dimensionality. Temporal pooling is explored via three schemes: global pooling (concatenating all time steps), mean pooling, and mean‑standard‑deviation (MeanStd) pooling, the latter stacking the mean and variance across windows.
The pooled features are fed to three downstream regressors: an L1‑regularized linear layer, a two‑layer dense neural network (DNN) with batch normalization and ReLU, and a support‑vector machine (SVM) with a fixed radial‑basis‑function kernel. Model training minimizes mean‑squared error against continuous cognitive‑load scores provided by an Adaptive Training System (ATS) for pilots, which integrates flight performance, task difficulty, and learning rate into a single scalar. Performance is measured by Pearson correlation between predicted and ATS scores.
Cross‑subject generalization is evaluated using nested cross‑validation: Cohort E (five participants who completed all 90 trials) serves as the held‑out test and validation set, while Cohorts D (11 participants, 32‑channel) are used for training. Additional experiments augment the training set with participants from Cohorts A‑C, which feature varying channel layouts, to assess robustness to data heterogeneity.
Results show that BFMs dramatically outperform conventional baselines. LaBraM with group‑average spatial pooling and global temporal pooling achieves an average Pearson correlation of 0.281, compared with 0.120 for power‑spectral‑density (PSD) features and 0.143–0.157 for end‑to‑end deep models such as EEGNet and EEGConformer. CBraMod also improves over baselines (0.133 correlation) but lags behind LaBraM, likely due to a smaller parameter count and different positional encoding. Temporal pooling choice has a modest impact; mean and MeanStd reduce dimensionality while preserving performance, whereas global pooling retains slightly more information. Spatial pooling, however, is decisive: group‑average consistently outperforms intersection across all downstream estimators, indicating that BFMs capture complementary information across electrodes that is lost when only a single electrode per region is retained.
When training data size is increased by adding heterogeneous cohorts, both BFMs maintain or improve performance, with LaBraM showing the steepest gains. The linear regressor proves most resistant to over‑fitting across cohort additions, whereas SVM and DNN exhibit larger fluctuations at cohort transition points.
Interpretability is addressed through Partition SHAP, a model‑agnostic method that respects feature correlations by constructing a hierarchical grouping of electrodes. By perturbing input EEG and measuring output changes, the authors generate SHAP importance maps per participant and per day. The resulting heatmaps consistently highlight prefrontal regions—particularly dorsolateral prefrontal cortex—as the most influential for cognitive‑load prediction. Longitudinal SHAP analysis reveals a growing emphasis on these frontal areas over successive training days, mirroring the observed decrease in overall cognitive‑load scores and aligning with neuroscientific theories linking prefrontal activity to working memory and executive control.
The paper’s contributions are fourfold: (1) a scalable pipeline that leverages BFM‑derived EEG features for long‑duration cognitive‑load estimation; (2) an anatomically informed group‑average pooling method that mitigates heterogeneity in electrode layouts; (3) an adaptation of Partition SHAP for EEG, providing neuroscientifically grounded explanations; and (4) a longitudinal study demonstrating that BFMs capture learning‑related dynamics in real‑world BCI scenarios.
Limitations include the lack of explicit benchmarking of real‑time inference latency on edge hardware, and the fact that SHAP explanations are limited to electrode‑level importance without dissecting underlying frequency‑band contributions. Moreover, the ground‑truth labels stem from a specific VR pilot training system, so external validation on other domains (e.g., classroom learning, industrial task monitoring) remains an open avenue.
In summary, by integrating large‑scale pre‑trained brain models, anatomically motivated pooling, and rigorous interpretability analysis, the authors present a robust, interpretable solution for continuous cognitive‑load monitoring. This work paves the way for adaptive BCIs that can dynamically adjust content, difficulty, or assistance based on reliable, neurophysiologically grounded estimates of mental workload.
Comments & Academic Discussion
Loading comments...
Leave a Comment