Towards Automated EEG-Based Epilepsy Detection Using Deep Convolutional Autoencoders
Epilepsy is one of the most common neurological disorders. This disease requires reliable and efficient seizure detection methods. Electroencephalography (EEG) is the gold standard for seizure monitoring, but its manual analysis is a time-consuming task that requires expert knowledge. In addition, there are no well-defined features that allow fully automated analysis. Existing deep learning-based approaches struggle to achieve high sensitivity while maintaining a low false alarm rate per hour (FAR/h) and lack consistency in the optimal EEG input representation, whether in the time or frequency domain. To address these issues, we propose a Deep Convolutional Autoencoder (DCAE) to extract low-dimensional latent representations that preserve essential EEG signal features. The ability of the model to preserve relevant information was evaluated by comparing reconstruction errors based on both time series and frequency-domain representations. Several autoencoders with different loss functions based on time and frequency were trained and evaluated to determine their effectiveness in reconstructing EEG features. Our results show that the DCAE model taking both time series and frequency losses into account achieved the best reconstruction performance. This indicates that Deep Neural Networks with a single representation might not preserve the relevant signal properties. This work provides insight into how deep learning models process EEG data and examines whether frequency information is captured when time series signals are used as input.
💡 Research Summary
This paper addresses the persistent challenge of reliable, fully automated seizure detection from scalp electroencephalography (EEG) recordings. While EEG remains the clinical gold standard for diagnosing epilepsy, manual inspection is labor‑intensive, requires expert knowledge, and suffers from variability in labeling, artifacts, and severe class imbalance. Recent deep‑learning approaches, especially convolutional neural networks (CNNs), have shown promise but still struggle to achieve both high sensitivity and low false‑alarm rates per hour (FAR/h). Moreover, there is no consensus on whether the optimal input representation should be in the time domain, the frequency domain, or a combination of both.
To tackle these issues, the authors propose a Deep Convolutional Autoencoder (DCAE) that learns a compact latent representation of multi‑channel EEG while explicitly preserving both temporal and spectral characteristics. The architecture processes 23 scalp channels, each windowed to 512 samples (2 s at 256 Hz). The encoder consists of three convolutional blocks with varying kernel sizes, batch normalization, dropout (20 % in the first block), and max‑pooling, followed by a fully connected layer that maps the extracted features to a 500‑dimensional latent vector. The decoder mirrors this structure with transposed convolutions and up‑sampling to reconstruct the original signal.
A key contribution is the systematic exploration of loss functions that balance reconstruction fidelity in the time and frequency domains:
- TS‑loss – pure mean absolute error (MAE) on the raw time series.
- TS‑FT‑loss – a weighted sum of time‑domain MAE and MAE on the Fourier transform (FT) of the signal, with the FT term multiplied by 20 to equalize its magnitude. Only the alpha‑beta band (8–30 Hz) is emphasized because these frequencies carry important seizure‑related information.
- TS‑STFT‑loss – similar to TS‑FT‑loss but replaces the FT with a short‑time Fourier transform (STFT) spectrogram (window = 64 points, hop = 8 points) to capture localized time‑frequency dynamics.
All frequency‑domain losses are normalized to the 99th percentile of the original spectrum to reduce the influence of outliers.
The dataset used is the publicly available TUH EEG Seizure Corpus (TUSZ), comprising 7 361 recordings from 675 patients (≈48 h of seizure activity). The authors split the data into training (4 664 files), development (1 832 files), and evaluation (865 files). Pre‑processing includes: removal of non‑scalp electrodes, resampling to 256 Hz, band‑pass filtering (0.5–70 Hz) with a 60 Hz notch, average montage selection of 23 channels, 2‑second windowing with plausibility checks (standard‑deviation thresholds), channel‑wise histogram scaling to the range
Comments & Academic Discussion
Loading comments...
Leave a Comment