Optimizing Channel Selection for Seizure Detection

OPTIMIZING CHANNEL SELECTION FOR SEIZURE DETECT ION V. Shah, M. Golmoham madi, S. Ziyabari, E. Von W eltin, I. Obeid and J. Picone Neural Engi neering Data Co nsortium, T emple Univer sity, Philadel phia, Penns ylvania, USA {vinitshah, meysam, saeedeh, eva.vonwelti n, obeid, p icone}@temple.ed u Abstract— Interpretation of el ectroencephalogram (EEG) signals can be complicated by obfuscating artifacts. Artifact detection plays an important role in t he observation and analysis of EEG signals . Spatial i nformation contained in the placement of the elect rodes can be ex ploited to accurately detect a rtifacts. However, when fewer electrodes are used, less spatial informat ion is available, m aking it harder to detect artifacts. In this study, w e inves tigate the performance of a deep learni ng algorithm , CNN-LSTM , on several channel configurations. Each configur ation was designed to minimize t he am ount of s patial information lo st compared to a standard 22-channel EEG. Systems using a reduced number of channels ranging from 8 t o 20 achieved sensitivities between 33 % and 37% with false alarms in the range of [38, 50] per 24 hours. False alarms increased dramatically (e.g., over 300 per 24 ho urs) when th e number of ch annels was further reduced. Baseline p erformance of a system that used all 2 2 channels w as 39% se nsitivity w ith 23 false alar m s. S ince the 22-channel syste m w as the only system that included r eferential channels, the r apid increase i n the false alarm rate as the number of cha nnels w as reduced underscores the importance of retaining referential channels for artifa ct reduction. T his c autionary result is i mportant because one of the bigg est differe nces between various ty pes of EEGs administered is the type of referential channel used. I. I NTRODUCTION An electroe ncephalogra m (E EG) is a ver y popular non- invasive to ol for record ing signals and dia gnosing bra in- related illnesses [ 1]. T he 10-20 electro de c onfiguration is by far the most popular standa rd worldwide for conducting EEG tests [2] and provides clinicia ns a n adequate amount of i nformatio n about the si gnal to m ake a diagnosis. Higher density EEGs are popular in re search communitie s for their super ior loc alization cap abilities, but are still not co mmon in cli nical p ractice. T hough the increased den sity of the elec trode grid doe s provide additional informat ion, this informatio n is no t significantl y more infor mative and do es not j ust ify t he additional d isk space required to archive the d ata. The TUH E EG Corpus (T UEEG) [3], which is the subj ect of this study, is the world’s largest publicly accessib le archive o f c linical EEG recordings. It c ontains over 40 unique c hannel co nfigurations. Many o f these configurations were created t o assist in the diagnosi s o f specific disea ses. T he most str iking di fference in t hese configurations is the manner in which ground and reference is used when a differential montage is constructed [4 ][5]. Since E EG signals a re very low i n voltage and quite nois y , groundin g and/or referenci ng plays a key ro le in one’s ab ility to collec t clean signals. In thi s pa per, w e focus o n an important subset of TUEEG known as the TUH EEG S eizure Corp us (T USZ) [6]. Over 90% of t hese files use th e 19-channel configuratio n shown in Figure 1 [7]. We have applied a com bination of longitudinal a nd transverse bi polar montages, r eferred to as a TCP montage [ 7], to create 22 channel d ifferential- bindings with a foc us on focal re gions o f t he scalp . This montage is al so summarized in Figure 1. TUSZ has been manually annotated for diverse morphologies of seizure e vents. W e have intro duced a deep lea rning arc hitecture [9 ] that ac hieves a ver y lo w false positive rate (FP R). T his s ystem integrates convolutional ne ural network s (CNNs) with recurrent neural n e tworks (RNNs) to d eliver state of the art performance . T his do ubly de ep recurr ent co nvolut ional structure models b oth spatial re lationships ( e.g., cross- channel de pendencie s) and temporal d ynamics (e.g., events suc h as spikes). The integration of CNNs a nd lo ng-short term memory (LSTM) units do es a much be tter j ob rejec ting artifacts. Artifacts and events such as wicket spikes, rectus muscle and electrode-pop artifacts are e asily con fused w ith spike and wave discharges because they often appear on o nly a few channels similar to the way seizure e vents present themselves. The depth of the convolutiona l net work is important since the top convolutional layers tend to l ear n generic features while the deep er layers learn d ataset specific feature s. T he convolutional LSTM architecture with pr oper initia lization and re gularization deliver s 30% sensitivity a t 6 false alar ms per 2 4 hours [10]. Figure 1. Elec trode location s for a standard 10-20 system with a defined 22 -c hannel TCP monta ge. Feature extrac tion typically r elies on time freque ncy representations of t he signal. Though we can replace traditional model-based feature extraction with de ep learning-based a pproac hes tha t o perate directly on the sampled data, i n this work we foc us o n the use of traditional cep stral-based features. In our current system, we use a traditional linear frequency cepstral coefficient- based feature extractio n appro ach ( LFCCs) [5 ][8]. We also use first a nd second de rivatives of the features since these improve perfor m ance. T hough w e can r eplace traditional model-based feature extraction with de ep learning-based a pproac hes tha t o perate directly on the sampled data, or more advance d d iscriminative features, these have not yet prod uced substantial impro vements in performance for this applica tion. Neurologists t y pica lly revie w EE Gs in 1 0 sec windo ws and identify events with a temporal resolution of approximatel y 1 sec. Follo w ing this appr oach, w e chose to analyze the signal in 1 sec epoc hs, and f urther di vide this interval into 10 frames of 0.1 secs each so t hat features ar e comp uted every 0.1 se conds (referred to as the fra me duratio n) using 0.2 sec ond a nalysis windo ws (referred to as the windo w duratio n). The outp ut of our feature extrac tion process is a f eature vector of dimen sion 26 for each of the 22 c hannels, with a frame duratio n of 0.1 secs. This o ptimized system p roduces 3 9% sensitivit y and 9 0% specificit y with 23 false a larms (FA) per 24 hours [9]. T his will be o ur baseline system. Our focus in this stud y is t o optimize the selection of channels. T his serves two purpo ses. First, it reduce s the dimensional ity of t he proble m. Second, a nd more importantly, our goal is to find a minimal numbe r of channels that are common acro ss all EEGs tha t ca n provide reasonab le levels o f p erformance. Otherwise, the system will have t o adapt to the unique channel configuration of ea ch EEG o r clinical site, and this is an extremel y complex pro cess. The results presented in this p aper use the Any Overlap scoring method [1 1] in which true p ositives ar e counted when the hypo thesis overlaps w ith one or more reference annotations. False positi ves cor respond to events in which the hy pot hesis annota tions do not over lap wit h any of the reference annotatio ns. T his method of sco ring is popular in the EEG research co mmunity. The r elative rankings of the s ystems ar e not sensitive to the scor ing method, though the absolute numbers do change s lightl y. II. C HANNEL S ELECTION EEGs are used to diagnose a w id e variety of pathologies. Application s include obvious thi ngs like se izure detection and p rediction. B ut an EEG today is also be ing used to d iagnose psych o logica l disorders, sleep disorders and head injuries. Further, an EEG is used to m o nitor the impact o f dr ug inter ventions. For ea ch speci fic ta sk, spatial i nformation p lays a majo r ro le. For example, electrod es pla ced ne ar the occipital lobe ca pture brain activity re lated to vi sion whe reas mid-parie tal region electrod es collect infor mation re lated to waking consciousness. In thi s study, we have focuse d o n seizure detection. We emphasize the importance of u sing do main k nowledge in the selec tion o f cha nnel co nfigurations instea d of using an ad hoc selection p rocess. An overview of t he c hannel selection process is given in Fi gure 2. W hen reducing the number of chan nels from 22 to 20, we re moved the referential channels A1 an d A2. T hese are attached to the patient’s ears and are generally very susceptib le to noise. Additionall y, all brain events o ccurring on those cha nnels can also be observed on e lectrodes T3 and T4 . Frontal Polar ( FP1 & FP2) channels are mostl y ignored because o nly 36% o f frontal se izures can be o bserved on scalp EEGs maki ng a utomatic detectio n of frontal lobe seizures very difficult [1 2]. The CZ ele ctrode is utilized throughout a ll configurations b ecause, due to i ts location at the c enter of the scalp a nd b ecause i t i s att ached to 6 adja cent electro des in the TCP m onta ge, the CZ electro de is able to detec t seizures occur ring in bo th he mispheres better tha n an y ot her single e lectrod e. Onl y o ne o f t he (a) 22 channels (b) 20 channels (c) 16 channels (d) 8 channels (e) 4 channels (f) 2 channels Figure 2. An overview of the channel selection strategies that were employed to reduce the number of channels. occipita l (O1 & O2) electro des ha ve been c onsidere d in 4 and 2 c hannel configurati ons because the occipital electrod es are always placed clo se to each other. Consequen tly, it is li kely t hat seiz ure eve nts occurr ing near one of the occipital elec trodes will a ppear on the other as well. III. E XPER IMENTA L D ESIGN AND A NALY S I S For this s tudy, we have used a baseline s ystem tha t integrates CN Ns and LST MS, as shown in Fig ure 3. The input tensors a re fed to a CNN stage that t y picall y consists of 3 layers of 2D CN N layers with 16 kernels o f size o f 3×3 and max-pooling l ayers o f size 2×2 to effectively reduce the d imensionalit y of t he input. Dropo ut layers are added at the end of e ach layer except the very las t one to avoid overfitti ng. T he outp ut is then flattened and fed to a 1D CNN network whic h acts as a fully connected net work. The output of this pass is fed to a bidir ectional LST M stage. Exponential Linear U nits (ELU) a re used as the acti vation f unctions for all sta ges except the last sta ge, which use s a sigmo id ac tivation function. A mean -square er ror loss function and Ada m optimizer are a lso used. Postpro cessing is used on the system output to reduce t he false alar m rate. In T able 1 w e summariz e the results for each of the channel c onfiguration s shown in Figure 2 . T he sys tem with t he 22-chan nel conf iguration, a s expected , outperfor ms t he other syste ms. The 20 -channel, 16- channel and 8-chan nel configur ations pro duce mode rate reductions in pe rformance. T he 4-channel and 2-c hannel configurations perfor m poorly because t hese configurations lac k spatial co ntext. Unfortunatel y, the typical syste m defined here cannot be applied identically for all the c hannel con figurations that we ha ve de fined for this stud y because di mensionality reduction on a small number o f cha nnels is a p roblem. Applying m a x-pooling w ith a 2×2 m atr ix on all t he layers when using 2 , 4, and 8, cha nnels is not possible. To m a ke a fai r co mparison and to understand t he b ehavior of a system o n low d imensional te nsors we ha ve used two separate a pproaches for lo w-dimensional channe l configurations. Fir st, w e simpl y keep the dimen sionality of channel tensor intact. Second , w e remove one or more CNN layer s wheneve r we face dimensiona lity reductio n issues. M odification in n umbe r of CNN layers c an be observed in the second co lumn in Tab le 1. An ROC curve, w hi ch depicts the tr ue positive rate (T PR) vs. the f alse positive rate (FPR), is shown i n Figure 4. We compare four systems: 22 , 16, 8 and 4 chan nels. The 2 2- channel system clea rly o utper forms the o ther t hree reduced-cha nnel co nfigurations. The per forman ce differences are greatest for lo w values o f FPR, which is the region of most interest i n this a pplication. On the other hand, when the FPR is high, the perfor mance between these systems is minimal. We also observe that the performance differences between 16 -channel and 8-channel co nfigurations a re small w ith the 8-channel syste m p erformin g slightly better when the FPR is lo w. T his see ms to valid ate the proce ss used to select the se c hannel co nfiguration s that was based on significant amou nts of do main knowledge. Figure 3. A block diagram of the b aseline sys te m. Table 1. Performance vs. channel configuration Ch. 2D CNN Layers Sensitivity (%) Specificity (%) FA/24 Hours 22 3 39.15 90.37 22.83 20 3 34.54 82.07 49.25 16 3 36.54 80.48 53.99 8 3 33.44 85.51 38.19 4 3 33.11 39.32 325.54 8 2 30.66 88.79 28.57 4 1 34.09 39.00 332.15 2 3 31.15 40.82 308.74 Figure 4. ROC curves for 22 , 16, 8 and 4 c hannels Next, we conducted an experiment to investigat e the importance of i ncluding the re ferential c hannels A1 and A2, r eferred to colle ctively as Ax. Ta ble 2 pre sents a compariso n the 2, 4 , 8 and 16 c hannel con figurations to the same configura tions with Ax added. We also provide an R OC curve in Fig ure 5. The ROC curves demonstrate that gap in per formance b etwee n the 18-channel syste m and the 10-channel syste m is much greater than that achieved wit hout the additiona l channels. Further , overall performance with A x is be tter than witho ut. To further prob e this, in Figur e 6 , we co mpare a n 18- channel co nfiguration with A x c hannels to a 16-channel configuration without the A x channel s. The system using referential cha nnels perfor ms better at lo w FPR than the system without referen tial channels, and this improvement i n perfor mance is not simply due to the increased cha nnel count. I nstead it is an indicat ion that the refere ntial channels ar e provid ing meaningful information, e specially at lo w FPRs. IV. S UMMARY In this p aper, we have investi gated t he i mpact of referential c hannels on se izure detec tion p erformance. We have e xplored this using a framework based on a hybrid CNN-LST M deep learning system. Not surprisingl y, u sing a ll channels from a 10-20 EEG configuration gave best pe rformance: 39.15 % sensitivity and 90.3 7% specificity with 22.83 FA per 24 hours. Selection a mode rately reduce d numbe r of c hannels ( e.g., 16 and 8 ) resulted i n a small b ut measur able degrada tion in pe rformance. Adding refere ntial c hannels to t hese configurations i mproved pe rformance par ticularly in the low FPR reg ion of pri m ary in terest in this a pplication. Deep learning s ystems are extremely sensitive to training conditions. Initializa tion of mod els a nd rando mization of the da ta pla y a far too signi ficant ro le in the over all performance . T his co mplicates these t ypes o f par ameter studies be cause the system must be individua lly optimized for each co ndition. T his is a n ongoi ng i ssue that we are a ddressing in f uture researc h. Also, w e de monstrated that net work ar chitecture s needed to change for the l ow-order s ystems. For exa m ple, t he 4- channel s ystem i n Ta ble 1 used only one 2D CNN layer. These t ypes o f op timizations are anot her reason t hese parametric studies must be carefull y designed. Finally, si nce the use of referential channels varies significantl y acr oss EE G type, c linical site, ne urologist, etc. Better techniques to r educe the se nsitivity of performance to these refere ntial chan nels is needed. A CKNOWL EDGMENTS Research repo rted in this publ ication was most re cently supported by the Na tional Human Ge nome Resear ch Institute o f the National Institutes o f Health under award number U01HG00 8468. The content is solel y t he responsibilit y o f the authors a nd does not necessaril y represent the officia l views of the National I nstitutes o f Health. This materia l is also ba sed in par t upo n work supported by the National Science Foundatio n unde r Grant No. IIP-1 622765. Any opinions, findi ngs, and conclusions or rec ommendatio ns e xpressed in this material ar e those of the au thor(s) and d o not necessarily reflect the views of the Na tional Science Fo undation. R EFERENCES [1] T. Yamada and E . Meng, Practical guide fo r clinical neurophysiologic testing: EEG . Philadel phia, Pennsylvania, USA: L ippinc ott Wil liams & Wilkins, 2009. Table 2. A comparison of performance demonstrating th e impact of includin g the A x channels No. Chan. Sensitivity (%) FA/24 Hours w/ A x w/ o A x w/ A x w/o A x w/ A x w/o A x 22 20 39.15 34.54 22.83 49.25 18 16 36.65 36.54 37.33 53.99 10 8 30.94 33.44 283.18 38.19 6 4 34.36 34.09 58.15 332.15 4 2 3 3.06 31.15 47.53 308.74 Figure 5. ROC c urves for 22, 18 and 10 channel configurations that include the A x channels Figure 6. 18- c hannels w/ A x vs. 16-chann els w/o A x [2] W. Tatu m, A. Husain, S. Benbadis, and P. Kaplan, Handbook of EEG Interpretation . New York City, Ne w York, USA: Demos Medical Publishing, 2007. [3] I . Obeid and J. Picone, “The Temple University Hospital EEG Data Corpus,” Front. Neurosci. Sect. Neural Technol. , vol. 10, p. 196, 2016. [4] I . Obeid and J. Picone, “Machine Learning A p proaches t o Automatic Interpretation of EEGs,” in Biomedical Signal Processing in Big Data , 1st ed., E. Sejdik and T. Falk, Eds. Boca Raton, Florida, USA: CR C Press, 2017 (in press). [5] S. L opez, M. Golammadi, I. Obei d, and J. Pic o n e, “An Analysis of T wo Commo n Refe rence Points for EEGs,” Pr oceedings o f the IEEE Signal Processing in M edicine and Biology Symposium , 2016, pp. 1–4. [6] M. Golmohammadi, V. Shah, S. Lopez , S. Ziyabari, S. Yang, J. Camaratta, I. Ob eid, a nd J. Pico ne, “The TU H EEG Seizur e Corpus,” Proceedings of the American Clinical Neurophysiology Society Annual Meeting , 2017, p. 1. [7] A CNS, “ G uideline 6: A Proposal for Standard Montages to B e Used in Clinical E EG,” Milwaukee, W S, USA , 2006 ( avai lable at http://www.acns.org/pdf/g uidelines/Guideline-6.pdf ). [8] A . Harati, M. Golmo hammadi, S. Lopez , I. Obeid, and J. Picone, “Improved EEG Event Classification Using Diffe rent ial Energy,” Proceedings of t he IEEE Signal Processing in Medicin e and Biology Symposium , 2015, p p. 1–4. [9] M. G olmohammadi, S. Ziy abari, V. Shah, I . Obeid, and J. P icone, “Deep Architectures for Automated Seizure Detection in Scalp EEGs,” submitted to the AAAI Conference on A rti fical Intelligence, 2018, pp. 1–8 (av ailable at: https://www.isip. piconepress.com/publicatio ns/unpublished/conferences/2018/aa ai/deep_learning/ ). [10] Golmohammadi, M., Ziyabari, S., Shah , V., Ob eid, I., & Picone, J. (2017). Gated Recurrent Netw orks for Seizure Detection. Submitted to the I EEE Signal Processing in Medicine and Biology Symposium (pp. 1 –5) (a vailab le at https://www.isip. piconepress.com/publicatio ns/unpublished/conferences/2017/iee e_spmb/rnn/ ). [11] Zi yabari, S., Shah, V., Gol mohammadi, M ., Obeid, I., & Picone, J. (20 1 7). A n Analysis of Objective Performance Metrics for Automatic I nterpretation of EEG Signal Eve nts. Submitted to the Journal of Clini cal Neurophysiology (available at: https://www. isip.piconepress.com/publicati ons/unpublished/journals/2017/jc n/metrics/ ). [12] W. Deburchgraeve et al., “Automated neonatal seizure detection mimicking a human obse rver reading EEG,” Clin. Neurophys iol. , vol. 119, no. 11, pp. 244 7–2454, 2008.

Optimizing Channel Selection for Seizure Detection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment