Indoor Sound Source Localization with Probabilistic Neural Network

IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS Abs tract —It is know n t h at ad verse envi ronmen ts such as h igh reve rberation and low s ignal-to-n oise rat io ( SN R) pose a great cha llenge to ind oor sound sou rce local ization. To addres s this cha llenge , in th is pap er , we pro pose a sound sou rce l o calizati on algo rithm bas ed on probab ilistic neural ne twork, na m e l y Genera liz ed cross corre l ati on Class ification Algo rithm (GCA) . Ex perimen tal resu lts fo r adverse en viron ments w ith high re verbera tion ti me T 60 up to 600 m s and low SNR s uch as ‒10dB show that, the average az imuth ang le error and e l eva ti on ang le error by GCA are only 4 .6º and 3.1º respe ctively . Co mpared w ith three recen tly published algo rith ms, G CA has i ncr eased the succ ess ra te on di rection of arr ival est imatio n sign i fican tly with good robus tness to environ mental changes . Thes e resul ts show that t h e propo sed GCA can localiz e accurate ly and robustly for dive rse i nd oor appl i cations wh ere the site acou stic features can be studied p ri o r to the loca l izat ion stage . Inde x Ter ms —Sou nd sour ce lo caliz ation (SS L), direction of arriva l (DOA) , gen eralize d cross co rrelation (GC C), probab ilistic neural netw ork (PNN), machine learning . I. I NT RODU CTIO N OCALIZATI ON techniques have bee n wide l y used in bo th outdoo r e n vi r onme nt [1] and indoo r e n vi r onme nt [2]. Dive r se ty p es of senso r s in clu ding aco ustic sensors, elec tr omagnetic se n sors , a n d optical senso r s have b een adopted for loc ali zatio n. Senso r n ode s wit h acous ti c mic rophones [ 3] w it h low pow er co n sumptio n we r e used i n wi r eles s senso r netwo r ks. I n [ 4], l oc ali zatio n o n t h e b asis o f dense passiv e radio-f r equency identific at io n tag w as p r opo sed. Laser r ange finde r [5] w as installe d on mob il e robo t t o l oc alize in enviro n ment whe r e g lass w alls s urrou n ded. RG B-de pt h came ra in t h e lig ht o f t wo -dimensional light detec ti on a n d rangi n g technique w as us ed for loc al izatio n i n [6]. In contrast t o co mmon de vice -enable t ec hn olo g y [7], dev ice-free t ec hn olo g y Manuscri pt r e c eiv ed A ugust 9, 2017; r evised N ovem ber 1 7, 2017; a cc ep t ed D ecem ber 6, 201 7. Th is w ork was s upp o r t ed b y t he SUTD -MIT Intern ational D e s ign C en ter und er Gr ant IDG3 170010 4 a nd NSFC 61 750110 529. Yingxi ang Sun, Jiajia C h en (Co r r esponding Aut hor*) and Ch au Yuen are with the Pillar o f Engin eering Pr oduct Developm ent, Sin gap ore Universit y o f Techn ology and D es ign, Singapor e, (e-m ail: jiajia_chen@s utd .edu.s g ). Sus anto R ahardj a is with Sch ool o f Marin e Scienc e and T e chn olog y, N o rthw estern P o l ytech nical Un i versity, Xi' an, P.R . China, and ST MIK R a h arja, T angerang, I ndonesi a ( e - mai l: susant orahard ja@ieee. org ). [8] to loc alize ta r gets that do n ot ca rr y an y dev ic e a lso ha s appeared. Among lo calizatio n techniques , indoo r so und sourc e loc al izatio n (SSL) ha s i mportant appl icatio n s in a w ide range of sce n arios. For e xample, robo ts can loc al ize the so und so ur ce to assist to de tect unknow n defec t in s mart facto r y . Furth e rmore , in smart h ospital, rob ots can atte n d to patients by loc al izin g so un d source. Mo r eove r , ca mera can be aut om atically steered for speaker localizatio n in sma rt m ee ting r oo m. In terms of sec ur ity monitoring, robo ts c an go on pat r ol a nd lo ok fo r sound so ur ce caused by people b reaking in. Th eref ore, i n doo r SSL has receiv ed a l ot o f att e ntion [9] [10 ] in the past decades . The exist ing S SL technolo gies can be catego r ized into thr ee groups: viz, time delay estimation method, be amfo r min g method, and mac h ine l ea rning method. Th e time de la y estimation m et h od is b ased o n co mputi n g the time dif fe r ence of arrival (TDOA) . One wide ly us ed technique fo r TDOA is the generalized cross correlatio n (GCC). As r eve r be r ati o n and noise c ause ambiguities in TDO A e stimation, ma n y effo r ts were made to address this prob lem. Th ese w orks employed various ty pes of micropho n e array s, such a s l ine ar arra y [11], circula r array [12], distrib uted array [13], and arbitrarily -shaped non-copla nar array [14]. The sec ond class is the b eamfo r min g me thod, w h ic h can be classified into sub space approaches and be amscan app roaches. Sub space app r oac hes explo it the ort h ogo n ality between signal a n d n oise sub spaces. Two famo us subs pace algorithms are multiple sig nal classif icati on (MU SIC) a n d esti mation of sig n al pa rameters v ia rotational inva rian ce technique (ESP RIT). B eamsca n approaches can localize the arra y signals into one spe cific directio n. A w ell-kn ow n technique is stee r ed r espo n se pow er phase transfo r m (SRP-P HAT), w h ich is adopte d by many be amscan a pproac h es [1 5]-[18]. Th e machine learning m et h ods are mo re eme r ging approac hes and a fe w at tempts have been made in the literatu r e . Mo st of th e wo rks are supervise d learning methods, in clud ing suppo r t ve ct or mac hin e [19], multilay er per ce ptr o n n eu r al n etwo rk [20], and Gaussia n mixture mode l [21] . Bes ides, a semi-supe r vised le arning algorithm based o n ma n ifo ld r egul arization [22] w as propo sed. Althoug h the ab ove gr ea t w orks h ave bee n do n e t o propo se effe ctive localizatio n algo r ithms , there are still two more maj or challe n ges to be addresse d furth e r . Th e fir s t i ssue is the accuracy of direc ti on o f arrival (DOA) estimatio n in high reve r be r ant e n vi r onme nts. As indoo r environme nt s are ec h oic, the reve r be r ation caused by multipath propagatio n int r o duces spec tr al d istortions and theref ore se verely de ter iorates DOA estimation. Se co n dly , spe ctr al cha r acte r istics o f undesire d bac kgr ound n oise can be t h e same as th e so ur ce signal. As suc h, Indoor Sound Source Loca liz at ion with Probabilistic Neural Network Yingxiang Sun, S tudent M embe r , IEEE , Jiajia Ch en * , Chau Yue n , Senior M ember , IEEE , and Susanto Rahardja, Fello w , IEEE L IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS the DOA estimatio n accuracy i s severely degra ded in l ow signal-to- noise-ratio (SNR ) envi r onments. Theref ore, m o re effo r t i s n ee ded to improv e t h e DOA estimatio n a cc ur acy for SSL in t h ese adv erse envi r onme nts. Amo n g th e ap pli c ati o ns, an important ca tegory e xists where the aco ustic fe at ur es of t h e phys ical rooms can be pr e-stud ied be fore loc a lizatio n . In this case , the acous ti c features includi n g the r oo m im p ulse r espo nse (RIR ) can be ev alua ted befo re any loc ali zatio n is pe rformed, w h ich makes mac h ine lea rning m et h ods the right t o ols. This kind o f da ta drive n trai ning met h ods ca n be more ef fecti v e espec ia lly w h en the e n vi r onment is too co mplex to b e mode le d. In this pape r, w e p r opo se a p r ob abilistic n eu ral netw ork (PNN) base d SSL algorit h m fo r the applicatio ns w h e r e pre-localiza ti o n site su r vey is poss ible. Compa r ed with o ther existing m achine learni ng met h ods, the most impo rtant adva n tage o f P NN is that it doe s n ot require any it erativ e training. I n addit ion, the GCC f eature is adopted to r ob ustl y represe n t th e sound sou r ce pos it ion, m aki n g the trai ning proce dur e effe ctive in r eve r be r ant and noisy environment s. Finally , the propo sed w eight ed locatio n dec ision metho d improv es th e acc uracy of the D OA e stimation by revisiting and acce ssing the prob abiliti es of t h e adjac ent cl usters. Ow ing to these n o velties, the r esults show that the p r opo sed algo r ith m can perfo r m more accu r ate SSL t han existi n g methods in the adve r se en vi r onme nts. The pe r fo r mance is p rov en to be r ob ust too , w h en r oo m e n viro n ment and/o r geo metry vari es. II. S YSTE M M ODEL AND P RO BLE M F ORMU LATION In this sec ti on, we prese nt the SSL p r ob lem to be a ddr ess ed. We consider the proble m of stationary single so urce loc al izatio n in side a 3-dime nsio n al r ectangul ar en clo sed r oom. The loc ati on o f th e so ur ce is a r bitrary inside the r oo m. A stationary mi c r ophone array which co n sists of M m ic r op h one s is used t o r ec eive sound signals inside the same roo m. Thr ou gh these fixed micr op hones in the array , we can r ece ive th e signal transmitted from th e source d irectly and the delay ed r eplicas of the so ur ce refle cted by r oom surface s. The m th microp h one ca n be r epresente d as M m w ith [ 1 , ] m M ∈ . Wh e n the sou nd wave hits a su rface such as a w all, a floo r o r a ce il ing, pa r t of th e w ave is abso rbed by t h e su r fac e while the r est is r ef lected b a ck into th e roo m. We assume that the s ou n d w ave is ref lected b y the su r fac es w ith the angle of in cide nce equal t o t h e angle of reflec tion. Th e r ef ore, t h e rece ive d signal a t eac h mic r o ph o n e is a mixed s ignal, co n s isting o f th e sig nal tra n smitted from the so ur ce dir ectly and th e delay ed replicas o f th e so ur ce w h ic h a r e reflec ted and attenuated . If the sou r ce signal is s ( t ) , the rece ived signal x m ( t ) at th e m th micropho ne can be exp resse d as ( ) ( ) ( ) ( ) m m m x t h t s t n t = ⊗ + , (1) w h ere  de n otes co nvolution. n m ( t ) is t h e n oise at the m th micropho ne, w h ich is uncorrelate d w ith s ( t ) and thos e noise s at other m ic r op h ones [ 9]. h m ( t ) is the R IR w h ic h c ontains the multipath propaga ti on and atte nuation i n f ormati o n be tween the so un d sou r ce and the m th mic r op hone. h m ( t ) v aries w ith so un d so ur ce a n d t h e m th mic ropho n e po siti ons . By assuming t h e receiv ed s igna ls set X = [ x 1 , x 2 , … , x M ] T , R IR set H = [ h 1 , h 2 , …, h M ] T and noise set N = [ n 1 , n 2 , …, n M ] T , (1) c an be written as ( ) X H s t N = ⊗ + . (2) If w e di vide a roo m int o a se t of space cluste r s w h os e volumes ar e small e nough, eac h space cluste r ca n b e represe n ted by a unique 3-di men sio nal coo rdinate in side it. To co p e with the h ig h co mputational bu rden, the r eg r ess ive SSL inside a 3-di mensio n al roo m ca n be transfo rm ed into a likelihoo d based no n li n ear classificat ion prob lem. The r ef ore, the classifier ca n dec ide w hi ch particula r clus ter th e sourc e be l ongs to, as s h ow n in Fig . 1. In this classif icati on p r ob lem, each sp ace cluster is a catego r y a nd a total numb er of K catego r ies ca n be created, i.e. C = [ c 1 , c 2 , …, c K ] and c i 3 ∈ R w it h i [ 1 , ] K ∈ . The co mplexity of th e c lassific at ion g r ow s with the in c r ease of K for f in er-grai n ed cluste r s, which leads to a more accurate lo calization i f the classific ation is s ucce ssful. A ll the K catego r ies are possib le solutions an d eac h pos sible so lut ion c i h as a set of features f eature i that dec ide the probab ili ty of c i be ing th e f inal so lution. B ased o n the f eatures and the rece ived signals set X , a dedic ated cl assifier classif ie s the source into o n e cluste r c s , whos e uni que co ordinate represe n tativ e i s [ d x , s , d y , s , d z , s ]. c s is the solution of th e classif icati on prob lem while [ d x , s , d y , s , d z , s ] is the so lution of SSL prob lem. This c lassificat ion p r ob lem by classif ier f un cti on classify ( · ) ca n be summa r ized as c s = classify ( X , i i feat ure ∀ ∑ ). (3) Assume the a ctu al sou rce loc ati on is [ s x , s y , s z ] inside the cluster c source . Eve n the classificat ion solution is w r ong if c s ≠ c source , the regressio n local izati o n e rr or ε can b e ev alua ted as ( ) ( ) ( ) 2 2 2 , , , x s x y s y z s z d s d s d s ε − + − + − = . (4) The DOA r esul ts in terms o f θ and ϕ ca n be ob tained from , , , s i n c o s ; s in s i n ; c o s x s m y s m z s m d x r d y r d z r θ φ θ φ θ = + = + = + , (5) w h ere x m , y m and z m are co ordin ate s of th e microp h one a rray . r denotes the di sta n ce be t wee n the cluste r c s a n d array ce n ter . θ [ 90 , 90 ] ∈ − +   is the elev ation an gle , from r ’s orthogo nal proje ction onto t h e xy -plane tow ards the po siti ve z -ax is. ϕ ( 18 0 , 18 0 ] ∈ − +   is th e azimut h angle, f r om t h e pos it ive x -axis tow ar ds th e positive y -axis, in te r ms o f r ’s orth o gona l proje ction ont o the xy -plane. Fig. 1. Sp ace clust er class ific ation for SS L If th e longest d iago n al i n side one cluste r is l , ε is bo un ded by l w h en the classif ication i s co rrect. If the classif icatio n is inco rr ect but c s is an adjacent cluste r o f c source , it is still poss ibl e to h a ve ε bo un ded by l . Th e r ef ore, the loc al izatio n e rror bounding depends on the correctness prob ability of t h e classif icati on as P ( | ) 1 s s ou r c e l c c ε ≤ = = ; P ( ) P ( ) s s o u r c e l c c ε ≤ ≥ = . (6) To minimize ε , th eref ore, w e n eed an ef f icient a n d a cc urate classif ier whic h i s with high c lassif icati o n c orrect n ess r ate and affo r dable co mputa tio nal co mplexity . In th e n ext sec t ion, we IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS w il l p r ese n t t h e de tails of the p r opo sed SSL al gorithm b ased o n the PNN classif ie r . III. T HE P ROPOSED A LGO RITHM The relatio n s h ip b etwee n the so ur ce position a n d t h e reco r ded sig n als at microp hone array is no n li n ear. We a do pt PNN [23] as th e classif ier, b ecause PNN is more su ita ble f o r t h e nonlinea r multi-c la ssif ication prob lem. PNN co ntains four lay ers, i.e. input lay er, p attern laye r , summatio n lay er and dec ision lay er succe ssively . With this classifier, we propo se a GCC classif icati o n algorit hm (GCA) to solve th e classif icatio n proble m fo rm ulated in Sec ti on II. A. GCC F eature Extractio n In o r der to generate the input vec tor space I fo r the PN N, w e need to ext ract featu r es from the sig n als a t mic r op h one arr ay . The f eatur e of each r ece ive d sign al is unique. Meanw h ile, as each sou n d sou r ce i s located at a unique pos it ion, the r e is a one-to -one co r r elatio n be twee n th e rec eive d signa ls a n d so urce pos it ions. Fo r a machine lea rning algorit hm to prov ide goo d so lut ion, it is e ssential to selec t w ell-defined fe at u r es prude ntly for t h e tra inin g. Th e reason is that th e prob ability densities of the catego r y patt e rn s are unk n ow n initially . The derivatio n of these pr ob ability densities sole l y r elies o n th ese sele ct ed fe atur es. GCC is an ideal ca n didate to be used as feature, sinc e it contains all th e n eede d in fo rmati o n f or DOA estima ti on and is reliab le in reve r be r ant and noisy en vironments. GCC varies a cross di ff erent f rames. T aki n g th e silent frame as an exa mpl e, G CC is mainly due to th e n oise . In thi s case, if we di r ectly use GCC in a single f rame a s t h e fe ature, it is n ot represe n tativ e. The r ef ore, GCC f r om higher SNR frames n eed to be ev al uated w ith hig h e r w eightage , while the r est a r e wi th lowe r we ighta ges o r ev en neglec ted. In ou r m et h od, GCC fro m all fra mes are we igh ted and summed to b e the featu r e [20], namely GCC f eatur e . As GCC and t h e w eights f or each f r ame are diff erent, GCC fe a tur e is u n ique. The l engt h L of each f r ame is selec ted b ased o n c ompromise b etween go od spec tr al resolution a nd sm all b ias and v ari ance. T h e r ef ore, a ve ctor co n sisting o f L GCCs ca n be ext r acted f or eac h f r ame. Assu me that t h e source si gnal co n sists o f totally F f r ames. We u se l f G C C to r eprese nt the l th GCC co rr espo n di n g to t h e f th f rame of the s ource sig nal, w ith [ 1 , ] l L ∈ and [ 1 , ] f F ∈ . T h e GCC fe atur e co rr espo n ding to one sound so ur ce can be expr ess ed as 1 1 F L l f f f l GCC w G CC = = = ⋅ ∑ ∑ , (7) w h ere 1 1 1 L l f l f F L l f f l GCC w GCC γ γ = = = = ∑ ∑ ∑ , (8) denoting t h e we ight of th e f th fra me. γ is a tu n ing pa ram ete r. To loc alize a sound so ur ce b y an M -mic rophone array w it h M ≥2, w e can co mpute a total number o f M ( M −1)/2 GC C fe atur es using (7), with each co rr espo nding t o o n e microphone co m binatio n . Th ese M ( M −1) /2 GCC fe at ures are groupe d toge th er to fo r m the complete GCC se t co r r espo nding to t h e so un d source . Theref ore, more accurate SSL can b e achieved w it h mo r e m ic r opho n e co mbinatio n s, but at the expe n se of highe r comput ational complexi ty i n GCC f eature extractio n. B. T raini ng At the be ginning of th e training, the e nclose d r oo m is d ivide d into a n umb er of K equal-dimensio n r ec tan gul ar cluste rs, namely c 1 , c 2 , …, c i , …, c K with [ 1 , ] i K ∈ . This dividi n g proce dur e is defined as cl u ster( Dim, K ), w her e th e dime n sio n of each cluster Dim depe nds on th e r equi r ed loc ali zati on accuracy . Assume t hat n i i s the total numbe r of tra ini n g sampl es taken inside the i th cluste r , w e can define f our ve ct or sp aces, namely X ={ X i , j }, S ={ S i , j }, GCC ={ GCC i,j } and H ={ H i , j }. E ac h X i , j rep r ese nts s ignals p roduce d at t h e mi c r opho ne a rray M w h en the j th t raining s ample so und so ur ce S i , j i n side t h e i th cluster is placed, wit h j ∈ [1, n i ]. G CC i ,j is the co rr es ponding GCC f eatur e ext racted from X i , j . H i , j r ep r ese n ts the co r responding R IR be t wee n M a n d S i , j . Give n the sampl ing f r eque ncy of sound signal ( f sampl e ), th e abso rpt ion co effic ien t o f t h e r oo m ( α c ), so un d ve locity in the air ( v c ), reve r be r ation time ( T 60 ) and the noise in th e r oo m ( N ), th e RIRs be t ween the microp h one array an d so ur ces can be co mputed [24]. This p roce dur e is def ined as RI R ( f sample , v c , T 60 , N , α c , M , S ). By convoluting H w ith S a n d addi n g N , we can produce th e signal ve cto r space X . After that , the GCC featu res GCC are ext racted using (7). We de fine this p r oc edure as GF ( X , γ ). U pon co mpletion of the fe at ure extractio n, all featu r es are supplied to PNN as the input vec tor space I . Th e n umb er of neurons of i n put lay er is e qual t o t h e dime n sio n o f input GCC fe atur e v ecto r . I n pa ttern laye r , t h e n umb er of neuro ns equ als to the total n umbe r of tr aining samples p la ce d to trai n th e PN N. Therefo re, th e r e ar e ∑ K i =1 n i n eu r o ns in pattern layer. The neurons of t he pattern lay er m ap input GCC fe at ure v ecto r t o a high-dime n sional sp ace a nd es timate co rr espo nding probab ili stic de n sity by Gauss ian kernel r ep r ese n ted as , , , / 2 2 ( ) ( ) 1 ( ) exp ( 2 ) 2 T i j i j i j D D GCC GCC GCC G CC GCC ϕ π σ σ   − − = −     , (9) w h ere φ i , j ( GCC ) is t h e G aussi an ke rn el f unction. σ is the s pread parameter w hic h represe nts the width o f th e Gaussia n kernel. T denotes the tra n spos e. GCC is th e D -dime n sio nal input GCC fe atur e ve ctor. GCC i , j is the center of the ke rn el . The output o f each neuron in the patte rn lay er ca n be generated usi n g (9) and all o utput s are transmitte d to the summatio n l ay er, in w hich t h e numbe r o f neurons e quals K . By ave r aging the output of a ll neuro n s tha t belo n g t o the same cluster c i , the su mmatio n lay er co mputes the prob ability p i ( GCC ) o f t h at input G CC feature being classified into the i th cluster as , , /2 2 1 ( ) ( ) 1 1 ( ) ex p (2 ) 2 i T n i j i j i D D j i G CC G CC G C C GC C p G C C n π σ σ =   − − = −     ∑ .(10) Assume th e priori pr o bability of occurr e n ce of e very cluster c i is h i , and the loss cause d by misclassification dec ision fo r each cluste r c i is co i . The dec ision lay er neur o n c lassif ies th e input GCC fe at ure into c luster c s acc ording to t h e Bay es’s dec ision r ule [23] as ( ) ( ), s s s i i i h co p GC C h c o p G C C i s × × > × × ∀ ≠ , (11) w h ere p s ( GCC ) i s the prob ability of GCC bein g classif ied into cluster c s . We assume h i a nd c o i a r e unique f or all the c lusters, so that the GCC fe atur e is cl assified i nto cluster c s as { } ar g ma x ( ), . s i i c p GCC c = (12) IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS We define t his p r oc edure a s DA ( GCC ). p i ( GC C ) a lso is th e probab ili ty of eac h training sample bei n g classifie d into the i th cluster, as there is a one-to -one co rr elatio n b etw een GCC and S . In terms of the output laye r , the r e i s onl y 1 n eu r o n , as o n ly t he most p r ob able class is chose n by t h e PNN . C. Loca lization Once the PNN is trained w ith the GCC f eatures, the GCA co n tinues t o the se co n d stage to l oc ali ze the unknow n sound so ur ce S u into o n e of th e K cluste rs. As prese nted i n Section I II B , the p r o bability o f S u be in g classified i nto e ve r y cluster can be co mputed by PNN usin g (10) , acco r ding to S u ’s GC C fe ature. Therefo re, the decisio n lay er classif ies S u i nto any of th e K clusters c s using (12) with those computed probab ili ties . How ever, whe n the space cluster’s vo lume is sm all, it i s diff icult to di st inguish which c luster th e source actual l y be l ongs to a nd hence t h e rate o f misclassific at io n b eco mes highe r . The s ituatio n gets w or se w h en t h e actual so ur ce c source is clo s e to t h e boundaries of tw o adjace n t clus ters. To so lve t his proble m, w e pr opo se a w eigh ted loc ati on dec ision m et hod (WL DM) in GCA instea d of using the PNN dec ision layer to classify dir ectly , which is prese nted be low. To guar antee 1 = 1 K a a p = ∑ , the so f tmax funct ion is ado pted to b e th e tra nsfe r fu n ct ion b etw een the patte rn lay er and the summ atio n lay e r . The r eb y w e can no r m alize t h e c atego rical prob ab ili ty distributio n in the ra n ge of (0, 1) that adds up to 1. W ith the pro bab ilitie s of a ll clus ters co mputed, w e select th e ζ mo st po ss ible c lusters who se prob ab ili ty sum is l es s than a clus ter size depe n dent o n a t hresho ld TH R , i.e . 1 a a p T H R ζ = ≤ ∑ . The sele ctio n sta r ts from the cluste r w ith top pro bab ility fo llow in g the de sc ending orde r, and sto ps be fo r e one ad ditio nal clus ter th at w ill caus e the p r ob ab ility sum to b e hig her t han the th r esho ld. Af ter these ζ adjace nt cl uste r s are se lec ted, we pe r fo rm t h e l o calizat ion throug h the f ollo w ing tw o ste ps, w hich are p relimi nary es timation and s ample poi nts e stimat ion. Le t P a deno te t h e ce ntral po int c hose n fo r the a th c luste r, w i th a ∈ [1, ζ ] and it s Ca r tes ian co o rdinates are x a , y a a n d z a . The prel imina rily e stimate d sou r ce po sitio n P s wit h Carte sian co ordinate s x s , y s a nd z s are co mputed as 1 1 1 ; ; s a a s a a s a a a a a x p x y p y z p z ζ ζ ζ = = = = ⋅ = ⋅ = ⋅ ∑ ∑ ∑ . (13) This proc ed ur e is de f ined as PE ( p a , P a ) . W ith (1 3), w e c an co mpute the dista nce l a be twe en the r ep rese n tat ive po int of th e a th adjac ent cluste r a n d t he es timate d so urce po sitio n by ( ) ( ) ( ) 2 2 2 a s a s a s a y l x x y z z = − + − + − . (14) The l o n ge r dista n ce indica tes that the ac tual so u r ce po sitio n is more likely to be far away f r om that p articu la r cluste r and he n ce its prob a b ili ty is supp os ed to be r ed uce d. The r efo re, new we ight o f the a th c luste r w hich is inve r se ly pro portio nal to th e dista n ce ca n be derive d by ( ) ( ) 1 1 / , 1 / a a a a l w l λ ζ λ = = ∑ (15 ) w here w a is the new w eight o f the a th c lus ter. 0< λ <1 deno tes the co ntr oll ing parame ter . T his p r oc ed ur e is de fined as weight cluster ( l a , λ ) . In o r de r to r edu ce the e rro r f u r the r, w e adju st the lo cali zation by m ore sample po ints in the sec ond step. I n eac h adj ace nt clus ter, β samp le poi nts are sele cte d to repres ent the clus te r po sition mo r e acc urate ly . Simil ar to the new w eights o f c luster, β sa mple point w e ights ca n be co mputed by ( ) ( ) , , , 1 1 / , 1 / a t a t a t t l w l ρ β ρ = = ∑ (16) w here l a,t is the d ista n ce f rom P s to the t th s amp le poi nt i n t h e a th clus ter wit h t [ 1 , ] β ∈ . w a,t de note s the w eight fo r the t th sam ple po in t in the a th c luste r. 0< ρ <1 is the co nt r olli ng p ara mete r. T his proc edu r e is de f in ed as w eight sp ( l a,t , ρ ). Theref o re, w e ca n de cide t he lo calizat ion o f c s t hroug h WLDM ( w a , w a,t , P a,t ): , , , , , , 1 1 1 1 , , , 1 1 ; ; , (1 7 ) x s a a t a t y s a a t a t a t a t z s a a t a t a t d w w x d w w y d w w z ζ β ζ β ζ β = = = = = =     = ⋅ ⋅ = ⋅ ⋅           = ⋅ ⋅     ∑ ∑ ∑ ∑ ∑ ∑ w here x a,t , y a,t a nd z a,t a r e C artesi an c oo rdinate s o f the t th s amp le po in t i n t h e a th clus ter. TABLE I T HE P SEUDO C OD E OF TH E P ROPOSED GCA GCA( Train , Loca lize ) begin Train ( M , Dim , f sample , v c , T 60 , N , α c , K , n i , γ , S ) // training stage o f GCA begin C =cluster( D im, K ); // div ide the ro om into K clu sters for all i ϵ [1, K ] for all j ϵ [1, n i ] H i,j = RIR ( f sample , v c , T 60 , N , α c , M , S i,j ); // c ompute th e RIR X i,j = H i,j  S i,j +N ; / / obtain the signal at micr ophone array GC C i,j = GF ( X i,j , γ ); // extract GCC feature end end p i ( GCC )= DA ( GCC ); // train PNN end Localize ( S u , γ , ζ , β , λ , ρ , THR ); // l ocali zation stage of GC A begin GCC = GF ( S u , γ ); p i ( GCC )= D A ( GCC ); / / comput e th e probability for all a ϵ [1, ζ ] P s = P E ( p a , P a ); // obtain preliminar y e stimation o f sour ce position w a = w eigh t cluster ( l a , λ ); // comput e weight s of cluster s for all t ϵ [1, β ] w a,t = w e ight sp ( l a,t , ρ ); // co mpute weights o f sample p oints end end c s = WLDM ( w a , w a,t , P a,t ); // obta in final source p osition end return DO A =[ θ , ϕ ]; end Fig. 2. Fl ow ch art of th e propos ed GCA Tra ining Stage Labeled signal samples at microphone array GCC feature extraction PNN training Localizatio n Stage Received s ignal samples from unknown p osition s GCC feature extraction Probability estimation a nd classificatio n Sample p oints estimation Preliminary estimation Final lo cation IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS The l o cated pos itio n c s =[ d x , s , d y , s , d z , s ] is t h e so lutio n of the GCA in C artesi an c oo rdinates and [ θ , ϕ ] is t h e DO A resul ts w hich can be comp uted by (5). The pse udo co de of t h e propo se d GCA is summ a r ize d i n Table I. The funct io n GCA ( Train , Loca li ze ) co nsists of tw o s ub-f un ctio ns w hic h a r e Train ( M , D im , f sample , v c , T 60 , N , α c , K , n i , γ , S ) and Local ize ( S u , γ , ζ , β , λ , ρ , THR ), rep r ese nting the two stages of GCA respe ctiv ely . Finally , the D OA ( θ , ϕ ) is retu rn ed as t he ou tput s. The f low chart of the p ropo se d GCA is depic ted in Fig . 2. IV. S YNTHET IC E XPERIMENT A L R ESULTS AND D I SCUSS ION In this se ctio n , sy n thetic expe rime nt s are co n duc ted to ev aluate the pe r fo rmance o f the prop os ed GCA w hi le o th er th r ee rec ently publis hed algo rithms pres ente d i n [14] , [1 8], a nd [19] are e mploy ed t o be th e co mpe ting met h o ds. A. Syn thetic Experimen tal Setup A ty pical me dium s ize meeti ng roo m w ith di me nsion as 4.0m × 4.0 m × 4.0m is simu late d. The mic r o phone array co nsists of six mic rop hones, w h ic h are pl ace d at M 1 =(1 .8m, 2.0 m, 2 . 0 m), M 2 =( 2.2m , 2. 0 m, 2.0m) , M 3 =(2 .0m, 1.8m , 2 .0m), M 4 =( 2.0m, 2 .2m , 2 .0m ), M 5 =(2 .0m, 2 .0m, 1.8m) and M 6 =(2.0 m, 2.0 m, 2.2m ). T he s ource is place d o n a sphere ce ntered at t h e ce ntroid of the r o om, w ith three dif fe r ent radi us va lues 0.5 m, 1.0 m a nd 1.5 m. O n eac h of th e three spher ical su rfac es, the so und so urce is plac ed at 21 dif feren t azimut h value s fro m ‒ 160º to +160 º a nd a t 9 diff erent e lev atio n v alues f rom ‒60º to +60º , bo th w ith ev en in te r vals . I n total, the so und so urce is place d at 5 67 dif f erent po sitio n s distrib uted in the roo m. In o ur ex perime nts, om nidi r ec tional mic rop hones are ado pted, w i th f r eque ncy r es ponse from 2 0H z to 20 k H z and dy n amic ra n ge of 87dB . W e use six mic r op hones rat h e r tha n o ther n umb e r s t o fo rm the array w ith suc h sp atial d istributio n mai n ly due to f our reaso ns. F irstly , if m icrop hones a r e d istribu ted alo ng e ac h dime nsio n of th e spac e, pos itio n o f the so ur ce ca n be be tter de termi n ed as t h e s ou nd pro pagate s v ia e ac h di mens ion of roo m. Sec o n dly , we onl y use 2 microp hones alo n g each of t he th r ee dime n s ions to minim ize co mputatio nal c omp lexit y . Third ly , co nsideri n g t h e tradeo ff b e t wee n co mputat ional co mple xity and val idne ss of i n fo r matio n o btai n ed from cro ss co rr ela tio n s, we set the maxim um dista n ce be tw een any two mic ropho nes to be 40c m. In add itio n, co nsideri ng a test so urce can b e p lace d any w h ere i n the r oo m, by ref erring to the s etup of co mpe tin g met hod TD E [14] , t h e ce nte r o f micro ph o n e a rray is place d at the ce nter o f t he ro om. A clea n spe ec h s a m pled at 8 k H z as [25] i s ado pted to be the so und so ur ce . T he 2 .7-se co nd s pee ch (f rom 220H z t o 3.4 k H z) is f r om the N OIZ EUS datab ase in A me ri ca n Eng lish l angu age . The s oun d so urce is als o o mnid irec tional i n t he setup . T he rev erbe ra tio n t ime T 60 , w h ic h me asu r es the t ime f o r t h e o rigi nal so und to dec ay by 60d B, is set to b e dif fere nt lev els as 0 m s, 100 m s, 200 m s, 400 m s a n d 6 00 m s . T he lo nge r T 60 rep r es ents the hig h e r rev e r be rat io n in the ro om. T h e SNR in the roo m is se t to b e dif ferent lev els as 10dB , 0dB , ‒5 dB and ‒10d B, whe re the noise is add itiv e no ise. The du ratio n o f e ach f rame o f the spe ech sig nal is c hose n to b e 0.064s and t he ov erlap r ate b etwee n two f rames is se t to b e 62.5% . As the maxim u m dista n ce b etw e en any two mic ropho n es o f the a rray is 0 .4 m, t he max imum po ss ible time de lay is 1 .17 m s by ass uming the s ou n d spe ed in t h e a ir b ei ng 3 43m/s . A s the sa mpli ng rate is 8 k H z, the max imum delay num be r i n s ample s is 10 . The ref ore, fo r a mic ropho ne pa ir, the f i r st 10 c ross co r relatio n s co n tain t he v al id inf ormatio n. H ow ever, i n case o f miss in g valid ity , w e selec t the f ir st 16 cros s co rrela tio n s to be the fe atu r e. As there are tota lly 15 m icrop hone c omb inatio ns fo r c r os s co rr ela tion c ompu ting, the dime n sio n of the GCC f eatu re v ec tor applie d to the inp ut lay er is 240. T h eref o r e, t he input lay er co nsists o f 240 n eu ro ns. In t h e trai ning st age of ou r sy n thetic expe rime nt s , t h e roo m is div ided into 409 6 equa l-di me n sio n recta ngula r sp ace c lusters w ith dime n sio n s as 0.25 m × 0.25m × 0.25 m e ach. The so und so urce is ra n do mly a n d suc ce ss ively place d i n e ac h cl uste r o nl y o n ce , i.e . n i =1 , as t he c luste r vo lume is s mal l. The r ef o r e, a total of 409 6 c omple te GCC f eatu re s ets are e xtracte d. I n this cas e, bo th patte rn lay er and sum matio n lay er co nsist of 4096 neu r o n s . Fo r the sp r ead p arame ter σ , a sm all σ w ill c ause ov erfitti n g w h ile a larg e σ w il l r esu lt in unde rfit ting. I n p ra ct ice , by r ef err ing t o [23] , σ c an be sele cte d fro m 3 to 10. In our ex perime nts, w e se t its v alue to b e 5. Fo r t he W LD M, w e s elec t the 15 mos t po ssib le cluste rs w h o se pr o bab ility s um is less t han 0.004 . Fo r the co ntrol ling p arame te rs, b oth λ a n d ρ are se t to b e 0.25 w hile γ i s set to be 2. In eac h a djac ent clus ter, 8 ve rt ex e s are sele cte d as samp le poi nt s to rep rese n t the clus ter po sitio n mo r e ac cu r ate ly . B. Implemen tation W e pe r fo r m the sy n th et ic e xperime n ts to c o mpare our r esu lts w ith three rece nt me tho ds, w hic h a r e t ime de lay estimati o n met hod TD E [1 4], b eamf o r ming met hod TL -SSC [18] , a n d mac hine learn ing met hod LS -SVM [19] . In the sy n th et ic ex perime nts, D im , T 60 , α c , and SNR a r e a ll r equ ired by our propo se d method and the co mpeti n g met h o ds. As the autho r -s h ared co de s o f TDE an d TL-SS C are avai lable on line at [26] a n d [27] r es pec tive l y , w e se lect thes e two a lgo rithms as co mpe tin g me thod s. T his helps to av o id a n y po tential e rr ors w hen mode ling the algo rithm by non-aut hors so that the co mpariso n is f air and valid . As the TL-S SC i s an imp r o ved v ersio n o f th e w idely use d S RP -PHAT algo rithm , w e do no t ado pt the origi nal SR P-P H AT algo rithm as a c omp eting met hod. Fo r LS- SVM , we c olle ct the TDO A f eatu r es as its o r iginal pape r [19] fo r trai ning. I n addit ion , as LS -SVM algo rithm transfo rms loc alizat ion to be a pur e class ific ati on prob lem, we as sume th at t h e es tim ated s oun d so urce po sitio n is at the ce nt r oid o f the clus te r w h ere it is class ifie d into . Furt hermo re, the pe rfo rmance s of th es e c ompe ting metho ds de gr ade if w e ado pt o ur m icrop hone a rray se tup i nto the ir met hods . To make f air co mpa r iso n, theref ore, the mic ropho n e array s for these three co mpeting met hods are setu p in t he sam e w a y as give n in t heir o r igi n al p ape r s, [1 4], [1 8], and [1 9] respe ctiv ely . What’s more, to imp r o ve R IR co mputati o n eff ic iency , fas t image met hod [24 ] is ado pted and the so urc e co de is avai lable o n line at [ 28]. Al l the f our me thods are imp lemen ted in Matl ab and run by a wo rkstatio n wit h 32G B R AM and du al I ntel Xe on 2. 4GH z p r oc esso r E5-263 0 V 3. C. Resu lts and Discussi on Valid ation on Fea ture Extraction The f ir st expe r ime n t i s to ex amine the e ff ectiv enes s of GCC f eatures. Simul atio ns are pe r fo rmed in f our dif fe rent IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS env ir o nme n ts w ith SN R dec reasi n g fro m 10dB to – 10dB , w he n T 60 = 0 m s , as demo n str ated by ( a) to (d) of Fig. 3. W ith the same aco ustic env ironme nt as in eac h sub plot, w e repe ate d the ex tractions f o r t hr ee time s w hich are s epa rated by the tw o b lack lines . It ca n be see n that the co mputed GCC fe atures in y ellow co lor patte rns demo nstra te goo d r ep r es entat ive n es s fo r the tes ting clus ters . T h e c ont ra s t be twe en the y e llow patte rn s and the b lue reg ions b e come s mo r e disti nctive w h e n SN R r is es f r om –10dB to 10dB . This s h o ws that the GC C fe ature rep r es entat ive n es s is mo re reliab le w h en SN R is high. Simi la r regul arity can b e o bserv ed w h en S NR is fixe d and T 60 va ries , w here GCC f eatu r e is more r e liable whe n T 60 is lo w . (a) (b) ( c) (d) Fig. 3. GCC f e at ures extr acted fr om diff e r ent SNR when T 60 =0 m s Impa ct of Reverb eration an d SNR W ith the v alida ted G CC fe atures , w e pe rfo r m t h e S SL us ing GCA a nd c ompa re t he pe rfo r manc e. Be caus e mos t o f the SS L appl icatio ns are to lo cal ize the so urce di rectio ns, w e ado pted the DO A as the perfo rma n ce met ri c. The results are summ arize d in Tab le II. A total of 20 dif fe r ent aco ustic env ir o nme n ts a r e c r eate d by v ar y ing SN R f r om 10dB to –1 0dB and T 60 fro m 0 m s to 600 m s . Wit h eac h e n v ir o nme n t, the DOA es timatio n e rr or (DEE) in te rms of the mea n e rror a nd sta nda r d de viatio n o f ϕ a n d θ a re co llec ted fro m the 56 7 lo calize d po sitions. To ev aluate the perfo r ma n ce w ith the diff e r ent acc uracy r equi r eme n ts , the l o caliza tio n succ es sful rate f or DO A estima tion (SR DE) is de fined . SR DE( α º ) pres ents t h e pe r ce n tage o f loc alizatio ns w ith b oth ϕ e rr o r and θ erro r les s and eq ual to ± α º o ut of the 567 loc alizi ng. In Tab le II, the results of SR DE(10º), SR DE(20 º) and SR DE(3 0º) are prov i de d. W h e n SNR is fixe d and T 60 vari es f r om 0 ms to 60 0 ms , t h e acc ura cy ge nerally drop s f o r e very algo rithm . Sim ilar tren d ca n be ob serv ed when T 60 is fixe d and SN R dec r eas es. Ho w ever, SR DE is n o t alw a y s in c reasi n g w it h the inc r eas ed SNR . In so me sce nario s, suc h as ve ry long T 60 , SR DEs may not s tri ct ly inc rease w ith the SNR . Th is s how s that w hen rev erbe ration is se ve r e, a little v ary of SNR w ill n ot af fec t the SR DE sig nific antly . If w e co mpare acro ss diff e r ent algo rithms , the p ropo se d GCA outpe rforms o ther t hree algo rithms s ignific antly . SR DE(3 0º) o f GC A is 100 % w h e n T 60 is lo w , reg ardles s o f SN R , and d rops to 69.8% in t he w orst c as e of T 60 = 6 00 m s a n d S NR = ‒10dB . Fo r TDE and TL -SS C, SR DE( 30º) a c hiev es 54 . 9% and 64 . 2 % in the be st cas e of T 60 = 0 m s and SNR = 10dB . With adv e rse env ir o nme n ts , how ev er, the SR DE o f TD E and TL-SS C d r op s, w hich sho ws that high r ev erb eratio n a nd low SN R affe ct th e lo calizatio n e ffec tive n es s of these two algo rithms . W he n α º is smal l suc h as 10º , the bas eline suc ce ssf ul rate by r ando m lo calizatio n s hould be (20 º/360º) × (2 0º/18 0º) = 0.6 2% . I n t h e mo st ad ve r se envi ronme nt, TDE pro vide s low SRD E(10º) slig h tly be tter t ha n th is bas eli n e rate . N ev erthele ss, this c an still reaso nab ly show that TDE pe r f ormance w i ll d r o p f o r mo re adv erse e nvi r on ments. Fo r LS -SV M, the r esul ts f or T 60 = 600 m s are le ft b lank in Tab le II , as t h e pro vide d so urce co de of fast image me tho d enco un te r s erro r s in this case . To avo id ina ppro priate imp lemen tatio n of LS- SVM , we pres ent an d disc uss the resul ts of L S-SV M w h en T 60 v ar ies from 0 m s to 400 m s o nly . The results s how t hat LS -SVM has perfo rma n ce simila rly to TD E and TL-SS C in te rms of acc ur acy for low r ev erbe r atio n and hig h SNR b ut drops in v ery adv erse e n v ir onme n ts. W h e n SN R = ‒10d B, the SR DEs by the co mpe ting algo rithms wit h longe r T 60 are so me times sligh tly higher t han tho se w ith s horte r T 60 . It show s t hat thes e algo r it hms are mo re se n sit ive to the ext r eme ly l o w SNR . W hen the sig nals are ve ry we ak, the algo rithms are sig nif icantly affec ted by the no ises . The av e r ages of DEEs and SR DEs unde r t he twe nty different env ir o nme n ts a r e co mpute d fo r eac h algo rith m. They ar e plo tted in Fig . 4( a) and (b) r esp ec tive l y . In F ig. 4(a) , the av erage o f me an e rr ors o f azimut h a ngle a nd elev atio n ang le by GCA a r e o nly 4.6º and 3.1º r es pec tive l y , indic ating t hat it c an es timate D OA v ery acc urately . In c ontras t, t h e D EEs o f o ther algo rithms are sig nif icantly hig h er. Com pari n g w ith the b es t pe r fo rmance amo ng the t hree co mpeting algo rithms , GCA ca n lo calize wit h ave rage of 88.6 % and 83 .8% reduc ed ϕ e rr o r and θ erro r resp ec tive l y , fo r all t h e 20 aco ustic e n v ironme n ts. In F ig. 4(b ), the ave rage SR DE(1 0º), SR DE( 20º) and SR DE(30 º) b y GCA c an achie ve 87.5% , 94.4 % and 96.9% respe ctiv ely . On the ot her ha n d, the av e rages of SR DEs of other three met h o ds in the 2 0 diff erent aco ustic e n vi ronme n ts are sig n ific ant l y lo wer. Co mpa r ed w ith t h e be st pe r fo rma n ce s a mong the thre e algo rithms , GCA imp r ov es av erages o f S R DE(10º), SR DE(20 º) and SR DE(3 0º) by 81.1%, 74.1% and 60.3% res pec tive l y . Fig. 4 . (a) DO A estimati on e rr ors (b) SRDEs with diff e r ent r equir ements How ever, it s hould be noted th at t he sig nific antly i ncrease d SR DEs by GCA ar e m ain ly contribute d by seve r al f acto r s. T h e f ir st o n e is t h e pre-lo c ali zat ion s i te su r vey effo rt to co llec t the f eatures, w hic h is n o t n ee ded by TD E and TL-SS C. Sec o n dly , the 567 d iff ere n t posit ions tes ted in this ex pe r ime nt co vered mo st of the po sitions in the roo m. Be cau se th e pe r fo rmance of IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS o th er co mpeti ng algo rithms may vary when t es ting at diff e r ent po sitions, the co n siste n t pe r fo rma n ce of GCA be co mes sig ni f icant w hen co mpa r in g the ave r age SR DEs of 567 test s. W h at’s mo r e, fo r LS -SVM , whe n the space cluste r ’s vo lume is smal l, it is dif f icult t o dec ide w h ic h c luste r the so und so urce b elongs to . This is a reaso n why LS -SV M canno t a c hiev e eve n hig h e r SS L ac cu racy , me anw h ile , it v erifie s the c o n trib utio n b y the p r op os ed WL DM in G CA , whic h can so lv e this p r o ble m. In addi tion, the GC C featu res use d in GCA is mo r e r ob ust co mpared to TDO A f eatu r es used in LS -SV M. Robu stness Valida tion In p ra ct ice , whe n r oo m ge ome tr y and aco ustic fe atures chan ge, suc h as pe op le mo v eme nt, doo rs o pe ning and clo sing, the v alidity o f the c o llecte d t raini ng d at a v aries . I n this cas e, we need to e n su r e that the DO A by G CA is still acc urate and r ob us t ev en whe n the envi r o nme n t c hange s af ter t r ai n in g. Mo reo ver, we also e xpe ct that the propo sed GC A can pe r fo rm we ll w ith so und so ur ce s at d iff e r ent f r equ encie s. Fo r the f ir s t expe rime nt, we ev aluate the rob ustness of G CA w ith resp ec t to the cha nge o f rev erbe ration t ime . We co llec t fo ur gro ups of t r ai ning da ta wit h SNR = –1 0dB , –5 dB, 0dB and 10dB respe ctiv ely , w hen T 60 = 200 m s . Nex t, f o r e ac h g r o up, w e v ar y T 60 to 0 m s, 100 m s, 400 m s and 600 m s to r ef lec t th e actua lly chan ged T 60 du ring loc alizing and co llec t t he testi ng data fro m the 56 7 test po sitio ns. T h e r es ults are su mma r ize d in Table III(A ). The T 60 = 200 m s results co llec ted as trai ning dat a is the b enchma rk and hig h lig hted i n gre y co lo r . C omp a r ed to t he b enchma rks, t he wo rst SR DE( 30º) d r op s are 22.7% , 6.2 %, 1.1% a n d 0 .5% respe ctiv ely for the f o ur grou ps. This sho w s that GCA is r ob us t to T 60 w i th S RD E(30º) e ve n w hen T 60 v arie s sig ni f icantly , exc ept the ve ry adve r se envi ronme n t whe re SN R = –1 0dB and T 60 = 600 m s . Fo r the se co nd exp erime nt, w e validate the rob ustnes s of GCA w ith r esp ec t to th e c h an ge of SNR . S imilarly , w e collect f ive g r o ups of r es ults wit h diff e r ent T 60 as 0 m s , 100 m s , 200 m s, 400 m s a nd 60 0 m s r esp ec tive l y w hen SN R = 0dB, as the traini ng da ta. Nex t, w e v ar y SNR to –10dB , –5dB an d 10dB to ref lect the ac tual ly c h ange d SNR duri n g lo calizi ng and c olle ct the tes ting d ata f r o m the 567 t es t pos itions . T he r es ults are summ arize d i n Tab le III(B ). T h e r es ults wit h SNR = 0dB is t h e b enchma rk and hig h lig hted i n gre y co lo r . C omp a r ed to t he b enchma rks, t h e w o rst S R DE(30º) d rop s are 5.7 %, 6.7%, 5.1% , 29.4% and 36.2% r es pec tive l y fo r the f ive gr o ups. The ref ore, GCA is r ob ust to SNR ex ce pt the ve r y adv erse env ir o nme nts w here SNR = –10 dB mea nwhile T 60 = 4 00 m s and 600 m s . Fo r the third e xperime n t, w e illus trate the imp act of f r eque ncy change o n lo calizat ion accuracy of GC A. We use tw o new so und so urce s [29 ] rathe r t han hum an sp eec h, i .e . mac hine r y sou n d and telep h one ring, w hose f reque n cies are dif ferent f rom t he p r ec eding hu ma n spe ec h so ur ce . We c ondu ct the exp erime nt u nder the co nditio n s w h ere SNR = –10 dB and 10dB , w i th T 60 v a r y ing f rom 0 m s to 600 m s . Al l the s etups are the same as thos e of huma n sp ee ch sc ena rio . The resul ts are summ arize d in Tab le III(C ). Co mpa red to t h e r es ults of huma n spe ech sc ena r io in Tab le II , w hen T 60 = 0 m s, 100 m s, and 200 m s, SR DE(3 0º) is almo st unc ha n ged fo r b oth of th e tw o new so urce s. Whe n T 60 be co mes hi g h e r wit h SNR = –10d B, SR DE(3 0º) o f mac h ine ry so und i ncreas es a little while t hat of tele pho n e ring dec r eas es slightly . Whe n T 60 b eco mes h igh er w ith SNR = 10 dB, SR DEs(3 0º) of bo th mac hine ry s o und and tele pho n e r ing dro p slig htly f aster. H ow ever, o n ave r ag e 9 4.9 % acc uracy and 88 .7% accu racy can st ill be achiev ed in terms of SR DE(3 0º) fo r t h es e two n ew so und so ur ce s respe ctiv ely . The r ef o r e, w e can co nclude that, t h e loc alizatio n acc uracy of the p ropo se d met h o d is sli g h tly aff ecte d by th e c h ange s of f r eque ncy , and t h eref o r e go od pe r forma n ce sti ll ca n be achie v ed when freque n cy change s. Impa ct by Differen t Test Se t To ev aluate the pe r fo rma nces of the fo ur algo rithms w ith dif ferent tes t set, w e r e-c omp ute the SR DEs of the four algo rithms wit h 378 po sitio n s. To make su re w e evalua ted the co mpe tin g algo rithms i n the co rr ec t ma nner, w e hav e used t h e autho r -s h ared sou rce c odes of TDE and TL-S SC. T hes e po sitions a r e ob tained by remo v in g the sou r ce pos itions on the sphe rical su rfac e w ith rad ius = 1 .0m, f r o m the prev i o us 567 po sitions. Du ring t he e xpe rime nt, the SNR is se t to b e –10d B, – 5dB , 0dB , and 1 0dB , w h ile T 60 v aries f rom 0 m s and 600 m s . The results are su mma rize d in Tab le IV. It c an b e ob serve d th at t h e th r ee co mpeting algo rithms ca n pe rfo r m relat ive l y we ll w h en SN R is hig he r , i.e . SNR = 0dB and 10dB , co mpa r ed wit h the cas es where SN R is low er, i.e. SN R = –10d B and –5d B. How ever, G CA stil l o utpe rfo rms them e ve n w ith highe r S NR . This sho w s that GCA perfo rms be tter tha n o the r thre e algo rithms w hen the so und so urce is eit her c los e to or f ar away f r om the mic r o ph o n e ar ray in adv e r se envi ronme nts. This be tter pe r fo rmance resu lt s f r om se v eral facto rs, suc h as the aco ustic f eature study ing, r o bus t GC C f eatu re, the prop os ed WL DM met hod, and co nsiste nt loc alizatio n capab ility at diff erent tes t po sitions in a ro om. Compl exity Compari sons The co mputa tio nal c omp lexi ties of t h e fo ur algo rithms a t 567 po sitio n s are sum marize d in Tab le V . Fo r eac h T 60 , the prese nted CPU time and real pro ce ssing time a r e the ave rage results w h e n SN R = –10 dB, –5d B, 0dB a n d 10 dB . As mo r e than o ne co r e are c alled du ri n g co mpu ting, the CPU time is hig h e r than t h e r eal p r oc es sing time . From Table V, the mac hine lea rni n g b ased GCA a nd L S-SV M have do minati ng off lin e trai ning, w hich c onsists o f R I R computi ng, fe ature ex traction and trai ning. Ho wev e r , we ca n ov e r co me this def ect by takin g fu r the r a pp r o xima tion w ith f ast imag e me thod and usi ng les s t r ai ning samp les. In additio n , o ff lin e trai n ing in the mac hine l ea rning algo rithm is o n ly perf ormed o nce f o r the f ixed r o om. Fo r GCA , the CPU time of the r es t o nline lo calizatio n o n ly acco unt s fo r 0.8 8%, 0 .55% , 0.32% , 0.13 %, and 0.0 6% o f th e t o tal CP U time f o r T60 = 0 m s , 1 00 m s , 200 m s, 400 m s, 600 m s r esp ec tive l y . This make s G CA e spe cially suitab le f or the r eal- time loc alizatio n app licatio ns wh en pre-lo cali zatio n site su rvey h as al r eady bee n do n e. In co ntras t, LS -SV M is co mputatio nally ineff ic ient when the numb er of clas sific ation categ orie s is larg e and the off lin e t raini ng co sts mo r e tha n 36733 4.3 sec o n ds of CPU time. This valid ates the adv antage of GCA on trai ning spee d co mpared w ith LS- SV M. TDE co sts at leas t 3291 1.0 se co nds of CPU time to generate the results o f loc ali zing 567 po sitio ns. In co nt r as t, TL-S SC is t h e mo st co mputatio nally eff icie n t algo r ith m amo ng the four met hods . T he p re-lo cali zation lo ok- up t able (L U T) c omp uting co sts ab out 906.9 sec o n ds CPU time and the loc alizatio n co sts aro und 755 .4 sec onds o nly . Co mpar ed w ith the ti me co st of IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS TABLE I I R ESULTS OF T HE F OUR A LGORITHMS AT 567 T ES T P OS ITIONS SNR/dB ‒ 10 ‒ 5 T 60 / m s 0 100 200 400 600 0 100 200 400 600 GCA (Pro posed) ϕ error (mean/ de viation)/º 1.8 /1.6 2.4 /2.0 5.1 /5.0 12. 7/14.0 22. 8/28.2 1.2 /0.9 1.3 /1.1 2.1 /1.8 5.4 /6.6 10. 8/15.3 θ error (mean/deviation)/ º 1.3 /1.1 1.7 /1.3 3.6 /3.2 8.7 /9.3 14. 3/15.0 0.9 /0.7 1.2 /0.9 1.5 /1.2 3.6 /3.9 7.0 /8.3 SRDE(10º) 100 % 99. 6% 84. 3% 46. 7% 27. 7% 100 % 100 % 99. 5% 81. 8% 56. 3% SRDE(20º) 100 % 100 % 97. 9% 75. 1% 55. 2% 100 % 100 % 100 % 96. 0% 82. 7% SRDE(30º) 100 % 100 % 99. 8% 88. 7% 69. 8% 100 % 100 % 100 % 98. 4% 90. 8% TDE[14] ϕ error (mean/deviation)/ º 87. 2/53.1 86. 9/54.1 82. 4/54.8 83. 2/51.5 86. 7/54.0 72. 8/57.3 68. 4/54.2 66. 7/53.0 76. 2/55.4 79. 7/56.7 θ error ( mean/ de viation )/º 43. 7/32.9 44. 7/32.1 43. 3/31.5 42. 8/30.4 43. 1/31.5 38. 7/29.6 39. 7/31.6 39. 3/31.4 39. 7/29.5 40. 7/29.6 SRDE(10º) 0.9 % 1.8 % 1.4 % 1.4 % 0.9 % 3.2 % 3.2 % 3.5 % 3.0 % 2.3 % SRDE(20º) 3.5 % 4.8 % 4.4 % 3.4 % 4.4 % 11. 5% 10. 9% 11. 3% 8.6 % 7.6 % SRDE(30º) 9.7 % 9.0 % 7.9 % 8.6 % 10. 2% 20. 6% 18. 3% 18. 7% 17. 6% 13. 1% TL-SSC[18] ϕ error (mean/deviation)/ º 47. 6/40.3 51. 4/40.7 58. 1/43.2 67. 1/44.5 70. 2/45.1 37. 8/37.3 41. 9/38.4 50. 1/40.3 59. 8/42.6 64. 1/43.3 θ error ( mean/ de viation )/º 19. 0/11.4 19. 0/11.4 19. 0/11.5 19. 1/11.6 19. 1/11.7 18. 9/11.5 18. 8/11.4 18. 7/11.3 18. 9/11.5 18. 9/11.5 SRDE(10º) 4.8 % 5.5 % 5.5 % 3.5 % 4.8 % 3.9 % 4.9 % 3.0 % 3.0 % 3.7 % SRDE(20º) 18. 3% 15. 3% 12. 5% 7.9 % 7.2 % 27. 0% 23. 3% 16. 0% 9.7 % 9.7 % SRDE(30º) 36. 9% 31. 9% 25. 6% 18. 0% 16. 2% 43. 9% 40. 2% 33. 0% 23. 6% 21. 3% LS-SVM[19] ϕ error (mean/deviation)/ º 46. 0/39.9 57. 5/43.7 70. 2/48.5 58. 9/43.0 - 26. 4/ 31.4 27. 6/ 29.6 52. 7/ 45.2 60. 9/ 46.7 - θ error ( mean/ de viation )/º 36. 2/26.8 36. 6/26.1 39. 0/28.2 39. 2/27.7 - 33. 6/24.3 34. 5/25.9 35. 9/25.5 39. 3/29.2 - SRDE(10º) 3.9 % 1.8 % 1.6 % 1.6 % - 9.0 % 7.8 % 2.5 % 3.0 % - SRDE(20º) 13. 4% 7.1 % 6.5 % 6.9 % - 24. 5% 20. 3% 12. 2% 7.2 % - SRDE(30º) 23. 3% 15. 3% 12. 5% 13. 2% - 41. 4% 35. 3% 22. 1% 16. 1% - SNR/dB 0 10 T 60 / m s 0 100 200 400 600 0 100 200 400 600 GCA (Pro posed) ϕ error (mean/ de viation)/º 1.1 /0.9 1.1 /1.0 1.5 /1.2 3.5 /6.2 6.4 /8.7 1.1 /0.9 1.0 /0.9 1.3 /1.1 3.5 /10.1 6.7 /13.2 θ error (mean/deviation)/ º 0.9 /0.7 0.9 /0.7 1.1 /0.9 2.3 /3.9 4.0 /4.8 0.9 /0.7 0.8 /0.7 1.0 /0.8 1.8 /2.9 3.8 /6.3 SRDE(10º) 100 % 100 % 100 % 93. 5% 77. 8% 100 % 100 % 100 % 94. 5% 81. 7% SRDE(20º) 100 % 100 % 100 % 99. 0% 92. 1% 100 % 100 % 100 % 98. 8% 91. 0% SRDE(30º) 100 % 100 % 100 % 99. 5% 97. 2% 100 % 100 % 100 % 99. 1% 94. 5% TDE[14] ϕ error (mean/deviation)/ º 51. 8/55.0 58. 8/57.7 56. 2/55.8 60. 6/55.3 60. 7/57.2 33. 5/51.9 46. 9/56.5 53. 9/27.5 58. 2/57.3 57. 1/57.4 θ error ( mean/ de viation )/º 31. 5/26.8 33. 8/27.9 35. 1/28.9 36. 7/29.6 35. 9/27.2 24. 0/23.7 27. 6/25.0 31. 0/27.5 31. 9/26.4 33. 9/27.4 SRDE(10º) 11. 3% 6.9 % 7.2 % 4.9 % 4.1 % 26. 1% 16. 1% 12. 5% 9.4 % 8.1 % SRDE(20º) 24. 5% 20. 8% 20. 8% 17. 3% 15. 2% 35. 5% 28. 2% 25. 4% 20. 6% 19. 9% SRDE(30º) 37. 9% 33. 3% 33. 2% 27. 0% 26. 5% 54. 9% 47. 4% 42. 9% 35. 6% 36. 5% TL-SSC[18] ϕ error (mean/deviation)/ º 29. 2/33.9 34. 0/36.2 42. 7/39.5 53. 3/41.1 59. 2/42.2 20. 2/28.6 22. 8/32.3 30. 5/36.0 44. 3/39.8 52. 7/41.8 θ error ( mean/ de viation )/º 18. 9/11.6 18. 7/11.4 18. 6/11.4 18. 7/11.4 18. 8/11.5 19. 5/11.2 19. 2/11.3 18. 6/11.3 18. 6/11.4 18. 7/11.3 SRDE(10º) 6.7 % 5.1 % 3.9 % 3.5 % 3.3 % 10. 6% 9.5 % 6.5 % 3.5 % 2.3 % SRDE(20º) 35. 8% 31. 4% 22. 4% 12. 5% 10. 8% 39. 5% 41. 6% 33. 9% 18. 5% 13. 6% SRDE(30º) 51. 0% 49. 7% 39. 5% 28. 4% 24. 5% 64. 2% 61. 4% 54. 7% 36. 2% 31. 2% LS-SVM[19] ϕ error (mean/deviation)/ º 24. 3/28.0 22. 8/27.7 39. 9/43.5 53. 7/51.9 - 29. 3/38.9 27. 7/35.5 24. 2/28.9 29. 1/29.1 - θ error ( mean/ de viation )/º 34. 2/25.2 33. 1/24.5 36. 6/28.8 39. 2/29.0 - 35. 4/25.6 31. 7/24.1 35. 4/26.1 36. 4/27.7 - SRDE(10º) 12. 2% 10. 4% 7.2 % 3.0 % - 9.4 % 11. 1% 10. 6% 5.3 % - SRDE(20º) 24. 2% 26. 3% 18. 9% 11. 1% - 21. 2% 25. 9% 24. 7% 21. 2% - SRDE(30º) 37. 0% 40. 7% 34. 7% 23. 1% - 38. 0% 42. 2% 37. 6% 35. 5% - TABLE I II (A) R OBUSTNESS V ALIDATION FOR P ROPOSED GCA W I T H F IXED SNR AND V ARY ING T 60 A T 5 6 7 T EST P OSITIONS SNR/dB ‒ 10 ‒ 5 0 10 T 60 / m s (train) 200 200 200 200 T 60 / m s (localiz e) 0 100 200 400 600 0 100 200 400 600 0 100 200 400 600 0 100 200 400 600 SRDE(10 º) 67. 2% 64. 9% 84. 3% 44. 3% 35. 1% 74. 4% 76. 9% 99. 5% 76. 4% 66. 7% 69. 0% 76. 4% 100 % 83. 8% 79. 5% 81. 1% 86. 1% 100 % 94. 2% 91. 9% SRDE(20º) 96. 3% 95. 1% 97. 9% 75. 0% 64. 2% 97. 5% 98. 2% 100 % 96. 5% 91. 0% 96. 8% 98. 2% 100 % 99. 0% 97. 0% 98. 2% 98. 8% 100 % 99. 8% 99. 1% SRDE(30º) 99. 5% 98. 9% 99. 8% 82. 5% 77. 1% 99. 1% 99. 5% 100 % 99. 5% 93. 8% 100 % 100 % 100 % 100 % 98. 9% 100 % 100 % 100 % 100 % 99. 5% (B) R OBUSTNESS V ALIDA TIO N FOR P ROPOSED GCA W IT H F IXED T 60 AND V ARYI NG SNR A T 567 T ES T P OS ITIONS T 60 / m s 0 100 200 400 600 SNR/dB (tr ain) 0 0 0 0 0 SNR/dB (l ocaliz e) ‒ 10 ‒ 5 0 10 ‒ 10 ‒ 5 0 10 ‒ 10 ‒ 5 0 10 ‒ 10 ‒ 5 0 10 ‒ 10 ‒ 5 0 10 SRDE(10º) 76.9% 86.8% 100% 96.0% 76.2% 95.1% 100% 83.4% 64.4% 92.4% 100% 68.8% 34.2% 72.8% 93.5% 72.7% 25.1 % 58.9 % 77.8% 68.4 % SRDE(20º) 87.1% 98.4% 100% 99.8% 85.8% 100% 100% 98.8% 86.8% 99.8% 100% 96.8% 59. 1% 90. 8% 99. 0% 95. 6% 47. 1 % 80. 1 % 92. 1% 92. 2 % SRDE(30º) 94.3% 99.3% 100% 100% 93.3% 100% 100% 99.5% 94.9% 99.8% 100% 99.8% 70. 1% 94. 7% 99. 5% 99. 4% 61. 0 % 88. 7 % 97. 2% 96. 6 % (C) R ESULTS OF P ROPOSED GCA BY S OUND S OURCES AT D IF FEREN T F REQUENCIES W IT H 567 T ES T P OSITIONS Sour ce M ach inery sound Teleph one rin g SNR/dB ‒ 10 10 ‒ 10 10 T 60 / m s 0 100 200 400 600 0 100 200 400 600 0 100 200 400 600 0 100 200 400 600 GCA (Pro posed) SRDE(10º) 98. 9% 98. 8% 87. 7% 50. 8% 26. 6% 99. 3% 99. 8 % 99. 1% 84. 1% 69. 7% 1 00% 98. 4% 71. 6% 27. 7% 12. 7% 97. 4% 97. 7 % 90. 8 % 47. 6% 28. 2% SRDE(20º) 100 % 100 % 98 .9% 81. 8% 55. 9% 100 % 100 % 100 % 94. 0% 84. 0% 100 % 100 % 93. 3% 58. 9% 37. 0 % 100 % 100 % 98. 2 % 77. 3% 56. 4% SRDE(30º) 100 % 100 % 99. 5% 91. 4% 71. 6% 100 % 100 % 100 % 96. 8% 90. 1% 100 % 100 % 98. 1% 76. 4% 52. 9 % 100 % 100 % 99. 3 % 88. 7% 72. 0% TABLE I V R ESULTS OF T HE F OUR A LGORITHMS AT 378 T ES T P OS ITIONS SNR/dB ‒ 10 ‒ 5 0 1 0 T 60 / m s 0 100 200 400 600 0 100 200 400 600 0 1 00 2 00 4 00 6 00 0 1 00 2 00 4 00 6 00 GCA (Pro posed) SRDE(10º) 100 % 99. 5% 82. 0% 48. 9% 31. 7% 100 % 100 % 99. 2% 76. 5% 57. 9% 100 % 100 % 100 % 91. 8% 72. 5% 100 % 100 % 100 % 93. 4% 80. 1% SRDE(20º) 100 % 100 % 96. 8% 73. 8% 57. 9% 100 % 100 % 100 % 94. 4% 79. 4% 100 % 100 % 100 % 98. 4% 90. 0% 100 % 100 % 100 % 98. 9% 91. 3% SRDE(30º) 100 % 100 % 99. 7% 87. 3% 70. 1% 100 % 100 % 100 % 97. 6% 87. 8% 100 % 100 % 100 % 99. 2% 96. 3% 100 % 100 % 100 % 99. 2% 93. 9% TDE[14] SRDE(10º) 0.8 % 2.1 % 1.3 % 1.6 % 1.1 % 3.2 % 3.2 % 3.4 % 2.6 % 1.9 % 11. 6% 7.9 % 7.1 % 5.0 % 5.0 % 27. 5% 15. 1% 15. 1% 11. 6% 9.8 % SRDE(20º) 3.7 % 5.0 % 4.0 % 3.7 % 5.6 % 11. 9% 10. 6% 11. 9% 8.5 % 8.2 % 25. 9% 20. 6% 20. 4% 18. 8% 16. 9% 36. 5% 27. 8% 28. 3% 24. 1% 19. 8% SRDE(30º) 10. 1% 10. 1% 7.4 % 9.8 % 10. 6% 20. 4% 16. 1% 20. 1% 17. 2% 13. 2% 40. 2% 32. 5% 32. 5% 28. 0% 27. 5% 55. 0% 48. 9% 44. 7% 38. 7% 37. 0% TL-SSC [18] SRDE(10º) 4.5 % 5.6 % 5.6 % 4.0 % 5.0 % 3.4 % 4.2 % 2.4 % 3.2 % 3.4 % 6.1 % 3.7 % 3.7 % 3.2 % 3.2 % 7.1 % 6.6 % 4.5 % 3.1 % 2.6 % SRDE(20º) 16. 7% 13. 2% 12. 4% 7.4 % 7.4 % 24. 9% 21. 4% 13. 5% 9.0 % 9.0 % 33. 9% 27. 3% 20. 4% 11. 4% 8.5 % 37. 0% 39. 2% 29. 9% 16. 4% 14. 1% SRDE(30º) 35. 4% 30. 4% 24. 9% 18. 0% 15. 9% 42. 6% 38. 6% 31. 0% 23. 0% 19. 8% 48. 2% 46. 6% 36. 5% 26. 2% 22. 5% 62. 8% 58. 0% 50. 7% 33. 8% 28. 9% LS-SVM [19] SRDE(10º) 4.5 % 2.4 % 1.1 % 1.1 % - 10. 3% 8.7 % 2.1 % 2.4 % - 12. 4% 11. 4% 7.4 % 3.2 % - 10. 6% 10. 9% 12. 4% 5.3 % - SRDE(20º) 14. 6% 8.5 % 6.1 % 6.6 % - 25. 9% 21. 7% 13. 0% 5.8 % - 22. 2% 24. 6% 20. 4% 12. 4% - 23. 0% 27. 8% 26. 7% 22. 8% - SRDE(30º) 25. 7% 16. 7% 11. 9% 11. 6% - 44. 7% 38. 1% 22. 5% 14. 6% - 40. 0% 39. 4% 37. 3% 23. 3% - 42. 2% 41. 6% 40. 2% 37. 0% - IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS TABLE V T HE C OMPUT A TIONAL C OMPLEXI TY BY CPU T IME ( S ) AND R EAL T IME ( S ) AT 567 T ES T P OSITIO NS T 60 =0 m s T 60 =100 m s T 60 =200 m s T 60 =400 m s T 60 =600 m s CP U time Real tim e CP U time Real tim e CP U time Real tim e CP U time Real tim e CP U time Real tim e GCA (Pro posed) Offline train i ng 2747. 7 1188. 8 3950. 4 2370. 4 6012. 1 4618. 5 12865 .6 11586 .4 31067 .4 29298 .3 Online localiza tion 24. 5 9.6 21. 7 9.4 19. 3 9.2 17. 2 11. 8 20. 1 9.8 TDE [14] Online localiza ti on 32911 .0 30149 . 3 37023 .8 34708 .9 42052 .5 39121 .1 55222 .1 51748 .0 58945 . 9 5510 7.0 TL-SSC[18] Offline LUT com puting 906 .9 653 .1 894 .7 643 .6 898 .4 643 .4 941 .9 717 .3 927 .1 716 .2 Online localiza tion 743 .5 583 .4 741 .9 576 .9 755 .4 580 .5 843 .9 699 .3 810 .4 686 .9 LS-SVM[19] Offline train i ng 37430 0.9 28594 .9 36733 4.3 29065 .0 38089 2.9 32471 .0 39176 9.3 37791 .7 - - Online localiza tion 21. 4 9.6 18. 2 8.5 18. 5 8.8 19. 9 8.4 - - TABLE V I T RADE OFF BET W EEN L OC AL IZ A TION A CCURACY AND C OMPUTATION AL C OMPLEXITY A T 567 T ES T P OSITION S K 512 4096 32768 K 512 4096 32768 SRDE(10º) 86. 9% 99. 6% 99. 8% CP U time(s) Real tim e( s) CP U time(s) Real tim e( s) CP U time(s) Real tim e( s) SRDE(20º) 99.8 % 10 0% 10 0% Offline traini ng 18 91.8 252 .5 3 900.1 2255. 5 14 0462 .3 2 1071 .5 SRDE(30º) 100 % 100 % 100 % Online localiz ati on 3.8 0.6 23. 1 9.8 3595. 8 1815. 5 TL- SSC, GCA spe nds more CPU time . H ow eve r , t his co mputatio n al o ve rh ead is acce ptable co n side ring the sig ni f icant imp r o ve ments o f 82.6%, 74.1 %, and 6 0.3% by GCA ov er TL-S SC fo r SRD E(10º) , SR DE(20 º), and SR DE(30 º) respe ctiv ely . Tradeo ff strategy The n umb e r o f clus ters K is p r o portio nal to co mput atio nal co mple xity . When K i s small , alt h o ugh co mpu tatio nal co mple xity is inex pens ive , the f eatu r es be co me in co nsiste nt, result ing i n deg radat ion of loc alizatio n acc u r acy . In co ntrast, i f K is la rge, a lthou gh the f eatu r es b eco me c onsiste nt, co mputatio n al c omple xity b ecome s ex pensiv e or eve n unaf fo r dab le. T h e ref ore, the qu antity of space clus te r div isio n shou ld be dete rmined by making a trad eof f b etween lo calizatio n acc uracy and co mput atio nal c omple xity . To il lus trate this kind of tradeof f , w e c o n duc t e xpe rime nt s by v ar y ing K , in t h e e nviro nment w he r e T 60 = 100 m s and SN R = – 10dB . K is set t o be 512, 4096 , and 3 2768 , co rr es pondi n g t o clus ter vo lume o f 0.5m × 0.5m × 0.5 m, 0.25 m × 0.25 m × 0.2 5m , and 0. 125m × 0.1 25m × 0.12 5m. T h e results a re s um mar ized in Tab le VI. F r om the resul ts, it ca n be obs erve d th at the co mple xity increase s w ith the gro w th o f K . W hen K =409 6, co mpared to the case o f K =51 2, SR DE(10º ) is s ig n ific ant l y imp r ov e d by 12.7%, ac hiev ing to 99.6% , alt h ou gh t he co st is so me com plexity in creas e. W h e n K increas es fro m 409 6 t o 32768 , SRD E(10º ) can hard ly b e impro ve d f urthe r , h ow e ve r , the co mple xity be come s ver y e xpe n siv e. The r ef o r e, w e d ivide the r o om i n to 40 96 c lus ters. V. C ONCLUSION In this pa per, we a dd r es s the prob le m of SSL in the challe n ging high r ev erb eratio n and low SN R enviro n me nts b y propo sing a n o ve l mac hine le arni ng b ase d algo rithm G CA . W ith G CC f eatu r e, the p ropo se d G CA t ransf o r ms the S SL prob lem into a li ke liho od b ased n onli near class ific atio n prob lem by util izing PN N, w h ic h is e spe cial ly suitab le f o r mul ticlass clas sific atio n p rob lem. I n o r der to o verco me th e misc lassif icatio n and esti mate DOA mo r e acc urate ly , we propo se W LD M in GC A . T he e xpe rime n tal r es ults hav e s h o w n that GCA ac hiev es mo re accu rate DOA e st imatio n. The av erage of mea n v alues of azim uth a n g le es tima ti o n e rr o r s and ele vatio n ang le e stim ati o n e rrors o f G CA ar e o nly 4.6º a nd 3 .1º respe ctiv ely . Co mpa r ed wit h three r ec ently publis hed algo rithms , G CA i mp r o ve s t he be st pe r f orma nces of av erage SR DE(1 0º), SR DE(20º ) and SR DE(30º ) by 81.1%, 74 .1 % and 60.3% r es pec tive l y . I n additio n , G CA pe rfo rms r ob ustly in dif ferent ac ous tic env ironme nts. This val idate s that the propo se d GC A can loc alize ve r y effe ctive l y fo r the applic atio n s w hen p h y sic al site a co us tic fe atur es can be acce ssed b efo r e the lo calizatio n stage . T his dat a driv e n t r ai ning m etho d is es pecially suitab le f or the in dust r y env ir on me n ts w hic h are too co mple x to b e mo dele d. R EFERENCES [1] H. Guo, K. S. L ow, a nd H. A. N guye n, “Opti mizing the l o calization of a wireless se nsor network i n real t i me base d on a lo w-co st microc ontroll er,” IEEE Tran s. Ind. Ele ctron . , vol. 58, no. 3, p p. 741– 749 , Mar. 2011 . [2] B. Wan g, S. Z hou, W . Liu, a nd Y. Mo, “I ndoor locali zation based on curve f itti ng a nd l ocation search u sing received signal s tre ngth,” IEEE Tra n s. Ind. Electro n. , vol. 62 , no. 1, pp. 572 –58 2, Jan. 201 5. [3] F. Deng e t a l., “E ner gy-based sound sour ce locali zation with lo w po wer consumpti on in wireless sensor netw orks,” I EEE Trans. Ind. Electron . , vol. 64, no. 6, pp. 489 4– 49 02, Jun. 2017 . [4] P. Ya ng and W. Wu , “E fficient part icle filter lo calization al gorit hm i n dense passive RFID tag s e nviro nment,” IEEE Tran s. Ind. E lectro n. , vol. 61 , no. 10, pp. 564 1– 5651 , Oct. 20 14 . [5] J. K im an d W. C hung, “Lo calizatio n of a mobil e r obot using a laser range finder in a gla ss-wall ed enviro nment,” IEEE Tran s. Ind. Electron . , v ol. 63, no. 6, pp. 3616 –3 627, Jun. 201 6. [6] H. Song, W. C hoi, an d H. K i m, “R obust v i sion -b a se d r elativ e-locali zation approach us ing an RGB-depth camera and Li D AR sensor f u sion,” IE EE Tran s. Ind. Electro n. , vol. 63 , no. 6, pp. 372 5–3 736, Jun. 201 6. [7] J. Wa ng, Q. Ga o, Y. Yu, H. Wa ng, and M. J in, “To ward r o bust indo or localization based on Ba ye sian filter u sing c hirp-spr ead-sp ectrum ran ging,” IEEE Tran s. Ind. Ele ctron . , vol. 59, no. 3, p p. 1622 –16 29, Mar. 2012 . [8] J. Wa ng et al., “Tra nsferri ng co mpressiv e-sen sing-based device-fre e localization a cross target div ersity,” IEE E Tran s. Ind. E lectro n. , vol. 62 , n o. 4, pp. 2397 –2 409, Ap r . 20 15. [9] J. Chen, J. Benesty, and Y . H uan g, “T i me d elay estimation in roo m acoustic en vironme nts: An overvi ew,” EUR ASI P J. Appl. Sign al P ro cess . , vol. 200 6, pp. 1 –19 , Jan. 20 06 . [10] S. A r genti eri, P . Danè s, a nd P. Souèr es, “ A s ur vey on sound s our ce localization i n r obotics : Fr om binau ral to a rra y p roce ssing metho ds,” Co mp u t. Spe ech Lang . , vol. 34 , n o. 1, pp. 8 7– 112 , Nov. 20 15 . [11] H. He, L . Wu, J . Lu, X. Qiu, a nd J . Chen, “T ime d iffer ence of arri val estimation exploitin g multi channel spatio-te mporal predi ction,” IE EE Tra n s. Audio Spe ech Lang . P roc ess. , v ol. 21 , no. 3, pp. 46 3– 47 5, Mar. 201 3. [12] D. Pa vlidi, A. Gr i ffin, M. Puigt, a n d A. M ou chtari s, “R eal-ti me multiple sound sour ce l ocalizati on a nd counti ng u sin g a circular micro phon e arra y,” IEEE T ra ns. Audio Speec h Lang. P roce ss. , vol. 21, no. 10, pp. 21 93– 2206 , Oct . 201 3. [13] A. Can clini, E. Anto nacci, A . Sarti, and S. T ubar o, “A coustic s ourc e localization wit h distribute d asynchronou s microph one network s, ” IEEE Tra ns. Audio Spe ech Lang . P roc ess. , v ol. 21 , no. 2, pp. 43 9– 44 3, Feb. 2013 . [14] X. Alameda-Pine da a nd R. Ho rau d , “ A geometri c a pproa ch to sound source localizatio n f r om time-dela y e stimates,” IEE E/ACM Tran s. Audio Spe ech Lang . P roc ess. , vol. 22 , no. 6, pp. 1082 –1 095, Jun. 201 4. [15] J. Velasco, C. J. Martn -Arguedas, J. Macia s-Guarasa , D. Pizarr o, an d M. Mazo, “ Pro posal a n d validation of a n a nalyti cal g enerative model of IEEE T RANSACTIONS ON INDUST RIAL ELECT RO NICS SRP-PH AT power map s in r ev erberant scenarios,” Signa l P ro cess. , v ol. 1 19 , pp. 209– 22 8, Feb. 2016 . [16] B. Munga muru and P. Aarabi, “E nhan ced sound locali zation,” IEEE Trans. Sy st. Man Cyber n. B Cybern . , vol. 34, no. 3, p p. 1526 –15 40, Jun. 200 4. [17] J. Dmochowski, J . Benesty, a nd S . Affes, “A generaliz ed s t eered r espon se power m ethod for computati onally v iable source localizatio n,” I EEE Tran s. Audio Spe ech Lang . P roc ess. , v ol. 15 , no. 8, pp. 25 10 –2 526, Nov. 200 7. [18] D. Yook, T . Lee, a nd Y. C ho, “Fa st sou nd s ource loc ali zatio n u sing two-lev el se arch spa ce cluster ing,” I EEE Trans. Cybern ., v ol. 46, no. 1, pp. 2 0 – 26, Jan. 201 6. [19] H. Chen a nd W . Ser, “Acou st i c so u rc e l o calizatio n Using LS -SVMs without cali bration of microph one arrays,” in Proc . IEEE Int. Symp . Circuits Syst. , Ta ipei, Tai wan, May 200 9, pp. 1 86 3–186 6. [20] X. Xiao et al., “ A l ear ning-ba sed ap proach t o directio n o f a rrival estimation in n oisy a nd re verbera nt environ ments,” in Pro c. IEE E In t. Co n f. Acou st. Spee ch S ignal P ro cess . , Bri sb a n e, Australia, Apr. 20 15 , pp. 76–8 0. [21] X. Li an d H. Liu , “ Sou nd so ur ce localizatio n for HRI u sing FOC-based time di fferen ce feature an d spatial gri d m at chi ng,” IE EE Trans. Cybern . , vol. 43, no. 4, p p. 1199 –1 212, Aug. 201 3. [22] B. Lau fer-Gol dshtein, R. T almo n, a nd S . Ga nnot, “S emi-su pervise d sound sour ce l ocalizati on based on mani fold r egula rizati on,” I EEE/AC M Trans. Audio Spee ch Lang. Process. , vol. 24 , no. 8 , pp. 1393 –1407 , Aug. 2016 . [23] D. F. Specht, “Proba bilisti c neural networks,” Neural Net w. , vol. 3 , no. 1, pp. 109– 11 8, Jan. 199 0. [24] E. Lehma nn a nd A. Johansso n, “Diffuse reverberati on model for efficie nt image-sour ce si mulation o f room i mpul se resp onses,” IE EE Tran s. Au dio Spe ech Lang . P roc ess. , vol. 18 , no. 6, pp. 1429 –1 439, Aug. 201 0. [25] Y. Hu a nd P . Loiz ou. NOIZE US database. [Onlin e]. Availa ble: htt p://ec s .utdallas.e du/loizou/ speech/ noizeus . [26] X. Alame da-Pine da and R. Horaud. Th e gtde MATL AB toolbo x. [Online ]. Available: http s://team.i nria.fr/per ception/re search/g eom etric-sound -source - localization . [27] T . L ee. T LSSC code. [O nl ine]. Availa ble: https://githu b.com/L eeT aewoo /f a st_ sound_sour ce_lo calization_u sing_T LSSC . [28] E. L ehma nn. Fa st ISM cod e. [O nline]. Availa ble: http:// www.er ic-lehma nn.com . [29] Find Sounds data ba se. [Online]. Availabl e: http://w ww. findsounds .com . Yingxiang Sun ( S’ 16) r e c ei ved his B.Eng . d egree in Electronic and Inf o r mation Engin eering fr om Xidian Universit y, Xi’an, China, in 2009. H e r e c ei ved his M.Eng. d egree i n E l ectromagn etic F ield and Microwa ve T e c hn ology fr om the 5 4 th Research Institute of Ch ina E l e c tron ics T echn ology Gr oup C o rp oration, Sh iji azhuang, C hin a, in 2012 . Curr ently, he is wo r kin g t owards t he Ph.D. degr ee a t the Pillar of Engineerin g P r oduct D e velopment, Si ngapor e Universit y o f T echnol ogy and D esign, Si ngap ore. H is curren t r ese arc h interests include digital sig n al p r oc essing an d m a c hi ne learnin g. Jiajia C hen rec eived his B . Eng. ( H ons) and Ph.D. from N an yang Techn ologic al U niversit y, Singapor e, in 2004 and 201 0, resp e ct i vely. Sinc e April 20 12, he has been with Si ngapor e Un iversit y o f Technol ogy and Des ign, wh ere h e is curr ently a Senior Lectur e r . His res earch int erest i nclud e s c ompu t ati onal transf ormations o f low-c ompl exit y di git al f ilters, image f usi on and audio si gnal proc e ss ing. Dr. C h en served as W e b C h air o f As ia-Pac ifi c C omp uter Systems Arch i t ecture Co nference 200 5, T e c hnic al Pr ogram C ommitt ee memb e r o f Eur ope an Signal Pro ces s ing Conference 2014 and T he Third IEEE Internat ional C onference on Mu ltimed ia B ig Data 2017, and Associat e E ditor of Sp r inger EU RA SIP J our nal on Em bed ded Sy ste ms sinc e 2016. Chau Yuen (S’02–M ’08–SM ’12) rec eived the B.Eng. and Ph.D. d egrees fr om N an yang Techn ological U ni versit y, Sin gap o r e, in 2 000 and 2004, res pecti vely. In 2 0 05, he wa s a P ost-D octor al Fellow with Luc ent Techn ologies B ell Labs, Murr ay Hill, NJ , USA. In 2 008, h e wa s a Visit ing As sis tant Prof essor with H ong Kon g P olyt e c hn ic U niversit y, H o n g Kong. From 2 006 to 20 10, he wa s a S enior R e s earch En gin eer with the Institute for Inf ocom m R e s earch, Sing apore, wh e r e he was i n volved in an i ndustri al project developi ng an 802.11 n wireless local area n etwor k s y s t em and acti vely participat ed in the third gen eration P artn e rs hip Pr oject Long-T e rm Evolution (LT E) and LTE-A standar diz a t ion. In 2010, he join ed the Singap o r e Un i versity o f Techn ology and Desi gn, S ing apo r e, as an Assis tant Pr ofess or. H e has a ut hor ed o v er 3 0 0 research p apers i n internati onal journals or c onf e r enc e s. He holds t wo U.S . patents. H e rec e i ved the IEEE Asia-P acific Outst anding Young R e s earc h e r A ward in 2012. H e s erves as an E dit o r o f the I EEE TRANSACT IONS ON COMM UNICAT IONS and IEEE TRANS AC TIONS ON VE H I CUL AR . Susanto Rahard ja ( F'11) r e c ei ved t he B .En g. degree fr om Nation al U ni versit y of S ingap ore in 1991, t h e M.En g. and P h.D. degrees all in Electronic En gineerin g f r o m N anyang Techn ological Un iversity, S ingap ore, in 1 993 and 1997 respectively. H e i s currentl y a Chair Pr of e ss o r at the North wester n Polyt e c hnic al Univers ity (NPU) under t h e Th ous a nd T alent Plan o f P eople's R e p ublic o f Chi na. His r es earch interests are in multimed ia, sign al pr o c e ssi ng, w irel ess communic ations, disc r e t e transforms a nd s ign al proc essi ng algorithms, implement ation and optimi zation. Dr Rahar dja was the re c ipi ents o f numerous awards, inclu ding the IE E H artr ee P r emium A ward , t he T an Kah K ee You ng Invent ors' Open Cat egory Gold awar d, the S in gapor e N a tion al Technology A wa r d, A*STAR Most Inspirin g Mentor A ward, Finalist o f the 2 010 W orld T e chn olog y & Summi t A ward, the Nokia Found ation Visi ting Pr ofess or A ward and the ACM R ec ogniti on of Servic e A ward.

Indoor Sound Source Localization with Probabilistic Neural Network

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment