Clustered Calibration: An Improvement to Radio Interferometric Direction Dependent Self-Calibration

Mon. Not. R. Astron. Soc. 000 , 1–18 (2007) Printed 4 June 2018 (MN L A T E X style ﬁle v2.2) Clustered Calibratio n: An Impro vement to Radio Interferometric Dir ection Dependen t Self-Calib ration S. Kazemi 1 ⋆ , S. Y ata watta 2 , S. Zaroubi 1 1 Kapte yn Astr onomical Institute , Univer sity of Gr oning en, P .O. Box 800, 9700 A V Gr oning en, the Netherl ands 2 ASTR ON, P ostbus 2, 79 90 AA Dwingel oo, the Netherla nds 4 June 2018 ABSTRA CT The new generation of radio synth esis array s, such as LOF AR an d SKA, ha ve been designed to surpass existing ar rays in terms of s ensitivity , angular resolution and frequ ency coverage. This ev olu tion h as led to the de velopm ent of advanced calibration techniques tha t ensure the deliv- ery of accur ate results at the lowest possible computational cost. Howe ver, the perf ormance of such ca libration techniques is still limited by the com pact, bright sou rces in the sky , u sed as c alibrators. It is importan t to have a b right eno ugh sou rce that is well d istinguished fr om the backg round no ise level in order to ac hiev e satisfactory r esults in calibratio n. This paper presents “clustered calibration ” as a m odiﬁcation to trad itional radio inter ferometr ic calibra- tion, in ord er to accommoda te f aint sources that are alm ost below the background noise le vel into the calib ration process. T he main id ea is to employ the information of the bright s ources’ measured sign als as an aid to calibrate fainter sou rces that are nearby the bright so urces. I n the case where we do n ot hav e br ight en ough sour ces, a sou rce cluster could a ct a s a b right source that can be distinguished from b ackgrou nd noise. For this purpo se, we co nstruct a nu mber of sou rce clu sters assuming that the signals of the sources belong ing to a single cluster are corrup ted by almost the same errors. Under this assump tion, each cluster is calibrated as a single sou rce, u sing th e combined co herencies of its sour ces simultaneou sly . This upgrad es the power of an in dividual faint s ource by the effective p ower of its clu ster . The solutio ns thus obtained for every cluster a re assigned to each individual source in the cluster . W e giv e per- forman ce analysis of clustered calibratio n to show the superio rity of this appro ach compared to the tr aditional u n-clustered calibration . W e a lso provide ana lytical criteria to c hoose th e optimum number of clusters for a gi ven observation in an efﬁcient m anner . Key words: methods: statistical, methods: numerical, technique s: interferome tric 1 INTRODUCTION Low frequency radio astronomy is undergoing a rev olution as a ne w generation of radio interferometers with increased sensitiv- ity an d resolution, such as the LOw F requenc y ARray (LOF AR) 1 , the Murchison W ide-ﬁel d Array (MW A) 2 and the Square Kilome- ter Array (SKA) 3 are being de vised and some are al ready beco m- ing operational. These arrays form a large ef fective aperture by the combination of a large nu mber of antennas usin g aperture syn the- sis (Thompson et al. 2001). In order to achiev e the full potential of such an interferometer , it is essential that the observ ed data is properly calibrated before any imaging is done. Calibration of ra- dio interferometers refers to t he estimation and reduction of errors introduced by the atmosphere and also by the receiv er hardwa re, ⋆ E-mail: kazemi@a stro.rug.nl 1 http:/ /www .lofar .org 2 http:/ /www .mwatelesc ope.org 3 http:/ /www .skatelescope .org before imaging. W e also consider the remov al of bright sources from the observ ed data part of calibration, t hat enable imaging the weak background sources. For low frequenc y observations with a wide ﬁ eld of view , calibration ne eds to be done alon g multiple di- rections in the sky . Proper calibration across the full ﬁeld of v iew i s required to achie ve the interferometer’ s desired precision and sen- sitivity givin g us high dynamic range images. Early radio astronomy used external (classical) calibration which estimates the instrumental unkno wn parameters using a ra- dio source with kno wn properties. This method has a relati vely lo w computational cost and a fast ex ecution ti me, but it g enerates results strongly af fected by the accurac y with which the source properties are known. In addition, existence of an isolated bright source, which is the most important requirement of external cal- ibration, in a very wide ﬁeld of vie w i s almost impractical, and e ven when it does, external calibration would only gi ve i nforma- tion along t he direction of the calibrator . The external calibration was then improv ed by self-calibration (Pearson & Readhead 1984). Self-calibration has the advantag e of estimating both the source c  2007 RAS 2 Kazemi et al. and instrumental unkn own s, without the need of having a prior kno wledge of the sky , only u tilizing the instrument’ s observed data. Moreo ver , it can achie ve a superior precision by iterating between the sky and the instrumental parame ters. The accuracy of any calibration scheme, r egard less of the used technique, is determined by the level of Signal to Noise Ra- tio (SNR). This limits the feasibility of an y calibration scheme to only bright sources that have a high enough S NR to be distin- guished from th e backgro und noise (Bernardi et al. 2010; Liu et al. 2009; Pindo r et al. 201 0). Note that interferometric imag es are pro- duced using the data observed during the total observation (inte- gration) time. Howe ver , calibration is done at shorter time inter- v als of that total duration. This increases the noise le vel of the data for which calibration is executed compared to the one in the im- ages. In other words, there are so me f ai nt sources that appear well abov e the noise in the images while they are well belo w the noise lev el during calibration. It has been a great challenge to calibrate such faint sources hav ing a ve ry low SNR. In this pa per we present clustered self-calibration, i ntroduced in Kaze mi et al. (2011a), and emphasize i ts better perfo rmance compared to un-clustered ca libra- tion below the calibration noise l e vel. Existing un-clustered calibra- tion appro aches (Intema et al. 2009; Smirno v 2011) can only han- dle a handful of d irections whe re strong enough sou rces are present and we improve this, in particular for subtraction of thousands of sources over hundreds of directi ons in t he sky , as in the case for LOF AR (Bregm an 2012). The i mplementation of clustered calibration is performed by clustering the sou rces i n the sk y , assumin g that all the sou rces in a single cluster ha ve the same corruptions, and calibrating each clus- ter as a single source. At the end , the obtained c alibration solutions for e very source cluster is a ssigned to all the so urces in tha t cluster . This pro cedure impro ves the information used by calibration by in- corporating the total of signals observed at each cluster instead of each indiv idual source’ s signal. Therefore, when S NR is very low , it provid es a considerably better result compared to un-clustered cal- ibration. Moreov er , cali brating for a set of source clusters instead of for all the individu al sources in the sky reduces the number of directions along which calibration has t o be performed, thus con- siderably cutting down the computational cost. Howe ver , there is one d rawback in clustered cali bration: The corruptions of each i n- di vidual source i n one gi ven cluster will almost surely be slightly differe nt from the corruptions estimated for the whole cluster by calibration. W e call this additional error as “clustering error” and in order to get the best performance in clustered cali bration, we should ﬁnd the right balan ce between the improv ement in S NR as opposed to deg radation by clustering error . Recently , clustering metho ds h av e gained a lot of popularity in dealing w ith l arge data sets (Kau fman & Rousseeuw 1990). There are v arious clustering approaches that co uld be app lied to calibra- tion based o n their speciﬁc cha racteristics. An overvie w of differen t clustering methods is given in Xu & W unsch (2008). C lustering of radio sources should take into account (i) their physical distance from each other and (ii) their indi vidual intensity . The smaller the angular separations of sources are, the higher the likelihood t hat they share the same corruptions in their radiated signals. More- ov er, in order to get the best accurac y in the calibration results, there should be a balance between ef fective intensities of different clusters. Thu s, in the clustering procedure, e very source should be weighted suitably according to its brightness intensity . The brightness distribution of radio source in the sky is a po w er law and the spatial dist ribution is P oisson. Therefore, clus- tering the sources via probabilistic clustering approaches is com- putationally complex. Hierarchical clustering (Johnson 1967) is a well-kno wn clustering approach with a straigh t forwa rd impleme n- tation suitable for our case. But, its computational cost gro ws ex- ponentially with the size of its target data set which can be a dis- adv antage when the num ber of sou rces is huge. W ei ghted K-me ans clustering (K erdprasop et al. 2005; MacQueen 1967) is also one of the most used of clustering schemes app licable in clustered ca libra- tion. The advantag e of this c lustering techniqu e is its low computa- tional cost, which i s proportional to t he number of clusters and the size o f the target d ata set. N onetheless, we emphasize that the co m- putational time taken by any of t he aforementioned clustering al- gorithms is negligible compared with the compu tational time tak en by the actual calibration. T herefore, we pursue all clustering ap- proaches in this p aper . Ho wev er, the use of Fuzzy C-means cluster - ing (Bezdek 1981) in clustered calibration requires major changes in the calibration data model and will be exp lored in future work. This paper is organized as f ollo ws: F irst, in secti on 2 we present the general data model u sed in clustered calibration. In sec- tion 3 , we present modiﬁed weighted K -means and div isiv e hier- archical clustering f or clustering sources in the sky . Next, in sec- tion 4 , we focus on analyzing the performance of clustered cali- bration, with an a priori clustered sk y model, and compare the re- sults with un-clu stered calibration. In clustered calibration, there is contention between the impro vemen t of SNR b y clustering sources and the errors introduced by t he clustering of sources i tself. Thus, we relate the clustered calibration’ s performance to the effecti ve Signal to Interference plus Noise Ratio (SINR) obtained for each cluster . For this purpose, we use statistical estimation theory and the Cramer -Rao Lo wer Bounds (CRLB) (Kay 1993). In section 5, we deriv e criteria for ﬁnding the optimum number of clusters for a gi ven sky . W e use the S INR ana lysis and adopt Akaik e’ s Informa- tion Criterion (AIC) (Akaik e 1973) and the Like lihood Ratio T est (LR T) (Gr av es 1978) to esti mate t he op timum numbe r of clusters. W e present simulation results in section 6 to show the superiority of clustered calibration to un-clustered calibration and the perfor- mance of t he presented criteria in detecting the optimum number of clusters. Finally , we dra w our conclusions in section 7. Through this paper , calibration is executed by the Space Alternating Gen- eralized Expectation maximization (SAGE) (Fessler & Hero 1994; Y atawatta et al. 2009; Kazemi et al. 2011b) algorithm. Moreo ver , in our simulations, radio sources are considered to be uniformly distributed in the sky and their ﬂux intensities follo w Raleigh dis- tribution, which i s t he worst case scenario. In real sky models, the re usually exist only a few (two or three) number of bright sources which dominate the emission. In the presence of these sources and the background noise, it is impractical to solve for the other faint sources in the ﬁeld of vie w ind ividu ally . Therefore, obtaining a bet- ter calibration performance via the clus tered calibration app roach, compared to the un-clustered one, is guaranteed. Therefore, in sec- tion 6 we il lustrate this using simulations in which the brightness distribution of sources is a po wer law with a v ery steep slope. On top of t hat, Kazemi et al. (2012); Y ataw att a et al. (submitted 2012) also present the perfo rmance of clustered calibration on real o bser- v ations using LOF AR. The following notations are used in this paper: B old, lower - case letters refer to column vectors, e.g., y . Upper case b old letters refer to matrices, e.g., C . All parameters are comple x numbers, un- less st ated otherwise. The inv erse, transpose, Hermitian transpose, and co njugation of a matrix are presented by ( . ) − 1 , ( . ) T , ( . ) H , and ( . ) ∗ , respecti vely . The stati stical exp ectation operator is referred to as E { . } . The matrix Kroneck er product and the p roper (strict) sub- set are denoted by ⊗ and ( , respectiv ely . I n is the n × n identity c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radio Interfer om etric C alibration 3 matrix an d ∅ is the empty set. The Kroneck er delta fun ction is pre- sented by δ ij . R and C are the sets of Real and Comple x numbers, respecti vely . The Fr obenius norm is sho wn by || . || . Estimated pa- rameters are deno ted by a hat, c ( . ) . All logarithmic calculations are to the base e . The multiv ari ate Gaussian and Complex Gaussian distributions are den oted by N and C N , respectiv ely . 2 CLUSTERED SELF-CALIBRA T ION D A T A M ODEL In this section, we present the measurement equation of a po- larimetric clustered calibration in detail (Hamaker et al. 1 996; Hamaker 2006). Suppose we have a radio interferometer consist- ing of N polarimetric antennas where each antenna is composed of two orthogonal du al-polarization feeds that observe K compact sources in the sky . Every i -t h source signal, i ∈ { 1 , 2 , . . . , K } , causes an induced voltage of e v pi = [ v X pi v Y pi ] T at X and Y dipoles of e very p -th antenna, p ∈ { 1 , 2 , . . . , N } . In practice, e v pi = J pi e i , (1) where e i = [ e X i e Y i ] T is the source’ s electric ﬁeld vector and J pi represents the 2 × 2 Jones matrix (Hamaker et al. 1996) cor- responding to the direction-dependent gain co rruptions in the radi- ated signal. These corrup tions are originated from the instrumental (the beam shape, system frequency response, etc.) and the prop- agation (tropospheric and ionospheric distortions, etc.) properties which later on, in this section, will be exp lained in more detail. The signal v p obtained at every antenna p is a linear su- perposition of the K sources corrupted signals, e v pi where i ∈ { 1 , 2 , . . . , K } , plus the antenna’ s thermal noise. T he multitude of ignored fainter sources also contrib utes to this additi ve noise. The voltages collected at the instrument antennas get corrected for geometric delays, based on the location of their antennas, and some instrume ntal effects, l ike t he an tenna clock phase s and elec- tronic gains. Then, they are correlated in the array’ s correlator to generate visibilities (Hamaker et al. 1996). The visibility matrix of the baseline p − q , E { v p ⊗ v H q } , is gi ven by V pq = G p K X i =1 J pi ( θ θ θ ) C i { pq } J H qi ( θ θ θ ) ! G H q + N pq . (2) In Eq. (2), θ θ θ ∈ C P , P = 4 K N , is the unkno wn instrumental and sky parameter vector , N pq is the additi ve 2 × 2 noise matrix of t he baseline p − q , and C i { pq } is the Fourier transform of the i -th source coherency matrix C i = E { e i ⊗ e H i } (Born & W olf 1999; Hamaker et al. 1996). If the i -th source radiation intensity is I i , then C i = I i 2 I 2 . Considering this source to hav e equatorial coordinate, (Right Ascension α , Declination δ ), equal to ( α i , δ i ) , and the geometric components of baseline p − q to be ( u, v , w ) , then C i { pq } = e − 2 π ( ul + vm + w ( √ 1 − l 2 − m 2 − 1)) C i , (3) where  2 = − 1 and, l = sin ( α i − α 0 ) cos ( δ i ) , m = cos ( δ 0 ) sin ( δ i ) − cos ( α i − α 0 ) cos ( δ i ) sin ( δ 0 ) , are the source d irection components corresp onding t o the observ a- tion phase reference of ( α 0 , δ 0 ) (Thompson et al. 2001). The errors common to all directions, such as the receiv er delay and amplitude errors, are giv en by G p and G q . Initial calibration at a ﬁ ner time and frequency resolution is performed to esti mate and correct for G p -s and the corrected visibilities are obtained as e V pq = G − 1 p V pq G − H q . (4) The remaining errors are unique to a giv en direction, but residual errors in G p -s are also absorbed into these errors, which are de- noted by J pi in the usual notation. V ectorizing Eq. (4), the ﬁnal visibili ty vector of t he baseline p − q is giv en b y v pq = K X i =1 J ∗ qi ( θ θ θ ) ⊗ J pi ( θ θ θ ) vec ( C i { pq } ) + n pq . (5) Stacking up all the cross correlations (measured visibil ities) and noise vectors in y and n , resp ectiv ely , the un-clustered self- calibration measurement equation is giv en by y = K X i =1 s i ( θ θ θ ) + n . (6) In Eq. (6), y , n ∈ C M , M = 2 N ( N − 1) , the noise ve ctor is con- sidered to hav e a zero mean Gaussian distrib ution wi th cova riance Π Π Π , n ∼ N (0 , Π Π Π M × M ) , and the n onlinear function s i ( θ θ θ ) is deﬁned as s i ( θ θ θ ) ≡      J ∗ 2 i ( θ θ θ ) ⊗ J 1 i ( θ θ θ ) vec ( C i { pq } ) J ∗ 3 i ( θ θ θ ) ⊗ J 1 i ( θ θ θ ) vec ( C i { pq } ) . . . J ∗ N i ( θ θ θ ) ⊗ J ( N − 1) i ( θ θ θ ) vec ( C i { pq } )      . (7) Calibration is essentially the Maximum Likeliho od (ML) esti- mation of the unkno wn parameters θ θ θ ( P complex valu es or 2 P real v alues), or of the Jones matrices J ( θ θ θ ) , from Eq. (6) and remo val of the K sources. Note that c alibration methods could also be applied to t he uncorrected visibilities of (4) to estimate G p and G q errors as well. The Jones matrix J pi , for eve ry i -th direction and at every p -th antenna, is gi ven as J pi ≡ E pi Z pi F pi . (8) In Eq. (8), E pi , Z pi , and F pi are the antenna’ s voltage pattern, ionospheric phase ﬂuctuation, and Faraday Rotation Jones matri- ces, respectiv el y . In practice, the E , Z , and F Jones matrices ob- tained for nearby directions and for a gi ven antenna are almost the same. Thus, for e very antenna p , if the i -th an d j -th sources hav e a small angular separation from each other , we may consider J pi ∼ = J pj . (9) This is the underlying assumption for clustered calibration. Clustered calibration ﬁrst assigns source clusters, L i for i ∈ { 1 , 2 , . . . , Q } where Q ≪ K , on which the sky v ariation is consid- ered to be uniform. Then, it assumes there exists a unique e J pi which is shared by all the sources of the i -th cluster L i , i ∈ { 1 , 2 , . . . , Q } , at recei ver p , p ∈ { 1 , 2 , . . . , N } . Based o n that, th e visibility a t e v- ery baseline p − q , given by Eq. (4), is reformulated as e V pq = Q X i =1 e J pi ( e θ θ θ ) { X l ∈ L i C l { pq } } e J H qi ( e θ θ θ ) + e N pq . (10) In Eq. (10), e N pq is the clustered calibration’ s effectiv e noise at baseline p − q which will be e xplicitly discussed at s ection 4. Note that cl ustered cali bration estimates the ne w unkno wn parameter c  2007 RAS, MNRAS 000 , 1–18 4 Kazemi et al. e θ θ θ ∈ C e P where e P = 4 QN . Denoting the effecti ve signal of ev- ery i -th cluster at baseline p − q by e C i { pq } ≡ X l ∈ L i C l { pq } , (11) the clustered calibration visibility vector at this baseline (vectorized form of Eq. (10)) is v pq = Q X i =1 e J ∗ qi ( e θ θ θ ) ⊗ e J pi ( e θ θ θ ) vec ( e C i { pq } ) + e n pq . (12) Finally , stacking up the visibilities of all the instrument’ s baselines in vector y , clustered calibration’ s general measurement equation is resulted as y = Q X i =1 e s i ( e θ θ θ ) + e n . (13) In Eq. (13) e s i is deﬁned similar to s i in Eq. (7) where J an d C are replaced by e J and e C , respecti vely . Because of the similarity between the clustered and the un- clustered calibration’ s measurement equations presented by Eq. (13) and Eq. (6), respecti vely , they could utilize t he same cali- bration techniques. Thus, the only dif ference between these two types of calibration i s that clustered calibration solves for clusters of so urces instead of for the individ ual ones. That u pgrades the sig- nals which should be calibrated for from Eq. (3) to Eq. (11). 3 CLUSTERING ALGORITHMS Clustering i s grouping a set of data so that the members of the same group (cluster) have some similarities (Kaufman & Rousseeuw 1990). This similarity is deﬁned based on the application of the clustering method. W e need to deﬁne clustering schemes in which two radio sources merge to a single cluster based on t he similarit y in their direction dependent gain errors ( Eq. (8) ). Radiations of sources that are close enough to each other in the sky are assumed to be affe cted by almost the same corruptions ( E q. ( 9) ). Based on that, we aim to design source clusters wi th small angular diameters. On the other hand, e very cluster’ s intensity is the su m of the intensities of its membe rs ( Eq. (11) ). In ord er to keep a ba lance between dif- ferent clusters’ intensities, we intend to apply weighted clustering techniques in which the sources are weighted proportional to their intensities. Suppose that the K sources, x 1 , x 2 , . . . , x K hav e ( α 1 , δ 1 ) , ( α 2 , δ 2 ) , . . . , ( α K , δ K ) equatorial coordinates, re- specti vely . The aim is to provid e Q clusters so that the objectiv e function f = P Q q =1 D ( L q ) is minimized. D ( L q ) is the angular diameter of cluster L q , for q ∈ { 1 , 2 , . . . , Q } , deﬁned as D ( L q ) ≡ max { d ( x i , x j ) | x i , x j ∈ L q } , (14) and d ( ., . ) is the angular separation between any two points on the celestial sp here. Hav ing two radio sources a and b with equatorial coordinates ( α a , δ a ) and ( α b , δ b ) , respecti vely , the angular separa- tion d ( a, b ) , in radians, is obtained by tan − 1 p cos 2 δ b sin 2 ∆ α + [ cos δ a sin δ b − sin δ a cos δ b cos ∆ α ] 2 sin δ a sin δ b + cos δ a cos δ b cos ∆ α , (15) where ∆ α = α b − α a . For deﬁning the centroids, we associate a weight to every source x i , as w i = w ( x i ) ≡ I i I ∗ , for i ∈ { 1 , 2 , . . . , K } , (16) where I i is the source’ s intens ity and I ∗ = min { I 1 , I 2 , . . . , I K } . Applying the weights to the clustering proce dure, the centroids of the clusters lean mostly to wards t he brightest sources. That causes a tendency in faint sources to gather wit h brighter sources close to them i nto one cluster . Thus, their weak signals are promoted being added up with some brighter sou rces’ signals. Moreov er, very strong sources will be isolated such that their signals are calibrated indi vidually , without being affec ted by the other faint sources. W e cluster radio so urces using weighted K-means (Kerd prasop et al. 2005) and div isiv e hierarchical clustering (Johnson 1967) algorithms. S ince the source clustering for cal- ibration is performed of ﬂine, its computational comple xity is negligible co mpared with the calibration procedure itself . 3.1 W eighted K-means clustering Step 1. Select the Q brightest sources, x 1 ∗ , x 2 ∗ , . . . , x Q ∗ , and ini- tialize the centroids of Q clusters by their locations as c q ≡ [ α q ∗ , δ q ∗ ] , for q ∈ { 1 , 2 , . . . , Q } , q ∗ ∈ { 1 ∗ , 2 ∗ , . . . , Q ∗ } . (17) Step 2. Assign each source to the cluster with the closest centroid, deﬁning the membership function m L q ( x i ) = ( 1 , if d ( x i , c q ) = min { d ( x i , c j ) | j = 1 , . . . , Q } 0 , Otherwise Step 3. Update the centroids by c q = P K i =1 m L q ( x i ) w i x i P K i =1 m L q ( x i ) w i , for q ∈ { 1 , 2 , . . . , Q } . (18) Repeat steps 2 and 3 until there are no reassignments of sources to clusters. 3.2 Divisiv e h ierarchical clustering Step 1. Initialize the cluster counter Q ′ to 1 , assign all the K sources to a single cluster L 1 and ∅ to a set of null clusters A . Step 2. Cho ose cluster L q ∗ , for q ∗ ∈ { 1 , 2 , . . . , Q ′ } − A , with the largest an gular diameter D ( L q ∗ ) = max { D ( L q ) | q ∈ { 1 , 2 , . . . , Q ′ } − A } . (19) Step 3. Apply the presen ted weighted K-means clustering t ech- nique to split L q ∗ into two clusters, L ′ q ∗ and L ′′ q ∗ . Step 4. If D ( L ′ q ∗ ) + D ( L ′′ q ∗ ) < D ( L q ∗ ) , then set Q ′ = Q ′ + 1 , L q ∗ ≡ L ′ q ∗ , L Q ′ ≡ L ′′ q ∗ , an d A = ∅ , otherwise set A = A ∪ { q ∗ } . Repeat steps 2, 3, and 4 until Q ′ = Q . 3.3 Comparison of clustering methods Hierarchical clustering method tends to design clusters w ith al- most the same angular diameters, whereas, the K-means clustering method t ends to keep the same level of intensity at all its clusters. In practice, si nce hiera rchical clustering method makes less erro rs in dedicating the sam e solutions to so urces in small clusters, it per- forms better than W eighted K-means clustering in a clustered cali- bration procedure. But, when the number of sources (and cl usters) c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radio Interfer om etric C alibration 5 270 272 274 276 278 35 36 37 38 39 40 41 42 α [deg.] δ [deg.] Figure 1. A simulate d 8 by 8 degr ees sky of ﬁfty point sources with inten- sities belo w 3 Jy . The source positions and their brightness are follo wing uniform and Rayleigh distributi ons, respecti vely . The marker sizes are pro- portiona l to sources inten sities. is very large ( Q > 100 ), its prohib itiv e computational costs makes the fast K-means clustering method preferab le. 3.3.1 Example 1: W eighted K-means and hierar chical clustering W e simulate an 8 b y 8 degrees sky with ﬁfty point sources with in- tensities belo w 3 Jansky (Jy). The source p ositions and their bright- ness follow uniform and Rayleigh distributions, respectively . The result is sho wn by F ig. 1 in which the symbol sizes are proportional to intensities of sources. W eighted K-means and di visiv e hierarchi- cal clustering methods are applied to cluster the ﬁfty sources into ten source clusters. The results are presented in Fig. 2 and Fig. 3, respecti vely . Fig. 2 sho ws t hat the W eighted K-means clustering could design source clusters with co nsiderably large angular diam- eters. Assigning the same calibration solutions to the sources of these larg e clusters co uld cause signiﬁcant errors. Ho wev er, as Fig. 3 sho ws, this is not the case for the hierarchical clustering and it constructs clusters with almost the same angular diameters. Since the number of sources in this simulation is n ot that large ( K = 50 ), t he dif ference between e xecution time of the two clus - tering methods is not signiﬁcant. Hence i n such a case, the use of hi- erarchical clustering metho d, rather than the W eighted K-means, is advised. Ho wev er, this is not th e case when we hav e a large number of source s, and subseq uently a large number of source clusters, in the sky . T o demon strate this, we use the t wo clustering techniques for clustering thous and of sources ( K = 100 0 ) into Q source clus- ters, Q ∈ { 3 , 4 , . . . , 100 } . T he methods’ computational ti mes ver - sus the number of clusters are plotted in Fig. 4. As Fig. 4 shows, for large Q s, the c omputational cost of W eighted K-means is muc h cheaper t han t he on e of the hierarchical clustering. That can make the W eighted K-means clustering method more suitable than the hierarchical clustering for such a case. 4 PERFORMANCE ANAL YSIS In this section, we explain t he reasons for clustered calibration’ s better performance, compared to un-clustered calibration, at a low 270 272 274 276 278 35 36 37 38 39 40 41 42 α [deg.] δ [deg.] Figure 2. Fifty poi nt sources are clustered into ten s ource clusters by W eighte d K-means cluste ring techni que. There is not a good balance be- tween dif ferent clusters angular diameters. 270 272 274 276 278 35 36 37 38 39 40 41 42 α [deg.] δ [deg.] Figure 3. Fift y point sources are clustered into ten source clusters via hier- archic al clustering method. Dif ferent clusters hav e almost the same ang ular diamete rs. SNR (Kazemi et al. 2012). T his superiority is in t he sense of achie ving an unprecedented precision i n solutions with a consider- ably lo w computational comple xity , giv en the optimum clustering scheme. In the next section, we present differen t criteria for ﬁnding the optimum number of clusters at which the clustered calibration performs the best. 4.1 Cramer -Rao Lower Bound s The most fundamental assumption in clustered calibration is that the sources at the same cluster have ex actly t he same corruptions in their radiated signals. This assumption is of course incorrect, nonetheless, it provides us with a stronger signal, the sum of t he signals in the whole cluster . W e present an analytic comparison of clustered and un-clustered calibration where we use the Cramer- c  2007 RAS, MNRAS 000 , 1–18 6 Kazemi et al. 0 20 40 60 80 100 0 200 400 600 800 Number of clusters Computational time (s) Weighted K−means Divisive Hierarchical Figure 4. W eighted K-means and di visi ve hierarch ical cluste ring methods computat ional costs. For small number of source clusters, the re is no di f- ference between exe cution times of the two clusteri ng methods. But, when the number of source clusters is large, the computati onal cost of W eighte d K-means becomes much cheape r than the one of th e hiera rchica l clusteri ng. Rao Lower Bound (CRLB) (Kay 1993) as a tool to measure the performance of the calibration. 4.1.1 Estimations of CRLB for two sour ces a t a single cluster For simplicity , ﬁrst consider observing two point so urces at a single baseline, lets say baseline p − q . Based on Eq. (4), t he visibilities are giv en by e V pq = J p 1 C 1 { pq } J H q 1 + J p 2 C 2 { pq } J H q 2 + N pq , (20) in the un-clustered calibration strategy . V ectorizing e V pq , the visi- bility vec tor is y = J ∗ q 1 ⊗ J p 1 vec ( C 1 { pq } ) + J ∗ q 2 ⊗ J p 2 vec ( C 2 { pq } ) + n pq . Assuming n pq ∼ C N ( 0 , σ 2 I 4 ) , we hav e y ∼ C N ( s ( θ θ θ ) , σ 2 I 4 ) , (21) where s ( θ θ θ ) ≡ X i =1 , 2 J ∗ qi ( θ θ θ ) ⊗ J pi ( θ θ θ ) vec ( C i { pq } ) . (22) Using Eq. (21), t he log-likeliho od function of the visibility v ector y is giv en by L ( θ θ θ | y ) = − 4 ln { π σ 2 } − σ − 2 ( y − s ( θ θ θ )) H ( y − s ( θ θ θ )) . (2 3) CRLB i s a t ight lower bound on the error variance of an y un- biased parameter estimators (Kay 199 3). Based on its deﬁnition, if the log-lik elihood function of the random vector y , L ( θ θ θ | y ) , satis- ﬁes the “regu larity” condition E y  ∂ ∂θ θ θ L ( θ θ θ | y )  = 0 , for all θ θ θ, (24) then for any unbiased estimator of θ θ θ , b θ θ θ , v ar ( b θ θ θ i ) > [ I − 1 ( θ θ θ ) ] ii , for i ∈ { 1 , . . . , M } , (25) where I ( θ θ θ ) is the Fisher informa tion matrix deﬁned as I ( θ θ θ ) = − E y  ∂ 2 L ( θ θ θ | y ) ∂θ θ θ∂ θ θ θ T  . (26) In other words, the variance of any unbiased estimator of the un- kno wn parameter v ector θ θ θ is bound ed from belo w by the diagonal elements of [ I ( θ θ θ )] − 1 . Using (23) and (26), the Fisher informa tion matrix of the vis- ibility vecto r y is obtained as I ( θ θ θ ) = 2 σ − 2 Re ( J H s J s ) , (27) where J s is the Jacobian matrix of s wi th respect to θ θ θ J s ( θ θ θ ) = 2 X i =1 ∂ ∂θ θ θ { J ∗ qi ⊗ J pi } [ I 4 ⊗ vec ( C i { pq } )] . (28) Thus, variations of any unbiased esti mator of parameter vector θ θ θ , lets say b θ θ θ , is bounde d from be low by the CRLB as V ar ( b θ θ θ ) > [2 σ − 2 Re ( J H s J s )] − 1 . (29) Lets try to bound the error v ariati ons of the clustered calibra- tion parameters assuming that the two sources construct a single cluster , called cluster number 1 . W e reform Eq. (20) as V pq = e J p 1 ( C 1 { pq } + C 2 { pq } ) e J H q 1 + Γ 1 { pq } + Γ 2 { pq } + N pq , (30) where Γ i { pq } , referred to as the “clustering error” matrices, are gi ven by Γ i { pq } = J pi C i { pq } J H qi − e J p 1 C i { pq } e J H q 1 , (31) and e J p 1 ( e θ θ θ ) is the clustered calibration solution at receiv er p . Eq. (30) implies that what is co nsidered as the no ise matrix e N pq in the clustered calibration data model, Eq. (10), is in fact e N pq ≡ Γ 1 { pq } + Γ 2 { pq } + N pq . (32) V ectorizing E q. ( 30), the clustered calibration visibility vector i s obtained by y = e J ∗ q 1 ⊗ e J p 1 vec ( C 1 { pq } + C 2 { pq } ) + e n pq , (33) where e n pq = v ec ( e N pq ) . W e point out t hat dependin g on the observation as well as the positions of the tw o sources on t he sky , the clustering error Γ i { pq } will hav e different properties. Ho wev er, in order to study the per- formance of the clustered calibration in a st atistical sense, and to simplify our analysis, we make the follo wing assumptions. (i) Consider statisti cal expectation ove r different observations and ov er different sky realizations where the sou rces are randomly distributed on t he sky . In that case, almost surely E { e J } → E { J } and consequ ently E { Γ i { pq } } → 0 . (34) In other words , we assume the clustering error to have zero mean ov er many observation s of different parts of the sky . (ii) W e assume that the closer the sources are together i n the sk y , the smaller the errors introduced b y clustering would be. Therefore, gi ven a set of sources, the clustering error will reduce as t he nu m- ber of clusters increase. In fac t this error introduced by clustering v anishes when the number of clusters is equal to t he number of sources (each cluster contains only one source). Therefore, giv en a set of sources, the v ariance of Γ i { pq } will decreas e as the number of clusters increase. Using Eq. (34) and bearing in mind that E { N pq } = 0 , we can consider e n pq ∼ C N ( 0 , e σ 2 I 4 ) where E { e n pq e n H pq } = e σ 2 I 4 . Therefore, y ∼ C N ( e s , e σ 2 I 4 ) , c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radio Interfer om etric C alibration 7 0 1000 2000 3000 4000 5000 0 500 1000 1500 2000 Clustered calibration effective noise power CRLB Clustered calibration Un−clustered calibration Figure 5. C lustered a nd un -clustere d calibrati ons CRLB. When the effec - ti ve no ise powe r of the clustered calibra tion, || e N || 2 , is small enough, then its CRLB is lower than of the un-cluste red calibrat ion’ s and it rev eals a superior performanc e. e s ≡ e J ∗ q 1 ⊗ e J p 1 vec ( C 1 { pq } + C 2 { pq } ) , and similar to Eq. (29), we hav e V ar ( b e θ θ θ ) > [2 e σ − 2 Re ( J H e s J e s )] − 1 . (35) W e use num erical simulations to compa re t he un-clustered and clustered calibrations performances via their CRLBs which are gi ven by Eq. (29) and Eq. (35), respecti vely . 4.1.2 Example 2: CRLB for two sour ces and one clus ter W e simulate a twelve hour observation of two point sources with intensities I 1 = 11 . 25 and I 2 = 2 . 01 Jy at sk y coordinates ( l , m ) equal to ( − 0 . 014 , − 0 . 005) and ( − 0 . 011 , − 0 . 010) radians, respec- tiv ely . W e use t he uv-cov erage of W esterbork Synthesis Radio T ele- scope (WSR T) with 14 receiv ers in th is simulation. W e Consider the J Jones matrices in Eq. (20) to be diagonal. Their amplitude and phase elements follow U ( 0 . 75 , 0 . 95) and U (0 . 003 , 0 . 004) distributions, respectiv ely . The background noise is N ∼ C N ( 0 , 10 I ) . Jones matrices of the clustered calibration, e J p 1 for p = 1 , 2 , are obtained as J p 1 + U (0 . 02 , 0 . 40) e j U (0 . 5 , 2) . For 20 realizations of e J matrices, we calculated CRLB of the un- clustered and clustered calibrations using Eq. (29) and Eq. (35), respecti vely . The results are presented in Fig. 5. A s sho wn in this ﬁgure, for small enough errors matri ces Γ Γ Γ of E q. (31), the clus- tered calibration’ s CRLB stands below the un-clustered calibra- tion’ s CR LB. On t he other hand, with i ncreasing power of error ma- trices, or the power of effectiv e noise e N , the u n-clustered calibra- tion’ s CRLB becomes lo wer than the c lustered ca libration’ s CRLB. 4.1.3 Analysis of CRLB Generally , if source 1 is considerably brighter than source 2 , || C 1 { pq } || ≫ || C 2 { pq } || , and if the weak source po wer is much lo wer than the noise lev el, || C 2 { pq } || ≪ || N pq || , then clustered calibration’ s performan ce is better than un-clustered calibration. Note that the worst performance of both calibrations is at the faintest source and we are more con cerned to compare the CRLBs for this source. The CRLBs obtained for the un-clustered and clustered cal- ibrations i n E q. (29) and Eq. (35 ), respective ly , are both almost equal to the i n verse of the Signal to Interference plus Noise Ratio (SINR), SINR − 1 . In un-clustered calibration, the effecti ve signal for the faintest source is C 2 { pq } where the noise i s N pq . There- fore, SINR for this source is SINR 2 = || C 2 { pq } || 2 || N pq || 2 . ( 36) But, i n clustered calibration, the effecti ve signal and noise are e C { pq } ≡ C 1 { pq } + C 2 { pq } and e N pq , respecti vely . T hus, SIN R for the cluster is SINR c = || e C { pq } || 2 || e N pq || 2 . (37) Clustered calibration has an improv ed performance when SINR c ≫ SINR 2 . (38) Consider the t wo possib le extremes i n a clustered cali bration procedure: (i) Clustering many sources in a large ﬁeld of view to a very small number of clusters. In this case, the angular diameter of a cluster is probably too large for t he assumption of uniform corrup- tions t o apply . S ubsequen tly , dedicating a single solution to all the sources of e very cluster by clustered calibration introdu ces cluster- ing error matrices Γ Γ Γ with a large variance (see Eq. (31)). Having high interference power , the clustered calibration effec tiv e noise e N of Eq. (32) becomes v ery large. Therefore, clustered calibration SINR will b e very lo w and it does not pro duce high quality results. (ii) Clustering sources in a small ﬁeld of view to a very large number of clusters. In this ca se, the v ariance of Γ Γ Γ matr ices are al- most zero while the signal p owers of source clusters are almo st as lo w as of the indi vidual sources. Therefore, the SI NR of clustered calibration is almost equal to the un -clustered calibration SINR and the calibration performance is expected to be almost the same as well. Thus, the best efﬁcienc y of clustered calibration is o btained at the smallest number o f clusters for which Eq. (38) i s satisﬁed. W e use the SINR of Eq. (38) as an efﬁcient criterion for detecting the opti- mum number of clusters. 4.1.4 Genera lization to many sour ces and many clusters For the visibility v ector y of un-clustered calibration’ s general data model, presented by Eq. (6), we hav e y ∼ C N ( K X i =1 s i ( θ θ θ ) , Π Π Π) . (39) Therefore, the CRLB of un-clustered calibration is v ar ( θ θ θ ) >  2 Re  ( K X i =1 J s i ( θ θ θ ) ) H Π Π Π − 1 ( K X i =1 J s i ( θ θ θ ) )  − 1 , where J s i is the Jacobian matrix of s i with respect to θ θ θ . Computing the exact C RLB is more complicated when we hav e c lustered calibration. In the clustered calibration measurement equation, gi ven by E q. (13), we hav e e n ≡ K X i =1 Γ Γ Γ i + n , (40) c  2007 RAS, MNRAS 000 , 1–18 8 Kazemi et al. where n is the un-clustered calibration’ s noise vector , Γ Γ Γ i = [ v ec ( Γ Γ Γ i { 12 } ) T . . . vec ( Γ Γ Γ i { ( N − 1) N } ) T ] T , (41) and Γ Γ Γ i { pq } is giv en by Eq. (31). Due to the existence of the nuisance parameters Γ Γ Γ i in the clustered calibration data model, calculation of its con ventional CRLB is impractical. This leads us to the use of Cramer-Rao like bounds devised in t he presence of the nu isance pa- rameters (Gini & Reggiannini 2000). W e apply the Modiﬁed CRLB (MCRLB) (Gini et al. 1998) t o the performance of clustered cali - bration. The MCRL B f or estimating the errors of b ˜ θ θ θ in the presence of the nuisance parameters Γ Γ Γ (clustering error) is deﬁned as v ar ( b ˜ θ θ θ ) >  E y , Γ Γ Γ  − E y | Γ Γ Γ  ∂ ∂ ˜ θ θ θ ∂ ∂ ˜ θ θ θ T ln { P ( y | Γ Γ Γ; ˜ θ θ θ ) }  − 1 , (42) where P ( y | Γ Γ Γ; ˜ θ θ θ ) i s the Probability Density Function (PDF) of the visibility vector y assuming that the Γ Γ Γ matrices of E q. ( 41) are a priori kno wn. Since n ∼ C N ( 0 , Π Π Π) , from Eq. (13) we hav e y | Γ Γ Γ ∼ C N ([ Q X i =1 ˜ s i + K X i =1 Γ Γ Γ i ] , Π Π Π) , (43) and therefore i n Eq. (42), − E y | Γ Γ Γ  ∂ ∂ ˜ θ θ θ ∂ ∂ ˜ θ θ θ T ln { P ( y | Γ Γ Γ; ˜ θ θ θ ) }  , which is called the modiﬁed Fisher information, is equal to 2 Re { [ Q X i =1 J ˜ s i ( ˜ θ θ θ ) + K X i =1 J Γ Γ Γ i ( ˜ θ θ θ ) ] H Π Π Π − 1 [ Q X i =1 J ˜ s i ( ˜ θ θ θ ) + K X i =1 J Γ Γ Γ i ( ˜ θ θ θ ) ] } . Note that E y , Γ Γ Γ in Eq. (42) could be estimated by Monte-Carlo method. As a rule of thumb, reducing the heavy computational cost of MCRLB, one can interpret the S INR test o f E q. (38) as follo ws: If in average the ef fective SINR of clustered ca libration, S INR c , gets higher than the ef fective SINR of un-clustered calibration obtained for the weakest obse rved signal, SINR w , E { SINR c } ≫ E { SINR w } , (44) then clustered calibration can achiev e a better results. In Eq. (44), the expec tation is taken with respect to the n oise N , error matrices Γ Γ Γ , and all the baselines. 4.1.5 Example 3: MCRLB and SINR estimations W e simulate WS R T including N = 14 receiv ers which observe ﬁfty sources with intensities belo w 15 Jy . The source positions and their brightness foll o w uniform and Ray leigh distributions, re- specti vely . The background noise is N ∼ C N ( 0 , 15 I M ) , where M = 2 N ( N − 1) = 364 . W e cluster sources using divisi ve hierarchical clustering, into Q number of clusters where Q ∈ { 3 , 4 , . . . , 50 } . Clustered calibration’ s Jones matrices, e J , are gener- ated as U (0 . 9 , 1 . 1) e j U (0 , 0 . 2) . Since fo r smaller numb er of clusters, we expect larg er interferenc e (errors) in clustered calibration’ s so- lutions, for ev ery Q , we co nsider P 50 i =1 Γ i ∼ C N ( 0 , 150 Q I M ) . The choice of the complex Gaussian distribution for the error matri ces Γ Γ Γ is due to the central limit theorem and the assumptions mad e in section 4.1.1. W e proceed to calculate the clustered calibration’ s MCRLB, gi ven by Eq. (42), and E { SINR c } , utilizing the Monte-Carlo method. Jacobian matrices for MCRLB are calculated numerically and in computation of E { SINR c } , signal po wer of ev ery cluster is obtained o nly using the clus ter’ s brightest and faintest sources. T he 0 10 20 30 40 50 1 2 3 4 5 x 10 −10 Number of clusters MCRLB Clustered calibration Un−clustered calibration Figure 6. Cluste red calibra tion’ s MCRLB. For very small Q , where the ef fect of interferenc e is lar ge, MCRLB is high. By increasi ng the number of clusters, MCRLB decreases and reaches its m inimum where the best performanc e of the clustered calibrat ion is expecte d. After that, due to the dominant ef fect of the background noise, MCRLB starts to increase until it reache s the un-clust ered calibr ation CRLB. 0 10 20 30 40 50 3 4 5 6 7 8 9 10 x 10 −6 Number of clusters SINR Clustered calibration Un−clustered calibration Figure 7. Clustere d calibrat ion’ s SINR. SINR is low for small Q , when the interference is larg e. By increasing the number of clusters the SINR increa ses and gets its highe st lev el for which the best performance of the calib ration is e xpecte d. After that, i t decre ases due to the do minant ef fect of the back ground noise, and con verge s to the un-cluster ed calibrat ion SINR. estimated results of MCRLB and E { SINR c } are presented by Fig. 6 and Fig. 7, respecti vely . As we can see in Fig. 6, for very sma ll Q , wh ere the effect of interference is large, MCRLB is high. By in- creasing the number of clusters, MCRLB decreases and reaches its minimum where the best performance of the clustered calibration is expec ted. After that, du e to the dominan t ef fect of the background noise, MCRLB starts to increase until it reaches t he CRLB of un- clustered calibration. The same result i s derived f rom E { SINR c } plot of Fig. 7. As Fi g. 7 shows, E { SINR c } is low for very small Q , when the interferen ce (i.e., the error due to clustering) is large. By increasing the number of clusters, E { SINR c } increases and gets its peak for which the clustered calibration performs the best. After that, it decreases and con verges to the E { SIN R } of un-clustered calibration. c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radio Interfer om etric C alibration 9 4.2 Computational cost In the measureme nt equation of un-clustered calibration, presen ted in Eq. (6), we hav e M = 2 N ( N − 1) constraints giv en by the visibility vector y , and need to solve for P = 4 K N unkno wn parameters θ θ θ . If P > M , then Eq. (6) wi ll be an under-d etermined non-linear system. This clariﬁes the ne ed of ha ving a small enough N (number of antennas) and a lar ge enough K (numbe r of sources) for estimating θ θ θ . Howe ver , clustered calibration, Eq. (13), has the adv antage of decreasing the number of directions, K , relativ e to the number of sourc e clusters, Q ≪ K . This considerably cuts do wn the number of un kno wn parameters P that needs to be calibrated, thus reducing the computationa l cost of calibration. 5 SELECTION OF NUMBER OF CLUSTERS Consider a clus tered calibration proce dure with a predeﬁned clus- tering scheme. There is no guarantee that the calibration results for Q number of clusters, where Q ∈ { 1 , 2 , . . . , K } is randomly chosen, is the most accurate. Thus, we seek the optimum number of clusters at which the clustered calibration performs the be st. In this section, we describe the use of: (i) Akaike’ s Information Cri- terion (AI C) (Akaike 1973), as well as (ii) Likelihood-Ratio T est (LR T) (Gr av es 1978), in ﬁnding this optimum Q for a gi ven ob- serv ation. Some other alt ernati ve criteria could also be found in W ax & Kailath (1985). Note that for different clustering schemes the optimum Q is not necessarily the same. 5.1 Akaike’ s In f ormation Criterion (AIC) W e utilize Akaike’ s Information Crit erion (AIC) to ﬁnd the opti- mum Q for clustered calibration. Consider ha ving e n ∼ C N ( 0 , e σ 2 I M ) in the general data model of clustered calibration, Eq. (13). Then, the log-likeliho od of the visi- bility vec tor y is gi ven by L ( e θ θ θ | y ) = − M log π − M log e σ 2 − 1 e σ 2 ( y − Q X i =1 e s i ( e θ θ θ ) ) H ( y − Q X i =1 e s i ( e θ θ θ ) ) . (45) The maximum likelihood estimation of the noise v ariance e σ 2 is c e σ 2 = 1 M ( y − Q X i =1 e s i ( e θ θ θ ) ) H ( y − Q X i =1 e s i ( e θ θ θ ) ) . (46) Substituting Eq. (46) in Eq. (45), we arri ve at the maximum likeli- hood estimation of e θ θ θ , L ( b e θ θ θ | y ) = − M log π − M − M log { 1 M ( y − Q X i =1 e s i ( e θ θ θ )) H ( y − Q X i =1 e s i ( e θ θ θ ) ) } . (47) Using Eq. (47), the AIC is gi ven by AIC ( Q ) = − 2 L ( b e θ θ θ | y ) + 2(2 e P ) . (48) The optimum Q is selected as the one that minimizes AIC ( Q ) . 5.2 Likelihood-Ratio T est (LR T) Errors in clustered calibration originate from the system (sky and instrumental) noise, “clustering errors” introduc ed in section 4.1.1, and “solver noise” which is referred to as errors produced by the calibration algorithm itself. W e assume tha t the true Jo nes matrices along differen t directions (clusters) at the same antenna are sta- tistically uncorrelated. Note that this assumption is only made for the L R T to produce reasonable r esults and this assumption is not needed for clustered calibration to work. Therefore, if such corre- lations exist, they are caused by the aforementioned errors. Con- sequently , the more accu rate the clustered calibration solutions are, the small er their statistical similarities would be. Bas ed on this gen- eral statement, the best nu mber of clusters in a clustered calibration procedure is the one which provides us wi th the minimum correla- tions in the calibrated solutions. Note that for a ﬁxed measurement, the correlation due to the system noise i s ﬁxed. Therefore, differ- ences in the statistical si milarities of solutions obtained b y dif ferent clustering sch emes are only d ue to “clustering errors” and “solver noise”. T o in vestigate the statistical interac tion betwee n the gain solu- tions we apply the Likelihoo d-Ratio T est (LR T). Consider the clustered calibration so lution e J pi ( e θ θ θ ) for directions i , i ∈ { 1 , 2 , . . . , Q } , at antennas p , where p ∈ { 1 , 2 , . . . , N } , e J pi = " e J 11 ,p e J 12 ,p e J 21 ,p e J 22 ,p # i . (49) Then, the parameter vector e θ θ θ pi (corresponding to the i -th direction and p -th antenna) is obtained by e θ θ θ pi = [ Re ( e J 11 ,p ) Im ( e J 11 ,p ) . . . Re ( e J 22 ,p ) Im ( e J 22 ,p )] T i . (50) Let us deﬁne for each antenna p and each pair of directions k and l , where k and l are belong to { 1 , 2 , ..., Q } , a vector z pkl as z pkl = [ e θ θ θ T pk e θ θ θ T pl ] T . (51) In fact, we are concatenating the solutions of the same antenn a f or two dif f erent directions (clusters) together . W e deﬁne the null H 0 model as H 0 : z pkl ∼ N ( m , Σ 0 ) . (52) where m = [ ¯ m ( e θ θ θ pk ) T ¯ m ( e θ θ θ pl ) T ] T , (53 ) and Σ 0 = " s 2 ( e θ θ θ pk ) 0 0 s 2 ( e θ θ θ pl ) # . (54) In E q. (53) and Eq. (54 ), ¯ m and s 2 are denoting sample mean and sample va riance, respectiv ely . Note that having a large number of samples in han d, the assumption of havin g a Gaussian distri bution for solutions is justiﬁed according to the Central Limit theorem. The structure of the va riance matri x Σ 0 tells us that the statistical correlation be tween the componen ts of the random v ector z qk l , or between the solutions e θ θ θ pk and e θ θ θ pl , is zero. T his is the ideal case i n which there are no estimation errors. T o in vestigate the v alidity of the nu ll mo del compared wit h the case in which there e xists some correlation between the so lutions, we deﬁne the alternativ e H 1 model as H 1 : z pkl ∼ N ( m , Σ 1 ) , (55) where the v ariance matrix Σ 1 is giv en by Σ 1 =   s 2 ( e θ θ θ pk ) Cov ( e θ θ θ pk , e θ θ θ pl ) Cov ( e θ θ θ pk , e θ θ θ pl ) T s 2 ( e θ θ θ pl )   . (56) c  2007 RAS, MNRAS 000 , 1–18 10 Kazemi et al. Cov ( e θ θ θ pk , e θ θ θ pl ) in Eq. (56) is the 8 × 8 samp le cov ariance matri x. Using the abov e mo dels, the Likelihood-Ratio is deﬁned as Λ = − 2 ln  Likelihood for nu ll model Likelihood for alternati ve model  , (57) which has a χ 2 distribution with 64 degrees of freedom. As Λ be- comes smaller, the null model, i n which the statistical correlation between the solutions is zero , beco mes more acceptable rather t han the alternativ e model. T herefore, the smaller the Λ is, the less the clustered calibration’ s errors are, and vice-versa. 6 SIMULA TION STUDIES W e use simulations t o compare the performance of un-clustered and clustered calibration. W orking with simulations has the adv an- tage of having the t rue solutions available, which is not t he case in real observations. That makes the comparison much more objec- tiv e. Nev ertheless, the better p erformance of the clustered calibra- tion i n comparison with the un-clustered on es in calibrating for real observ ations of LOF AR is also sho wn by Kazemi et al. (2011a); Y atawatta et al. (su bmitted 2012). W e simulate an East-W est radio synthesis array including 14 antennas (simil ar to WSRT) and an 8 by 8 degrees sky with ﬁfty sources with low intensities, below 3 Jy . The so urce positions and their brightness follo w uniform and Rayleigh distributions, respec- tiv ely . The single channel simulated observ ation at 355 MHz is sho wn in Fig. 8. W e proceed to add gain errors, multiplying source coherencies by the Jo nes matrices, as it is sho wn i n Eq. (2), to our si mulation. The amplitude and pha se of the Jones matrices’ elements a re gener- ated using linear combina tion of sine and cosine fu nctions. W e aim at simulating a sky wit h almost uniform variations on small angular scales. In other words, we p rovide v ery similar Jo nes matrices for sources with small angular se parations. T o a ccomplish this goal, for e very antenna, we ﬁrst choose a single direction as a reference and simulate its Jones matrix as it is explained before. Then, for the remaining f orty nine sources, at that antenna, the Jones matrices (amplitude and phase terms) are that initial Jone s matrix m ultiplied by the i n verse of their correspondin g angular distances from that reference direction. T he result of adding such gain errors to our simulation is sho wn in Fig. 9. 6.1 Perf ormance comparison of the Clu stered and un-clustered calibrations at SNR=2 W e add no ise n ∼ C N ( 0 , σ 2 I ) with σ 2 = 28 , as it is sho wn in Eq. (6), to our simu lation. The r esult has a S N R = 2 and i s presented in Fig. 10. W e have chosen to present the case of S N R = 2 since for this particular simulated observ ati on both the div isiv e h ierarchi- cal and the weighted K-means clustered calibrations achie ve their best p erformances at the same number o f clusters, as will be shown later in this section. W e ap ply un-clustered and clustered calibration on the simu- lation to compare their efﬁciencies. The ﬁfty sources are grouped into Q ∈ { 3 , 4 , . . . , 49 } number of clusters, using the proposed di visiv e hierarchical an d w eighted K-means clustering algorithms. Self-calibration is implemented via S pace Alternating General- ized E xpectation Maximization (SA GE) algorithm (F essler & Hero 1994; Y atawatta et al. 2009; Kazemi et al. 2011b) with nine it era- tions. Plots of the av eraged Frobenius distance between the simu- lated (true) Jones matrices and the obtained solutions is sho wn i n Fig. 11 . As we can see i n F ig. 1 1, for both clustering schemes, in- creasing the nu mber of clusters decreases this distance and th e min- imum is reached at approximately thirty three clusters ( Q = 33 ). Beyon d this number of clusters, it increases until the ﬁfty individual sources become ind ividu al clusters. This sho ws that the best perfo r- mance of both the di visiv e hierarc hical an d the weighted K-mean s clustered calibrations i s approximately at thirty three clusters and is superior to that of the un-clustered calibration. The Frobenius distance curves in Fig. 11, the MCRLB curve in Fig. 6 , and the SINR curv e in Fig. 7 illustrate that clustered calibra- tion with an e xtremely low numb er of clusters does not nece ssarily perform better than the un-clustered cali bration. The reason is that when there are only a small number of clu sters, the interference, o r the so -called “clustering errors” introduced in section 4.1.1, is rela- tiv ely large. There fore, the ef fect of this interference domin ates the clustering of signals. On the other hand, we reach the theoretical performance limit approximately after twenty ﬁv e number of clus- ters and therefore increasing the number beyon d this point giv es highly va riable results, mainly because we are limited by the num- ber of constraints as opposed to the number o f parameters t hat we need to solv e for . But, this is not the case for th e plots in Fig. 6 and Fig. 7. The reason is t hat the MCRLB results of Fig. 6 as well as the SINR results of F ig. 7 are obtained by Monte-Carlo method with iterations ov er ﬁfty different sky and noise realizations. Howe ver , Fig. 1 1 is li mited to the presented speciﬁc simulation with only one sky and one no ise realization. The residual images of t he un-clustered calibration as well as the divisi ve hierarchical and weighted K-means clustered calibra- tions for Q = 33 are shown by Fig. 12 and Fig. 13, respectiv ely . As i t is sho wn by Fig. 12 , in the resu lt of un-clustered calibration, the sources are almost not subtracted at all and there is a si gniﬁ- cant residual error remaining. The residuals hav e asymmetric Gaus- sian distribution with va riance σ 2 = 82 . 2 9 which is much larger than the simulated (true) noise variance σ 2 = 10 . 85 . On t he other hand, the sources hav e been perfectly subtracted in the case o f clus- tered calibration, F ig. 13, and the residuals con verge to the simu- lated background noise distribution. The residuals of the div isiv e hierarchical and W eighted K-means clustered calibrations follo w symmetric zero mean Gaussian distribution s with σ 2 = 20 . 17 and σ 2 = 18 . 76 , respectiv ely . These v ariances are closer to the simu- lated one σ 2 = 10 . 85 and this indicates th e prom ising pe rformance of clustered calibration . As we can see , hierarchical clustered cal- ibration pro vides a slightly better r esult comp ared to the K-means one. This is due to the fact that hierarchical clustering constructs clusters of smaller angular diameters and thus it assigns the same calibration solutions to the sources with smaller angular separa- tions. W e also calculate the Root Mean Squared Error of Prediction (RMSEP) to asse ss the performance of clustered and un -clustered calibrations’ non-linear regressions. The results of l og(RMSEP), presented by Fig. 14, also justify that the best ef ﬁciencies of the hi- erarchical and K-means clustered calibrations are obtained at thirty three number of clusters. But, note that t here is a difference be- tween the behavior of log(RMSEP) plot of Fig. 14 and the plots of MCRLB, SINR, and Frobenius distance between the simulated Jones matri ces and solutions in Fig. 6, Fi g. 7, and Fig. 11, respec- tiv ely . In Fig. 14, log(RMSEP) of clustered cali bration is less than that of un-clustered calibration, e ven for extreme ly lo w numbe r of clusters. This means that even with those low nu mber of clusters, c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radi o Interfer ometric Calibration 11 Jy −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Figure 8. Single channel simulated observ ation of ﬁfty sources, with intensities below 3 Jy . The source position s and their brightness are follo wing uniform and Rayle igh distrib utions, respecti vely . The image size is 8 by 8 de grees at 355 MHz. There are no gain errors and noise in the simulation . Jy −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Figure 9. Simulated imag e with added gain errors. The errors, the c omplex 2 × 2 Jones mat rices, are g enerate d as line ar combinati ons of sin and cos func tions. The v ariatio n of the sky is almost uniform on small angula r scales. c  2007 RAS, MNRAS 000 , 1–18 12 Kazemi et al. Jy −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Figure 10. Simula ted image of sky , corrupte d by gain errors and by noise, as in Eq. (6). The simulated noise vec tor n has zero m ean compl ex Gaussian distrib ution and the SNR is equal to 2. Jy −0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 Figure 12. Re sidual image of the un-cl ustered SA GE calibrat ion for ﬁft y sources. The sources ar e almost not subtract ed at all an d there are signiﬁcant resid ual errors around them. The residua ls hav e asymm etric Gaussian distributi on with va riance σ 2 = 82 . 29 which is m uch larger than the true noise va riance σ 2 = 10 . 85 . c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radi o Interfer ometric Calibration 13 Jy −0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 Figure 13. The residual images of the clustered cali bration using hierarchic al (right) and W eighted K-means (left) cluste ring methods with thirty three source clusters. Calibrati on is implemented by SA GE algorith m. The sources are subtracte d perfectly and the residual s con verge to the simulated backgro und noise distr ibuti on. The hierarc hical clustering and W eighted K-means resi duals follo w symmetric zero mea n Gaussia n distrib utions with σ 2 = 20 . 17 and σ 2 = 18 . 76 , respecti vely , where the simulate d noise distrib ution is, C N ( 0 , 10 . 85 I ) . 0 10 20 30 40 50 0 5 10 15 20 25 30 35 Number of Clusters ||True Jones−Calibration solutions|| Weighted K−means Divisive Hierarchical un−clustered Figure 11. The ave rage Frobenius distance between the simulated (true) Jones matrices and the solutions of clustered and un-clustered calibrat ions. The two curv es represent clustere d calibrat ion via di visiv e hi erarchi cal and weighte d K-means c lustering a lgorithms. By increasing the number of clus- ters, for both c lusterin g methods, t his dista nce is decre ased and gets its min - imum approximat ely at thirty three c lusters. After that, it i s increased t ill the ﬁfty indiv idual sources. That sho ws that the best pe rformance of the clus- tered cali bration is at around thirty three cluste rs. clustered calibration performs better than the un-clustered calibra- tion. This is some what in disagreement wi th th e scenarios sho wn in Fig. 6, Fig. 7, and Fig. 11. For a better understanding of the reason behind t his contrast, ﬁrst lets see ho w the residual errors are origi- nated. Based on Eq. (13) and Eq. (40), in the clustered calibration strategy , we hav e y = Q X i =1 e s i ( e θ θ θ ) + K X i =1 Γ Γ Γ i + n , (58) After ex ecuting a calibration for the above data model, there is a distance between the target parameters e θ θ θ and the esti mated solu- tions b e θ θ θ . This is t he so-called “solver noise”, mentioned in section 5.2. Thus, the residuals are gi ven as y − Q X i =1 e s i ( b e θ θ θ ) = Q X i =1 { e s i ( e θ θ θ ) − e s i ( b e θ θ θ ) } + K X i =1 Γ Γ Γ i + n . (59 ) From Eq. ( 59), we immediately see that the background noise n is ﬁxed and the “clustering errors” are calculated for all the sources as P K i =1 Γ Γ Γ i . Ho wev er, since we solve only for Q directions and not for all the K sources i ndi vidually , the “solv er noise” part, P Q i =1 { e s i ( e θ θ θ ) − e s i ( b e θ θ θ ) } , is also calculated only for Q clusters and not for all the K sources. It is clear that for a very small Q , this term could be much less than for Q ≃ K . Therefore, in Fig. 11, the result of RMSEP at a very lo w numb er of clusters is still less than the ones at Q = K . When Q is not at the two extremes of being very small or very lar ge (almost equal t o the number of indi vidual sources), then the result of RMSEP is promising. Mo reov er , ap ply- ing more accurate calibration methods or increasing t he number of iterations, the “solver noise” will decrease and subsequently , we expec t the RMSEP curve to ha ve the same behav ior as the curves of Fig. 7 and Fig. 11. c  2007 RAS, MNRAS 000 , 1–18 14 Kazemi et al. 0 10 20 30 40 50 −3 −2 −1 0 1 2 3 Number of clusters log(RMSEP) Weighted K−means Divisive Hierarchical un−clustered Figure 14 . The RMSEP f or cluste red and un-cluste red calibration s. The re- sults are obtained using a ba se ten logarit hmic scal e. The two curv es are cor - responding to cluste red calibra tion via divisi ve hierarch ical and weighted K-means clust ering algorithms. By increa sing the number of cluster s, the results are dec reased and the minimum resul t is obtai ned at around thirty three clusters. After that, the results are increased till the ﬁfty individ ual sources. That shows the superior performance of th e clustere d calibrat ion compared to the un-clustered one. The best performance of the clustere d calib ration for both of the applied clustering methods is at around thirty three number of clusters. 6.2 Optimum number of clusters f or SNR=2 W e uti lize AIC and L R T to select the optimum number of clus- ters for which clustered calibration achiev es its best performance. The methods are applied to our simulati on f or the case of SNR=2. The AIC and LR T results are shown in Fi g. 15 and F ig. 16, re- specti vely . They both agree on Q = 33 as the optimum number of clusters for the divisi ve hierarchical and W eighted K-means clus- tered calibrations. Lik elihood-Ratio plot of Fig. 16 has almost the same beha vior as the plot of Frobenius distance between the sim- ulated Jones matrices and the obtained solutions presented b y Fig. 11. T he reason i s that the results of the both plots are obtained using the solutions them selves as t he input data. Ho wev er, since AIC re- sults are computed using the re sidual errors as inputs, which is also the case for o btaining the RMSEP curves o f Fig. 14, AIC curves o f Fig. 1 5 are slightly steep er than t he Frobenius distance between the simulated Jones matri ces and solutions and the Li kelihoo d-Ratio curves of Fig. 11 and Fig. 16, respecti vely . 6.3 Clustered calibration’ s efﬁciency at different SNRs W e start changing the noise in our simulation to see how it ef- fects t he clustered calibration’ s efﬁcienc y . W e simulate the cases for which SNR ∈ { 1 , 2 , . . . , 15 } and apply clustered calibration on them. Since the sky model does not change, the clusters obtained b y di visiv e hierarchical and weighted K-means methods fo r the case of SNR = 2 remain the same. Fig. 17 sho ws the optimum number of clusters, on which the best performances of clustered calibrations are obtained for those different SNRs. As we can see in Fi g. 17 , for lo w SNRs, the optimum Q is small. By increasing the SNR, the optimum Q is i ncreased till it be comes equal to the num ber of all the i ndi vidual so urces that we ha ve in the sky , i.e., K . This means that when the S NR is very low , the beneﬁt of improving signals by clustering sources is much higher than the payoff of introducing “clustering errors” in a clustered calibration procedure. T herefore, clustered calibration for Q ≪ K has a su perior performance com- 0 10 20 30 40 50 −3 −2.5 −2 −1.5 −1 −0.5 x 10 6 Number of clusters AIC Weighted K−means Divisive Hierarchical un−clustered Figure 15. AIC plot for clustered and un-clustere d calibrati ons. Both the weighte d K-means and divi si ve hierarchi cal clustered calibra tions get their minimum AIC at about thirt y three clusters. T his illustrates th at the ir be st performanc es are obta ined at this number of cl usters. Also, their AIC re- sults at thirty th ree cluste rs is lo wer than the un-clust ered calibratio n’ s AIC, which shows the ir bette r performance s compared to the un-cluste red cali- bration . 0 10 20 30 40 50 200 400 600 800 1000 1200 1400 1600 1800 Number of clusters Likelihood−Ratio Weighted K−means Divisive Hierarchical un−clustered Figure 16 . Likelihoo d-ratio of the gain solutions obtaine d by clustered and un-cluste red calibrati ons. In the both cases of weighte d K-mea ns and divi- si ve hierarchical clustered calibration s, the minimum Likelih ood-ratio v al- ues belong to approxi mately thirty three number of clusters. These min- imums are also lower than the un-clustered calibration ’ s L ike lihood-ra tio result. Therefore, clustere d calibrat ion via both the clustering methods per- forms better than the un-clustered calibration and it achie ves the best accu- rac y in its solutions at thirty three clusters. pared to the un-clustered calibration. While for a h igh enough SNR, the sit uation becomes the opposite. In this case, t he un-clustered calibration achiev es be tter resu lts c ompared to clustered c alibration having th e disadv antage of introducing “clustering errors”. 6.3.1 Empirical estimation of SINR Having the results of Fig. 17 in hand, we could ﬁnd an empirical model for estimating the optimum number of clusters for various SNRs. Note that by changing the observ ation and the instrument characteristics, this model will also be changed. As it is explained in section 4.1.3, the best performance of clustered calibration is obtained when the S INR is at its highest c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radi o Interfer ometric Calibration 15 0 5 10 15 15 20 25 30 35 40 45 50 55 SNR Optimum number of clusters (Q) Weighted K−means Divisive Hierarchical un−clustered Figure 17. The opti mum n umber o f clusters, on which the best performanc e of cl ustered cali bration is obta ined, at dif ferent SNRs. F or lo w SNRs, the ef ﬁcienc y of the clustered calibrati on is superior to the un-clustered cali- bration . As the SNR gets higher , clustered calibration s achie ve their best solutions utiliz ing a higher number of clusters. Finally , when the SNR is high enough, the performance of un-clustere d cali bration becomes better than the clustere d one. lev el. Fig. 17 sho ws the nu mber of clusters on which the maximum SINR was obtained, where the signal (sky) and the noise powers are kno wn a priori. Thus, the only unkno wn for estimating t he SINR is the interference, or the “clustering err ors”, for w hich we need to have a prediction model. After estimating t he SINR using this model, ﬁnding the optimum Q will be straightforwa rd. Consider the deﬁnition of “clustering errors” giv en by Eq. (31). It is logical that for ev ery source, the differen ce between its true Jones ma trix a nd th e clustered calibration solution, || J − e J || , is a function of the angular distance between the source and the cen- troid of the cluster that it belongs to. Based on this and using E q. (31) and Eq. (34), fo r the interference of the i -th cluster at baseline p − q we assume that X l ∈ L i Γ l { pq } ∼ C N ( 0 , η || e C i { pq } || 2 { D ( L i ) } ν I 2 ) , (60) where η and ν are unkno wns. E q. (60 ), in fact, conside rs an inter - ference power (variance ) of η || e C i { pq } || 2 { D ( L i ) } ν for ev ery i -th cluster , i ∈ { 1 , 2 , . . . , Q } , at baseline p − q . Assuming the interfer- ences of dif ferent clusters to be statistically in dependent from each other , and bearing in mind that the baseline’ s additive noise N pq has also a complex Gaussian distri bution independen t from those interferences’, then the noise plus interference po wer for the i -th cluster at baseline p − q is obtained by η || e C i { pq } || 2 { D ( L i ) } ν + || N pq || 2 . (61) Fitting suitable η and ν to Eq. (61), the SINR for the i -th cluster at baseline p − q is equal to the cluster’ s signal power , || e C i { pq } || 2 , di vided by t he result of Eq. (61). Subsequently , estimation of E { SINR c } will be straight forward where the expe ctation is cal- culated with respect to all the source clusters and all the base- lines. Note that simulation provides us with the true noise po wer , || N pq || 2 . In the case of hav ing a real observ ation, t his po wer could be estimated by Eq. (46). Fig. 18 sho ws the number of clusters on which di visiv e hierar- chical and weighted K-means clustered calibrations achiev e their maximum estimated E { SINR c } . T he results are calculated for SNR ∈ { 1 , 2 , . . . , 15 } . For the hierarchical clustering η = 1550 0 5 10 15 10 20 30 40 50 SNR Divisive Hierarchical optimum Q True value SINR estimation 0 5 10 15 10 20 30 40 50 SNR Weighted K−means optimum Q True value SINR estimation Figure 18. Optimum number of clusters at which the div isi ve hierarchi- cal (top) and W eighted K-m eans (bottom) clustered calibratio ns perform the best. For both of the c lustering m ethods, the results obtained by SINR estimati ons mostly match the true optimum number of clusters. and ν = 0 . 3 , and for the K-means clustering η = 2500 and ν = 0 . 003 . As we can see, for both the clustering methods, these maximum E { SINR c } s are mostly obtained at the true optimum number of clusters for which clustered calibration performed the best. Introducing more reﬁned mo dels compared to Eq. ( 60) could e ven impro ve the current result. 6.4 Realistic sky models So far , we have l imited our studies to sky models in which the brightness and position of the radio sources f ollo w Rayleigh and uniform distributions, respectiv ely . These characteristics provide us with a smo oth and uniform variation of ﬂux intensities in our sim- ulated skies. In such a case, the effects of the ba ckground noise on the faintest and t he strongest signals are almost the same. There- fore, if clustered cali bration performs better than the un-clustered calibration that would be only based on upgrading the signals against the noise. Although, in nature, we mostly deal with the sky models in which the distribution of the ﬂ ux intensities is a po wer law , wi th a steep slope, and t he spatial distribution i s Poisson. Hence, there exist a fe w number of very bright sou rces, whose sig- nals are considerably stronger than the others, and they are sparse in the ﬁeld of view . The corruptions of the background noise plu s the interferences of the strong signals of those few bright sources make c  2007 RAS, MNRAS 000 , 1–18 16 Kazemi et al. the calibration of the other faint point sources impractical. Thus, there is the need for utilizing the clustered calibration which applies the solutions of the brigh t sources to their closed by fainter ones or solves for upgraded si gnals obtained b y adding up a group of f aint signals t ogether . This has been sho wn by Kazemi et al. (2011a); Y atawatta et al. (submitted 2012), w hen comparing the efﬁcienc y of the clustered and un-clustered calibrations on LOF AR real ob- serv ations. In this section, using simulations, we also rev eal the superiority of clustered calibration compared to the un-clustered calibration for such sky mode ls. W e simulate a sky of 52 radio point sources which are obtained by modiﬁed Jeli ´ c et al. (2008) foregro und model. T he brightness distribution of the po int sources follo ws the source co unt function obtained at 151 MHz (Willott et al. 2001), while t he ang ular clus- tering o f the sources are characterized by a typical t wo-point corre- lation function, ρ ( d ) = Ad − 0 . 8 . ( 62) In (6 2), ρ is the t wo point correlation fu nction, d is the angular se p- aration, and A is the normalization amplitude of ρ . The ﬂux cut off is 0.1 Jy . W e co rrupt the signals with gain errors which are linear co mbina- tions of sin and cos functions, as in our previous simulations. At the end, a zero mean Gaussian thermal noise with a standard d ev i- ation of 3 mJy is ad ded to the simulated data. The resu lt is shown in Fig. 19. In Fig. 19, all the brigh t sources are concentrated on the right side of the image, rather than being uniformly distributed in the ﬁeld of view , and the rest of the sources are so faint that are almost in visible. W e ap ply the clustered and un -clustered calibrations on Q ∈ { 3 , 4 , . . . 51 } nu mber of clusters and K = 52 number of ind ivid- ual sources, r especti vely . The clustering method used is the divi- siv e hierarchical and the calibrations are executed via S A GE al- gorithm with nine number of iterations. T he residual noise v ari- ances o btained are demonstrated in Fig. 20. As Fig. 20 sho ws, the lev el of the residu al noise obtained by the clustered cali bration for Q ∈ { 15 , 16 , . . . 45 } number of clusters is alw ays below the result of the un-clustered calibration. T his proves the better performance of the c lustered calibration. The best result of the clustered c alibra- tion, with the minimum noise lev el, is achie ved for Q = 27 number of clusters. The residual images of the clus tered calibration with Q = 27 number of source clusters, and the un-clustered calibration for K = 52 indi vidual sources are shown by Fig. 21. In the right side of t he residual image of the un-clustered calibration there e xist ar- tiﬁcial stripes caused by over and under estimating the brightest sources of the ﬁeld of view . That shows the problematic perfor- mance of t he un -clustered calibration. Ho we ver , clustered calibra- tion has gen erated mu ch less artiﬁcial effects after subtracting these sources. On top of that, t he zoomed i n windo w in the left side of the images of Fig. 21 show that the f aint sources are n ot remov ed by the un-clustered calibration at all, while being almost perfectly subtracted by the clustered calibration. Moreov er , the residual noise of the clustered calibration follows a symmetric zero mean Gaus- sian distributions with a standard deviation of 4.2 mJy , while the one from the un-clustered calibration has an asymmetric Gaussian distribution with mean an d standard de viation equal to -1.2 an d 5.3 mJy , respectiv ely . T aking to account that the simulated noise dis- tribution is a zero mean Gaussian distribution with a variance of 3 mJy , the superior performance of the clustered calibration com- pared to the un-clustered one is e vident. As the ﬁnal conclusion of this simulation, calibrating below 0 10 20 30 40 50 4 5 6 7 8 Number of clusters Noise variance Divisive Hierarchical Un−clustered Figure 20. The noise v ariance s of the residual images, obtaine d by clus- tered and un-clustered c alibrat ions, in mJy . The lev el of the residual noise obtaine d by the cl ustered ca librati on for Q ∈ { 15 , 16 , . . . 45 } number of clusters stands bel ow the resul t of the un-cluster ed cali bration. That re veals the superior performanc e of the clust ered cali bration in compari son wi th t he un-cluste red one. The best result o f the cluster ed calibr ation which a chie ves the minimum noise le vel is at Q = 27 number of clusters. the noise level, clustered calibration always performs better t han the un-clustered calibration. This is regardless of the sky model and is only based on the fact that solving for i ndi vidual sources with very poor signals is impractical. Nev ertheless, whe n some sources are very close to each other , t he sky corruption on their signals would be ex actly the same and there is no point in solving for ev ery of them indi vidually . 7 CONCLUSIONS In this paper , we demonstrate t he superior performance of “clus- tered calibration” compared to un-clustered calibration especially in calibrating sources that are belo w the calibration no ise le vel. The superiority is in the sense of having more accurate results by the enhanceme nt of SNR as well as by the improvemen t of computa- tional efﬁcienc y by reducing the number of directions alon g which calibration has to be performed. In a “clustered calibration” procedure, sky sources are grouped into so me clusters and every cluster i s calibrated as a sin- gle source . That replaces the coheren cies of indi vidual sources by the t otal coherenc y of the cluster . Clustered calibration is applied to these new coherencies that carry a higher le vel of information com- pared with the individual ones. Thus, f or the calibration of sources belo w the noise lev el it has a considerably better performance com- pared to un-clustered calibration. A nother way of look ing at clus- tering is to consider the distribution of source ﬂux densities, i.e. the n umber of sources vs. th e source ﬂux density cu rve. Re gardless of the intrinsic ﬂux den sity distribution, clustering mak es the num- ber of c lusters vs. the cluster ﬂux density cu rve more u niform, thus yielding superior performance. An analytical proof of t his superi- ority , for an arbitrary sky model, is presented using MCRLB and SINR analysis. KLD and LR T are utilized to detect the optimum number of clusters, for w hich the clustered calibration accomplishes its best performance. A model for estimating SINR of clustered calibration is also presented by which we could ﬁ nd the optimum number of clusters at lo w co mputational cost. c  2007 RAS, MNRAS 000 , 1–18 Cluster ed Radi o Interfer ometric Calibration 17 Jy −0.2 0 0.2 0.4 0.6 0.8 Figure 19. Simula ted observ ation of ﬁfty two sources which ar e obtaine d by modiﬁe d Jeli ´ c et al. (2008) foreground model . The co rrupting gain errors are generat ed as linear combinatio ns of sin and cos functions. The image size is 8 by 8 degrees and the additi ve thermal noise is a zero m ean Gaussia n noise with a standard de viatio n of 3 mJy . Divisi ve hierarchical as well as W eighted K-means clustering methods are used to exploit the spatial proximity of the sources. Simulation studies revea l clustered calibration’ s improved perfor- mance at a low S NR, util izing these clu stering algo rithms. Bo th the clustering methods are hard clustering techniques which di vide data to distinct clusters. Howe ver , we expect more accu rate results using fuzzy (so ft) clustering, which constructs ov erlapping clusters with uncertain boundaries. Application and performance of this type of clustering for clustered calibration will be e xplored in future work. A CKNO WLEDGM ENT The ﬁrst autho r would like t o gratefully ac kno wledge NWO grant 436040 and to thank N . Monshizadeh for the useful discussions. W e also thank the referee for a careful revie w and v aluable commen ts that helped enhance this paper . REFERENCES Akaike H., 1973, in Second International Symposium on Infor- mation Theory (Tsahkadsor , 1971), Akad ´ emiai Kiad ´ o, Budapest, pp. 267–28 1 Bernardi G., Mitchell D. A., Ord S. M., Greenhill L. J., Pindor B., W ayth R. B., W yithe J. S. B., 2010, ArXiv e-prints Bezdek J. C., 1981, Pattern Recognition w ith Fuzzy Objecti ve Function Algorithms. Kluwer Academic Publishers, Norwell, MA, USA Born M., W olf E., 1999 , Principles of Optics. Cambridge Unive r- sity Press Bregman J. D., 201 2, P hD Thesis, Uni versity of Groninge n Fessler J. A., Hero A. O. , 1994, IEEE Transactions on Signal Pro- cessing, 42, 2664 Gini F ., Reggiann ini R., 2000, IEEE Tran sactions on C ommuni- cations, 48, 2120 Gini F ., Re ggiannini R., Mengali U., 1998, IEEE, 46, 52 Grav es S., 1978, British J. Philos. Sci., 29, 1 Hamaker J. P ., 2006 , A&A, 456, 395 Hamaker J. P ., Bregm an J. D., Sault R. J., 1996, A&AS, 117, 137 Intema H. T ., van der T ol S., Cotton W . D., Cohen A. S., v an Bemmel I. M., R ¨ ottgering H. J. A., 200 9, A&A, 501, 1185 Jeli ´ c V ., Zaroub i S., Labropoulos P ., Thomas R. M., Bernardi G., Brentjens M. A., de Bruyn A. G., Ciardi B., Harker G. , Koo p- mans L. V . E., Pande y V . N., Schaye J., Y atawa tta S., 2008, MN- RAS, 389, 1319 Johnson S. C., 1967, Psychometrika, 32, 241 Kaufman L., Rousseeuw P ., 199 0, Finding Groups in Data An In- troduction to Cluster Analysis. W iley Intersc ience, New Y ork Kay S., 1993, Fundamentals of Statisti cal Signal Processing: Es- timation theory , Prentice Hall signa l processing series. P rentice- Hall PTR Kazemi S., Y ataw att a S., Zaroubi, 2012, IEE E International Con- ference on Acoustics, Speech and Signal P rocessing ( ICASSP), 2533 Kazemi S., Y atawatta S., Zaroubi S ., 2011a, IEEE W orkshop on StatisticalSignal Processing, SSP2011, 414, 597 Kazemi S., Y atawatta S ., Z aroubi S ., de Bruyn A. G., K oopmans L. V . E., Noordam J., 2011b, MNRAS, 414, 1656 Ke rdprasop K ., Kerdp rasop N., Sattayatham P ., 2005, in In DaW aK, pp. 488–497 Liu A., T egmark M., Zaldarriaga M., 2009, MNRAS, 394, 1575 MacQueen J. B., 1967, in Proc. of the ﬁfth Berkeley Symposium on Mathematical Statistics and Probab ility , Cam L. M. L., Ney- c  2007 RAS, MNRAS 000 , 1–18 18 Kazemi et al. Jy −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 Figure 21. T he residual images of the clustered calibrati on for Q = 27 number of clusters (right) and the un-clustered calibra tion for K = 52 (left). Calibra tion is implemented by SA GE algorithm with nine number of iterations and the clustering method applied is di visiv e hierarchical . In the right side of the residual image of the un-cluste red calibrat ion we see stripe s of ove r and under estimations due to problematic performanc e of the un-clust ered calibra tion in subtracting the brightest sources. The zoomed in windo w in the left side of the image al so sho ws tha t the faint sources are not re move d a t all . Ho wev er , clustere d calib ration could remov e all the faint sources almost perfec tly and has generat ed much less artef acts after subtract ing for the brightest sources in the right side of the ﬁeld of view . More ov er , the residual noise of the clustered calibrati on foll ows a symm etric zero mean Gaussian distributi ons with a standard de viation of 4.2 m Jy , while the one from the un-cluste red calibra tion has an asymmetric Gaussian distribu tion with mean and standard devia tion equal to -1.2 and 5.3 m Jy , respecti vely . T aking to account that the simulated noise distrib ution is a zero mean Gaussian distribution with a standard devia tion of 3 mJy , the bette r performance of the clustered calibr ation compared to the un-cluste red one is evid ent. man J., eds., V ol. 1, Uni versity of California Press, pp. 281–2 97 Pearson T . J., Readhead A. C. S., 1984, ARA&A, 22, 97 Pindor B ., W yithe J. S. B., Mitchell D. A., Ord S. M., W ayth R. B., Greenhill L. J., 2010, P ASA, 28, 46 Smirnov O. M., 2011 , A&A, 527, A108 Thompson A. R., Moran J. M., Swenson G. W ., 2001, Interferom- etry and Synthesis in Radio Astronomy , 2n d edn. Wile y-VCH W ax M., Kailath T ., 1985, Acoustics, Speech and S ignal Process- ing, IEEE T ransactions on, 33, 387 W illott C. J., Ra wl ings S ., Bl undell K. M., Lacy M., Eales S. A., 2001, MNRAS, 322, 536 Xu R., W unsch D., 2008, Clustering. John Wile y and Sons Y atawatta S., de Bruyn A. G., LOF AR Collaboration, submitted 2012, Astronomy and Astrophysics Y atawatta S., Zaroubi S., de Bruy n G., Ko opmans L., Noordam J., 2009, Digital Signal Processing W orkshop and 5th IEEE Signal Processing Education W orkshop, 2009. DS P/SPE 2009. IEEE 13th, 150 This paper has been typeset from a T E X/ L A T E X ﬁle prepared by the author . c  2007 RAS, MNRAS 000 , 1–18

Clustered Calibration: An Improvement to Radio Interferometric Direction Dependent Self-Calibration

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment