A random walk on image patches
In this paper we address the problem of understanding the success of algorithms that organize patches according to graph-based metrics. Algorithms that analyze patches extracted from images or time series have led to state-of-the art techniques for c…
Authors: Kye M. Taylor, Francois G. Meyer
c xxxx Society for Industrial and Applied Mathematics Vol. xx, pp. x x-x A random w alk on image patches ∗ Ky e M. T a ylor † and F ran¸ cois G. Mey er ‡ Abstract. In this paper we address the problem of understanding the success of algorithms that organize patc hes according to graph-based metrics. Algorithms that analyze patches extracted from images or time series ha ve led to state-of-the art techniques for classification, denoising, and the study of nonlinear dynamics. The main contribution of this w ork is to provide a theoretical explanation for the ab o ve exp erimen tal observ ations. Our approac h relies on a detailed analysis of the commute time metric on protot ypical graph mo dels that epitomize the geometry observed in general patc h graphs. W e prov e that a parametrization of the graph based on commute times shrinks the m utual distances b etw een patches that corresp ond to rapid lo cal changes in the signal, while the distances b et w een patches that correspond to slo w lo cal c hanges expand. In effect, our results explain wh y the parametrization of the set of patches based on the eigenfunctions of the Laplacian can concentrate patc hes that correspond to rapid local c hanges, which would otherwise b e shattered in the space of patc hes. While our results are based on a large sample analysis, numerical exp erimentations on syn thetic and real data indicate that the results hold for datasets that are very small in practice. Key w ords. image patc hes, diffusion maps, Laplacian eigenmaps, graph Laplacian, comm ute time AMS subject classifications. 62H35, 05C12, 05C81, 05C90 1. Intro duction. Problem statement and motivation. In this pap er w e address the problem of understanding the success of algorithms that organize patches according to graph-based metrics. P atches are local portions, or snippets, of a signal or an image. The set of patc hes can b e organized b y constructing a graph that connects patc hes that are similar. Indeed, it is reasonably straigh tforward to measure the similarity b et w een patc hes that are alike. The graph can then b e used to extend the notion of similarit y to patches that are v ery differen t. F or instance, one can measure the distance b etw een t wo visually different patc hes by computing the n umber of edges of the shortest path (geo desic) connecting them. In this w ork we explore a distance defined b y the comm ute time asso ciated with a random walk defined on the graph. Algorithms that analyze patch data using graph-based metrics hav e led to state-of-the art tec hniques for classification [ 36 , 41 ], denoising [ 5 , 7 , 22 , 24 , 33 , 37 , 38 ], and studying dynamics [ 4 , 25 , 43 ]. The graph pro vides a new persp ectiv e from which to analyze the similarities b e- t ween patc hes, and consequently , the lo cal signal or image conten t they con tain. F or example, in [ 4 ], properties of the graph’s geometry , suc h as the distribution of clustering-coefficients and the a v erage geo desic distance b etw een tw o vertices, are used to separate c haos and noise, or differen t t yp es of chaos. In [ 36 , 37 , 38 , 41 ], the geometry of the graph is analyzed by studying a random walk on it. Sp ecifically , the diffusion distanc e [ 14 ] (or sp e ctr al distanc e [ 32 ]), and the ∗ This wo rk w as partially supp o rted by National Science Foundation Grants DMS 0941476, ECS 0501578, and a contract from Sandia National Laborato ries. † Depa rtment of Applied Mathematics, 526 UCB Universit y of Colo rado, Boulder, CO 80309-0526 ( tay- lo rkm@colorado.edu ). ‡ Depa rtment of Electrical Engineering, 425 UCB, Universit y of Colo rado, Boulder, CO 80309-0425 ( fmey er@colorado.edu ). 1 2 K. M. T A YLOR AND F. G. MEYER c ommute time distanc e [ 6 ] (which is equiv alen t to the r esistanc e distanc e [ 21 ]) are t wo related graph metrics that are derived from the random w alk, and that can b e used to parametrize the graph’s geometry . These metrics can be used to efficiently organize patches in a manner that rev eals the lo cal behavior of the associated signal or image. In our previous work [ 36 , 41 ], w e ha ve noticed that metrics based on a diffusion, or a random walk concentrate patc hes that con tain rapid changes in the signal or image data. These patches contain c hanges asso ciated with singularities (edges), rapid c hanges in frequency (textures, oscillations), or energetic transien ts con tained in the underlying function. F urthermore, patc hes that contain only the smo oth parts of the image are more spread out according to such graph metrics. Outline of our approach and results. The main con tribution of this w ork is to pro vide a theoretical explanation for the ab o ve exp erimen tal observ ations. Our approach relies on a detailed analysis of the comm ute time metric on prototypical graph models that epitomize the geometry observed in general patch-graphs. W e assume that the set of patches is comp osed of t wo broad classes: patc hes within which the function v aries smo othly and slowly , and patches where the function exhibits anomalies: singularities, very rapid change in lo cal frequency , etc. W e prov e that a parametrization of the graph based on commute times shrinks the mutual distances b et ween v ertices that corresp ond to rapid lo cal c hanges relative to the distances b et w een vertices that corresp ond to slow lo cal changes. In effect, our results explain wh y the parametrization of the set of patc hes based on the eigenfunctions of the Laplacian [ 37 , 38 ] can concen trate anomalous patches, whic h w ould otherwise b e shattered in the space of patc hes. This concentration phenomenon can then be exploited for further processing of the patc hes (e.g. denoising, classification, etc). While our results are based on a large sample analysis, n umerical exp erimen tations on synthetic and real data indicate that the results hold for datasets that are v ery small in practice. Organization. This pap er is organized as follo ws. In the next section, w e describ e the patc h-based representation of a signal, and the associated patc h-graph. W e dev elop some in tuition ab out the graph of patc hes by studying several examples in section 3 . In section 4 , w e describ e the embedding of the graph of patches based on commute time. The protot ypical graph mo dels that allo w us to study the parametrization are defined in section 5 . The main theoretical result about the embedding of the graph mo dels are presented in section 5.3 . Numerical exp erimen ts confirming our theoretical analysis are presented in section 6 . W e finish with a discussion in section 7 . 2. Prelimina ries and Notation. F or simplicity and without loss of generality , w e assume that the signal of interest is formed b y a sequence of samples, { x n } N 0 n =1 . Because w e w ant to extract N = N 0 − ( d − 1) patches from this sequence, w e need d extra samples at the end (hence the N 0 samples). W e first define the notion of a p atch . Definition 2.1. We define a patch as a ve ctor in R d forme d by a subse quenc e of d c ontiguous samples extr acte d fr om the se quenc e { x n } N 0 n =1 , x n = x n x n +1 . . . x n +( d − 1) T , for n = 1 , 2 , . . . , N . (2.1) As w e collect all the patc hes, w e form the p atch-set in R d . A RANDOM W ALK ON IMA GE P A TCHES 3 Definition 2.2. The patch-set is define d as the set of p atches extr acte d fr om the se quenc e { x n } N 0 n =1 , p atch-set = { x n , n = 1 , 2 , . . . , N } . (2.2) A main ob jectiv e of this pap er is to understand the organization of the patch-set and relate this organization to the presence of lo cal changes in the signal or the image. W e note that the concept of patc h is related to the concept of time-delay em b edding. Sp ecifically , if the sequence comprises measuremen ts of a dynamical system, then T aken’s embedding theorem [ 35 , 39 ] allo ws us to replace the unkno wn phase space of the dynamical system with a topologically equiv alen t phase space formed by the patc h-set ( 2.2 ). While in this work we do not assume that the sequence { x n } is an observ able of a dynamical system, we are nevertheless in terested in a similar goal: the organization of patc hes in R d . Throughout this pap er, we think ab out a patch, x n , in sev eral differen t wa ys. Originally , x n is simply a snipp et of the time series. Then, w e think about x n as a p oin t in R d . Later, w e also regard x n as a vertex of a graph. Keeping these three p erspectives in mind is critical to our approac h and understanding. In order to study the discrete structure formed b y the patc h-set ( 2.2 ), we connect patc hes together (using their nearest neighbors) and define a graph (or netw ork) that we call the p atch-gr aph . Definition 2.3. The patch-graph , Γ , is a weighte d gr aph define d as fol lows. 1. The vertic es of Γ ar e the p atches { x n , n = 1 , . . . , N } . 2. Each vertex x n is c onne cte d to its ν ne ar est neighb ors using the metric ρ ( x n , x m ) = x n k x n k − x m k x m k . (2.3) 3. The weight w n,m along the e dge { x n , x m } is given by w n,m = ( e − ρ 2 ( x n , x m ) /σ 2 if x n is c onne cte d to x m , 0 otherwise. (2.4) The edges of the patch-graph enco de the similarities b etw een its N v ertices. W e work with the metric ρ (defined in ( 2.3 )) b ecause it is not sensitiv e to c hanges in the lo cal energy of the signal (measured by k x n k ). The metric ρ allows us to detect c hanges in the signal’s lo cal frequency conten t, or lo cal smo othness. The parameter σ controls the scaling of the similarit y ρ ( x n , x m ) betw een x n and x m when defining the edge w eigh t w n,m . In particular, w n,m will drop rapidly to zero as ρ ( x n , x m ) b ecomes larger than σ . An imp ortan t remark ab out the w a y we measure distances on the graph is in order here. W e use ρ to define the graph top ology defined b y the edges: which patch is connected to whic h patc h. This is appropriate since we can compare patches that are similar using ρ (e.g. t wo patc hes con taining the same edge, but at differen t lo cations). On the other hand, as explained in section 4.2 , w e use the comm ute time to analyze the global geometry of the patch-graph. 4 K. M. T A YLOR AND F. G. MEYER A B C Figure 1. A, B, C: time series c omp ose d of N 0 = 2072 samples. The c olor of the signals A and B enc o des the lo c al varianc e (lar ge = r e d, low = blue). C: seismo gr am; the c olor indic ates the temp or al pr oximity to a seismic arrival, identifie d by vertic al black lines. See text for mor e details. D E F Figure 2. D, E, and F: image of size 128 × 128 , 128 × 128 , and 240 × 240 pixels resp e ctively. The c olor of the pixel at the c enter of e ach p atch enc o des the lo c al varianc e of the image intensity. Indeed, the distance defined by ρ becomes useless when we need to compare v ery different patc hes (e.g. a patc h of a uniform region vs a patc h that con tains an edge). As explained in section 4.2 , the global organization of the patches can b e disco vered by studying the sp eed at whic h a random w alk propagates along the graph (via hitting times). Finally , w e note that the w eighted graph is fully characterized by its weight matrix . Definition 2.4. The weigh t matrix W is the N × N matrix with entries W n,m = w n,m . The degree matrix is the N × N diagonal matrix D with entries D n,n = P N l =1 w n,l . 3. W a rm up: A first lo ok at the patch-set. The goal of this section is to provide the reader with some intuition ab out the geometry of the patch-set and the asso ciated patc h- graph. This will help us motiv ate our graph mo dels and the analysis of their geometry . A t the end of the section, we provide a sketc h of our plan of attack. A RANDOM W ALK ON IMA GE P A TCHES 5 3.1. Examples of signals and images. W e construct the patc h-set associated with some examples of signals and images. Because it is not practical to visualize the patch-set in R d when d = 25, we displa y the pro jection of the patch-set on to the three-dimensional space that captures the largest v ariance in the patc h-set (computed using principal comp onent analysis). Figure 1 displa ys three signals { x n } , n = 1 , . . . , N 0 , with N 0 = 2072. P atches of size d = 25 samples are extracted around eac h time sample, whic h results in the maximum o verlap betw een patc hes. Signal A is a chirp, signal B is a row of the image Lenna (sho wn in Fig. 2 -D), and signal C is a seismogram [ 41 ]. In order to quan tify the lo cal regularit y of signals A and B, w e compute the v ariance o ver eac h patch, and color the curve according to the magnitude of the lo cal v ariance: hot (red) for large v ariance and cold (blue) for lo w v ariance. The color of signal C enco des the temp oral proximit y to the arriv al of a seismic wa ve asso ciated with an earthquake: hot color indicates close proximit y , while cold corresp onds to baseline activit y . Iden tifying arriv al-times is necessary for purp oses suc h as lo cating an earthquak e’s epicenter. This example illustrates the application of the presen t w ork to the problem of detecting seismic w a ves [ 41 ]. Figure 2 displays three images. W e extract patches of size 5 × 5. Here, the patches are not maximally ov erlapping: we collect every third patc h in the horizontal and v ertical directions for images D and E, while we collect ev ery fifth patc h in eac h direction for image F. This results in patc h-sets of size 42 × 42 for images D and E, and of size 48 × 48 for image F. As b efore, the color of a pixel in the images enco des the lo cal v ariance within the patch cen tered at that pixel. 3.2. Projections of the patch-sets. Figure 3 shows the pro jections of each of the six patc h-sets. Distances in Figure 3 correspond to the normalized distance ρ . W e observe that patc hes with high v ariance (red-orange) app ear to b e scattered all ov er R d . These patches corresp ond to regions where the image in tensity v aries rapidly . P atches with low v ariance (blue-green), which corresp ond to regions where the signal is smo oth and v aries very little, tend to b e concen trated along one-dimensional curves (for time series) and tw o-dimensional surfaces (for images). These visual observ ations can b e confirmed when computing the actual m utual distances b et w een patches (data not sho wn). The organization of the patches in the patc h-set can b e explained using simple arguments. Let us assume that the sequence { x n } corresp onds to the sampling of an underlying differ- en tiable function x ( t ), and assume that x 0 ( t ), the deriv ative of x ( t ), remains small o ver the in terv al of in terest. In this case, if t wo patc hes x n and x m o verlap significan tly – i.e. | n − m | is small – then they will be close to one another in R d . Indeed, the v alues of the co ordinates of patc hes x n and x m will b e very similar, since the signal x ( t ) v aries slowly . In principle, if the sampling is fast enough, the patches should lie along a one-dimensional curve in R d . By the same argument, when x ( t ) exhibits rapid c hanges, the magnitude of the deriv ativ e, | x 0 ( t ) | , can b e very large, and therefore temp orally neighboring patches are not guaran teed to b e spatial neigh b ors in R d . This argument allows us to understand the distribution of the patc hes in the signal B, or the image F. Instead of c haracterizing patc hes according to the lo cal smoothness of the underlying function, we can also analyze the distribution of the patches according to the function’s lo cal frequency information. This will help us understand the structure of the patc h-set for signal A. 6 K. M. T A YLOR AND F. G. MEYER A B C D E F Figure 3. Princip al c omp onent analysis of p atch-sets asso ciate d with the time series A-C and the images D-F. Each p oint r epr esents a p atch; the color enc odes the varianc e within the p atch (se e Figur es 1 and 2 .) F or this t yp e of signal, it is appropriate to measure the distance b et w een the normalized patc hes, x n / k x n k and x m / k x m k after computing the F ourier transforms (a simple rotation of R d ) of the resp ective patches. This pro cess is akin to the concept of time-frequency analysis. W e exp ect that regions of the signal with little lo cal frequency changes will cluster in R d : this is the case for the blue patches of the c hirp A. On the con trary , when the lo cal frequency con tent changes rapidly (as in the middle of the chirp A), the corresp onding (red/orange) patc hes will b e at a large distance of one another in R d (see Figure 3 -A). Finally , we can try to understand the organization of the patc h-set for the seismogram C. Let us assume that { x n } is obtained b y sampling a function of the form x ( t ) = b ( t ) + w ( t ), where w ( t ) represents a seismic w av e and b ( t ) represents baseline activit y . W e can exp ect that w ( t ) is a rapidly oscillating transien t with a ric h frequency conten t, while b ( t ) is v arying slo wly . Now consider tw o patches x n and x m . It can be sho wn that if both patc hes x n and x m are extracted from the baseline function, b ( t ), and do not con tain any part of the energetic transien t, then their mutual distance is exp ected to b e small. In addition, if x n con tains part of the energetic transien t w ( t ) and x m is extracted from the baseline b ( t ), then their m utual distance is exp ected to b e large. Finally , if x n and x m are comp osed of tw o differen t parts of w ( t ), then their m utual distance is also exp ected to b e large (provided the patc hes are sufficien tly long and w ( t ) oscillates sufficiently fast). More generally , one can exp ect that t w o patc hes extracted from tw o different energetic transien ts w 1 ( t ) and w 2 ( t ) will b e at a large distance from one another [ 41 ]. A RANDOM W ALK ON IMA GE P A TCHES 7 Figure 4. The weight matric es W asso ciate d with signals A-F are displaye d as images: w n,m is enc ode d as a gr aysc ale value: from white ( w n,m = 0 ) to black ( w n,m = 1) . Dark structur es along the diagonal of the W matrix asso ciate d with the time series A-C indic ate that p atches that ar e close in time ar e also close in R d . 3.3. F rom the patch-set to the patch-graph: the weight matrix W . Having gained some understanding ab out the organization of the patch-set, we now mov e to the structure of the patc h-graph and its weigh t matrix W . Figure 4 displa ys the w eight matrices built from the patc h-sets that corresp ond to the time series A-C (top) and the images D-F (bottom). Note that when pro cessing time series A-C, the columns (or equiv alen tly , the rows) of W can b e identified with temp orally-ordered time-samples. Therefore, a large main diagonal in the w eight matrix corresp ond to patc hes that are close in time and also close in R d . F or instance, consider the time series A and its asso ciated w eight matrix. The dark bands near the top- left and b ottom-righ t of the diagonal corresp ond to the slowly-v arying oscillations near the b eginning and end of the chirp (see Figure 1 ). Indeed, large en tries in the diagonal of W is a direct consequence of relativ ely little v ariation in the time series. On the other hand, the columns of W corresp onding to p ortions of the time series that exhibit rapid lo cal c hanges (cen ter of Figure 4 -A) tend to lac k such prominen t diagonal structures. F or suc h regions of the matrix W , the en tries are no longer concentrated along the diagonal, and are shattered across all ro ws and columns (see the center of W in Figure 4 -A; the columns corresp ond to the fastest oscillations at the center of the chirp). The large distances b et w een these patches are also apparent in the lighter pixel intensities, represen ting relativ ely smaller edge-weigh ts. Note that the patc hes extracted from the seismic data are very far apart, as indicated by the 8 K. M. T A YLOR AND F. G. MEYER m uch ligh ter shades of gray . It is more difficult to relate the ordering of an image’s w eight matrix to locations in the image itself. F or the w eight matrices asso ciated with images D-F, the ordering of the columns is equiv alent to the order in which the patches were collected from the image plane: first left-to-right, then top-to-b ottom (similar to a raster scan, or how one w ould read pages of a b o ok). Hence, perio dically rep eating dark blo c ks in the weigh t matrices asso ciated with images D-F are indicative of image patc hes that are close in R d and close in the image-plane as a result of relatively little change in the image’s lo cal con tent. F or example, the dark square-like structure that app ears near the main diagonal of W in Figure 4 -E, and whic h spans roughly one fifth of the n um b er of columns, corresp onds to the mirror’s smo oth, light b order in image E. 3.4. Summa ry of the exp eriments and our plan of attack. The exp eriments in sections 3.1 and 3.3 highligh t the fact that regions of an image, or of a signal, that con tain anomalies (e.g. singularities, edges, rapid changes in the frequency con tent, etc.) are scattered all ov er the patch-set, making their detection and identification extremely difficult (see Figures 3 -A and 3 -F). In contrast, patc hes from smo oth regions app ear to cluster along low dimensional curv es or surfaces. Because the anomalous patc hes are usually the most in teresting ones, w e need to find a new parametrization of the patc h-set that concentrates the anomalies and separate them from the smo oth baseline part of the image. The structure of W in the “rough regions” suggest that patches that con tain anomalies app ear to b e very w ell connected (see the cen ter of Figure 4 -D, whic h corresp onds to the boa on the hat of Lenna). This concept can b e quantified by studying how fast a random walk w ould reach all patches in these rough regions of W , and suggest that we should consider studying the hitting times ass ociated with a random w alk on the patch-graph. In the next section we formalize this concept and prop ose a parametrization of the patc h-set in terms of comm ute time. A theoretical analysis of this approac h is pro vided in section 5 4. P a rametrizing the patch-graph. 4.1. The fast and slow patches. W e first in tro duce the concept of fast and slow patches. W e ha ve noticed that patc hes that con tain anomalies (discon tinuities, edges, fast c hanges in frequency , etc.) in the original signal lead to regions of the matrix W where the nonzero en tries are scattered all around. W e call suc h patc hes fast p atches because, as we will see in the following, a random walk will diffuse extremely fast in such regions of the patch-graph. Con versely smo oth regions of the signal lead to slow p atches that are asso ciated with a small n umber of large entries in W , whic h are concentrated along the diagonal. A random w alk initialized in the slo w patc h region of the patc h-graph will diffuse v ery slowly . 4.2. A b etter metric on the graph: the commute time. As explained previously , w e prop ose to replace the Euclidean distance, which leads to the scattering of the fast patc hes seen in Figure 3 by a notion of distance that quantifies the sp eed at whic h a random walk diffuses on the patc h-graph. W e prop ose to use the comm ute time. P arametrizing the graph using its comm ute time distance is closely related to parametrizing the graph using its diffusion distance [ 14 , 23 ] (see Section 4.2.2 ). Although the works [ 7 , 37 , 38 ] do not explicitly embed v ertices of the patch-graph based on the diffusion distance, they also study a random walk on the patc h-graph, and define the diffusion distance in terms of this w alk. In these studies, A RANDOM W ALK ON IMA GE P A TCHES 9 noise is remov ed b y evolving the diffusion pro cess for a small time. A detailed comparison of our approach with the seminal work of [ 37 ] is pro vided in section 7.3 . W e note that the notion of first-passage time asso ciated with a diffusion (which is equiv alen t to the hitting time asso ciated with a random w alk) has b een used extensiv ely to characterize the geometry of complex net w orks, and random media (e.g. [ 3 , 15 ] and references therein). It is therefore natural to analyze the patc h-set with this distance. 4.2.1. A random w alk on the patch-graph. In order to define the comm ute time betw een t wo v ertices, w e first need to define a random walk on the graph. In our problem, the random w alk do es not correspond to a physical pro cess, but will lead to a notion of global proxi mity b et w een patches. W e consider a first-order homogeneous Mark ov pro cess, Z k , defined on the v ertices of the patch-graph, Γ, and ev olving with the transition probability matrix P giv en by P n,m = Prob( Z k +1 = x m | Z k = x n ) , w n,m P l w n,l = W n,m D n,n . (4.1) Consider a slow patc h x n extracted from a regular/smo oth part of the signal. If the random w alk starts at x n , then it can only tra vel along the lo w-dimensional structure that corresponds to the temp oral neighbors of x n (see e.g. Figure 3 -A.) The existence of this narrow b ottlenec k is also visible in the W matrix (see Figure 4 -A): a random walk initialized within the fat diagonal of the upp er left corner of W (the low frequency part of the c hirp) is trapp ed in this region of the matrix, and can only trav el along this fat diagonal. As a result, it will take man y steps for the random w alk to reach another slow patch x m if | n − m | is large. This notion can b e quantified by computing the av erage hitting-time , h ( x n , x m ), whic h measures the exp ected minim um n umber of steps that it takes for the random walk, started at v ertex x n , to reac h the v ertex x m [ 6 ] h ( x n , x m ) = E n min { j ≥ 0 : Z j = x m } , where the expectation E n is computed when the random w alk is initialized at v ertex x n , i.e. when Z 0 = x n . The commute time [ 6 ]: provides a symmetric version of h , and is defined b y κ ( x n , x m ) = h ( x n , x m ) + h ( x m , x n ) . (4.2) 4.2.2. Sp ectral rep resentation of the commute time. When the random w alk is re- v ersible and the graph is fully connected, the commute time can b e expressed using the eigen vectors φ 1 , . . . , φ N of the symmetric matrix D − 1 / 2 WD − 1 / 2 = D 1 / 2 PD − 1 / 2 . The corresp onding eigen v alues can b e lab eled suc h that − 1 < λ N ≤ . . . ≤ λ 2 < λ 1 = 1. Each eigen vector φ k is a v ector with N comp onents, one for eac h v ertex of the graph. Hence, we write φ k = φ k ( x 1 ) φ k ( x 2 ) . . . φ k ( x N ) T , to emphasize the fact that we consider φ k to b e a function sampled on the vertices of Γ. The comm ute time can b e express ed as κ ( x n , x m ) = N X k =2 1 1 − λ k φ k ( x n ) √ π n − φ k ( x m ) √ π m 2 , (4.3) 10 K. M. T A YLOR AND F. G. MEYER where π n = P N m =1 w n,m / P N j,l =1 w j,l is the stationary distribution associated with the transi- tion probabilit y matrix P [ 27 , 36 ]. 4.2.3. The relationship to diffusion maps. The diffusion distanc e [ 14 ] betw een vertices x m and x n , D t ( x m , x n ), measures the distance b et ween the transition probabilit y distributions – computed at time t – of tw o random w alks initialized at x n and x m , P N l =1 | P ( t ) n,l − P ( t ) m,l | 2 . The diffusion distance can also b e decomp osed in terms of the eigen vectors φ k [ 14 ], D 2 t ( x m , x n ) = 1 V N X k =2 λ 2 t k φ k ( x m ) √ π m − φ k ( x n ) √ π n 2 , (4.4) where V = P m 0 ,n 0 w m 0 ,n 0 is the v olume of the graph. It follows that the commute time is a scaled sum of the squares of diffusion distances computed at all times, κ ( x m , x n ) = V ∞ X t =0 D 2 t/ 2 ( x m , x n ) . (4.5) The significance of this equation is that the comm ute time includes the short term evolution ( t ≈ 0) as well as the asymptotic regime ( t → ∞ ) of the random w alk. W e will come bac k to this analysis in section 5.4 . 4.3. P a rametrizing the patch-graph. Equation ( 4.3 ) suggests the following em b edding Ψ of the patc h-graph Γ in to R N − 1 , Ψ : x n − → 1 √ π n h φ 2 ( x n ) √ 1 − λ 2 φ 3 ( x n ) √ 1 − λ 3 . . . φ N ( x n ) √ 1 − λ N i T , n = 1 , 2 , . . . , N . (4.6) If we agre e to measure the distance on the graph Γ using the square ro ot of the commute time, then the mutual Euclidean distance after embedding is equal to the original distance on the graph, k Ψ( x n ) − Ψ( x m ) k = p κ ( x n , x m ) . (4.7) The result is a direct consequence of ( 4.4 ) and ( 4.5 ). Similar ideas w ere first prop osed in [ 32 ] to em b ed manifolds and are the foundation of the parametrizations given in [ 2 , 14 ]. In practice, we need not use all the N − 1 co ordinates in the embedding defined b y ( 4.6 ). Indeed, since λ N ≤ · · · ≤ λ 2 < λ 1 , we hav e that 1 √ 1 − λ N ≤ · · · ≤ 1 √ 1 − λ 3 ≤ 1 √ 1 − λ 2 , and therefore, if we can accept some approximation error, w e can use only the first d 0 co ordinates of Ψ. As w e will see in section 5.4 , this dimension reduction further impro v es the separation betw een slow patc hes and fast patches. In the remaining of the pap er w e will work with the embedding of Γ in to R d 0 defined b y Φ : x n − → 1 √ π n h φ 2 ( x n ) √ 1 − λ 2 . . . φ d 0 +1 ( x n ) √ 1 − λ d 0 +1 i T . (4.8) W e note that w e can alw a ys c ho ose d 0 suc h that the em b edding Φ almost preserv es the comm ute time, k Φ( x n ) − Φ( x m ) k 2 ≈ κ ( x n , x m ) . (4.9) In fact, our exp erimen ts indicate that this approximation holds for small v alues of d 0 . A RANDOM W ALK ON IMA GE P A TCHES 11 Figure 5. Sc atter plot of the p atch-set shown in Figur e 3 after p ar ametrizing using Φ in ( 4.8 ), with d 0 = 3 . The fast p atches (r e d and or ange) ar e now c onc entrate d and have b e en lump e d together. The slow p atches (blue-gr e en) r emain aligne d along curves (for time-series) and surfac es (for images). 4.4. Examples (revisited). Figure 5 displays the em b edding of the patc h-sets asso ciated with signals and images A-F using the map Φ ( 4.8 ), where d 0 = 3. The blue curve in Figure 5 - A corresp onds to the slo w patc hes (lo w frequencies of the chirp) that are connected according to their temp oral pro ximity . On the other hand, red and orange patches extracted from the high frequency part of the chirp are no w concentrated in a relativ ely small region (compare to Figure 3 -A). Similar features are seen in the parametrizations of the patch-graphs asso ciated with signals B-F. 5. A model for the patch-graph and the analysis of its emb edding. 5.1. Our approach. The embedding of the patc h-graph Γ defined b y Φ, in ( 4.8 ), should lead to a representation of the patc h-set in R d 0 where distances corresp ond to commute times measured along the graph b efore embedding. Our goal is to explain the concentration of the fast patc hes created b y the embedding Φ (see e.g. Figure 5 ). Our approach is based on a theoretical analysis of a graph mo del that epitomizes the characteristic features observed in patch-graphs comp osed of a mixture of fast and slow patches. This mo del is comp osed of t wo subgraphs: a subgraph of slow p atches , whic h are extracted from the smo oth regions of the signal, and a subgraph of fast p atches , whic h are extracted from the regions of the signal that con tain singularities, changes in frequency , or energetic transients. W e confirm our theoretical analysis with numerical exp erimen tations using syn thetic signals in section 6 , and w e demonstrate that our conclusions are in fact applicable to a larger class of patch-graphs. The graph mo dels are introduced in section 5.2 . Our theoretical analysis of the em b edding of the graph mo dels is giv en in section 5.3 . W e ev aluate the p erformance of the embedding Φ 12 K. M. T A YLOR AND F. G. MEYER when d 0 is small in section 5.4 . 5.2. The protot ypical graph mo dels. W e define the graph models in terms of the nonzero en tries in the asso ciated w eight matrix W . Without loss of generality , we assume that the n umber of v ertices N is even. The slow graph mo del. The large en tries in a weigh t matrix W of a patch-graph comp osed only of slo w patches will ha ve large en tries when | n − m | is small 1 : temp oral/spatial proximit y implies proximit y in patch-space (see e.g. Figure 4 -A, top corner). W e therefore define the slow gr aph mo del as follo ws. Definition 5.1. The slow gr aph S ( N , L ) is a weighte d gr aph c omp ose d of N vertic es, x 1 , . . . , x N . The weight on the e dge { x n , x m } is define d by w n,m = ( w S if | n − m | ≤ L, 0 otherwise, for 1 ≤ n, m ≤ N and 2 L + 1 ≤ N . (5.1) The w eight w s is a p ositive real n umber that models the distance b etw een t w o temporally adjacen t patc hes. The parameter L characterizes the thickness of the diagonal in W . The slo w graph is fully connected and each v ertex has at most 2 L neighbors, not including self- connections (see Figure 6 ). Hence, w e require that 2 L + 1 ≤ N . Finally , note that the slow graph is distinct from a regular ring, since the first and last v ertices are not connected. W e do not consider a regular ring since it would imply that the underlying signal is p erio dic. The fast graph mo del. W e no w consider the mo del for a patc h-graph built from a patch-set comprising only fast patches. As demonstrated in section 3 , most of the entries in W hav e similar sizes, and appear to b e scattered throughout the matrix: temp oral/spatial pro ximity do es not correlate with proximit y in patch space. In fact, fast patc hes are all far aw ay from one another. W e therefore define the fast gr aph mo del as follows. Definition 5.2. The fast gr aph F ( N , p ) is a r andom weighte d gr aph c omp ose d of N vertic es, x 1 , . . . , x N . The weight on the e dge { x n , x m } is define d by w n,m = w m,n = ( w F with pr ob ability p, 0 with pr ob ability 1 − p if 1 ≤ n < m ≤ N , and w n,m = 1 if n = m. The weigh t w F is a p ositiv e real num b er that models the distance b etw een tw o fast patches. The fast graph mo del is equiv alent to a w eighted v ersion of the Erd¨ os-Ren yi graph mo del [ 19 ], except that F ( N , p ) contains self-connections and has edge w eights p ossibly less than one. The parameter p controls the density of the edges; p = 1 corresp onds to a fully connected graph (clique). 1 W e assume that the rows/columns of W are ordered according to increasing index n of the sequence { x n } . This assumption do es not affect the graph’s parametrization nor our theoretical conclusions, but allows us to in terpret the structure in W . A RANDOM W ALK ON IMA GE P A TCHES 13 S (N/2,p) F (N/2,L) Figure 6. The fuse d gr aph mo del Γ ∗ ( N ) is c omp ose d of a slow gr aph S ( N / 2 , L ) (blue) and a fast gr aph F ( N / 2 , p ) (or ange), c onne cte d by r andom e dges (gr e en). Figure 7. The weight matrix W of the fuse d graph mo del Γ ∗ (256) is displaye d as an image: w n,m is enc o de d as a gr aysc ale value: fr om white ( w n,m = 0 ) to black ( w n,m = 1) . The entries of W asso ciate d with the slow gr aph app e ar in the upp er-left quadrant of W . Entries associate d with the fast gr aph app e ar in the lower right quadrant. R andom e dges b etwe en the fast gr aph and slow gr aph app e ar in the upp er right and lower left quadr ants. The fused graph mo del. The fuse d gr aph mo del exemplifies the patch-set asso ciated with a signal, or an image, whic h exhibits regions of fast and slow changes. The fused graph com bines a slo w and a fast subgraph of equal size (see Figure 6 ). Definition 5.3. The fuse d gr aph Γ ∗ ( N ) is a weighte d gr aph c omp ose d of a slow sub gr aph S ( N / 2 , L ) and a fast sub gr aph F ( N/ 2 , p ) . In addition, e dges b etwe en S ( N/ 2 , L ) and F ( N / 2 , p ) ar e cr e ate d r andomly and indep endently with pr ob ability q and assigne d the e dge weight w c > 0 . Edges betw een S ( N / 2 , L ) and F ( N/ 2 , p ) ensure that Γ ∗ ( N ) is connected (a requirement for the v alidity of the parametrization ( 4.6 )). These edges allow us to mo del patc hes that are extracted from regions of the image that com bine edges/transients and smooth intensit y . If q is so small that no edges are created betw een the tw o subgraphs, then an edge is place d at random b et ween the tw o subgraphs to ensure that the final fused graph is connected. The true patc h-graph is alw ays constructed using a ν nearest neighbor rule (see section 2 ): eac h patch is connected to ν other patches. In order to mimic a true patc h-graph, w e adjust the thic kness L of the slo w subgraph to the density of the edge connection, p , in the fast subgraph, so that on av erage, each v ertex in the fused graph is connected to 2 L vertices. W e 14 K. M. T A YLOR AND F. G. MEYER kno w that the num b er of edges b et ween distinct v ertices in F ( N , p ) is a binomial random v ariable with exp ectation N ( N − 1) 2 p . Since the total num b er of edges b et ween distinct v ertices of S ( N , L ) is equal to 2 L X j =1 ( N − j ) = N L − L ( L + 1) 2 , (5.2) w e choose p = 2 L N − 1 − L ( L + 1) N ( N − 1) . (5.3) This c hoice of p guarantees that the exp ected n umber of edges in F ( N , p ) is equal to the n umber of edges in S ( N , L ). F urthermore, provided that L = O (ln( N )), a short computation sho ws that, for large v alues of N , this choice of p also ensures that the exp ected degree of a vertex in F ( N , p ) is equal to the a v erage degree of a vertex in S ( N , L ). Figure 7 shows the nonzero entries in the w eight matrix asso ciated with one realization of the fused graph mo del using parameters N = 256, L = d 2 ln N e = 12 and q = 1 N . V ertices x n with n ≤ 128 are only connected to other vertices x m if | n − m | ≤ L . This connectivity mimics the spatial (temp oral) connectivity present in the smo oth parts of an image (signal). 5.3. The main result. Our goal is to understand the effect of the embedding Φ defined b y ( 4.8 ) on the fused graph. It turns out that studying the embedding of eac h individual subgraph (slow and fast) separately is muc h more tractable than considering the entire fused graph. T o complement our theoretical study of the fast and the slow subgraphs, we provide n umerical evidence in sections 5.4 that indicates that our understanding of the embedding of the subgraphs can b e used to analyze the embedding of the fused graph. In section 6 , w e confirm that our theoretical analysis can b e applied to true patc h-graphs. Instead of studying Φ directly , we tak e adv antage of the fact that the em b edding Φ almost preserves the commute time (see ( 4.9 )). W e can therefore understand the effect of the embedding on the distribution of m utual distances k Φ( x n ) − Φ( x m ) k within a subgraph by studying the distribution of the commute times κ ( x n , x m ) on that subgraph. While it would app ear that it is a straightforw ard affair to compute the comm ute time on the slo w graph, the computation b ecomes rapidly in tractable. F or this reason we provide low er and upp er b ounds for the a verage commute time on the slow and fast subgraphs, resp ectiv ely . This is sufficient for our needs, since the t wo b ounds rapidly separate even for lo w v alues of N . T o estimate these b ounds, we rely on the connection b etw een comm ute times on a graph and effectiv e resistance on the corresp onding electrical net work [ 10 , 16 ]. Sp ecifically , w e map a graph to an electrical circuit as follows: each edge with w eight w n,m b ecomes a resistor with resistance 1 /w n,m . The v ertices of the graph are the connections in the circuit. Giv en tw o vertices, x n and x m in the circuit, one can compute the effective resistance betw een these no des, R n,m . The key result [ 10 ] is that κ ( x n , x m ) = V R n,m , where V is the v olume of the graph. Before stating the main Lemma, let us tak e a moment to compute some rough estimates of the comm ute times on the slow and fast graphs. T o get some quic k answers, w e consider the simplest v ersions of the tw o graph mo dels. When L = 1, the slow graph S ( N , 1) is a p ath 2 This is equiv alen t to the n umber of en tries along the first L upp er diagonals of the matrix W . A RANDOM W ALK ON IMA GE P A TCHES 15 with self-connections. On a path of N vertices without self-connections, the comm ute time b et w een vertex x n and x m is equal to 2( N − 1) | m − n | . Therefore, the a v erage commute time (computed o ver all pairs of v ertices) on a path of length N is O ( N 2 ). While it w ould make sense that adding edges to a path should decrease the comm ute time, this is usually not true [ 27 ]. Nev ertheless, the presence of edges that allow the random walk to mo ve forw ard by a distance L at each time step lead us to conjecture that the a v erage commute time on S ( N , L ) should be of the order 1 L O ( N 2 ). In fact, as w e will see in Lemma 5.6 , the a verage commute time of the slo w graph is of the order 1 L 2 O ( N 2 ). With regard to the fast graph, w e can analyze the case where the densit y of edges p = 1. In this case, the fast graph F ( N , 1) is a c omplete gr aph , or clique , and every v ertex is connected to ev ery other v ertex. In a complete graph, the a verage comm ute time is O ( N ). Since the fast graph can b e regarded as a complete graph whose edges ha ve been remov ed with probability 1 − p , w e exp ect the commute time to b e sligh tly larger than O ( N ), since remo ving edges restricts the random walk er’s options to get from one v ertex to another. Again, in agreemen t with our intuition , Lemma 5.6 asserts that in the fast graph, the commute time is of the order of [ L ln( N ) / ln( L )] O ( N ). W e are now ready to state the main lemma. Our results will b e stated in terms of the “a verage b eha vior” of the commute time on each graph, a concept that w e need to define prop erly . In the case of the slow graph, whic h is deterministic, we consider the av erage com- m ute time computed o ver all pairs of v ertices. Definition 5.4. L et κ S b e the aver age c ommute time b etwe en vertic es in the slow gr aph S ( N , L ) κ S , 2 N ( N − 1) X 1 ≤ m 1, and the upp er b ound on κ F is negligible relativ e to the lo w er b ound on κ S , when N is large. Co rollary 5.7. Assume that L = c ln N for some c onstant c > 1 . It fol lows that, as N → ∞ , the lower b ound on κ S gr ows like N ln N 2 , and the upp er b ound on κ F gr ows like N (ln N ) 2 ln ln N . F urthermor e, the lower b ound on κ S gr ows faster than the upp er b ound on κ F , and so with a pr ob ability that appr o aches one as N → ∞ , lim N →∞ κ F κ S = 0 . Pr o of . Notice that κ S is bounded aw ay from zero. Because the c hoice of L guarantees that the fast graph is connected with a probabilit y approac hing one, κ F is finite with probability approac hing one. Therefore the ratio κ F /κ S is b ounded below by zero and from abov e b y a ratio of the b ounds from Lemma 5.6 . The ratio of b ounds go es to zero, which follows from a simple, but length y , limit calculation. W e can translate the corollary in terms of the m utual distance b et w een vertices of the subgraphs after the em b edding Φ: Φ( F ( N , p )) will b e more concen trated than Φ( S ( N , L )). 5.4. Sp ectral decomp osition of commute times on the graph mo dels. The results of section 5.3 apply to the exact comm ute times on the graph mo dels. Ho wev er, as mentioned in section 4.3 , it is more practical to use a truncated version of the sp ectral expansion of the com- m ute time, defined b y Equation ( 4.3 ). W e also noticed that the comm ute time encompasses the short term evolution ( t ≈ 0) as w ell as the asymptotic regime ( t → ∞ ) of the b ehavior of the random w alk. Neglecting eigenv alues φ k for large k emphasizes the long term b eha vior of the random walk, and w e exp ect that it should further increase the difference betw een the slo w and fast graphs. In this section, we confirm exp erimentally that appro ximating the commute times by truncating the expansion ( 4.3 ) actually emphasizes the separation b et ween the fast subgraph and the slo w subgraph in the fused graph model. In all the numerical exp erimen ts in this section, unless otherwise stated, w e fix N = 1024, L = d 2 ln( N ) e , p is chosen according to ( 5.3 ), q = 1 / N , and w S = w F = w c = 1. In all exp erimen ts, we compute the eigen v alues { λ k } of the matrix D − 1 / 2 WD − 1 / 2 asso ciated with the fast graph, the slow graph, and the fused graph. A RANDOM W ALK ON IMA GE P A TCHES 17 Slo w and fast subgraphs: tw o different dynamics revealed by the sp ectral decomp osition. W e first provide a back-of-the-en v elop e computation of the sp ectrum of the slow and fast graphs. As w e ha v e noticed b efore, the slow graph mo del is a “fat” path. W e know that the sp ectrum of a path without self-connections [ 11 ] is given by cos [ π ( k − 1) / ( N − 1)] , k = 1 , 2 , . . . , N . W e exp ect therefore that the eigen v alues associated with the slo w graph will deca y slo wly a wa y from one for small k . Figure 8 (inset) displa ys the eigenv alues asso ciated with the slow graph mo del. As expected, the spectrum is flat around k = 0 and exhibits the slow est deca y of all the graph models. W e use the similarit y betw een the fast graph mo del and the Erd¨ os-Renyi graph to predict the sp ectrum of the fast graph. Except for λ 1 = 1, all the other eigen v alues of an Erd¨ os-Renyi graph asymptotically follo w the Wigner semicircle distribution [ 12 ]. Our n umerical exp erimen ts confirm this prediction: as sho wn in Figure 8 -right, the eigen v alues of the fast graph app ear to b e distributed along a semicircle. The decay of the sp ectrum has a direct influence on the dynamics of the random walk. Sp ecifically , the spectral gap con trols the mixing r ate , which measures the expected num b er of time-steps that are necessary to reduce the distance b etw een the probabilit y distribution after t steps P ( t ) n,m and the stationary distribution π m b y a certain factor [ 42 ]. This concept is justified b y the fact that the con v ergence of P ( t ) n,m is exp onen tial [ 17 ], and is given by max n,m P ( t ) n,m π m − 1 ≤ λ t max π min , t = 1 , . . . (5.8) where λ max = max { λ 2 , | λ N |} (which is related to the sp ectral gap), and π min is the smallest en try of the stationary distribution. Since λ 2 is muc h larger in the slow graph than in the fast graph, we exp ect that con vergence to the asso ciated stationary distribution will tak e longer on the slo w graph than on the fast graph. The dynamic of the fused graph is enslaved by the slo w subgraph. W e no w consider a random w alk on the fused graph. If this random w alk begins at x n in the fast subgraph of the fused graph, then after a small n umber of steps, t 0 , the probabilit y of finding the random w alker at an y other v ertex x m in the fast subgraph is close to the stationary distribution, P t 0 n,m ≈ π m . On the other hand, during the same amoun t of steps, a random walk initialized in the slo w subgraph will only explore a small section of the slo w subgraph, and consequently , the transition probabilities will still b e similar to its initial v alues P ( t 0 ) n,m ≈ P n,m . As a result, the restriction imp osed by the geometry of the slow subgraph is exp ected to decrease the con vergence rate of the transition probabilities on the fused graph. W e confirm this analysis with experimental results. Figure 8 (inset) sho ws that for k < 23 the eigen v alues asso ciated with the fused graph and the eigen v alues asso ciated with the slow graph exhibit slow decay a wa y from one, thereb y increasing the conv ergence rate given in ( 5.8 ). F or 25 ≤ k ≤ 400, the eigen v alues of the fused graph deca y at a rate similar to that of the fast graph. Finally , for k ≥ 400 the eigenv alues of the fused graph join those of the slo w graph (see also the histogram in Figure 8 -righ t). W e ha ve observ ed exp erimentally that these transitions in the b ehavior of the sp ectrum of the fused graph are not affected b y v arying the parameters N , L , and q . W e 18 K. M. T A YLOR AND F. G. MEYER 200 400 600 800 1000 −0.5 0 0.5 1 k λ k Fused Slow Fast 10 20 30 40 50 − 0.5 0 0.5 1 0 0.05 0.1 0.15 0.2 0.25 λ k Frequency Fused Slow Fast Figure 8. The eigenvalues λ k of the matrix D − 1 / 2 WD − 1 / 2 asso ciate d with the fuse d (gr e en), slow (blue), and fast (or ange) gr aphs. L eft: λ k as a function of k ; right: histo gr am of the λ k . φ 1 Slow graph φ 2 φ 8 φ 16 φ 32 φ 1 Fast graph φ 2 φ 8 φ 16 φ 32 φ 1 Fused graph φ 2 φ 8 φ 16 φ 32 Figure 9. The eigenve ctors { φ 1 , φ 2 , φ 8 , φ 16 , φ 32 } asso ciate d with the slow (left), fast (c enter), and fuse d (right) gr aphs. Right: the lar ge amplitude of the eigenve ctors φ k on the first half of vertic es (blue) b elonging to the slow sub gr aph le ads to a larger sep ar ation b etwe en the fast and slow sub graphs when trunc ating the c ommute time exp ansion. conclude that the slow subgraph has the largest influence on the first few (small k ) eigen v alues λ k of the fused graph. The eigenvectors of the fused graph and their impact on the commute time. The transition exhibited in the sp ectrum of the fused graph can also b e detected in the corresponding eigen- v ectors φ k . Figure 9 shows the eigen vectors { φ 1 , φ 2 , φ 8 , φ 16 , φ 32 } corresp onding to the three graph mo dels. The first eigen vector φ 1 has entries equal to the square ro ot of the stationary distribution, φ 1 ( x n ) = √ π n , and is not used in the expansion of the comm ute time ( 4.3 ). As exp ected, the random w alk sp ends most of its time inside the slo w subgraph of the fused graph, as indicated by the larger v alues of φ 1 for the first (blue) N / 2 v ertices (see Figure 9 -righ t). The eigenv ectors { φ 2 , φ 8 , φ 16 } of the fused graph exhibit large amplitude oscillations o ver the v ertices b elonging to the slo w subgraph (first half – shown in blue – of the plots in Figure 9 -righ t), which resemble those found in the eigen vectors associated with the slo w graph A RANDOM W ALK ON IMA GE P A TCHES 19 (Figure 9 -left). As k increases, the eigen vectors φ k of the fused graph b ecome more and more similar to the eigen v ectors of the fast graph. The impact of the eigen vectors φ k on the comm ute time on the fused graph can b e analyzed b y estimating the size of the terms 1 1 − λ k φ k ( x n ) √ π n − φ k ( x m ) √ π m 2 (5.9) in the sp ectral expansion ( 4.3 ) of the commute time κ . W e claim that κ ( x n , x m ) will b e small if b oth v ertices x n and x m are in the fast subgraph, and that κ will be large if either v ertex is in the slo w subgraph. W e can first estimate the size of φ k ( x n ) / √ π n − φ k ( x m ) / √ π m . W e observe that the eigen- v ectors φ k for small v alues of k ha ve large amplitude oscillations on vertices b elonging to the slo w subgraph, but are relativ ely constan t on the fast subgraph (see Figure 9 -right). There- fore, for small v alues of k , each term ( 5.9 ) will b e small when x n and x m b oth belong to the fast subgraph (w e also ha ve π n ≈ π m when tw o vertices b elong to the same subgraph). Con versely , these terms will b e large when either x n or x m b elongs to the slow subgraph. While this analysis of the size of the terms ( 5.9 ) only holds for small v alues of k , it turns out that these are the terms that hav e the largest influence in the expansion of the comm ute time ( 4.3 ). Indeed, the spectrum of the fused graph decays slo wly , and therefore the first few co ef- ficien ts (1 − λ k ) − 1 in the comm ute time expansion ( 4.3 ) are m uch larger than the remainders, and therefore the terms ( 5.9 ) for small v alues of k will pro vide the largest contribution in the expansion of the comm ute time. W e conclude that κ ( x n , x m ) is small when x n and x m b elong to the fast subgraph, and κ ( x n , x m ) is large when either v ertex is in the slow subgraph. F urthermore, we exp ect that this difference will be further magnified if w e replace the exact expansion of κ in ( 4.3 ) b y an appro ximation that only includes the first few v alues of k . The truncated sp ectral expansion of the commute time increases the contrast b etw een the slo w and fast subgraphs. W e finally come to the heart of the section: the n umerical computation of the a v erage approximate commute time defined by κ 0 = 2 N ( N − 1) X n d 0 + 1. Figure 17 shows √ κ 0 as a function of the frequency parameter (left), and smo othness parameter (right). W e note that as the signal exhibits more rapid, lo cal changes (increasing β F , or decreasing H F ), the associated fast patc hes are increasingly concen trated (smaller k Φ( x n ) − Φ( x m ) k ) through the parametrization. These exp erimen ts confirm that the theoretical analysis can b e applied to the true patch-set constructed from realistic signals. 7. Discussion. Using realistic graph mo dels, probabilistic arguments, and the connection b et w een the commute time of random walks on graphs and the embedding ( 4.8 ), w e provided a theoretical explanation for the success of the metho ds that analyze and pro cess images based on graphs of patc hes. Our results establish that the embedding of the patch-graph of an image based on the commute time b et w een vertices of the graph reveals the presence of patc hes con taining rapid changes in the underlying signal or image by concen trating these patc hes close to one another while leaving the patc hes extracted from the slo wly changing p ortions of the signal organized along low-dimensional structures. 7.1. P a rameter selection. 7.1.1. Cho osing the patch size. In this w ork w e are in terested in the lo c al b ehavior of the image, and therefore d should remain of the order of what we consider to b e the local scale. W e also note that as d b ecomes large, the num b er of a v ailable patches ( N /d ) b ecomes smaller, making the estimation of the geometry of the patc h-set more difficult, since patches no w live in high-dimension. Another consequence of the “curse of dimensionality” is that the distance betw een patc hes becom es less informative for large v alues of d . If the original signal is o versampled with resp ect to the true physical processes at stake, then one can coarsen the sampling of the patch-set in the image domain. In practice, it would b e more advisable to coarsen the underlying con tin uous patch-set, which is a non trivial question. 7.1.2. Cho osing edge weights. In general, t w o principles guide the c hoice of edge w eights in the patch-graph. On the one hand, patches that are very close should b e connected with a large weigh t (short distance), while patches that are faraw ay should hav e a very small weigh t along their m utual edge. This principle is equiv alent to the idea of only trusting lo cal distances in R d . Suc h a requiremen t is intuitiv ely reasonable if we assume that the patch-set represents a discretization of a nonlinear manifold in R d . In this situation, w e kno w that when the p oin ts on the manifold are v ery close to another, the ge o desic distanc e is well approximated b y the Euclidean distance. Con v ersely , b ecause of the presence of curv ature, the Euclidean distance is a po or appro ximation to the geodesic distance on the manifold when p oin ts are far apart. Because the only information av ailable to us is the Euclidean distance b etw een patc hes, w e should not trust large Euclidean distances. On the other hand, as observed in Section 3 , the fast patches, whic h con tain rapid c hanges, are all very far apart (large ρ 2 ( x n , x m )). Therefore the probability that the random walk escap es the fast patch x n and jumps to a differen t patc h x m , whic h is giv en by w n,m P l w n,l = e − ρ 2 ( x n , x m ) /σ 2 P l w n,l , 26 K. M. T A YLOR AND F. G. MEYER is alw ays m uch smaller than the probability of staying at x n , whic h is giv en by 1 P l w n,l . In order to av oid that the random walk b e trapp ed at each no de x n , we “saturate” the distance function by c ho osing σ to b e v ery large. In this case, for all the nearest neighbors x m of x n , w e hav e w n,m ≈ 1, and the transition probabilit y is the same for all the neighbors, P n,m ≈ 1 /ν . This choice of σ promotes a very fast diffusion of the random walk lo cally . W e note that choosing a large σ ma y b e av oided if self-connections are not enforced (i.e. w n,n = 0). How ever, self-connections are a necessary tec hnical requirement to pro ve that the Mark ov pro cess is ap eriodic, which is required to prov e the equality ( 4.4 ) [ 14 ]. W e note that choosing σ to b e very large do es not en tirely obliterate the information pro vided b y the mutual distance b etw een patches, measured when patc hes are pro jected on the sphere with ρ ( x n , x m ), ( 2.3 ). Indeed, ρ ( x n , x m ) is used to select the nearest neighbors of eac h patch, and therefore allows us to define a notion of a lo cal neigh b orho od around eac h patc h. Cho osing σ to b e v ery large forces a very fast diffusion within this neighborho o d, irresp ectiv e of the actual distances ρ ( x n , x m ). Alternatively , we could consider c ho osing σ to v ary adaptively from one neigh b orho od to another. The parameter σ could b e small when patc hes are extremely close to one another, while σ could b e large when the patches are at a large m utual distance of one another. This notion is the foundation of the self-tuning w eight matrix, whic h adjusts its w eights based on a p oin t’s lo cal neighborho o d [ 30 ]. 7.2. Extensions and generalizations. In general, the patc h-set of an image consists of more than t wo homogeneous subsets. F or example, one could partition an image patc h- set into uniform patc hes, edge patches, and texture patches. Our exp erience [ 40 ] with a generalization of the time-frequency signal mo del (section 6 ) indicates that w e can still separate the patches when the signal is comp osed of up to four differen t lo cal behaviors (sp ecified b y four differen t v alues of the parameter in the auto correlation function). Another extension of this work in volv es the em b edding of a patc h-set constructed from a library of images. Recen t studies [ 26 ] indicate that high-contrast patc hes extracted from optical images organize themselv es around 2-dimensional smooth sub-manifold ([ 8 ]). This idea has also b een exploited to construct dictionaries that lead to very sparse representations of images (e.g. [ 18 ], and references therein). Finally , we note that our results ab out the embedding of the slo w ( 5.1 ), fast ( 5.2 ), and the fused graph ( 5.3 ) are very general and can b e applied to datasets [ 9 ] where the corresp onding graph exhibits a similar structure. F or instance, one could imagine using this idea to study so cial netw orks, where the concept of cliques would corresp ond to fast subgraphs. 7.3. Related wo rk. The concept of patches has pro ven extremely useful in many areas of image analysis: texture analysis/syn thesis [ 28 ], image completion [ 31 , 44 ], sup er-resolution [ 34 ], and denoising [ 5 , 7 , 22 , 24 , 33 , 37 , 38 , 44 ]. While these references do not explicitly construct a patch-graph, these w orks all compute distances b et w een patc hes, and use the nearest neigh b ors of each patch to analyze and pro cess patches. Recent works on the analysis of time-series also use patc hes and construct net w orks of patc hes [ 4 , 25 , 43 ]. All these references pro vide exp erimen tal evidence for the success of w orking on image (or signal) patches. A RANDOM W ALK ON IMA GE P A TCHES 27 In this w ork, we pro vide a theoretical justification for this exp erimen tal success. W e study the effect of the em b edding Φ ( 4.8 ) on the organization of the patc h-set. Our analysis assumes that there exists a natural partition of the patch-set into tw o classes: patches extracted from the smo oth baseline and patc hes that contain sudden lo cal changes of the image in tensit y or signal v alue. It is in teresting to compare and con trast our work to the work of Singer, Shk olnisky , and Nadler [ 37 ] who pro vide a different theoretical explanation for the success of patc h-based denoising algorithms. The authors in [ 37 ] treat the matrix P as a filter, whic h acts on an N -dimensional column-v ector-represen tation of the signal or the image. Eac h multiplication of the probabilit y distribution by P is interpreted as the ev olution of the diffusion process on the patch-graph ov er a time-step of duration σ . The results in [ 37 ] rely on the con v ergence of a prop erly normalized version of P to ward the bac kward F okk er-Planck op erator. The authors can compute the eigenfunctions of the op erator when the signal is either a one-dimensional constant function perturb ed by Gaussian noise, or a one-dimensional step function also con taminated b y Gaussian noise. In con trast, our analysis is based on the analysis of the comm ute time on graphs that epitomize the patch-graph constructed from tw o classes of patc hes. In addition, w e need not assume that the image is piece-wise constan t. In fact, our exp erimen ts demonstrate that our analysis can b e applied to detect many different types of anomalies: c hanges in the lo cal frequency con ten t, changes in lo cal regularit y , etc. F urthermore, our theoretical analysis holds for finite v alues of the n um b er of patches N . It is in teresting to note that Singer et al. study the me an first-p assage time b etw een patc hes extracted from the noisy step function. The mean first-passage time is deriv ed from the hitting time, which is used to define the commute time. The authors in [ 37 ] use an energy argumen t to explain the existence of a large mean first- passage time betw een patches extracted from either side of the step function’s discon tinuit y . They argue that a high densit y of patches is associated with a low er p otential energy , and consequen tly it will take longer for a random pro cess to exit the w ell with suc h a lo w potential. Finally , our results are not limited to patc hes of size d = 1, as are the results in [ 37 ]. The energy argumen t in [ 37 ] adds an interesting in terpretation to our analysis. F ollo wing this p ersp ective, the slow patches can b e interpreted as points sampled from a probabilit y densit y function P defined on R d with a supp ort that is defined along a low-dimensional manifold. This lo calization leads to a p otential U = − log P with a deep and narrow well, from whic h the random walk cannot escap e. This argumen t agrees with our findings that the av erage commute time b et w een slo w patches is v ery large, and th us, the random walk sp ends considerably more time in the slow subgraph b efore b eing able to reach a patc h that is temp orally faraw a y . F rom a more general p ersp ectiv e, this work presents an in v estigation into the diffusion pro cess on the graphs models presen ted in Section 5.2 . Our w ork is thus related to a large b ody of w ork on the analysis of complex and random netw orks using first-passage time (e.g. [ 15 ] and references therein). This area if usually motiv ated by ph ysical problems such as transport in disordered media, neuron firing, or energy flow on p ow er-grids instead of applications in signal pro cessing. 7.4. Op en questions. While w e obtained estimates for the av erage commute time on the fast and slo w graph mo dels considered separately , it w ould b e desirable to obtain similar 28 K. M. T A YLOR AND F. G. MEYER estimates on the fused graph. A t the momen t, our analysis of the fused graph relies on n umerical sim ulations. W e are also a w are of a small discrepancy in the upp er bound on κ F : this b ound is increasing with L . In fact w e exp ect that the comm ute time on F ( N , p ) should decrease as p , and therefore L , increases. The reason for this apparen t inconsistency is that the pro of of ( 5.7 ) relies on a loose upper bound for the effective resistance b etw een t wo vertices, whic h is pro vided b y the geo desic distance on the graph [ 10 ]. This is not a tight inequality on a graph suc h as F ( N , p ). A more effective inequality , which could impro ve the upper bound ( 5.7 ), relies on the computation of the distribution of the num b er of paths s of length at most l b etw een the t wo v ertices. W e could then use the fact that the comm ute time is b ounded from ab ov e b y a constan t times the ratio l/s [ 10 ], which w ould decrease the upp er bound in ( 5.7 ). App endix A. The connectedness of the fast graph. It is necessary that the fast graph F ( N , p ) b e connected to b e able to apply the sp ectral decomp osition of the commute time. T o ensure that the probabilit y of F ( N , p ) b eing disconnected will v anish as N gets large, w e m ust choose N p > log N [ 17 ]. Since p is defined as a function of L in ( 5.3 ), any requirement on p ultimately constrains L . First, b ecause the maximum degree of a v ertex in S ( N , L ) is 2 L + 1, according to ( 5.1 ), we require 2 L + 1 ≤ N . Manipulation of this inequalit y leads to L + 1 N ≤ 1 2 + 1 2 N . W e assume that N ≥ 2, so that L + 1 N ≤ 3 4 . It follo ws that 2 − L + 1 N ≥ 5 4 > 1 . Therefore, rewriting ( 5.3 ) and using the last inequality we hav e p = L N − 1 2 − L + 1 N > L N 2 − L + 1 N > L N . Therefore, c ho osing L = c log N for some c > 1 ensures that N p > log N , and consequen tly , the probabilit y of F ( N , p ) b eing disconnected approac hes zero as N approac hes infinity . App endix B. Bounding the commute times in the graph mo dels. B.1. Pro of of the low er b ound on the average commute time in the slow graph. In order to compute a lo wer b ound on the a verage commute time, w e consider a fixed pair of v ertices in the slow graph, x n 0 and x m 0 , and compute a low er b ound on the commute time κ ( x n 0 , x m 0 ). W e can then compute the av erage of this low er b ound ov er all the pairs of v ertices. T o obtain the lo wer bound on κ ( x n 0 , x m 0 ) we use a standard to ol to obtain low er b ounds on comm ute time: the Nash-Williams inequality [ 29 ]. The Nash-Williams inequality A RANDOM W ALK ON IMA GE P A TCHES 29 is usually form ulated in terms of electrical net w orks. W e prefer to present an equiv alen t for- m ulation that is directly adapted to our problem. W e first introduce the concept of e dge-cutset . Definition B.1. L et V 1 and V 2 b e two disjoint sets of vertic es. A set of e dges E is an e dge- cutset sep ar ating V 1 and V 2 if every p ath that c onne cts a vertex in V 1 with a vertex in V 2 includes an e dge in E . Giv en a w eigh ted graph, whic h ma y contain lo ops, w e define a random walk with the probabilit y transition matrix P n,m = W n,m / D n,n . Let x m 0 and x n 0 b e tw o v ertices. The comm ute time b et ween vertices x m 0 and x n 0 , κ ( x m 0 , x n 0 ) satisfies the follo wing low er b ound. Lemma B.2 (Nash-Williams). If x m 0 and x n 0 ar e distinct vertic es in a gr aph that ar e sep a- r ate d by disjoint e dge-cutsets E k , k = 1 , . . . , then V X k X { x n , x m }∈ E k w n,m − 1 ≤ κ ( x m 0 , x n 0 ) wher e { x m , x n } is an e dge in the cutset E k , (B.1) and where the v olume of the graph is defined by V = P N i =1 P N j =1 w i,j . W e no w exhibit a sequence of edge-cutsets in the slo w graph. W e refer to Figure 18 for the construction of the cutsets. W e define the first cutset E 1 . If m 0 < L , then E 1 needs a little more attention and is defined as the set of L edges { x i , x j } , where i and j are defined b y ( i = 1 , . . . , m 0 , j = m 0 + 1 , . . . , L + i. (B.2) The edge-cutset E 1 is sho wn in the Figure 18 for m 0 = 1 (left) and m 0 = 2 (cen ter), for L = 3. The remo v al of this set of edges prev ents x m 0 from being connected to x n 0 . Indeed, the self lo op on the diagonal (green entry) do es not allow the random walk to mov e tow ard x n 0 . This can b e also b e visualized in Figure 19 , where E 1 is the leftmost set of edges that connect x m 0 to that part of the graph that is connected to x n 0 . The sum of edge weigh ts in E 1 is at most L ( L + 1) w S / 2. If m 0 ≥ L , then E 1 , is defined as the other generic edge-cutsets. W e now define the generic edge-cutsets E k as the set of L ( L + 1) / 2 edges { x i , x j } such that ( i = m 0 + 1 + ( k − 2) L, . . . , m 0 + ( k − 1) L, j = m 0 + 1 + ( k − 1) L, . . . , L + i. (B.3) As seen in Figure 18 -righ t for k = 3, setting the en tries of E 3 to zero disconnects the upper and lo wer part of the submatrix W ( m 0 : n 0 , m 0 : n 0 ), thereb y isolating x m 0 and x n 0 . Alternatively , w e also see in Figure 19 that an y path from x m 0 to x n 0 needs to go through E 3 . Each edge- cutset E k , k ≥ 2 is a triangle with a height of size L . Therefore, after creating E 1 , we can fit j n 0 − ( m 0 +1)+1 L k suc h cutsets b etw een x m 0 +1 and x n 0 . The sum of the weigh ts along the edges of eac h cutset E k , k = 2 , . . . is giv en b y L ( L + 1) w s / 2. In addition, the sum of edge w eights in the first cutset E 1 is at most L ( L + 1) w s / 2. Putting everything together, the computation 30 K. M. T A YLOR AND F. G. MEYER of the lo w er b ound using the Nash-Williams Lemma yields V X k X { n,m }∈ E k w n,m − 1 ≥ [ N (2 L + 1) − L ( L + 1)] w s ( n 0 − m 0 ) L 2 L ( L + 1) w s + 2 L ( L + 1) w s ≥ [ N (2 L + 1) − L ( L + 1)] L ( L + 1) 2 n 0 − m 0 L − 1 + 2 ≥ [ N (2 L + 1) − L ( L + 1)] L ( L + 1) 2 n 0 − m 0 L W e can summarize this result in the follo wing lemma. Lemma B.3. The c ommute time b etwe en vertic es x n 0 and x m 0 inside S ( N , L ) satisfies κ ( x m 0 , x n 0 ) ≥ 2 [ N (2 L + 1) − L ( L + 1)] L ( L + 1) n 0 − m 0 L . (B.4) Finally , w e b ound the av erage commute time in the slow graph. Observ e that the slow graph mo del S ( N , L ) has N − j pairs of vertices such that | m − n | = j , for j = 1 , . . . , N − 1. Therefore, using the lo w er b ound giv en in Lemma B.3 it follo ws that X 1 ≤ m β , where the second equality follows after expressing cosine with complex exp onentials, and applying the binomial theorem. It is clear that 2 β is the frequency of the fastest sinusoid making up the random signal z ( t ), and that most of the energy is on av erage at frequency β . Let A j and B j b e indep endent and iden tically distributed Normal random v ariables with zero mean and unit v ariance. Define ˆ z j = s ˆ C j 2 ( A j + iB j ) . Finally , the signal z ( t ) is defined as z ( t ) = X j ∈ Z ˆ z j e 2 π ij t . T o chec k that the signal z ( t ) defined abov e has the correct autocorrelation, observ e that linearit y of the exp ectation, indep endence and zero mean of the random v ariables, and the fact that ˆ C j = ˆ C − j together imply that 34 K. M. T A YLOR AND F. G. MEYER E ( z ( t ) z ( t + τ )) = X | j |≤ 2 β X | k |≤ 2 β E ˆ z j ˆ z k e − 2 π ikτ e 2 π i ( j − k ) t = X | j |≤ 2 β X | k |≤ 2 β q ˆ C j ˆ C k 2 [ E ( A j A k ) − i E ( A j B k ) + i E ( A k B j ) + E ( B j B k )] e − 2 π ikτ e 2 π i ( j − k ) t = X | j |≤ 2 β ˆ C j 2 E ( A 2 j ) + E ( B j ) 2 e − 2 π ij τ = X | j |≤ 2 β ˆ C j e 2 π ij τ . Therefore, referencing ( C.2 ), it follo ws that E ( z ( t ) z ( t + τ )) = C ( τ ). REFERENCES [1] P. Abr y and F. Sellan , The wavelet-b ase d synthesis for fr actional Br ownian motion prop ose d by F. Sel lan and Y. Meyer , Appl. Comput. Harmon. A., 3 (1996), pp. 377–383. [2] M. Belkin and P. Niyogi , L aplacian eigenmaps for dimensionality r eduction and data r epr esentation , Neural Computations, 15 (2003), pp. 1373–1396. [3] O. B ´ enichou, C. Chev alier, J. Klafter, B. Meyer, and R. V oituriez , Ge ometry-c ontr ol le d kinetics , Nature chemistry , 2 (2010), pp. 472–477. [4] E.P. Bor ges, D.O. Cajueir o, and F.S. Andrade , Mapping dynamic al systems onto c omplex networks , The European Ph ysical Journal B - Condensed Matter and Complex Systems, 58 (2007), pp. 469–474. [5] S. Bougleux, A. Elmoa t az, and M. Melkemi , L o cal and nonlo cal discr ete r e gularization on weighte d gr aphs for image and mesh pr o c essing , Intern. J. Comput. Vis., 84 (2009), pp. 220–236. [6] P. Bremaud , Markov Chains , Springer V erlag, 1999. [7] A. Buades, B. Coll, and J.M. Morel , A r eview of image denoising algorithms, with a new one , Multiscale Mo deling and Simulation, 4 (2005), pp. 490–530. [8] G. Carlsson, T. Ishkhanov, V. De Sil v a, and A. Zomorodian , On the lo c al b ehavior of sp ac es of natur al images , Intern. J. Comput. Vis., 76 (2008), pp. 1–12. [9] F. Cazals, F. Chazal, and J. Giesen , Sp e ctr al te chniques to explor e p oint clouds in euclide an sp ac e , in Nonlinear Computational Geometry , Springer, 2010, pp. 1–34. [10] A.K. Chandra, P. Ragha v an, W.L. Ruzzo, and R. Smolensky , The ele ctric al r esistanc e of a gr aph c aptur es its c ommute and c over times , in Pro c. 21st ACM Symposium on Theory of Computing, A CM, 1989, pp. 574–586. [11] F. Chung , Sp e ctr al Gr aph The ory , American Mathematical Society , 1997. [12] F. Chung, L. Lu Linyuan, and V. Vu , Spe ctr a of r andom gr aphs with given exp e cte d de gr e es , P . Natl. Acad. Sci. USA, 100 (2003), pp. 6313–6318. [13] A. Cohen and J.P. D’Ales , Nonline ar appr oximation of r andom functions , SIAM J. Appl. Math., 57 (1997), pp. 518–540. [14] R.R. Coifman and S. Lafon , Diffusion maps , Applied and Computational Harmonic Analysis, 21 (2006), pp. 5–30. [15] S. Condamin, O. B ´ enichou, V. Tejedor, R. Voituriez, and J. Klafter , First-p assage times in c omplex sc ale-invariant me dia , Nature, 450 (2007), pp. 77–80. [16] P.G. Do yle and J.L. Snell , R andom Walks and Ele ctric Networks , ArXiv Mathematics e-prints, (2000). Av ailable from . [17] R. Durrett , R andom Gr aph Dynamics , Cambridge, 2007. [18] M. Elad , Sp arse and R e dundant R epr esentations: F r om The ory to Applic ations in Signal and Image Pr o c essing , Springer V erlag, 2010. [19] P. Erdos and A. Renyi , On the evolution of r andom gr aphs , Publ. Math. Inst. Hung. Acad. Sci, 5 (1960), pp. 17–61. A RANDOM W ALK ON IMA GE P A TCHES 35 [20] A. Fronczak, P. Fronczak, and J.A. Ho lyst , Aver age p ath length in r andom networks , Phys. Rev. E, 70 (2004). [21] A. Ghosh, S. Boyd, and A. Saberi , Minimizing effe ctive r esistanc e of a gr aph , SIAM Rev., 50 (2008), pp. 37–66. [22] G. Gilbo a and S. Osher , Nonloc al op er ators with applic ations to image pr o cessing , Multiscale Mo del. Sim ul., 7(3) (2008), pp. 1005–1028. [23] P.W. Jones, M. Maggioni, and R. Schul , Manifold p ar ametrizations by eigenfunctions of the L aplacian and he at kernels , P . Natl. Acad. Sci. USA, 105 (2008), pp. 1803–1808. [24] V. Ka tko vnik, A. Foi, K. Egiazarian, and J. Astola , F r om lo c al kernel to nonlo cal multiple-mo del image denoising , Intern. J. Comput. Vis., 86 (2010), pp. 1–32. [25] L. Lacasa, B. Luque, F. Ballesteros, J. Luque, and J. Nu no , F r om time series to c omplex networks: The visibility gr aph , P . Natl. Acad. Sci. USA, 105 (2008), pp. 4972–4975. [26] A.B. Lee, K.S. Pedersen, and D. Mumford , The nonline ar statistics of high-c ontr ast p atches in natur al images , Intern. J. Comput. Vis., 54 (2003), pp. 83–103. [27] L. Lo v ´ asz , R andom walks on gr aphs: A survey , in Combinatorics: Paul Erd¨ os is eight y , v ol. 2, J´ anos Boly ai Math. Soc, 1993, pp. 1–46. [28] J. Lu, J. Dorsey, and H. Rushmeier , Dominant textur e and diffusion distanc e manifolds , Computer Graphics F orum, 28 (2009), pp. 667–676. [29] R. L yons and Y. Peres , Pr ob ability on tr e es and networks . In preparation. Av ailable at h ttp://mypage.iu.edu/ ∼ rdly ons . [30] L.Z. Manor and P. Perona , Self-tuning spe ctr al clustering , in Pro ceedings of the 18th Annual Confer- ence on Neural Information Processing Systems (NIPS’04), 2004. [31] H. Mobahi, S . Rao, and Y. Ma , Data-driven image c ompletion by image p atch subsp ac es , in Pro c. 27th Conf. on Picture Co ding Symp osium, IEEE Press, 2009, pp. 241–244. [32] P.B ´ erard, G. Besson, and S. Gallot , Emb e ddings Riemannian manifolds by their he at kernel , Geo- metric and F unctional Analysis, 4(4) (1994), pp. 373–398. [33] G. Peyr ´ e , Image pr o c essing with non-lo c al sp e ctr al b ases , Multiscale Mo del. Simul., 7 (2008), pp. 703–730. [34] M. Protter, M. Elad, H. T akeda, and P. Milanf ar , Gener alizing the nonlo cal-me ans to sup er- r esolution r e c onstruction , Image Processing, IEEE T ransactions on, 18 (2009), pp. 36 –51. [35] T. Sauer, J.A. Yorke, and M. Casda gli , Emb e dolo gy , Journal of Statistical Physics, 65 (1991), pp. 579–616. 10.1007/BF01053745. [36] X. Shen and F.G. Meyer , L ow-dimensional emb edding of fMRI datasets , NeuroImage, 41 (2008), pp. 886 – 902. [37] A. Singer, Y. Shkolnisky, and B. Nadler , Diffusion Interpr etation of Nonlo c al Neighb orho o d Filters for Signal Denoising , SIAM Journal of Imaging Sciences, 2 (2009), pp. 118–139. [38] A. Szlam, M. Maggioni, and R.R. Coifman , R egularization on gr aphs with function-adapte d diffusion pr o c esses , Journal of Machine Learning Research, 9 (2008), pp. 1711–1739. [39] F. T akens , Dete cting strange attr actors in turbulenc e , in Dynamical systems and turbulence, W arwick 1980 (Cov entry , 1979/1980), Lecture Notes in Math., 898, Springer, 1981, pp. 366–381. [40] K.M. T a ylor , The ge ometry of signal and image p atch-sets , PhD thesis, Univ ersit y of Colorado, Boulder, Dept. of Applied Mathematics, June 2011. (a v ailable from ecee.www.colorado.edu/ ∼ fmeyer ). [41] K.M. T a ylor, M.J. Procopio, C.J. Young, and F.G. Meyer , Estimation of arrival times fr om seismic waves: a manifold-b ase d appr o ach , Geophys. J. Intern., 185 (2011), pp. 435–452. [42] S. Vemp ala , Ge ometric r andom walks: A survey , MSRI volume on Combinatorial and Computational Geometry , (2005). [43] J. Zhang and M. Small , Complex network fr om pseudop eriodic time series: T op olo gy versus dynamics , Ph ys. Rev. Lett., 96 (2006), p. 238701. [44] M. Zhou, H. Chen, J. P aisley, L. Ren, G. Sapiro, and L. Carin , Non-p ar ametric Bayesian dictionary le arning for sp arse image r epresentations , in Adv ances in Neural Information Pro cessing Systems 22, Y. Bengio, D. Sch uurmans, J. Lafferty , C. K. I. Williams, and A. Culotta, eds., 2009, pp. 2295–2303.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment