On the Difficulty of Selecting Ising Models with Approximate Recovery

1 On the Dif ﬁculty of Selecting Ising Models with Approximate Reco very Jonathan Scarlett and V olkan Ce vher Abstract —In this paper , we consider the pr oblem of estimating the underlying graph associated with an Ising model given a number of independent and identically distributed samples. W e adopt an approximate recov ery criterion that allows f or a number of missed edges or incorr ectly-included edges, in contrast with the widely-studied exact reco very problem. Our main results provide information-theor etic lower bounds on the sample complexity for graph classes imposing constraints on the number of edges, maximal degree, and other properties. W e identify a broad range of scenarios where, either up to constant factors or logarithmic factors, our lower bounds match the best known lower bounds for the exact r ecovery criterion, several of which ar e known to be tight or near -tight. Hence, in these cases, approximate r ecovery has a similar difﬁculty to exact reco very in the minimax sense. Our bounds are obtained via a modiﬁcation of F ano’s in- equality for handling the approximate recovery criterion, along with suitably-designed ensembles of graphs that can broadly be classed into two categories: (i) Those containing graphs that contain se veral isolated edges or cliques and are thus difﬁcult to distinguish from the empty graph; (ii) Those containing graphs for which certain groups of nodes are highly correlated, thus making it difﬁcult to determine precisely which edges connect them. W e support our theoretical r esults on these ensembles with numerical experiments. Index T erms —Graphical model selection, Ising model, Gaus- sian graphical models, Markov random ﬁelds, inf ormation- theoretic limits, lower bounds, Fano’ s inequality . I . I N T RO D U C T I O N Graphical models are a widely-used tool for providing com- pact representations of the conditional independence relations between random variables, and arise in areas such as image processing [1], statistical physics [2], computational biology [3], natural language processing [4], and social network anal- ysis [5]. The problem of gr aphical model selection consists of recov ering the graph structure giv en a number of independent samples from the underlying distribution. While this fundamental problem is NP-hard in general [6], there exist a variety of methods guaranteeing e xact reco very with high probability on r estricted classes of graphs, such as bounded degree and bounded number of edges. Existing works hav e focused primarily on Ising models and Gaussian models, and our focus in this paper is on the former . In particular , we focus in the problem of approximate r ecovery , in which one can tolerate some number of missed edges or incorrectly-included edges. The motiv ation for such a study is that the exact recovery criterion is very restrictiv e, The authors are with the Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL) (e-mail: {jonathan.scarlett,volkan.ce vher}@epﬂ.ch). This work was supported in part by the European Commission under Grant ERC Future Proof, SNF 200021-146750 and SNF CRSII2-147633, and EPFL Fellows Horizon2020 grant 665667. and not something that one would typically expect to achieve in practice. In particular , if the number of samples required for exact recovery is very large, it is of signiﬁcant interest to know the potential savings by allowing for approximate recov ery . The answer is unclear a priori , since this can lead to vastly impro ved scaling laws in some inference and learning problems [7] and virtually no gain in others [8]. Our main focus is on algorithm-independent lower bounds for Ising models, re vealing the number of measurements required for approximate recov ery regardless of the compu- tational complexity . W e extend Fano’ s inequality [9], [10] to the case of approximate recovery , and apply it to restricted sets of graphs that prove the difﬁculty of approximate recovery . Our main results rev eal a broad range of graph classes for which the approximate reco very lo wer bounds e xhibit the same scalings as the best-kno wn exact recovery lower bounds [9], [10], which are kno wn to be tight or near-tight in many cases of interest. This indicates that, at least for the classes that we consider , the approximate recovery problem is not much easier than the exact recovery problem in the minimax sense. A. Pr oblem Statement The ferromagnetic Ising model [11] is speciﬁed by a graph G = ( V , E ) with vertex set V = { 1 , . . . , p } and edge set E . Each vertex is associated with a binary random variable X i ∈ {− 1 , 1 } , and the corresponding joint distribution is P G ( x ) = 1 Z exp  X i,j λ ij x i x j  , (1) where λ ij = ( λ ( i, j ) ∈ E 0 otherwise , (2) and Z is a normalizing constant called the partition function. Here λ > 0 is a parameter to the distribution, sometimes called the in verse temperature. Let X ∈ { 0 , 1 } n × p be a matrix of n independent samples from this distribution, each row corresponding to one such sample of the p v ariables. Giv en X , an estimator or decoder constructs an estimate ˆ G of the graph G , or equiv alently , an estimate ˆ E of the edge set E . Recovery Criterion: Gi ven some class G of graphs, the widely-studied exact recovery criterion seeks to characterize P e := max G ∈G P [ ˆ E 6 = E ] . (3) W e instead consider the follo wing approximate reco very cri- terion, for some maximum number of errors q max ≥ 0 : P e ( q max ) := max G ∈G P  | E ∆ ˆ E | > q max  , (4) 2 where E ∆ ˆ E = ( E \ ˆ E ) ∪ ( ˆ E \ E ) , so that | E ∆ ˆ E | denotes the edit distance , i.e., the number of edge insertions and deletions required to transform one graph to another . In this deﬁnition, q max does not depend on G , and hence, the number of allowed edge errors does not depend on the graph itself. W e consider graph classes with a maximum number of edges equal to some value k , and set q max = θ ∗ k for some constant θ ∗ ∈ (0 , 1) not scaling with the problem size. Note that θ ∗ = 1 would trivially giv e P e ( q max ) = 0 . Graph Classes: W e consider the follo wing three nested classes of graphs G k ⊇ G k,d ⊇ G k,d,η ,γ : • (Edge bounded class G k ) This class contains all graphs with at most k edges. • (Edge and de gr ee bounded class G k,d ) This class contains the graphs in G k such that each node has degree (i.e., number of edges it is inv olved in) at most d . • (Sparse separator class G k,d,η ,γ ) This class contains the graphs in G k,d satisfying the ( η , γ ) -separ ation condition [12]: For any two non-connected vertices in the graph, one can simultaneously block all paths of length γ or less by blocking at most η nodes. The restriction on the number of edges is moti vated by the fact that real-world graphs are often sparse. The restriction on the degree is also relev ant in applications, and is particularly commonly-assumed in the statistical physics literature. The sparse separation condition is somewhat more technical, but it is of interest since it is known to permit polynomial-time exact recov ery in many cases [12], [13]. Moreover , it is known to hold with high probability for se veral interesting random graphs; see [12] for some examples. Generalized Edge W eights: A generalization of the abov e Ising model allows λ ij to take different non-zero v alues for each ( i, j ) ∈ E , some of which may be negati ve. Previous works considering model selection for this generalized model hav e sought minimax bounds with respect to the graph class and these parameters subject to λ min ≤ | λ ij | ≤ λ max for some λ min and λ max . The lower bounds deri ved in this paper immediately imply corresponding lower bounds for this generalized setting, provided that our parameter λ in (2) lies in the range [ λ min , λ max ] . Notation and T erminology: Throughout the paper , we let P G and E G denote probabilities and expectations with respect to P G (e.g., P G [ X i = X j ] , E [ X i X j ] ). W e denote the ﬂoor function by b·c , and the ceiling function by d·e . W e use the standard terminology that the de gr ee of a node v ∈ V is the number of edges in E containing v , and that a clique is a subset C ⊂ V of size at least two within which all pairs of nodes hav e an edge between them. B. Related W ork A v ariety of algorithms with v arying lev els of computational efﬁcienc y hav e been proposed for selecting Ising models with rigorous guarantees, including conditional independence tests for candidate neighborhoods [14], correlation tests in the presence of sparse separators [12], [15], greedy techniques [16]–[19], con ve x optimization approaches [20], elementary estimators [21], and intractable information-theoretic tech- niques [9]. These works ha ve made v arious assumptions on the un- derlying model, including incoherence assumptions [20], [21] and long-range correlation assumptions [12], [15]. A notable recent work av oiding these is [19], which provides recovery guarantees using an algorithm whose complexity is only quadratic in the number of nodes for a ﬁxed maximum degree, thus resolving an open question posed in [22]. Early works providing algorithm-independent lo wer bounds used only graph-theoretic properties [12], [14], [23]; the resulting bounds are loose in general, since they do not capture the ef fects of the parameters of the joint distribution (e.g., λ ). Sev eral reﬁned bounds were giv en in [9] for graphs with a bounded degree or a bounded number of edges. Additional classes were considered in [10], including the bounded girth class and a class related to the separation criterion of [12] (and hence related to G k,d,η ,γ deﬁned abov e). While our techniques build on those of [9], [10], we must consider signiﬁcantly different ensembles, since those in [9], [10] contain graphs that differ only by one or two edges, thus making approximate recov ery trivial. T o our knowledge, the only other work giving an approx- imate recov ery bound for the Ising model is [24], where the degree-bounded class is considered. The ef fect of edge weights is not considered therein, and the bound is prov ed by counting graphs rather than constructing restricted ensembles. Conse- quently , only an Ω( d log p ) necessary condition is shown, in contrast with our bounds containing a d 2 or e λd term ( cf. , T able I). The necessary conditions for list decoding [25] bear some similarity to approximate recovery , but the problem and its analysis are in fact much more similar to exact recov ery , allowing the ensembles from [9], [10] to be applied directly . Beyond Ising models, se veral works have provided neces- sary and sufﬁcient conditions for recov ering Gaussian graph- ical models [13], [26]–[29]. In this context, a necessary condition for approximate recov ery was given in [13, Cor . 7], but the corresponding assumptions and techniques used were vastly different to ours: The random Erdös-Rényi model was considered instead of a deterministic class, and an additional walk-summability condition speciﬁc to the Gaussian model was imposed. C. Contrib utions Our main results, and the corresponding existing results for exact recovery , are summarized in T able I, where we provide necessary scaling laws on the number of samples needed to obtain a v anishing probability of error P e ( q max ) . Note that some of the exact recovery conditions giv en in the ﬁnal column were not explicitly giv en in [9], [10], b ut they can easily be inferred from the proofs therein; see Section II for further discussion. W e also observe that our analysis requires handling more cases separately compared to [9], [10]; in those works, the ﬁnal three ro ws corresponding to G k in T able I are all a single case giving Ω( k log p ) scaling, and similarly for G k,d . T able I rev eals the following facts: 1) In all of the kno wn cases where exact recov ery is known to be difﬁcult, i.e., exponential in a quantity that 3 Graph Class Parameters Necessary for approximate recovery (this paper) Best known necessary for exact recovery [9], [10] Bounded edge G k Distortion q max < k 4 (Theorems 1 and 2) λ = ω  1 √ k  Exponential in λ √ k Exponential in λ √ k λ = O  1 √ k  1  k  p Ω( k log p ) Ω( k log p ) λ = O  1 √ k  p  k  p 4 3 Ω( k ) Ω( k log p ) λ = O  1 √ k  p 4 3  k  p 2 Ω  p 2 √ k  (between Ω( p ) and Ω( k ) ) Ω( k log p ) Bounded edge and degree G k,d Distortion q max < k 4 d − 2 d (Theorems 3 and 4) λ = ω  1 d  Exponential in λd Exponential in λd λ = O  1 d  d 2  k  p Ω( d 2 log p ) Ω( d 2 log p ) λ = O  1 d  p  k  p √ d Ω( d 2 ) Ω( d 2 log p ) λ = O  1 d  p √ d  k ≤ pd 2 Ω  d 3 p 2 k 2  (between Ω( d ) and Ω( d 2 ) ) Ω( d 2 log p ) Bounded edge and degree with sparse separators G k,d,η ,γ Distortion q max < ( cη − 1) 2 k 2 cη (2 η + m ( γ +1)) ( c ∈ (0 , 1) , m ∈ { 0 , . . . , d 2 − η } ) (Theorem 5) λ = ω  min n 1 √ η , 1 m 1 1+ γ o λ = O (1) k ≤ p 4 Exponential in max  λ 2 η , λ γ +1 m  Exponential in max  λ 2 η , λ γ +1 d  λ = O  min n 1 √ η , 1 m 1 1+ γ o λ = O (1) k ≤ p 4 Ω  max n η , m 2 γ +1 o log p  Ω  max n η , d 2 γ +1 o log p  T able I: Summary of main results on parital reco very , and comparisons to the best known necessary conditions for e xact reco very . Each entry shows the necessary scaling law for the number of samples required to achiev e a vanishing error probability . increases in the problem dimension, the same difﬁculty is observed for approximate recovery , at least for the values of q max shown. For G k and G k,d , this is true ev en when we allo w for up to a quarter of the edges to be in error . Note that we did not seek to optimize this fraction in our analysis, and we expect similar difﬁculties to arise ev en when higher proportions of errors are allo wed. In fact, by a simple v ariation of our analysis outlined in Remark 1 in Section IV -C, we can already increase this fraction from 1 4 to 1 2 . 2) In many of the cases where the necessary conditions for exact recov ery lack exponential terms, the correspond- ing necessary conditions for approximate recovery are identical or near-identical; in particular, see the second and third ro ws corresponding to G k , the second and third rows corresponding to G k,d , and the second ro w corresponding to G k,d,η ,γ with m = d 2 − η . While there are logarithmic terms missing in some cases (e.g., k vs. k log p ), these are typically insigniﬁcant in the regimes considered (e.g., k = Ω( p ) ). 3) In contrast, there are some cases where signiﬁcant gaps remain between the best-kno wn conditions for e xact re- cov ery and approximate recovery . The two most extreme cases are as follows: (i) If k = Θ( p 2 −  ) for some small  > 0 , the necessary conditions for G k are Ω( p 2 −  log p ) and Ω( p 1+ / 2 ) , respectively; (ii) If k = Θ( pd ) , then the necessary conditions for G k,d are Ω( d 2 log p ) and Ω( d log p ) , respecti vely . It remains an open problem as to whether this beha vior is fundamental, or due to a weakness in the analysis. The starting point of our results is a modiﬁcation of Fano’ s inequality for the purpose of handling approximate recov ery . T o obtain the abov e results, we apply this bound to ensembles of graphs that can be broadly classed into two categories. The ﬁrst considers graphs with a large number of isolated edges, or more generally , isolated cliques. W e characterize how difﬁcult each graph is to distinguish from the empty graph, and use this to derive the results given in item 2) above. On the other hand, the results on the exponential terms discussed in item 1) arise from considering ensembles in which sev eral groups of nodes are always highly correlated due to the presence of a large number of edges among them, thus making it difﬁcult to determine precisely which edges these are. Both of these categories help in providing bounds that match those for exact reco very . For example, the Ω( k log p ) behavior for λ = O  1 k  in [9] is proved by considering graphs with a single isolated edge, and our analysis extends this to approximate recov ery by considering graphs with k isolated edges. Analogously , the exponential behavior (e.g., in λ √ k ) in [9] is prov ed by considering cliques with one edge remov ed, and our analysis re veals that the same exponential behavior arises even if a constant fraction of the the edges are removed. W e provide numerical results on our ensembles in Section VI supporting our theoretical ﬁndings. Speciﬁcally , we imple- ment optimal or near -optimal decoding rules in a variety of cases, and ﬁnd that while approximate recovery can be easier than exact recovery , the general behavior of the two is similar . I I . M A I N R E S U LT S In this section, we present our main results, namely , algorithm-independent necessary conditions for the criterion in (4) with all λ ij = λ . Our conditions are written in terms of asymptotic o (1) terms for clarity , but purely non-asymptotic variants can be inferred from the proofs. Throughout the section, we make use of the binary entropy function in nats, 4 H 2 ( θ ) := − θ log θ − (1 − θ ) log(1 − θ ) . Here and subsequently , all logarithms have base e . All proofs are deferred to later sections; some preliminary results are presented in Section III, a number of ensembles are presented and analyzed in Section IV, and the resulting theorems are deduced in Section V. A. Bounded Number of Edges Class G k W e ﬁrst consider the class G k of graphs with at most k edges. It will prov e conv enient to treat two cases separately depending on how k scales with p . Theorem 1. (Class G k with k ≤ p/ 4 ) F or any number of edges such that k → ∞ and k ≤ p/ 4 , and any distortion level q max = b θ k c for some θ ∈  0 , 1 4  , it is necessary that n ≥ max  e λ ( √ k/ 2 − 2) / 2  log 2 − H 2 (2 θ )  6 λk , 2(1 − θ ) log p λ tanh λ   1 − δ − o (1)  (5) in or der to have P e ( q max ) ≤ δ for all G ∈ G k . W e proceed by considering two cases as in [9]. In the case that λ √ k → ∞ at any rate faster than logarithmic in p (or e ven logarithmic with a constant that is not too small), the sample complexity is dominated by the exponential term e λ ( √ k − 2) / 2 , and any recov ery procedure requires a huge number of samples. Thus, in this case, e ven the approximate recov ery problem is very difﬁcult. On the other hand, if λ = O  1 √ k  then the second condition in (5) gi ves a sample complexity of Ω( k log p ) , since tanh λ = O ( λ ) as λ → 0 . These observations are the same as those made for exact recov ery in [9], where the best known necessary conditions for G k were giv en. Thus, we hav e reached similar conclusions ev en allowing for nearly a quarter of the edges to be in error . Theorem 2. (Class G k with k = Ω( p ) ) F or any number of edges of the form k = b cp 1+ ν c for constants c > 0 and ν ∈ [0 , 1) , and any distortion level q max = b θk c for some θ ∈  0 , 1 4  , it is necessary that n ≥ max  e λ ( √ k/ 2 − 2) / 2  log 2 − H 2 (2 θ )  6 λk , log 2 − H 2 ( θ ) λ e 2 λ cosh(4 λcp ν ) − 1 e 2 λ cosh(4 λcp ν )+1   1 − δ − o (1)  (6) in or der to have P e ( q max ) ≤ δ for all G ∈ G k . As above, the sample comple xity is e xponential in λ √ k due to the ﬁrst term in (6). On the other hand, we claim that when λ = O  1 √ k  , the second term in (6) leads to the sample complexity O (min { k , p 2 / √ k } ) . T o see this, we choose k as in the theorem statement and note that λp ν = O ( p − 1 2 (1+ ν )+ ν ) = O ( p − 1 2 (1 − ν ) ) ; since cosh ζ = 1 + O ( ζ 2 ) as ζ → 0 , this implies that cosh(2 cλp ν ) = 1 + O ( p − (1 − ν ) ) . W e thus hav e e 2 λ cosh(2 cλp ν ) =  1 + O ( p − 1 2 (1+ ν ) )  (1 + O ( p − (1 − ν ) )  , which ﬁnally yields e 2 λ cosh(2 cλp ν ) − 1 e 2 λ cosh(2 cλp ν )+1 = O (max { p − 1 2 (1+ ν ) , p − (1 − ν ) } ) = O (max { 1 / √ k , k/p 2 } ) . When k = Ω( p ) and k = O ( p 4 / 3 ) , we ha ve min { k , p 2 / √ k } = k , and hence, these observations are again the same as those made for exact recovery in [9], except that our growth rates do not include a log p term; this logarithmic factor is insigniﬁcant compared to the leading term k = Ω( p ) . In contrast, the gap is more signiﬁcant when k  p 4 / 3 ; in the extreme case, when k = Θ( p 2 −  ) for some small  > 0 , we obtain a scaling of Ω( p 1+ / 2 ) , as opposed to Ω( k log p ) = Ω( p 2 −  log p ) . B. Bounded De gr ee Class G k,d Next, we consider the glass G k,d of graphs such that every node has degree at most d , and the total number of edges does not exceed k . Theorem 3. (Class G k,d with k ≤ p/ 4 ) F or any maximal de gree d > 2 and number of edges k such that k = ω ( d 2 ) and k ≤ p/ 4 , and any distortion level q max = b θ k c for some θ ∈  0 , 1 4 d − 2 d  , it is necessary that n ≥ max  e λ ( d − 2) / 4  log 2 − H 2  d d − 2 · 2 θ  3 λd 2 , 2(1 − θ ) log p λ tanh λ   1 − δ − o (1)  (7) in or der to have P e ( q max ) ≤ δ for all G ∈ G k,d . The ﬁrst term in (7) reveals that the sample complexity is exponential in λd . On the other hand, if λ = O  1 d  then the second term gives a sample complexity of Ω( d 2 log p ) . W e cannot directly compare Theorem 3 to [9], since there k was assumed to be unrestricted for the degree-bounded ensemble. Howe ver , the analysis therein is easily extended to G k,d , and doing so recovers the nearly identical observations to those above, as summarized in T able I. In this sense, Theorem 3 matches the best known necessary conditions for exact recovery ev en when nearly a quarter of the edges may be in error . Theorem 4. (Class G k,d with k = Ω( p ) ) F or any maximal de gree d > 2 and number of edges k such that k = ω ( d 2 ) and k ≤ 1 2 p ( d 0 − 1) for some d 0 ≤ d , and any distortion level q max = b θ k c for some θ ∈  0 , 1 4 d − 2 d  , it is necessary that n ≥ max  e λ ( d − 2) / 4  log 2 − H 2  d d − 2 · 2 θ  3 λd 2 , log 2 − H 2 ( θ ) λ e 2 λ cosh(2 λd 0 ) − 1 e 2 λ cosh(2 λd 0 )+1   1 − δ − o (1)  (8) in or der to have P e ( q max ) ≤ δ for all G ∈ G k,d . The sample complexity remains exponential in λd . By some standard asymptotic expansions similar to those following Theorem 2, we ha ve e 2 λ cosh(2 λd 0 ) − 1 e 2 λ cosh(2 λd 0 )+1 = O  max  1 d ,  d 0 d  2  whenev er λ = O  1 d  ; hence, the second condition in (8) becomes n = Ω  d min  d,  d d 0  2  . Thus, if d 0 = O ( √ d ) then we again get the desired n = O ( d 2 log p ) behavior; this means that we can allo w for k up to O ( p √ d ) . More generally , we instead get the possibly weaker scaling la w 5 n = Ω  min  d 2 , d 3 / ( d 0 ) 2  , which is equiv alent to n = Ω  min  d 2 , d 3 p 2 k 2  when k = Θ( pd 0 ) . In the extreme case, when k = Θ( pd ) (the highest gro wth rate possible giv en the degree constraint alone), this only recovers Ω( d log p ) scaling. C. Sparse Separator Class G k,d,η ,γ W e now consider the class G k,d,η ,γ of graphs in G k,d that satisfy the ( η, γ ) -separation condition [12]. W e focus on the case k ≤ p/ 4 , since the main graph ensemble that we consider for this class is not suited to the case that k = ω ( p ) . Theorem 5. (Class G k,d,η ,γ with k ≤ p/ 4 ) F ix any parameters ( d, k , η , γ ) with k ≤ p/ 4 and η ≤ b d 2 c , and let m be an inte ger in  0 , . . . , b d 2 c − η  . F or any distortion level q max =  θ ( cη − 1) 2 k 2 cη (2 η + m ( γ +1))  for some θ ∈  0 , 1 2  and c ∈  1 η , 1  , it is necessary that n ≥ max   1 +  cosh(2 λ )  (1 − c ) η − 1  1+(tanh λ ) γ +1 1 − (tanh λ ) γ +1  m  2 λcη ×  log 2 − H 2 ( θ )  , 2( k − q max ) log p k λ tanh λ   1 − δ − o (1)  (9) in or der to have P e ( q max ) ≤ δ for all G ∈ G k,d,η ,γ . W e proceed by considering only the case λ = O (1) , though simpliﬁcations of Theorem 5 for λ → ∞ are also possible. W ith λ = O (1) , we have  cosh(2 λ )  (1 − c ) η = e ζ λ 2 (1 − c ) η for some ζ = Θ(1) , and similarly  1+(tanh λ ) γ +1 1 − (tanh λ ) γ +1  m = e ζ 0 mλ γ +1 for some ζ 0 = Θ(1) [10, Sec. 5]. These identities reveal that the sample complexity is exponential in both λ 2 η and λ γ +1 m . On the other hand, if λ = O  1 √ η  and λ = O  1 m 1 γ +1  then the second term in (9) gives n = Ω(max { η , m 2 γ +1 } log p ) . Due to the choice q max =  θ ( cη − 1) 2 k 2 cη (2 η + m ( γ +1))  , if we set m = d/ 2 − η then we are only in the regime of a constant fraction of errors if dγ = Θ( η ) . This is true, for example, if η = Θ( d ) so that the separator set size is a ﬁxed fraction of the maximum degree, and γ = Θ(1) so that the separation is with respect to paths of a bounded length. More generally , to handle larger v alues of q max , one can choose a smaller v alue of m , thus leading to a larger value of q max but with a less stringent condition on the number of measurements in (9). In the e xtreme case, m = 0 , and then we are always in the regime of a constant proportion of errors; ho wever , this yields a necessary condition Ω( η log p ) not depending on d or γ . The graph family studied in [10, Thm. 2] is some what different from G k,d,η ,γ , in particular not putting any constraints on the maximal degree nor the number of edges. Nev ertheless, by choosing the parameters in the proof therein to meet these constraints, 1 one again obtains similar conditions to those abov e, as summarized in T able I. In particular , for any choice of m that grows as Θ( d ) , the scaling laws for exact recov ery and approximate recovery coincide. 1 Speciﬁcally , in [10, Sec. 9.2], one can set t ν = d − η to satisfy the degree constraint, and then choose α =  k t ν ( γ +1)+2 η − 1  to ensure there are at most k edges in total. I I I . A U X I L I A RY R E S U LT S In this section, we provide a number of auxiliary results that will be used to prov e the theorems in Section II. W e ﬁrst present a general form of Fano’ s depending on both the Kullback-Leibler (KL) di vergence and edit distance between graphs, and then pro vide a number of properties of Ising models that will be useful for characterizing the KL di ver gence and edit distance in speciﬁc scenarios. A. F ano’ s Inequality for Appr oximate Recovery As is common in studies of algorithm-independent lower bounds in learning problems, we make use of bounds based on Fano’ s inequality [30, Sec. 2.10]. W e ﬁrst brieﬂy outline the most relevant results for the exact recovery problem. Recall the deﬁnitions of P e and P e ( q max ) in (3)–(4) with respect to a gi ven graph class G . It is known that for an y subset T ⊆ G , and any covering set C T (  ) such that any graph G ∈ T has an “  -close” graph G 0 ∈ C T (  ) satisfying D ( P G k P G 0 ) ≤  , we have [10] P e ≥ 1 − log | C T (  ) | + n + log 2 log |T | . (10) In particular, if C T (  ) is a singleton, solving for n gives the necessary condition n ≥ log |T |   1 − δ − log 2 log |T |  (11) in order to have P e ≤ δ . For approximate recovery , we consider ensembles (i.e., choices of T ) for which the decoder’ s outputs may lie in some set T 0 without loss of optimality; in most cases we will hav e T = T 0 , but in general, T 0 need not e ven be a subset of the graph class G . W e use the following generalization of (11). Lemma 1. Suppose that the decoder minimizing the average err or probability with r espect to a distortion level q max , averag ed over a graph uniformly drawn fr om a set T ⊆ G , always outputs a graph in some set T 0 . Mor eover , suppose that ther e exists a graph G 0 such that D ( P G k P G 0 ) ≤  for all G ∈ T , and that there ar e at most A ( q max ) graphs in T 0 within an edit distance q max of any given graph G ∈ T . Then it is necessary that n ≥ log |T | − log A ( q max )   1 − δ − log 2 log |T |  (12) in or der to have P e ( q max ) ≤ δ . Pr oof: See Appendix A. B. Pr operties of F err omagnetic Ising Models W e will use a number of useful results on ferromagnetic Ising models, each of which is either self-e vident or can be found in [9] or [10]. W e start with some basic properties. Lemma 2. F or any graphs G and G 0 with edge sets E and E 0 r espectively , we have the following: (i) F or any pair ( i, j ) , we have [9] E G [ X i X j ] = 2 P G [ X i = X j ] − 1 . (13) 6 (ii) The diver gence between the corr esponding distributions satisﬁes [10, Eq. (4)] D ( P G k P G 0 ) ≤ X ( i,j ) ∈ E \ E 0 λ  E G [ X i X j ] − E G 0 [ X i X j ]  + X ( i,j ) ∈ E 0 \ E λ  E G 0 [ X i X j ] − E G [ X i X j ]  (14) ≤ X ( i,j ) ∈ E \ E 0 λ  1 − E G 0 [ X i X j ]  + X ( i,j ) ∈ E 0 \ E λ  1 − E G [ X i X j ]  . (15) (iii) If E 0 ⊂ E , then we have for any pair ( i, j ) that [10, Eq. (13)] E G [ X i X j ] ≥ E G 0 [ X i X j ] . (16) (iv) Let ( V 1 , . . . , V K ) be a partition of V into K disjoint non-empty subsets. If G and G 0 ar e such that there ar e no edges between nodes in V i and V j when i 6 = j , then D ( P G k P G 0 ) = K X i =1 D ( P G i k P G 0 i ) , (17) wher e G i = ( V , E i ) , with E i containing the edges in E between nodes in V i (and analogously for G 0 i ). The remaining properties concern the probabilities, expec- tations and diver gences associated with more speciﬁc graphs. Lemma 3. (i) If G 0 is obtained fr om G by r emoving a single edge ( i, j ) , then [10, Eq. (19)] P G [ X i = X j ] 1 − P G [ X i = X j ] = e 2 λ P G 0 [ X i = X j ] 1 − P G 0 [ X i = X j ] (18) and [10, Lemma 4] D ( P G k P G 0 ) ≤ λ tanh λ. (19) (ii) Let G contain a clique on m ≥ 2 nodes and no other edges, and let G 0 be obtained fr om G by r emoving a single edge ( i, j ) . Then, deﬁning m := m − 1 , we have [9, Eq. (31)] P G 0 [ X i = X j ] 1 − P G 0 [ X i = X j ] = P m j =0  m j  exp  λ 2 (2 j − m ) 2  exp  2 λ (2 j − m )  P m j =0  m j  exp  λ 2 (2 j − m ) 2  . (20) Mor eover , we have [9, Lemma 1] P G 0 [ X i = X j ] ≥ 1 − m m + e mλ/ 2 (21) and E G 0 [ X i X j ] ≥ 1 − 2 me λ e mλ + me λ . (22) (iii) Suppose that for some edge ( i, j ) ∈ E ∆ E 0 , ther e exist at least m node-disjoint paths of length ` between i and j in G . Then [10, Lemma 3] E G [ X i X j ] ≥ 1 − 2 1 +  1+(tanh λ ) ` 1 − (tanh λ ) `  m . (23) If the same is true in both G and G 0 for all ( i, j ) ∈ E ∆ E 0 , then [10, Cor . 3] D ( P G k P G 0 ) ≤ 2 λ | E ∆ E 0 | 1 +  1+(tanh λ ) ` 1 − (tanh λ ) `  m . (24) (iv) Mor e generally , if ther e exist at least m l node-disjoint paths of length ` l between ( i, j ) for l = 1 , . . . , L , wher e the values of ` l ar e all distinct, then E G [ X i X j ] ≥ 1 − 2 1 + Q L l =1  1+(tanh λ ) ` l 1 − (tanh λ ) ` l  m l . (25) I V . G R A P H E N S E M B L E S A N D L OW E R B O U N D S O N T H E I R S A M P L E C O M P L E X I T I E S In this section, we provide necessary conditions for the approximate recov ery of a number of ensembles, making use of the tools from the previous section. In particular , we seek choices of T , T 0 and A ( q max ) for substitution into Fano’ s inequality in Lemma 1. In Section V, we use these to establish our main theorems. A. Ensemble 1: Many Isolated Edges This ensemble contains numerous isolated edges, such that if λ is small then it is difﬁcult to determine precisely which ones are present. It is constructed as follows with some integer parameter α ≤ p/ 4 : Ensemble1( α ) [Isolated edges ensemble]: • Each graph in T is obtained by forming exactly α node-disjoint edges that may otherwise be arbitrary . For this ensemble, we hav e the following properties: • The number of graphs is |T | = Q α i =0  p − 2 i 2  ≥  b p/ 2 c 2  α , since p − 2 α ≥ p 2 by the assumption α ≤ p/ 4 . • The maximum degree of each graph is one. • For this ensemble, it suf ﬁces to tri vially let T 0 contain all graphs. • The number of graphs within an edit distance q max of any single graph is upper bounded as A ( q max ) ≤ P q max q =0 P q max − q q 0 =0  α q  p 2  q 0 ≤ (1 + q max ) 2  α b α/ 2 c  p 2  q max . Here the term  α q  corresponds to choosing q edges to remov e, and the term  p 2  q 0 upper bounds the number of ways to add q 0 ≤ q max − q ne w edges. W e ha ve also used the fact that  α q  is maximized at q = b α/ 2 c . • From (19), the KL di vergence from a single-edge graph to the empty graph is upper bounded by λ tanh λ . Using this fact along with (17), any graph in T has a KL diver gence to the empty graph of at most  = αλ tanh λ . Combining these with (12) gives the necessary condition n ≥ α log  b p/ 2 c 2  − log  (1 + q max ) 2  α b α/ 2 c  p 2  q max  αλ tanh λ ×  1 − δ − log 2 |T |  (26) in order to have P e ( q max ) ≤ δ . 7 Simplifying both log  b p/ 2 c 2  and log  p 2  to (2 log p )(1 + o (1)) , and writing log  α b α/ 2 c  ≤ α log 2 = o ( α log p ) as well as log(1 + q max ) 2 ≤ 2 log (1+ α ) = o ( α log p ) , we can simplify (26) to n ≥ 2 α log p − 2 q max log p αλ tanh λ  1 − δ − o (1)  , (27) provided that α → ∞ and q max ≤ (1 − Ω(1)) α . Letting q max = b θ 1 α c for some θ 1 ∈ (0 , 1) , this becomes n ≥ 2(1 − θ 1 ) log p λ tanh λ  1 − δ − o (1)  . (28) B. Ensemble 2: Many Isolated Gr oups of Nodes As an alternative to Ensemble 1, this ensemble allows for signiﬁcantly more edges, in particular permitting k = ω ( p ) . It is constructed as follows with integer parameters m and α : Ensemble2( m , α ) [Isolated cliques ensemble]: • Form α ﬁxed groups of nodes, each containing m nodes. • Each graph in T is formed by forming arbitrarily many edges within each group, but no edges between the groups. For this ensemble, we hav e the following: • The number of nodes forming these groups is mα . • The total number of possible edges is α  m 2  , and hence the total number of graphs is |T | = 2 α ( m 2 ) . • The maximal degree of each graph is at most m − 1 . • The decoder can output an element of T without loss of optimality , since any inter-group edges declared to be present are guaranteed to be wrong. Thus, we may set T 0 = T . • The number of graphs within an edit distance q max of any single graph is A ( q max ) = P q max q =0  α ( m 2 ) q  ≤ 1 + q max  α ( m 2 ) q max  , assuming q max ≤ 1 2 α  m 2  . • In Lemma 4 below , we sho w that the KL diver gence of the graph associated with one group to the corresponding empty graph is upper bounded by  m 2  λ e 2 λ cosh(2 λm ) − 1 e 2 λ cosh(2 λm )+1 . Hence, the KL di ver gence of any G ∈ T to the empty graph is upper bounded by  = α  m 2  λ e 2 λ cosh(2 λm ) − 1 e 2 λ cosh(2 λm )+1 due to (17). Substituting these into (12), setting q max = b θ 2 α  m 2  c for some θ 2 ∈  0 , 1 2  , and applying some simpliﬁcations, we obtain the following necessary condition for P e ( q max ) ≤ δ : n ≥ log 2 − H 2 ( θ 2 ) λ e 2 λ cosh(2 λm ) − 1 e 2 λ cosh(2 λm )+1  1 − δ − o (1)  , (29) whenev er α  m 2  → ∞ . Note that the binary entropy function arises from the identity  N b θN c  = e nH 2 ( θ )(1+ o (1)) as N → ∞ . It remains to prove the claim on the KL di vergence, for- malized as follows. Lemma 4. Let G denote an arbitrary graph with edges connected to at most m ≥ 2 nodes, and let G 0 be the empty graph. Then, it holds that D ( P G k P G 0 ) ≤  m 2  λ e 2 λ cosh(2 λm ) − 1 e 2 λ cosh(2 λm ) + 1 . (30) Pr oof: W e prove the claim for the case that G contains a single m -clique; the general case then follows in a similar fashion using (16). Let G be obtained from G by removing a single edge, say index ed by ( i, j ) . Deﬁning q ( G ) := P G [ X i = X j ] and m := m − 1 , we have from (18) that q ( G ) 1 − q ( G ) = e 2 λ q ( G ) 1 − q ( G ) , (31) and from (20) that q ( G ) 1 − q ( G ) = P m j =0  m j  exp  λ 2 (2 j − m ) 2  exp  2 λ (2 j − m )  P m j =0  m j  exp  λ 2 (2 j − m ) 2  . (32) Noting the symmetry of the summands with respect to j and m − j , we obtain the following when m is odd (the case that m is e ven is handled similarly , leading to the same conclusion): q ( G ) 1 − q ( G ) = P b m/ 2 c j =0  m j  exp  λ 2 (2 j − m ) 2  · 2 cosh  2 λ (2 j − m )  2 P b m/ 2 c j =0  m j  exp  λ 2 (2 j − m ) 2  (33) ≤ max j =0 ,..., b m/ 2 c cosh  2 λ (2 j − m )  (34) = cosh  2 λm  (35) ≤ cosh  2 λm  . (36) Substituting (36) into (31), solving for q ( G ) , and conv erting from probability to expectation via (13), we obtain E G [ X i X j ] ≤ e 2 λ cosh(2 λm ) − 1 e 2 λ cosh(2 λm ) + 1 . (37) The proof is concluded by substituting into (14) and noting that E G 0 [ X i X j ] = 0 , | E \ E 0 | =  m 2  , and | E 0 \ E | = 0 . C. Ensemble 3: Larg e Inter-Connected Cliques This ensemble in v olves cliques with numerous edges between them, making it dif ﬁcult to determine precisely which inter- clique connections are present, particularly for lar ge cliques and large values of λ . It is constructed as follo ws with inte ger parameters m and α : Ensemble3( m , α ) [Inter -connected cliques ensemble]: • Construct a ﬁxed “building block” as follows: T ake an arbitrary subset of the p vertices of size 2 m , split the 2 m vertices into two sets of size m each, fully connect each of those sets, and then put m extra edges between the two sets in a ﬁxed but arbitrary one-to-one fashion. • Form α disjoint copies of this building block to obtain a base graph G 0 . • Each graph in T is formed by taking G 0 and adding an arbitrary number of additional edges between each pair of partially-connected cliques. Thus, G 0 itself contains the fewest edges within T , and the union of α cliques of size 2 m contains the most edges. 8 m inter-clique connections m -cliques Figure 1: Building block for Ensemble 3 with m = 4 . An illustration of one building block is giv en in Figure 1. For this ensemble, we hav e the following: • The number of nodes forming these groups is 2 mα , and the number of edges in each graph is upper bounded by α  2 m 2  ≤ 2 αm 2 . • The number of potential edges between two m -cliques is m 2 , and m of them are always there in each building block. Hence, the number of ways of adding edges to one building block is 2 m ( m − 1) , and the total number of graphs is 2 αm ( m − 1) . • The maximal degree of each graph is at most 2 m − 1 . • Similarly to Ensemble 2, the decoder can output an element of T without loss of optimality , so that T 0 = T . • The number of graphs within an edit distance q max of any single graph is A ( q max ) = P q max q =0  αm ( m − 1) q  ≤ 1 + q max  αm ( m − 1) q max  , assuming q max ≤ 1 2 αm ( m − 1) . • In Lemma 5 below , we sho w that the KL diver gence of the graph associated with one group to the 2 m -clique graph is upper bounded by 12 λm 4 e − λ ( m − 1) / 2 . Thus, the KL di ver gence from any G ∈ T to the union of α 2 m - cliques is upper bounded by  = 12 λαm 4 e − λ ( m − 1) / 2 due to (17). Substituting these into (12), setting q max = b θ 3 αm ( m − 1) c for some θ 3 ∈  0 , 1 2  , and simplifying, we obtain n ≥ e λ ( m − 1) / 2  log 2 − H 2 ( θ 3 )  12 λm 2  1 − δ − o (1)  , (38) whenev er αm ( m − 1) → ∞ . It remains to prove the claim on the KL di vergence, for- malized as follows. Lemma 5. Let G denote the graph corresponding to a single gr oup in Ensemble 3, and let G 0 be the corr esponding graph containing a 2 m -clique. Then D ( P G k P G 0 ) ≤ 12 λm 4 e − λ ( m − 1) / 2 . (39) Pr oof: W e focus on the case that G is the building block obtained by forming two cliques of size m and connecting m edges between them; the case that further edges are present is handled similarly using (16). From (16) and (21), we hav e for any ( i, j ) within either of the two m -cliques that P G [ X i = X j ] ≥ 1 − m m + e mλ/ 2 (40) ≥ 1 − m m + e mλ/ 2 , (41) where m := m − 1 . By taking an arbitrary node from each clique and applying the union bound over the 2( m − 1) ≤ 2 m ev ents corresponding to other nodes in the clique having the same value as that node, we ﬁnd that the probability that each of the cliques have nodes that all take the same value satisﬁes the follo wing: P G [ all nodes same within each clique ] ≥ 1 − 2 m 2 m + e mλ/ 2 . (42) Next, we consider the probabilities of the two cliques taking a common value vs. two dif ferent values. Letting A ν,σ be the ev ent that the ν -th clique has v alues all equal to σ ∈ { +1 , − 1 } , we hav e from (1) that P G [ A 1 , + ∩ A 2 , + ] = 1 Z exp  λ  2  m 2  + m  (43) P G [ A 1 , + ∩ A 2 , − ] = 1 Z exp  λ  2  m 2  − m  . (44) T aking the ratio between the two gives P G [ A 1 , + ∩ A 2 , + ] P G [ A 1 , + ∩ A 2 , − ] = e 2 mλ . (45) By the same argument, this is also the ratio between an y analogous ev ents with the same signs in the numerator and differing signs in the denominator . The same argument also applies when we condition on each of the two cliques having common-valued nodes; in this case, the left-hand side of (45) simply amounts to ψ 1 − ψ , where ψ is the conditional probability that all of the 2 m nodes making up the two cliques take the same v alue. Equating ψ 1 − ψ = e 2 mλ in accordance with (45) and solving for ψ , we obtain the following: P G [ all nodes same | all nodes same within each clique ] = 1 − 1 1 + e 2 mλ , (46) where “all nodes” refers to the 2 m nodes making up the two cliques. Multiplying this with (42) gives P G [ all nodes same ] ≥ 1 − 2 m 2 m + e mλ/ 2 − 1 1 + e 2 mλ . (47) Using this fact along with (13), we hav e for all ( i, j ) , e ven in different cliques, that E G [ X i X j ] ≥ 1 − 4 m 2 m + e mλ/ 2 − 2 1 + e 2 mλ . (48) Finally , the number of edges that are in the complete graph G 0 but not in G is trivially upper bounded by  2 m 2  ≤ 2 m 2 , and thus substitution into (15) yields D ( P G k P G 0 ) ≤ 2 λm 2  4 m 2 m + e λ ( m − 1) / 2 + 2 1 + e 2 λm  . (49) The proof is concluded by writing 4 m 2 m + e λ ( m − 1) / 2 + 2 1 + e 2 λm ≤ 4 m 2 e λ ( m − 1) / 2 + 2 e 2 λm (50) ≤ 6 m 2 e λ ( m − 1) / 2 . (51) 9 m length- ` paths ⌘ 2 length-2 paths ⌘ 1 nodes Figure 2: Building block for Ensemble 4 with η 1 = 5 , η 2 = 2, m = 2 , and ` = 3 . Remark 1. In this ensemble, there are αm 2 edges known with certainty , and a possible further αm ( m − 1) that are unknown. Thus, slightly more than half of the potential edges are known. This limits the values of q max that are meaningful when applying this ensemble, and is the reason for the constraints on q max (e.g., q max ≤ k / 4 ) in Theorems 1 – 4. Howe ver , one can generalize this ensemble by considering mor e than two groups of m -cliques such that each pair has m inter -clique connections. W ith this extension, the fraction of potential edges that are known can be made arbitrarily close to zero, and similar results to those shown in T able I for G k (respectiv ely , G k,d ) can be obtained ev en when q max = b θ k 2 c (respectively , q max = b θ k 2 d − 2 d c ) for some θ ∈ (0 , 1) . D. Ensemble 4: Many Node-Disjoint P aths This ensemble is based on forming a lar ge number of node- disjoint paths between pairs of nodes, making it difﬁcult to determine whether or not direct edges also exist between those nodes [10]. It is constructed as follows with integer parameters η 1 , η 2 , m , ` , α : Ensemble4( η 1 , η 2 , m , ` , α ) [Disjoint paths ensemble]: • T ake an arbitrary subset of the p v ertices of size η 1 and label them 1 , 2 , . . . , η 1 . For each consecutiv e pair of these nodes, including the wrapped-around pair ( η 1 , 1) , form η 2 node-disjoint paths of length two between them, and also form m node-disjoint paths of length ` between them. • Form a base graph G 0 by taking α copies of this graph. • Each graph in T is formed by taking G 0 and adding arbitrarily man y edges among the η 1 “center” nodes of each building block. Thus, G 0 itself has the fewest edges, whereas the graph with α  η 1 2  additional center edges contains the most edges. An illustration of one b uilding block is sho wn in Figure 2. For this ensemble, hav e the following: • The number of nodes within each b uilding block is η 1 (1+ η 2 + m ( ` − 1)) , and hence the total number of nodes is αη 1 (1 + η 2 + m ( ` − 1)) . • Within each building block, there are up to  η 1 2  edges in the center , as well as 2 η 1 η 2 further edges forming paths of length two, and mη 1 ` edges forming paths of length ` . Hence, the total number of edges is between α η 1 (2 η 2 + m` ) and α η 1 (( η 1 − 1) / 2 + 2 η 2 + m` ) . • The total number of graphs is |T | = 2 α ( η 1 2 ) . • The maximal degree is less than η 1 + 2 η 2 + 2 m . • Similarly to Ensembles 2 and 3, we may set T 0 = T . • The number of graphs within an edit distance q max of any giv en graph is A ( q max ) = P q max q =0  α ( η 1 2 ) q  ≤ 1 + q max  α ( η 1 2 ) q max  , assuming q max ≤ 1 2 α  η 1 2  . • Using Lemma 6 below , along with (17), the KL di- ver gence from any graph in T to the corresponding graph with all centers connected is upper bounded by  = 2 λαη 1 ( η 1 2 ) 1+  cosh(2 λ )  η 2  1+(tanh λ ) ` 1 − (tanh λ ) `  m . Substituting these into (12) and setting q max = b θ 4 α  η 1 2  c for some θ 4 ∈  0 , 1 2  giv es n ≥ 1 +  cosh(2 λ )  η 2  1+(tanh λ ) ` 1 − (tanh λ ) `  m 2 λη 1  log 2 − H 2 ( θ 4 )  ×  1 − δ − o (1)  (52) provided that α  η 1 2  → ∞ . It remains to prove the claim on the KL di vergence, for- malized as follows. Lemma 6. Let G denote the graph corresponding to a single gr oup in the construction in Ensemble 4, and let G 0 be the corr esponding building block with all of the center nodes connected. Then D ( P G k P G 0 ) ≤ 2 λη 1  η 1 2  1 +  cosh(2 λ )  η 2  1+(tanh λ ) ` 1 − (tanh λ ) `  m . (53) Pr oof: W e focus on the case that G is the building block described abo ve; the case that further edges are present is handled similarly using (16). W e know from (25) that the joint distrib ution between any two consecutive nodes in the center satisﬁes E G [ X i X j ] ≥ 1 − 2 1 +  cosh(2 λ )  η 2  1+(tanh λ ) ` 1 − (tanh λ ) `  m , (54) since 1+(tanh λ ) 2 1 − (tanh λ ) 2 = cosh(2 λ ) . Using (13), this implies P G [ X i = X j ] ≥ 1 − 1 1 +  cosh(2 λ )  η 2  1+(tanh λ ) ` 1 − (tanh λ ) `  m . (55) Thus, by applying the union bound o ver ( i, j ) pairs of the form (1 , 2) , (2 , 3) , . . . , ( η 1 − 1 , η 1 ) , ( η 1 , 1) , the probability that all η 1 of the center nodes take the same value satisﬁes P G [ all center nodes same ] ≥ 1 − η 1 1 +  cosh(2 λ )  η 2  1+(tanh λ ) ` 1 − (tanh λ ) `  m . (56) Again using (13), this implies for any pair of center nodes ( i, j ) , including non-adjacent pairs, that E G [ X i X j ] ≥ 1 − 2 η 1 1 +  cosh(2 λ )  η 2  1+(tanh λ ) ` 1 − (tanh λ ) `  m . (57) 10 Observing that the corresponding edge sets E and E 0 satisfy | E 0 \ E | ≤  η 1 2  and | E \ E 0 | = 0 , (53) follows from (15). V . A P P L I C A T I O N S T O G R A P H F AM I L I E S Finally , we prov e our main results by applying the ensem- bles from the previous section to the graph families introduced in Section I-A. All of the necessary conditions on n stated in this section are those needed to obtain P e ( q max ) ≤ δ , where the graph class deﬁning P e ( · ) will be clear from the context. A. Pr oofs of Theor ems 1 – 2: Bounded Edges Ensemble For the class G k of graphs with at most k edges, we hav e the follo wing: • If k ≤ p/ 4 , then using Ensemble 1 with α = k , we obtain from (28) that n ≥ 2(1 − θ 1 ) log p λ tanh λ  1 − δ − o (1)  (58) provided that q max ≤ b θ 1 k c for some θ 1 ∈ (0 , 1) . • If k = b cp 1+ ν c for some c > 0 and ν ∈ [0 , 1) , then we use Ensemble 2 with m = b 2 cp ν c and α = b p/m c = 1 2 c p 1 − ν (1 + o (1)) , chosen so that mα ≤ p nodes are used in the construction. The number of possible edges is α  m 2  ≤ 1 2 αm 2 ≤ 1 2 pm ≤ cp 1+ ν , as desired. W e obtain from (29) that n ≥ log 2 − H 2 ( θ 2 ) λ e 2 λ cosh(2 λcp ν ) − 1 e 2 λ cosh(2 λcp ν )+1  1 − δ − o (1)  (59) provided that q max ≤ b θ 2 α  m 2  c for some θ 2 ∈  0 , 1 2  . Substituting the choices of m and α into the latter expres- sion, we ﬁnd that q max can be as lar ge as θ 2 k (1 + o (1)) . • W e use Ensemble 3 with α = 1 and m = b p k / 2 c , chosen so that the number of edges does not exceed 2 αm 2 ≤ k . W ith these choices, we obtain from (38), along with the identity b m c ≥ m − 1 , that n ≥ e λ ( √ k/ 2 − 2) / 2  log 2 − H 2 ( θ 3 )  6 λk  1 − δ − o (1)  , (60) provided that q max ≤ b θ 3 αm ( m − 1) c for some θ 3 ∈  0 , 1 2  . Substituting the choices of m and α into the latter expression, we ﬁnd that q max can be as lar ge as θ 3 k 2 (1 + o (1)) , provided that k → ∞ . Note that this construction uses 2 mα ≤ √ 2 k nodes, which is asymptotically less than p since k = o ( p 2 ) . W e obtain Theorem 1 from (58) and (60), and Theorem 2 from (59) and (60). Speciﬁcally , we set q max = b θ k c for some θ ∈  0 , 1 4  , and by equating this with the abov e upper bounds on q max we see that we may set θ 1 = θ , θ 2 = θ (1 + o (1)) and θ 3 = 2 θ (1 + o (1)) . B. Pr oofs of Theor ems 3 – 4: Bounded Degr ee Ensemble For the class G k,d of graphs such that every node has degree at most d , and the total number of edges does not exceed k , we hav e the following: • If k ≤ p/ 4 , then using Ensemble 1 with α = k , we obtain from (28) that n ≥ 2(1 − θ 1 ) log p λ tanh λ  1 − δ − o (1)  , (61) provided that q max ≤ b θ 1 k c for some θ 1 ∈ (0 , 1) . • In the case that k = Ω( p ) , we use Ensemble 2 with the following parameters: 1) m = d 0 ≤ d , chosen so that the maximal degree m − 1 does not exceed d ; 2) α = b k /  d 0 2  c , chosen so that the number of edges α  m 2  does not exceed k . W ith these choices, we obtain from (29) that n ≥ log 2 − H 2 ( θ 2 ) λ e 2 λ cosh(2 λd 0 ) − 1 e 2 λ cosh(2 λd 0 )+1  1 − δ − o (1)  , (62) whenev er q max ≤ b θ 2 α  d 0 2  c for some θ 2 ∈  0 , 1 2  . Substituting the choice of α , we ﬁnd that q max can be a large as θ 2 k (1+ o (1)) . Note also that the number of nodes used is upper bounded as αm ≤ k ( d 0 2 ) d 0 = 2 k d 0 − 1 , which is upper bounded by p provided that k ≤ 1 2 p ( d 0 − 1) . • W e use Ensemble 3 with the following parameters: 1) m = d d/ 2 e , chosen so that each block has nodes with degree not exceeding 2 m − 1 ≤ d ; 2) α =  k ( 2 m 2 )  , chosen to ensure that the number of edges does not exceed α  2 m 2  ≤ k . W ith these choices, we obtain from (38) that n ≥ e λ ( d − 2) / 4  log 2 − H 2 ( θ 3 )  3 λd 2  1 − δ − o (1)  , (63) when q max ≤ b θ 3 αm ( m − 1) c for some θ 3 ∈  0 , 1 2  . Substituting the choice of α to obtain αm ( m − 1) = k m − 1 2 m − 1 (1 + o (1)) , and then writing d/ 2 ≤ m ≤ ( d + 1) / 2 , we ﬁnd that the latter condition holds provided that q max ≤ d/ 2 − 1 d θ 3 k (1 + o (1)) . The number of nodes used is 2 mα ≤ 2 mk m (2 m − 1) = 2 k 2 m − 1 ≤ 2 k d − 1 , which is upper bounded by p provided that k ≤ 1 2 p ( d − 1) . W e obtain Theorem 3 from (61) and (63), and Theorem 4 from (62) and (63). Similarly to the previous subsection, we set θ 1 = θ , θ 2 = θ (1 + o (1)) , and θ 3 = d d − 2 · 2 θ (1 + o (1)) . C. Pr oofs of Theor em 5: Sparse Separator Ensemble For the class G k,d,η ,γ ( cf. Section II-C), we have the following: • If k ≤ p/ 4 , then again using Ensemble 1 with α = k , we obtain from (27) that n ≥ 2( k − q max ) log p k λ tanh λ  1 − δ − o (1)  . (64) • W e use Ensemble 4 with the following parameters: 1) η 1 = b cη c and η 2 = b (1 − c ) η c for some c ∈  1 η , 1  , thus ensuring that η 1 ≥ 1 ; 2) ` = γ + 1 , chosen to ensure that the ( η, γ ) -separation condition is satisﬁed; 3) m ≤ d/ 2 − η , chosen so that the maximal degree is upper bounded by η 1 + 2 η 2 + 2 m ≤ 2 η + 2 m ≤ d ; 11 4) α = b k cη ( cη/ 2+2(1 − c ) η + m ( γ +1) c , chosen to ensure the total number of edges αη 1 (( η 1 − 1) / 2+2 η 2 + m` ) does not exceed k . W ith these choices, we obtain from (52) that n ≥ 1 +  cosh(2 λ )  (1 − c ) η − 1  1+(tanh λ ) γ +1 1 − (tanh λ ) γ +1  m 2 λcη ×  log 2 − H 2 ( θ 4 )  1 − δ − o (1)  (65) provided that q max ≤ b θ 4 α ( cη − 1) 2 / 2 c for some θ 4 ∈  0 , 1 2  . Here we hav e used ζ − 1 ≤ b ζ c ≤ ζ and  cη 2  ≥ ( cη − 1) 2 / 2 . Note that the graph in this ensemble with the most edges has at least as many edges as nodes, since each node is connected to at least two edges. Thus, since we ha ve assumed k ≤ p/ 4 and we hav e already chosen the parameters to ensure there are at most k edges, we ha ve also ensured that less than p nodes are used. Substituting the above choice of α into the upper bound on q max , we ﬁnd that q max can be as large as j θ 4 ( cη − 1) 2 k 2 cη ( cη/ 2 + 2(1 − c ) η + m ( γ + 1) k ≥ j θ 4 ( cη − 1) 2 k 2 cη (2 η + m ( γ + 1)) k , (66) since cη / 2 + 2(1 − c ) η ≤ 2 η for c ∈ [0 , 1] . W e obtain Theorem 5 by combining (64), (65) and (66), and renaming θ 4 as θ . V I . N U M E R I C A L R E S U LT S In this section, we simulate the graph learning problem for some of the ensembles presented in Section IV, as well as the analogous ensembles used for exact reco very in [9], [10]. Before proceeding, we discuss the optimal decoding techniques for the two recovery criteria. Suppose that the graph G is uniformly drawn from some class G . In the case of exact recovery , the optimal decoder is the maximum-likelihood (ML) rule ˆ G = arg max G ∈G P G [ X ] , (67) where P G [ X ] is the probability of observing the samples X ∈ { 0 , 1 } n × p when the true graph is G . In contrast, the optimal rule for approximate recovery is given by ˆ G = arg max G ∈G X G 0 : | E ∆ E 0 |≤ q max P G 0 [ X ] , (68) where E and E 0 are the edge sets of G and G 0 respectiv ely . Both (67) and (68) are, in general, computationally intractable, requiring a search over the entire space G . Howe ver , in the examples below , we are able to apply (67) by using v arious tricks such as symmetry arguments. While we need to consider relativ ely small graph sizes for Ensembles 3 and 4, these will still be adequate for generating results that support the theory . Unfortunately , we found the implementation of (68) much more difﬁcult, and we therefore also use (67) for approximate recov ery ev en though, in general, it is only optimal for exact recov ery . Nev ertheless, ev en with approximate recov ery , we expect ML to provide a benchmark that that is unlikely to be beaten by any practical methods. In all of the experiments, the error probabilities are obtained by e valuating the empirical av erage over 5000 trials. A. A V ariant of Ensemble 1 and a Counterpart fr om [9] It was sho wn in [9] that if one considers all graphs with a single edge, then it is difﬁcult to distinguish each of these from the empty graph if λ is small, thus making exact recov ery difﬁcult. In Figure 3, we simulate the performance of this ensemble with p = 100 . Since the partition function Z (see (1)) is the same for all graphs in this ensemble, the ML rule (67) simply amounts to declaring the single edge to be the pair ( i, j ) among the  p 2  possibilities such that X i = X j in the highest number of samples. Our Ensemble 1 is analogous to the single-edge ensemble from [9]; ho we ver , in order to facilitate the computation, we consider a slight variant deﬁned as follows: Ensemble1a( α ) [Isolated edges ensemble]: • Group the p vertices into p/ 2 ﬁxed pairs in an arbitrary manner . • Each graph in T is obtained by connecting exactly α of those p/ 2 pairs. Note that Ensemble 1a can be interpreted as a genie-aided version of Ensemble 1, where the decoder is given information narrowing the Q α i =0  p − 2 i 2  possible graphs down to a smaller set of size  p/ 2 α  . For this reason, the performance under Ensemble 1a is an optimistic estimate of the performance under Ensemble 1, and mo ving to the latter should only narro w the gaps seen in our comparisons to [9]. Figure 3 plots the approximate recovery error probability for Ensemble 1a with p = 100 and α = 12 , setting q max = 3 so that up to a quarter of the edges may be in error . The maximum-likelihood rule (67) is simple to implement: Since all graphs ha ve the same partition function, the most likely graph corresponds to choosing the α edges among the p/ 2 potential edges, such that the corresponding pairs of nodes agree in as many observations as possible. This can be implemented by simply counting the number agreements of the p/ 2 pairs and then sorting. In accordance with our theory , the general behavior of the error probability as a function of n is similar for Ensemble 1a (approximate recovery) and the ensemble from [9] (e xact recov ery). Moving to approximate recov ery does provide some beneﬁt, but it appears to be only in the constant factors. More speciﬁcally , across the range sho wn, the number of measurements required to achiev e a giv en error probability in [0 . 01 , 0 . 5] differs for the two ensembles and recovery criteria only by a multiplicativ e factor in the range [1 , 2 . 2] . In both cases, the learning problem becomes increasingly dif ﬁcult as λ becomes smaller, since the edges are weaker and therefore more dif ﬁcult to detect. B. Ensemble 3 and a Counterpart fr om [9] A counterpart to Ensemble 3 from [9] considers the  m 0 2  possible graphs on m 0 nodes obtained by remo ving a single 12 Number of Measurements 10 2 10 3 10 4 Error Probability 10 -2 10 -1 10 0 λ = 0 . 5 λ = 1 λ = 0 . 1 Figure 3: Empirical performance for Ensemble 1a (approxi- mate recovery; red bold) and its counterpart from [9] (exact recov ery; blue non-bold). edge from the m 0 -clique. Thus, ev ery graph is difﬁcult to distinguish from the m 0 -clique, particularly as m 0 and λ increase, and exact recovery is difﬁcult. In Figure 4, we plot the performance of this ensemble with m 0 = 8 . In this case, ML decoding amounts to choosing the pair ( i, j ) such that X i 6 = X j in the highest number of samples. For comparison, we consider Ensemble 3 with m = 4 and α = 1 , chosen so that the maximal number of edges and degree match those of the ensemble from [9] with m 0 = 8 . W e set q max = 3 , so that up to a quarter of the 12 unknown edges may be in error . W e perform ML decoding using a brute force search over the 2 12 possible graphs. Compared to the previous e xample, the gap between the curves for approximate recovery and exact recovery are more signiﬁcant. This is because although both our results and those of [9] prove that the sample complexity is exponential in λm , the exponent in [9] is double that of ours. Intuitiv ely , this is because we work with cliques of half the size. Despite this, the general behavior of our curves and those of [9] is similar , with the sample complexity rapidly growing lar ge as λ increases due to higher correlations among the 8 nodes. C. Ensemble 4 and a Counterpart fr om [10] A counterpart to Ensemble 4 from [10] ﬁrst constructs α disjoint building blocks, each of which connects two nodes ( i, j ) , and then forms η node-disjoint paths of length 2 between them. Each graph in the ensemble is then obtained by removing the direct edge from one of the α building blocks, while leaving the length- 2 paths unchanged. W e consider this construction with α = 4 and η = 8 , thus leading to the use of p = 40 nodes and k = 68 edges, and a maximal degree d = 9 . Figure 5 plots the performance of the ML decoder , which amounts to counting the number agreements between the α pairs of “central” nodes (one per b uilding block), and declaring the edge to be absent in the one with the most disagreements. For comparison, we consider Ensemble 4 with η 1 = 4 , η 2 = 3 , m = 0 and α = 2 ; this construction uses p = 32 nodes and k = 60 edges, and has a maximal degree d = 9 , Number of Measurements 10 2 10 3 Error Probability 10 -2 10 -1 10 0 λ = 0 . 5 λ = 0 . 1 λ = 0 . 75 Figure 4: Empirical performance for Ensemble 3 (approximate recov ery; red bold) and its counterpart from [9] (exact recov- ery; blue non-bold). Number of Measurements 10 2 10 3 Error Probability 10 -1 10 0 λ = 1 λ = 0 . 1 λ = 0 . 75 Figure 5: Empirical performance for Ensemble 4 (approximate recov ery; red bold) and its counterpart from [10] (exact recov ery; blue non-bold). thus being comparable to the abov e construction from [10]. W e set q max = 3 , so that up to a quarter of the 12 unknown edges may be in error . W e perform ML decoding using a brute force search ov er the 2 12 possible graphs, which simpliﬁes to performing ML separately on the 2 6 possible graphs corresponding to each of the two building blocks. Once again, we observ e the same general behavior between our ensemble and that of [10]. While it may appear unusual that the exact recov ery curves have a smaller error probability at low values of n , this occurs because ev en a random guess achiev es an probability of exact recovery of 1 4 for the ensemble in [10] with α = 4 . Despite this, we see that approximate recov ery is easier for large n as expected, and that in both cases the recov ery problem rapidly becomes more difﬁcult as λ increases due to higher correlations among the nodes. V I I . C O N C L U S I O N W e hav e provided information-theoretic lower bounds on Ising model selection with approximate recov ery for a variety 13 of graph classes. For a wide range of scaling regimes of the relev ant parameters, we hav e obtained necessary conditions with the same scaling laws as the best known conditions for exact recovery , thus indicating that approximate reco very is not much easier in the minimax sense. T o this end, we presented a generalized form of F ano’ s inequality for handling approximate recov ery , and applied it to a variety of graph ensembles. These were broadly categorized into those where it is difﬁcult to distinguish each graph from the empty graph, and those where it is difﬁcult to determine which edges between highly-correlated groups of nodes are present. In both cases, we required a departure from the ensembles considered for exact reco very [9], [10] in which the graphs differ in only one or two edges. It would be interesting to determine to what extent approx- imate recov ery can help when we move beyond the minimax performance criterion and the edit distance. For example, signiﬁcant gains may be possible in the setting of random Ising model edge weights { λ ij } , since it may become safe to “ignore” the weakest edges. As another example, since our analysis is based on constructing ensembles of graphs having a KL diver gence that is close to a single graph, one may expect that under a recov ery criterion based on D ( P G k P ˆ G ) being small, there is more to be gained. Other directions for further work include models beyond the Ising model (e.g., non-binary , Gaussian), and studies of achieving approximate recovery with practical algorithms. A P P E N D I X A P RO O F O F L E M M A 1 The proof follows standard steps in the deriv ation of Fano’ s inequality as in [10], but with suitable modiﬁcations to handle the approximate reco very criterion; see [31] for analogous modiﬁcations in the context of support recov ery , and [25] for a related list decoding result. Due to the similarities to other variants, we focus primarily on the details that are speciﬁc to the approximate recovery criterion. Let G be uniformly distrib uted on T , let ˆ G be the estimate of G , and let E and ˆ E be the corresponding edge sets. More- ov er , let P e ( q max ) be the error probability P [ | E ∆ E 0 | > q max ] av eraged over the random graph G . By assumption, we may consider decoders such that ˆ G ∈ T 0 without loss of optimality . Deﬁning the error indicator E := 1 {| E ∆ E 0 | > q max } and applying the chain rule for entropy in two different ways, we hav e H ( E , G | ˆ G ) = H ( G | ˆ G ) + H ( E | G, ˆ G ) (69) = H ( E | ˆ G ) + H ( G |E , ˆ G ) . (70) W e hav e H ( E | G, ˆ G ) = 0 since E is a function of ( G, ˆ G ) , and H ( E | ˆ G ) ≤ log 2 since E is binary . Moreover , we have H ( G |E , ˆ G ) = (1 − P e ( q max )) H ( G |E = 0 , ˆ G ) + P e ( q max ) H ( G |E = 1 , ˆ G ) (71) ≤ (1 − P e ( q max )) log A ( q max ) + P e ( q max ) log |T | , (72) where (72) follows from the deﬁnition of A ( d max ) in the lemma statement and the fact that E = 0 implies that G is within a distance q max of ˆ G , and we ha ve used the fact that the entropy is upper bounded by the logarithm of the number of elements of the support. W e ha ve now handled three of the terms in (69)–(70), and for the ﬁnal one we write H ( G | ˆ G ) = − I ( G ; ˆ G ) + H ( G ) = − I ( G ; ˆ G ) + log |T | , since G is uniform on T . Substituting the preceding observations into (69)–(70) and performing some simple rearrangements gives P e ( q max ) ≥ 1 − I ( G ; ˆ G ) + log 2 log |T | − log A ( q max ) . (73) Finally , we bound the mutual information using the steps of [10], which are stated here without the details in order to av oid repetition: W e use the data processing inequality to write I ( G ; ˆ G ) ≤ I ( G ; X ) , where X contains the n independent samples from P G . Using a covering argument, as well as the assumption containing G 0 in the lemma statement, it follo ws that I ( G ; X ) ≤ n . Substituting into (73), solving for n , and writing P e ( q max ) ≥ P e ( q max ) , we obtain the desired result. R E F E R E N C E S [1] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, ” IEEE T rans. P att. Analysis and Mach. Intel. , no. 6, pp. 721–741, 1984. [2] R. J. Glauber, “Time-dependent statistics of the Ising model, ” J. Math. Phys. , vol. 4, no. 2, pp. 294–307, 1963. [3] R. Durbin, S. R. Eddy , A. Krogh, and G. Mitchison, Biological sequence analysis: Pr obabilistic models of proteins and nucleic acids . Cambridge Univ . Press, 1998. [4] C. D. Manning and H. Schütze, F oundations of statistical natural language pr ocessing . MIT press, 1999. [5] S. W asserman and K. Faust, Social network analysis: Methods and applications . Cambridge Univ . Press, 1994, vol. 8. [6] D. M. Chickering, “Learning Bayesian networks is NP-complete, ” in Learning from data . Springer , 1996, pp. 121–130. [7] G. Reev es and M. Gastpar , “The sampling rate-distortion tradeoff for sparsity pattern recovery in compressed sensing, ” IEEE T rans. Inf. Theory , vol. 58, no. 5, pp. 3065–3092, May 2012. [8] J. Scarlett and V . Cevher , “Phase transitions in group testing, ” in Pr oc. ACM-SIAM Symp. Disc. Alg. (SODA) , 2016. [9] N. Santhanam and M. W ainwright, “Information-theoretic limits of selecting binary graphical models in high dimensions, ” IEEE T rans. Inf. Theory , vol. 58, no. 7, pp. 4117–4134, July 2012. [10] K. Shanmugam, R. T andon, A. Dimakis, and P . Ravikumar , “On the information theoretic limits of learning Ising models, ” in Adv . Neur . Inf. Pr oc. Sys. (NIPS) , 2014. [11] E. Ising, “Beitrag zur theorie des ferromagnetismus, ” Zeitschrift für Physik A Hadr ons and Nuclei , vol. 31, no. 1, pp. 253–258, 1925. [12] A. Anandkumar, V . Y . F . T an, F . Huang, and A. S. W illsky , “High- dimensional structure estimation in Ising models: Local separation criterion, ” Ann. Stats. , vol. 40, no. 3, pp. 1346–1375, 2012. [13] ——, “High-dimensional Gaussian graphical model selection: W alk summability and local separation criterion, ” J. Mach. Learn. Res. , vol. 13, pp. 2293–2337, 2012. [14] G. Bresler, E. Mossel, and A. Sly , “Reconstruction of Markov random ﬁelds from samples: Some observ ations and algorithms, ” in Appr ., Rand. and Comb. Opt. Algorithms and T echniques . Springer Berlin Heidelberg, 2008, pp. 343–356. [15] R. W u, R. Srikant, and J. Ni, “Learning loosely connected Markov random ﬁelds, ” Stoch. Sys. , vol. 3, no. 2, pp. 362–404, 2013. [16] A. Jalali, C. C. Johnson, and P . K. Ravikumar , “On learning discrete graphical models using greedy methods, ” in Adv . Neur . Inf. Pr oc. Sys. (NIPS) , 2011. [17] A. Ray , S. Sanghavi, and S. Shakkottai, “Greedy learning of graphical models with small girth, ” in Allteron Conf. Comm., Control, and Comp. , 2012. 14 [18] G. Bresler, D. Gamarnik, and D. Shah, “Structure learning of antiferro- magnetic Ising models, ” in Adv . Neur . Inf. Pr oc. Sys. (NIPS) , 2014. [19] G. Bresler , “Efﬁciently learning Ising models on arbitrary graphs, ” in ACM Symp. Theory Comp. (STOC) , 2015. [20] P . Ravikumar , M. J. W ainwright, J. D. Lafferty , and B. Y u, “High- dimensional Ising model selection using ` 1 -regularized logistic regres- sion, ” Ann. Stats. , vol. 38, no. 3, pp. 1287–1319, 2010. [21] E. Y ang, A. C. Lozano, and P . K. Ravikumar , “Elementary estimators for graphical models, ” in Adv . Neur . Inf. Pr oc. Sys. (NIPS) , 2014, pp. 2159–2167. [22] A. Montanari and J. A. Pereira, “Which graphical models are difﬁcult to learn?” in Adv . Neur . Inf. Proc. Sys. (NIPS) , 2009. [23] R. T andon and P . Ravikumar , “On the difﬁculty of learning po wer law graphical models, ” in IEEE Int. Symp. Inf. Theory , 2013. [24] A. K. Das, P . Netrapalli, S. Sanghavi, and S. V ishwanath, “Learning Markov graphs up to edit distance, ” in IEEE Int. Symp. Inf. Theory , 2012, pp. 2731–2735. [25] D. V ats and J. M. Moura, “Necessary conditions for consistent set-based graphical model selection, ” in IEEE Int. Symp. Inf. Theory , 2011, pp. 303–307. [26] N. Meinshausen and P . Bühlmann, “High-dimensional graphs and vari- able selection with the Lasso, ” Ann. Stats. , vol. 34, no. 3, pp. 1436–1462, June 2006. [27] W . W ang, M. W ainwright, and K. Ramchandran, “Information-theoretic bounds on model selection for Gaussian Markov random ﬁelds, ” in IEEE Int. Symp. Inf. Theory , 2010. [28] V . Jog and P .-L. Loh, “On model misspeciﬁcation and KL separation for Gaussian graphical models, ” in IEEE Int. Symp. Inf. Theory , 2015. [29] P . Ra vikumar , M. J. W ainwright, G. Raskutti, and B. Y u, “High- dimensional cov ariance estimation by minimizing ` 1 -penalized log- determinant diver gence, ” Elec. J. Stats. , vol. 5, pp. 935–980, 2011. [30] T . M. Cover and J. A. Thomas, Elements of Information Theory . John W iley & Sons, Inc., 2006. [31] G. Reeves and M. Gastpar, “ Approximate sparsity pattern recov ery: Information-theoretic lower bounds, ” IEEE T rans. Inf. Theory , v ol. 59, no. 6, pp. 3451–3465, June 2013. Jonathan Scarlett (S’14 – M’15) received the B.Eng. degree in electrical engineering and the B.Sci. degree in computer science from the Univ ersity of Melbourne, Australia. In 2011, he was a research assistant at the De- partment of Electrical & Electronic Engineering, Uni versity of Melbourne. From October 2011 to August 2014, he was a Ph.D. student in the Signal Processing and Communications Group at the Univ ersity of Cambridge, United Kingdom. He is now a post-doctoral researcher with the Laboratory for Information and Inference Systems at the École Polytechnique Fédérale de Lausanne, Switzerland. His research interests are in the areas of information theory , signal processing, machine learning, and high-dimensional statistics. He received the Cambridge Australia Poynton International Scholarship, and the EPFL Fellows postdoctoral fellowship co-funded by Marie Curie. V olkan Cevher (SM’10) receiv ed the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turke y , in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of T echnology in Atlanta, GA in 2005. He was a Research Scientist with the Univ ersity of Maryland, College Park from 2006-2007 and also with Rice Univ ersity in Houston, TX, from 2008-2009. Currently , he is an Associate Professor at the Swiss Federal Institute of T echnology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice Univ ersity . His research interests include signal processing theory , machine learning, conv ex optimization, and information theory . Dr . Ce vher was the recipient of a Best Paper A ward at SP ARS in 2009, a Best Paper A ward at CAMSAP in 2015, and an ERC StG in 2011.

On the Difficulty of Selecting Ising Models with Approximate Recovery

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment