Geometric Representations of Random Hypergraphs
A parametrization of hypergraphs based on the geometry of points in $\mathbf{R}^d$ is developed. Informative prior distributions on hypergraphs are induced through this parametrization by priors on point configurations via spatial processes. This pri…
Authors: Simon Lunagomez, Sayan Mukherjee, Robert L. Wolpert
Geometric representations of random hyper graphs Abstract W e introduce a novel parametrization of distributions on hyper graphs based on the geom- etry of points in R d . The idea is to induce distributions on hypergraphs by placing priors on point configurations via spatial processes. This prior specification is then used to infer condi- tional independence models or Mark ov structure for multiv ariate distrib utions. This approach supports inference of f actorizations that cannot be retriev ed by a graph alone, leads to new Metropolis-Hastings Markov chain Monte Carlo algorithms with both local and global moves in graph space, and generally offers greater control on the distribution of graph features than currently possible. W e pro vide a comparativ e performance ev aluation against state-of-the-art, and we illustrate the utility of this approach on simulated and real data. K eywords: Abstract simplicial complex, Computational topology , Copulas, Factor models, Graphical models, Random geometric graphs. Contents 1 Introduction 1 1.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Background and pr eliminaries 3 2.1 Graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Geometric graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Random geometric graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Geometric repr esentations of random hyper graphs 11 3.1 Prior specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Sampling from prior and posterior distributions . . . . . . . . . . . . . . . . . . . 13 3.2.1 Prior sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.2 Posterior sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Con ver gence of the Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Results 16 4.1 Illustration of modeling adv antages . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1.1 The nerve determines the junction tree f actorization . . . . . . . . . . . . . 16 4.1.2 Subgraph counts in RGGs are a function of Q . . . . . . . . . . . . . . . . 17 4.2 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.1 G is in the Space Generated by A . . . . . . . . . . . . . . . . . . . . . . 20 4.2.2 Gaussian graphical model . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.3 Factorization Based on Nerv es . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.4 G Outside the Space Generated by A . . . . . . . . . . . . . . . . . . . . 28 4.3 Comparati ve performance analysis with state-of-the-art . . . . . . . . . . . . . . . 30 4.3.1 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.1 Fisher’ s Iris data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.2 Daily exchange rates data . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5 Discussion 36 A Filtrations and Decomposability in Random Geometric Graphs 42 A.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 A.2 Algorithm deletes fe w edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 1 Intr oduction Consider the problem of making inference on the dependence structure among random v ariables X 1 , ..., X p ∈ R p , from m replicated observ ations. The dominant formalism for this problem, is that of graphical models ( Lauritzen , 1996 ). In this formalism. the focus is on the first two moments of the observ ation vector , X = { X 1 , ..., X p } , and the dependence structure is specified in terms of pairwise relations, which define an undirected graph. If such a graph is decomposable, inference is typically carried out ef ficiently . Here we detail a new approach for the construction of distrib utions on undirected graphs, moti vated by the problem of Bayesian inference of the dependence structure among random v ariables. 1.1 Related work It is common to model the joint probability distribution of a family of p random variables { X 1 , . . . , X p } in two stages. First specify the conditional dependence structur e of the distrib ution, then specify details of the conditional distributions of the variables within that structure (see p . 1274 of Dawid and Lauritzen 1993 , or p . 180 of Besag 1975 , for example). The structure may be summa- rized in a v ariety of ways in the form of a graph G = ( V , E ) whose vertices V = { 1 , ..., p } index the v ariables { X i } and whose edges E ⊆ V × V in some w ay encode conditional dependence. W e fol- lo w the Hammersley-Clif ford approach ( Besag , 1974 ; Hammersley and Clif ford , 1971 ), in which ( i, j ) ∈ E if and only if the conditional distribution of X i gi ven all other v ariables { X k : k 6 = i } depends on X j , i.e. , differs from the conditional distribution of X i gi ven { X k : k 6 = i, j } . In this case the distribution is said to be Marko v with respect to the graph. One can show that this graph is symmetric or undir ected , i.e. , all the elements of E are unordered pairs. The simultaneous inference of a decomposable graph and marginal distrib utions in a fully Bayesian frame work was approached in ( Green , 1995 ) using local proposals to sample graph space. A promising extension of this approach called Shotgun Stochastic Search (SSS) takes advantage of parallel computing to select from a batch of local mov es ( Jones et al. , 2005 ). A stochastic search method that incorporates both local mov es and more aggressiv e global mov es in graph space has been dev eloped by Scott and Carv alho ( 2008 ). These stochastic search methods are intended to identify regions with high posterior probability , b ut their con ver gence properties are still not well understood. Bayesian models for non-decomposable graphs have been proposed by Rov erato ( 2002 ) and by W ong, Carter , and K ohn ( 2003 ). These two approaches focus on Monte Carlo sampling of the posterior distribution from specified hyper Marko v prior laws. Their em- phasis is on the computational problem of Monte Carlo simulation, not on that of constructing interesting informativ e priors on graphs. W e think there is need for methodology that of fers both 1 ef ficient exploration of the model space and a simple and fle xible f amily of distrib utions on graphs that can reflect meaningful prior information. Erd ¨ os-R ´ enyi random graphs (those in which each of the p 2 possible undirected edges ( i, j ) is included in E independently with some specified probability α ∈ [0 , 1] ), and variations where the edge inclusion probabilities α ij are allowed to be edge-specific, have been used to place in- formati ve priors on decomposable graphs ( Heckerman et al. , 1995 ; Mansinghka et al. , 2006 ). The number of parameters in this prior specification can be enormous if the inclusion probabilities are allo wed to vary , and some interesting features of graphs (such as decomposability) cannot be expressed solely through edge probabilities. Mukherjee and Speed ( 2008 ) dev eloped methods for placing informati ve distributions on directed graphs by using concordance functions (functions that increase as the graph agrees more with a specified feature) as potentials in a Marko v model. This approach is tractable, b ut it is still not clear how to encode certain common assumptions within such a frame work. For the special case of jointly Gaussian variables { X j } , or those with arbitrary marginal dis- tributions F j ( · ) whose dependence is adequately represented in Gaussian copula form X j = F − 1 j Φ( Z j ) for jointly Gaussian { Z j } with zero mean and unit-diagonal co variance matrix C , the problem of studying conditional independence reduces to a search for zeros in the precision matrix C − 1 . This approach (see Hoff , 2007 , for example) is faster and easier to implement than ours in cases where both are applicable, b ut is far more limited in the range of dependencies it allo ws. For example, a three-dimensional model in which each pair of variables is conditionally independent gi ven the third cannot be distinguished from a model with complete joint dependence of the three v ariables (we return to this example in Section 4.2.3 ). 1.2 Contrib utions In this article we establish a nov el approach to parametrize spaces of graphs. For an y inte- gers p, d ∈ N , we sho w in Section 2.2 ho w to use the geometrical configuration of a set { v i } of p points in Euclidean space R d to determine a graph G = ( V , E ) on V = { v 1 , ..., v p } . An y prior distribution on point sets { v i } induces a prior distribution on graphs, and sampling from the posterior distribution of graphs is reduced to sampling from spatial configurations of point sets— a standard problem in spatial modeling. Relations between graphs and finite sets of points hav e arisen earlier in the fields of computational topology ( Edelsbrunner and Harer , 2008 ) and random geometric graphs ( Penrose , 2003 ). From the former we borro w the idea of nerves , i.e . , simplicial complex es computed from intersection patterns of con ve x subsets of R d ; the 1 -skeletons (collection of 1 -dimensional simplices) of nerves are geometric graphs. 2 As a side benefit our approach also yields estimates of the conditional distributions giv en the graph. The model space of undirected graphs grows quickly with the dimension of { X 1 , . . . , X p } (there are 2 p ( p − 1) / 2 undirected graphs on p vertices) and is difficult to parametrize. W e propose a nov el parametrization and a simple, flexible family of prior distributions on G and on Marko v probability distributions with respect to G ( Dawid and Lauritzen , 1993 ); this parametrization is based on computing the intersection pattern of a system of con v ex sets in R d . The novelty and main contribution of this paper is structural inference for graphical models, specifically , the proposed representation of graph spaces allo ws for fle xible prior distrib utions and ne w Mark ov chain Monte Carlo (MCMC) algorithms. From the random geometric graph approach we gain understanding about the induced distribu- tion on graph features when making certain features of a geometric graph (or hyper graph) stochas- tic. 2 Backgr ound and preliminaries 2.1 Graphical models The graphical models frame work is concerned with the representation of conditional dependen- cies for a multiv ariate distribution in the form of a graph or hypergraph. W e first revie w relev ant graph theoretical concepts and then relate these concepts to factorizing distrib utions. A graph G is an ordered pair ( V , E ) of a set V of vertices and a set E ⊆ V × V of edges . If all edges are unordered (resp., ordered), the graph is said to be undir ected (resp., dir ected ). All graphs considered in this paper are undirected, unless stated otherwise. A hyper graph , denoted H , consists of a verte x set V and a collection K of unordered subsets of V (kno wn as hyper edges ); a graph is the special case where all the subsets are verte x pairs. A graph is complete if E = V × V contains all possible edges; otherwise it is incomplete . A complete subgraph that is maximal with respect to inclusion is a clique . Denote by C ( G ) and Q ( G ) , respecti vely , the collection of complete sets and cliques of G . A path between two vertices { v i , v j } ∈ V is a sequence of edges connecting v i to v j . A graph such that any pair of vertices can be joined by a unique path is a tr ee . A decomposition of an incomplete graph G = ( V , E ) is a partition of V into disjoint nonempty sets ( A, B , S ) such that S is complete in G and separates A and B , i.e. , any path from a vertex in A to a verte x in B must pass through S . Iterativ e decomposition of a graph G such that at each step the separator S i is minimal and the subsets A i and B i are nonempty generates the prime components of G , the collection of subgraphs that cannot be further decomposed. If all prime components of a graph G are complete, then G is said to be decomposable . Any graph G can be represented as a tree T 3 whose vertices are its prime components P ( G ) ; this is called its junction tr ee representation. A junction tree is a hyper graph. Let P be a probability distribution on R p and X = ( X 1 , . . . , X p ) a random vector with dis- tribution P . Graphical modeling is the representation of the Marko v or conditional dependence structure among the components { X i } in the form of a graph G = ( V , E ) . Denote by f ( x ) the joint density function of { X i } (or probability mass function for discrete distrib utions— more generally , density for an arbitrary reference measure). The distribution P (and hence its density f ( x ) ) may depend implicitly on a vector θ of parameters, taking values in some set Θ G , which in some cases will depend on the graph G ; write Θ = t Θ G for the disjoint union of the parameter spaces for all graphs on V . Each vertex v i ∈ V is associated with a variable X i , and the edges E determine ho w the distri- bution factors. The density f ( x ) for the distribution can be f actored in a v ariety of ways associated with the graph G ( Lauritzen , 1996 , p . 35). It may be f actored in terms of complete sets a ∈ C ( G ) : f ( x ) = Y a ∈ C ( G ) φ a ( x a | θ a ) , (2.1a) or similarly in terms of cliques a ∈ Q (assuming f is positive, according to the Hammersley- Clif ford theorem); if G is decomposable then f ( x ) may also be factored in junction-tree form as: f ( x ) = Q a ∈ P ( G ) ψ a ( x a | θ a ) Q b ∈ S ( G ) ψ b ( x b | θ b ) , (2.1b) where P ( G ) and S ( G ) denote the prime factors and separators of G , respectiv ely , and where ψ a ( x a | θ a ) denotes the marginal joint density for the components x a for prime factors a ∈ P ( G ) and ψ b ( x b | θ b ) that for separators b ∈ S ( G ) ( Dawid and Lauritzen , 1993 , Eqn. (6)). In the Gaussian case, a similar factorization to ( 2.1b ) holds e ven for non-decomposable graphs ( Roverato , 2002 , Prop. 2). The prior distributions required for Bayesian inference about models of the form ( 2.1 ) may be specified by gi ving a marginal distribution on the set of all graphs G ∈ G p on p vertices and conditional distributions on each Θ G , the space of parameters for that graph: p ( G , θ ) = p ( G ) p ( θ | G ) , G ∈ G p , θ ∈ Θ G (2.2) where θ ∈ Θ G determines the parameters { θ a : a ∈ C ( G ) } or { θ a : a ∈ P ( G ) } and { θ b : b ∈ S ( G ) } . Giudici and Green ( 1999 ) pursue this approach in the Gaussian case, while Dawid and Lauritzen 4 ( 1993 ) of fer a rigorous frame work for specifying more general prior distributions on Θ G . Such pri- ors, called hyper Markov laws , inherit the conditional independence structure from the sampling distribution, now at the parameter level. The hyper In verse W ishart, useful when the factors are multi variate normal, is by far the most studied hyper Markov law . Most pre viously studied mod- els of the form ( 2.2 ) specify very little structure on p ( G ) ( Giudici and Green , 1999 ; Heckerman et al. , 1995 ; Rov erato , 2002 )— typically p ( G ) is taken to be a uniform distribution on the space of decomposable (or unrestricted) graphs, or perhaps an Erd ¨ os-R ´ enyi prior to encourage sparsity ( Mansinghka et al. , 2006 ), with no additional structure or constraints and hence no opportunity to express prior kno wledge or belief. T wo inference problems arise for the model specified in ( 2.2 ): inference of the entire joint posterior distrib ution of the graph and factor parameters, ( θ , G ) , or inference of only the conditional independence structur e , which entails comparing dif ferent graphs via the marginal likelihood Pr {G | x } ∝ Z Θ G f ( x | θ , G ) p ( G ) p ( θ | G ) dθ . Inference about G may no w be vie wed as a Bayesian model selection procedure (see Robert , 2001 , p . 348). 2.2 Geometric graphs Most methodology for structural inference in graphical models either assumes little prior struc- ture on graph space, or else represents graphs using high dimensional discrete spaces with no ob vi- ous geometry or metric. In either case prior elicitation and posterior sampling can be challenging. In this section we propose parametrizations of graph space that will be used in Section 2.3 to specify flexible prior distributions and to construct new Metropolis/Hastings MCMC algorithms with local and global moves. The key idea for this parametrization is to construct graphs and hyper graphs from intersections of con vex sets in R d . W e illustrate the approach with an example. Fix a con vex region A ⊂ R d and let V ⊂ A be a finite set of p points. For each number r ≥ 0 , the proximity graph Prox ( V , r ) (see Figure 1 ) is formed by joining ev ery pair of (unordered) elements in V whose distance is 2 r or less, i.e. , whose closed balls of radius r intersect. As r ranges from 0 to half the diameter of A , the graph Prox ( V , r ) ranges from the totally disconnected graph to the complete graph. This example is a particular case of a more general construction illustrated in Figure 2 ; hyper graphs can be computed from properties of intersections of classes of con vex subsets in Euclidean space. The con vex sets we consider are subsets of R d that are simple to parametrize and compute. The key concept in our construction is the nerve : 5 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Figure 1: Proximity graph for 100 v ertices and radius r = 0 . 05 . Definition 2.1 (Nerve) . Let F = { A j , j ∈ I } be a finite collection of distinct nonempty con ve x sets. The nerve of F is given by Nrv ( F ) = ( σ ⊆ I : \ j ∈ σ A j 6 = ∅ ) . The nerve of a family of sets uniquely determines a hyper graph. W e use the follo wing three nerves in this paper to construct hyper graphs (for more details, see Edelsbrunner and Harer , 2008 ). Definition 2.2 ( ˇ Cech Complex) . Let V be a finite set of points in R d and r > 0 . Denote by B d the closed unit ball in R d . The ˇ Cech complex corr esponding to V and r is the nerve of the sets B v ,r = v + r B d , v ∈ V . This is denoted by Nrv ( V , r , ˇ Cech ) . Definition 2.3 (Delaunay T riangulation) . Let V be a finite set of points in R d . The Delaunay trian- 6 gulation corr esponding to V is the nerve of the sets C v = x ∈ R d : k x − v k ≤ k x − u k , u ∈ V for v ∈ V . This is denoted by Nrv ( V , Delaunay ) , and the sets C v ar e called V oronoi cells . Definition 2.4 (Alpha Complex) . Let V be a finite set of points in R d and r > 0 . The Alpha complex corresponding to V and r is the nerve of the sets B v ,r ∩ C v , v ∈ V . This is denoted by Nrv ( V , r , Alpha ) . −1 0 1 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1 0 1 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1 0 1 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1 0 1 −1.5 −1 −0.5 0 0.5 1 1.5 x y Figure 2: (a) A set of vertices in R 2 are used to construct a family of disks of radius r = 0 . 5 . (b) The nerv e of this con vex set. This is an e xample of a ˇ Cech comple x. (c) For the same v ertex set the V oronoi diagram is computed. (d) The nerve of the V oronoi cells is obtained. This is an example of the Delaunay triangulation. Note that the maximum clique size of the Delaunay is bounded by the dimension of the space of the verte x set plus one; such a restriction does not apply to the ˇ Cech complex. Here we illustrate the idea of nerve and specifically , the idea of alpha complex. Consider the verte x set displayed in T able 1 and r = 0 . 5 . The sets B v ,r ∩ C v and the corresponding nerve (alpha complex) are illustrated in Figure 3 . Since the set indexed by V 4 does not intersect with any other B v ,r ∩ C v , it will produce an isolated vertex in the nerve. The set indexed by V 1 only intersects 7 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y V 1 V 3 V 2 V 3 V 1 V 2 V 5 V 5 V 4 V 4 Figure 3: (a) Intersection of balls and V oronoi cells computed using r = 0 . 5 and the vertex set listed in T able 1 . (b) The corresponding Alpha complex. Coordinate V 1 V 2 V 3 V 4 V 5 x 0 . 2065 0 . 6383 0 . 9225 − 0 . 8863 0 . 3043 y 0 . 3149 − 0 . 1193 − 0 . 2544 0 . 0816 − 0 . 9310 T able 1: V ertex set used for generating a family of sets and the corresponding nerv e. with the set index ed by V 2 , therefore there will be an edge joining V 1 and V 2 in the nerve. V 2 , V 3 and V 5 intersect as pairs, therefore, the edges of the triangle with vertices V 2 , V 3 and V 5 will be in the nerve. Since the sets indexed by V 2 , V 3 and V 5 also intersect as a triad, the facet or face of the triangle is also included in the nerve. The nerve of a family of sets is a particular class of hyper graphs known as (abstract) simplicial complex es. Definition 2.5 (Abstract simplicial comple x) . Let V be a finite set. A simplicial complex with base set V is a family K of subsets of V such that τ ∈ K and σ ⊆ τ implies σ ∈ K . The elements of K ar e called simplices, and the number of connected components of K is denoted ] ( K ) . The nerv e of a collection of sets is al ways a hyper graph; in simple cases, only verte x pairs arise so the 1 -skeleton determines a unique graph. Definition 2.6 ( p -skeleton) . Let K be a simplicial complex, and denote by | τ | the car dinality of a 8 simplex τ ∈ K . The p -skeleton of K is the collection of all τ ∈ K such that | τ | ≤ p + 1 . The elements of the p -skeleton ar e called p -simplices and the 1 -sk eleton is just a graph (mor e precisely , it is V ∪ E for a uniquely determined graph G = ( V , E ) ). The 1 -skeleton of a nerve is the graph obtained by considering only nonempty pairwise inter- sections. The process of obtaining the nerve and the 1 -skeleton from a family of sets is illustrated in Figure 4 . Differ ent families of con ve x sets in R d induce dif ferent restrictions in graph space: for the Delaunay triangulation and the Alpha comple x, for example, clique sizes cannot e xceed d + 1 . Although no such blanket restriction applies to the ˇ Cech complex, for this complex some graphs are still unattainable— for example, no ˇ Cech comple x can include a star graph whose central node has degree higher than the “kissing number , ” i.e. , maximal number of disjoint unit hyperspheres touching a gi ven hypersphere, 6 for d = 2 , 12 for d = 3 , etc . The ˇ Cech and Alpha complex es are hypergraphs index ed by a finite set V = { V 1 , . . . , V p } ⊂ R d and a size parameter r ≥ 0 . Each induces a parametrization on the space of hyper graphs ( V , r ) 7→ H ( V , r ) . The class A of con ve x sets used to compute the nerve determines the space of hypergraphs. T o keep the notation simple, A will be implicit whenev er obvious by the context. W e will use A ( V , r ) to denote a generic element of A for either the ˇ Cech or the Alpha complex. Similarly , 1 -skeletons of nerves induce a parametrization of the spaces of graphs ( V , r ) 7→ G ( V , r ) . T wo principal adv antages of this approach are: 1. F or each family of con ve x sets {A} , the number of parameters needed to specify the graph G or hypergraph H gro ws only linearly with the number of vertices; 2. The hypergraph parameter space will be a subset of R d , a very con venient parameter space −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y Figure 4: (a) Giv en a set of vertices and a radius ( r = 0 . 5 ) one can compute A i = C i ∩ B i , where C i is the V oronoi cell for vertex i and B i is the ball of radius r centered at verte x i . (b) The Alpha complex is the nerv e of the A i ’ s. (c) Often the main interest will be the 1 -skeleton of the comple x, which is the subset of the nerve that corresponds to (nonempty) pairwise intersections. 9 for MCMC sampling. 2.3 Random geometric graphs In Section 2.2 we demonstrated ho w the geometry of a set V of p points in R d can be used to induce a graph G . In this section we explore the relation between prior distributions on random sets V of points in R d and features of the induced distribution on graphs G , with the goal of learning ho w to tailor a point process model to obtain graph distributions with desired features. Definition 2.7 (Random Geometric Graph) . F ix inte gers p, d ∈ N and let V = ( V 1 , . . . , V p ) be drawn fr om a pr obability distrib ution Q on ( R d ) p . F or any class A of con vex sets in R d and r adius r > 0 , the graph G ( V , r , A ) is said to be a Random Geometric Graph (RGG). While Definition 2.7 is more general than that of ( Penrose , 2003 , p . 2), it still cannot describe all the random graphs discussed in ( Penrose and Y ukich , 2001 ) (for example, those based on k - neighbors cannot in general be generated by nerves). For A we will use closed balls in R d or intersections of balls and V oronoi cells; most often Q will be a product measure under which the { V i } will be p independent identically distributed ( iid ) draws from some marginal distribution Q M on R d , such as the uniform distributi on on the unit cube [0 , 1] d or unit ball B d , but we will also explore the use of repulsiv e processes for V under which the points { V i } are more widely dispersed than under independence. It is clear that different choices for A , Q and r will hav e an impact on the support of the induced RGG distribution. T o make this notion precise we define feasible graphs. Definition 2.8 (Isomorphic) . Write G 1 ∼ = G 2 for two graphs G i = ( V i , E i ) and call the gr aphs isomorphic if ther e is a 1:1 mapping χ : V 1 → V 2 such that ( v i , v j ) ∈ E 1 ⇔ χ ( v i ) , χ ( v j ) ∈ E 2 for all v i , v j ∈ V 1 . Definition 2.9 (Feasible Graph) . F ix numbers d, p ∈ N , a class A of con vex sets in R d , and a distribution Q on the random vectors V in ( R d ) p . A graph Γ is said to be feasible if for some number r > 0 , Pr {G ( V , r, A ) ∼ = Γ } > 0 . In contrast to Erd ¨ os-R ´ enyi models, where the inclusion of graph edges are independent ev ents, the RGG models e xhibit edge dependence that depends on the metric structure of R d and the class A of con vex sets used to construct the nerv es. There is an extensi ve literature describing asymptotic distributions for a variety of graph fea- tures such as: subgraph counts, v ertex degree, order of the largest clique, and maximum verte x 10 degree (for an encyclopedic account of results for the important case of 1 -skeletons of ˇ Cech com- plex es, see Penrose , 2003 ). Se veral results for the Delaunay triangulation, some of which gener- alize to the Alpha complex, are reported in ( Penrose and Y ukich , 2001 ). Regarding the support on the distribution of hypergraphs induced by the complexes, generally , this is an area of open research (personal communication with H. Edelsbrunner). Recent work by Kalhe ( 2014 ) surveys some of this literature, focusing on random simplicial complexes. The monograph by Penrose ( 2003 ) discusses the relationship between Q and subgraph counts, the degree distribution, and the percolation threshold, in Chapters 3, 4 and 10, respecti vely Penrose ( 2003 , Chap. 3) gi ves conditions which guarantee the asymptotic normality of the joint distribution of the numbers Q j of j -simplices (edges, triads, etc .), for iid samples V = ( V 1 , . . . , V p ) from some marginal distribution Q M on R d , as the number p = | V | of vertices gro ws and the radius r p shrinks. Simulation studies suggest that the asymptotic results apply approximately for p ≥ 24 – 100 . By this we mean that sometimes 24 is sufficient (the distribution of the vertices is approximately multi variate normal), and sometimes 100 may be required (distribution of the v ertices f ar from being multi variate normal). 3 Geometric r epresentations of random h ypergraphs W e de velop a Bayesian approach to the problem of inferring factorizations of distributions of the forms of ( 2.1 ), f ( x ) = Y a ∈ C ( G ) φ a ( x a | θ a ) or Q a ∈ P ( G ) ψ a ( x a | θ a ) Q b ∈ S ( G ) ψ b ( x b | θ b ) . In each case we specify the prior density function as a product p ( θ , G ) = p ( θ | G ) p ( G ) (3.1) of a conditional hyper Mark ov law for θ ∈ Θ and a marginal RGG la w on G . W e use con v entional methods to select the specific hyper Markov distribution (hyper In verse W ishart for multiv ariate normal sampling distributions, for example) since our principal focus is on prior distributions for the graphs, p ( G ) . Ev ery time we refers to hyper Marko v laws, it will be in the strong sense according to Dawid and Lauritzen ( 1993 ). W e also present MCMC algorithms for sampling from the posterior distribution on G , for observed data. 11 3.1 Prior specifications All the graphs in our statistical models are b uilt from nerves constructed in Section 2.2 from a random vertex set V = { V i } p i =1 ⊂ R d and radius r > 0 . Since the nerve construction is in v ariant under rigid transformations (this is, transformations that preserv e angles as well as distances) of V or simultaneous scale changes in V and r , restricting the support of the prior distribution on V to the unit ball B d does not reduce the model space: Proposition 3.1. Every feasible graph in R d may be r epr esented in the form G ( V , r , A ) for a collection V of p points in the unit ball B d and for r = 1 p . Pr oof. Let G = ( V , E ) ∼ = G ( V , r , A ) be a feasible graph with |V | = p vertices. Every edge ( v i , v j ) ∈ E has length dist ( v i , v j ) ≤ 2 r so, by the triangle inequality , e very connected component Γ i of G with p i vertices must hav e diameter no greater than the longest possible path length, 2 r ( p i − 1) , and so fits in a ball B i of diameter 2 r ( p i − 1) . The union of these balls, centered on a line segment and separated by r (2 + 1 /p ) , will ha ve diameter less than r (2 p − 1) . By translation and linear rescaling we may take r = 1 /p and bound the diameter by 2 , completing the proof. W e fix r = 1 p and simplify the notation by writing G ( V , A ) instead of G ( V , r , A ) for A = ˇ Cech or A = Alpha or , if A is understood, simply G ( V ) . Thus we can induce prior distributions on the space of feasible graphs from distributions on configurations of p points in the unit ball in R d . For iid uniform draws V = ( V 1 , . . . , V p ) from B d , the expected number of edges of the graph G ( V , r, A ) is bounded above by E [# E ] ≤ n 2 (2 r ) d ; for r = 1 p in dimension d = 2 this is less than E [# E ] < 2 , leading to relativ ely sparse graphs. W e often take larger v alues of r (still small enough for empty graphs to be feasible), to generate richer classes of graphs. A limit to how large r may be is gi ven by the partial con verse to Prop. 3.1 , Proposition 3.2. The empty gr aph on p vertices cannot be e xpr essed as G ( V , r, ˇ Cech ) for any V ⊂ B d with r ≥ p 1 /d − 1 − 1 . Pr oof. Let V = { V 1 , . . . , V p } ⊂ B d be a set of points and r > 0 a radius such that G ( V , r, ˇ Cech ) is the empty graph. Then the balls V i + r B d are disjoint and their union with d -dimensional volume pω d r d lies wholly within the ball (1 + r ) B d of volume ω d (1 + r ) d (where ω d = π d/ 2 / Γ(1+ d/ 2) is the volume of the unit ball), so p < (1 + 1 r ) d . Slightly stronger , the empty graph may not be attained as G ( V , r , ˇ Cech ) for any r ≥ 1 / [( p/p d ) 1 /d − 1] where p d is the maximum spherical packing density in R d . For d = 2 , this giv es the asymptoti- cally sharp bound r < 1 .h q p √ 12 /π − 1 i . 12 3.2 Sampling fr om prior and posterior distributions Let Q be a probability distribution on p -tuples in R d , p ( G ) the induced prior distribution on graphs G ( V , ˇ Cech ) for V ∼ Q with r = 1 p , and let p ( θ | G ) be a con ventional hyper Markov law (see belo w). W e wish to draw samples from the prior distribution p ( θ, G ) of ( 3.1 ) and from the posterior distribution p ( θ , G | x ) , gi ven a v ector x = ( x 1 , ..., x m ) of iid observ ations x j iid ∼ f ( x | θ ) , using the Metropolis/Hastings approach to MCMC ( Hastings , 1970 ; Robert and Casella , 2004 , Ch. 7). W e begin with a random walk proposal distribution in B d starting at an arbitrary point v ∈ B d , that approximates the steps V (0) , V (1) , V (2) , ... of a diffusion V ( t ) on B d with uniform stationary distribution and reflecting boundary conditions at the unit sphere ∂ B d . The random walk is con veniently parametrized in spherical coordinates with radius ρ ( t ) = k V ( t ) k and Euler angles— in d =2 dimensions, angle ϕ ( t ) — at step t . Informally , we take indepen- dent radial random walk steps such that ( ρ ( t ) ) d is reflecting Brownian motion on the unit interval (this ensures that the stationary distribution will be Un ( B d ) ) and, conditional on the radius, angular steps from Bro wnian motion on the d -sphere of radius ρ ( t ) . Fix some η > 0 . In d = 2 dimensions the reflecting random w alk proposal ( ρ ∗ , ϕ ∗ ) we used for step ( t + 1) , beginning at ( ρ ( t ) , ϕ ( t ) ) , is: ρ ∗ = R [ ρ ( t ) ] 2 + ζ ( t ) ρ η 1 / 2 , ϕ ∗ = ϕ ( t ) + ζ ( t ) φ η /ρ ( t ) for iid standard normal random variables n ζ ( t ) ρ , ζ ( t ) φ o , where R ( x ) = x − 2 1 2 ( x + 1) is x reflected (as many times as necessary) to the unit interval. Similar expressions work in any dimension d , with ρ ∗ = R [ ρ ( t ) ] d + ζ ( t ) ρ η 1 /d and appropriate step sizes for the ( d − 1) Euler angles. For small η > 0 this diffusion-inspired random walk generates local moves under which the proposed ne w point ( ρ ∗ , ϕ ∗ ) is quite close to ( ρ ( t ) , ϕ ( t ) ) with high probability . T o help escape local modes, and to simplify the proof of ergodicity below , we add the option of more dramatic “global” moves by introducing at each time step a small probability of replacing ( ρ ( t ) , ϕ ( t ) ) with a random draw ( ρ ∗ , ϕ ∗ ) from the uniform distribution on B d (see Figure 5 ). Let q ( V ∗ | V ) denote the Lebesgue density at V ∗ ∈ ( B d ) p of one step of this hybrid random walk for V = ( V 1 , . . . , V p ) , starting at V ∈ ( B d ) p . 13 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y Figure 5: This figure illustrates a global mo ve. (a) The current configuration of the points. (b) The graph implied by this configuration. (c) The proposal configuration which is obtained by randomly moving one v ertex. (d) The graph implied by the proposed move. 3.2.1 Prior sampling T o draw sample graphs from the prior distribution begin with V (0) ∼ Q ( d V ) and, after each time step t ≥ 0 , propose a ne w mo ve to V ∗ ∼ q ( V ∗ | V ( t ) ) . The proposed move from V ( t ) (with induced graph G ( t ) = G ( V ( t ) ) ) to V ∗ (and G ∗ ) is accepted (whereupon V ( t +1) = V ∗ ) with probability 1 ∧ H ( t ) , the minimum of one and the Metropolis/Hastings ratio H ( t ) = p ( V ∗ ) q ( V ( t ) | V ∗ ) p ( V ( t ) ) q ( V ∗ | V ( t ) ) . Otherwise V ( t +1) = V ( t ) ; in either case set t ← t +1 and repeat. Note the proposal distribution q ( · | · ) leav es the uniform distribution in v ariant, so H ( t ) ≡ 1 for Q ( d V ) ∝ d V and in that case e very proposal is accepted. 14 3.2.2 Posterior sampling After observing a random sample X = x = ( x 1 , . . . , x m ) from the distribution x j ∼ f ( x | θ , G ) , let f ( x | θ , G ) = m Y i =1 f ( x i | θ , G ) denote the likelihood function and M ( G ) = Z Θ G f ( x | θ , G ) p ( θ | G ) dθ (3.2) the mar ginal likelihood for G . For posterior sampling of graphs, a proposed mo ve from V ( t ) to V ∗ is accepted with probability 1 ∧ H ( t ) for H ( t ) = M ( G ∗ ) p ( V ∗ ) q ( V ( t ) | V ∗ ) M ( G ( t ) ) p ( V ( t ) ) q ( V ∗ | V ( t ) ) . (3.3) For multi variate normal data X and hyper in verse Wishart hyper Mark ov law p ( θ | G ) , M ( G ) from ( 3.2 ) can be expressed in closed form for decomposable graphs G ( V ) . ef ficient algorithms for e valuating ( 3.2 ) are still a vailable e ven if this condition f ails. The model will typically be of v ariable dimension, since the parameter space Θ G for the factors may depend on the graph G = G ( V ) . Not all proposed moves of the point configuration V ( t ) V ∗ will lead to a change in G ( V ) ; for those that do we implement reversible-jump MCMC ( Green , 1995 ; Sisson , 2005 ) using the auxiliary variable approach of Brooks et al. ( 2003 ) to simplify the book-keeping needed for non-nested mo ves Θ G Θ G ∗ . 3.3 Con ver gence of the Marko v chain Denote by ˙ G ( p, d, A ) the finite set of feasible graphs with p vertices in R d , i.e. , those generated from 1 -skeletons of A -complex es. F or each G ∈ ˙ G ( p, d, A ) let V G ⊂ ( B d ) p denote the set of all points V = { V 1 , . . . , V p } ∈ ( B d ) p for which G ∼ = G ( V , 1 p , A ) , and set µ ( G ) = Q V G . Then Proposition 3.3. The sequence G ( t ) = G ( V ( t ) , 1 p , A ) induced by the prior MCMC pr ocedur e de- scribed in Section 3.2.1 samples each feasible graph G ∈ ˙ G ( p, d, A ) with asymptotic fr equency µ ( G ) . The posterior pr ocedur e described in Section 3.2.2 samples each feasible gr aph with asymp- totic frequency µ ( G | x ) , the posterior distrib ution of G given the data x and hyper Markov prior p ( θ | G ) . 15 Pr oof. Both statements follow from the Harris recurrence of the Markov chain V ( t ) constructed in Section 3.2 . For this it is enough to find a strictly positiv e lower bound for the probability of transitioning from an arbitrary point V ∈ ( B d ) p to any open neighborhood of another arbitrary point V ∗ ∈ ( B d ) p ( Robert and Casella , 2004 , Theorem 6.38, pg. 225). This follow s immediately from our inclusion of the global move in which all p points { V i } are replaced with uniform draws from ( B d ) p . It is interesting to note that while the sequence G ( t ) = G ( V ( t ) , 1 p , A ) is a hidden Marko v process, it is not itself Markovian on the finite state space ˙ G ( p, d, A ) ; ne vertheless it is er godic, by Prop. 3.3 . 4 Results Here we illustrate the use of the proposed parametrization using simulations and real data. These numerical examples provide us with an opportunity to test priors that encourage sparsity , and MCMC algorithms that allo w for local as well as global moves by design. 4.1 Illustration of modeling advantages 4.1.1 The nerve determines the junction tr ee factorization Here we use a junction tree f actorization with each uni v ariate marginal X i associated to a point V i ∈ R d (the standard graphical models approach). In this case, specifying the class of sets to com- pute the nerv e and the v alue for r determines a factorization for the joint density of { X 1 , . . . , X p } . W e illustrate with p = 5 points in Euclidean space of dimension d = 2 . Let ( X 1 , X 2 , X 3 , X 4 , X 5 ) ∈ R 2 be a random vector with density f ( x ) and consider the vertex set displayed in T able 1 (also sho wn as solid dots in Figures 3 and 6 ). For an Alpha complex with r = 0 . 5 the junction tree factorization ( 2.1b ) corresponding to the graph in Figure 3 is f ( x ) = ψ 12 ( x 1 , x 2 ) ψ 235 ( x 2 , x 3 , x 5 ) ψ 4 ( x 4 ) ψ 2 ( x 2 ) , we will denote the factorization as [1 , 2][2 , 3 , 5][4] . In the case where the factors are potential functions rather than marginals we will use {·} instead of [ · ] . Similarly , for the ˇ Cech complex and r = 0 . 7 the factorization corresponding to the graph in Figure 6 is f ( x ) = ψ 1235 ( x 1 , x 2 , x 3 , x 5 ) ψ 14 ( x 1 , x 4 ) ψ 1 ( x 1 ) . 16 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y V 3 V 2 V 1 V 5 V 5 V 4 V 4 V 1 V 3 V 2 Figure 6: (a) ˇ Cech complex computed using r = 0 . 7 and the verte x set listed in T able 1 . (b) The 1 -Skeleton of the ˇ Cech complex. 4.1.2 Subgraph counts in RGGs are a function of Q In this subsection we illustrate ho w the distribution of particular graph features changes as a function of the sampling distrib ution of the random point set V and contrast this with Erd ¨ os-R ´ enyi models. Specifically we will focus on the number of edges (2-cliques) Q 2 and the number of 3-cliques Q 3 . The two spatial processes we study for Q are iid uniform draws from the unit square [0 , 1] 2 in the plane, and dependent draws from the Mat ´ ern type III hard-core repulsiv e process ( Huber and W olpert , 2009 ), using ˇ Cech complex es with radius r = 1 / √ 150 ≈ 0 . 082 in both cases to ensure asymptotic normality ( Penrose , 2003 , Thm. 3.13). In our simulations we v ary both the number of v ariables (graph size) p and the Mat ´ ern III hard core radius ρ . Comparisons are made with an Erd ¨ os-R ´ enyi model with a common edge inclusion parameter . T able 2 displays the quartiles for Q 2 and Q 3 as a function of the graph size p , hard core radius ρ , and Erd ¨ os-R ´ enyi edge inclusion probability p . Figures 7 , 8 , and 9 show the joint distribution of ( Q 2 , Q 3 ) for { V i } iid ∼ Un ([0 , 1] 2 ) , for a Mat ´ ern III process with hard core radius ρ = 0 . 35 , and for draws from an Erd ¨ os-R ´ enyi model with inclusion probability α = 0 . 065 , respecti vely . 17 140 160 180 200 220 240 100 150 200 250 300 350 400 450 Edge Count 3−Clique Count Figure 7: Edge counts and 3 -Clique counts from 2 , 500 simulated samples of G ( V , 1 / √ 2 · 75 , ˇ Cech ) where | V | = 75 and V i iid ∼ Un ([0 , 1] 2 ) , 1 ≤ i ≤ 75 . The multiv ari- ate normal appears as a reasonable approximation for the joint distribution, as suggested by ( Penrose , 2003 , Thm. 3 . 13 ). ˇ Cech radius is r n = 1 / √ 2 n . 130 140 150 160 170 180 190 200 210 60 80 100 120 140 160 180 200 220 240 260 Edge Count 3−Clique Count Figure 8: Edge counts and 3 -Clique counts from 2 , 500 simulated samples of G ( V , 1 / √ 2 · 75 , ˇ Cech ) where | V | = 75 and V sampled from a Matt ´ ern III with hard-core radius r = 0 . 35 . 18 140 150 160 170 180 190 200 210 220 5 10 15 20 25 30 35 40 Edge Count 3−Clique Count Figure 9: Edge counts and 3 -Clique counts from 1 000 simulated samples of an Erd ¨ os-R ´ enyi graph with edge inclusion probability of p = 0 . 065 . Graph |V | Edges 3 -Cliques 25% 50% 75% 25% 50% 75% Uniform 75 161 171 182 134 160 190 Mat ´ ern (0 . 035) 75 154 161 170 110 124 144 ER (0 . 050) 75 130 138 146 6 8 11 ER (0 . 065) 75 172 181 189 14 18 22 Uniform 50 69 75 81 34 43 57 Mat ´ ern (0 . 035) 50 66 71 76 27 35 43 Mat ´ ern (0 . 050) 50 62 67 71 22 27 33 ER (0 . 050) 50 56 61 67 1 2 4 ER (0 . 065) 50 74 79 85 3 5 7 Uniform 20 9 12 14 1 2 4 Mat ´ ern (0 . 035) 20 9 11 13 1 1 3 Mat ´ ern (0 . 050) 20 8 10 12 0 1 2 ER (0 . 050) 20 8 9 11 0 0 0 ER (0 . 065) 20 10 12 15 0 0 1 T able 2: Summaries of the empirical distribution of edge and 3 -clique counts for ˇ Cech complex random geometric graphs with radius r = 0 . 082 , for vertex sets sampled from iid draws from the unit square from: a uniform distrib ution, a hard core process with radius ρ = 0 . 035 , and from Erd ¨ os-R ´ enyi (ER) with common edge inclusion probabilities of α = 0 . 050 and α = 0 . 065 . These simulations illustrate that by varying the distribution Q we can control the joint distri- bution of graph features. The repulsiv e and iid uniform distrib ution have very similar edge distri- butions, for example (see Figures 7 and 8 ), while (as anticipated) the repulsi ve process penalizes 19 large cliques. Joint control of these features is not possible with an Erd ¨ os-R ´ enyi model with a common edge inclusion probability and it is not obvious how to encode this type of information in the concordance function approach of Mukherjee and Speed ( 2008 ). In Section A we proposed a procedure for generating decomposable graphs, and noted that the graphs induced by this algorithm are similar to those constructed without the decomposability restriction. In Figure 22 we display a simulation study of the distribution of edge counts for a RGG and the restriction to decomposable graphs. These distributions are very similar . 4.2 Simulation studies W e de velop four examples. The first e xample illustrates that our method works when the graph encoding the Markov structure of underlying density is contained in the space of graphs spanned by the nerve used to fit the model. In the second example we apply our method to Gaussian Graphical Models. The third example shows that the nerve hypergraph (not just the 1 -skeleton) can be used to induce different groupings in the terms of a factorization, and therefore a way to encode dependence features that go beyond pairwise relationships. In the fourth example we compare results obtained by using dif ferent filtrations to induce priors o ver dif ferent spaces of graphs. 4.2.1 G is in the Space Generated by A Let ( X 1 , . . . , X 10 ) be a random v ector whose distribution has f actorization: f θ ( x ) = ψ θ ( x 1 , x 4 , x 10 ) ψ θ ( x 1 , x 8 , x 10 ) ψ θ ( x 4 , x 5 ) ψ θ ( x 8 , x 9 ) ψ θ ( x 2 , x 3 , x 9 ) ψ θ ( x 6 ) ψ θ ( x 7 ) ψ θ ( x 4 ) ψ θ ( x 8 ) ψ θ ( x 9 ) ψ θ ( x 1 , x 10 ) (4.1a) The Markov structure of ( 4.1a ) can be encoded by the geometric graph displayed in Figure 10 . W e transform v ariables if necessary to achiev e standard Un (0 , 1) marginal distributions for each X i , and model clique joint marginals with a Clayton copula ( Clayton , 1978 , or Nelsen , 1999 , § 4.6), the exchangeable multi v ariate model with joint distribution function Ψ θ ( x I ) = 1 − p I + X i ∈ I x − θ i ! − 1 /θ 20 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y V 8 V 5 V 9 V 2 V 4 V 6 V 7 V 1 V 3 V 10 Figure 10: Geometric graph representing the model giv en in ( 4.1a ). F or this example graphs are constructed to be decomposable and the clique marginals are specified as Clayton copulas. and density function ψ θ ( x I ) = θ p I Γ( p I + 1 /θ ) Γ(1 /θ ) 1 − p I + X i ∈ I x − θ i ! − p I − 1 /θ Y i ∈ I x i ! − 1 − θ (4.1b) on [0 , 1] p I for some θ ∈ Θ = (0 , ∞ ) , for each clique [ v i : i ∈ I ] of size p I . W e drew 250 samples from model ( 4.1 ) with θ = 4 . For inference about G we set A = Alpha and r = 0 . 30 , with independent uniform prior distributions for the vertices V i iid ∼ Un ( B 2 ) on the unit ball in the plane. W e used the random walk described in Section 3.2 to draw posterior samples with Algorithm 1 applied to enforce decomposability . T o estimate θ we take a unit Exponential prior distribution θ ∼ Ex (1) and employ a Metropolis/Hastings approach using a symmetric random 21 walk proposal distrib ution with reflecting boundary conditions at θ = 0 , θ ∗ = θ ( t ) + ε , with ε t ∼ Un ( − β , β ) for fixed β > 0 . W e drew 1 000 samples after a burn-in period of 25 000 draws. The three models with the highest posterior probabilities are displayed in T able 3 . The geo- metric graphs computed from six posterior samples (one ev ery 100 draws) are sho wn in Figure 11 ; note that the computed nerves appear to stabilize after a few hundred iterations while the actual position of the verte x set continues to vary . Graph T opology Posterior Probability [ 1 , 4 , 10 ][ 1 , 8 , 10 ][ 4 , 5 ][ 8 , 9 ][ 2 , 3 , 9 ][ 6 ][ 7 ] 0 . 963 [1 , 4 , 10][1 , 8 , 10][4 , 5][8 , 9][2 , 3 , 9][6][5 , 7] 0 . 021 [1 , 4 , 10][1 , 8][4 , 5][8 , 9][2 , 3 , 9][6][7] 0 . 010 T able 3: The three models with highest estimated posterior probability . The true model is shown in bold (see Figure 10 ). Here θ = 4 . 4.2.2 Gaussian graphical model W e use our procedure to perform model selection for the Gaussian graphical model X ∼ No (0 , Σ G ) , where G encodes the zeros in Σ − 1 . W e adopt a Hyper In verse W ishart (HIW) prior distribution for Σ | G . The mar ginal lik elihood (in the parametrization of Atay-Kayis and Massam , 2005 , Eqn (12)) is gi ven by M ( V ) = (2 π ) − pN/ 2 I G ( V ) ( δ + N , D + X T X ) I G ( V ) ( δ, D ) , (4.2) where I G ( δ, D ) = Z M + ( G ) | Σ | ( δ − 2) / 2 e − 1 2 < Σ ,D> d Σ denotes the HIW normalizing constant. This quantity is av ailable in closed form for weakly decom- posable graphs G ( V ) , but for our unrestricted graphs ( 4.2 ) must be approximated via simulation. For our low-dimensional examples the method of ( Atay-Kayis and Massam , 2005 ) suffices; for larger numbers of variables we recommend that of ( Carvalho et al. , 2007 ). W e set δ = 3 and D = 0 . 4 I 6 + 0 . 6 J 6 ( I 6 and J 6 denote the identity matrix and the matrix of all ones, respecti vely). 22 − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y − 1 0 1 − 1 0 1 x y Figure 11: Geometric graphs corresponding to snapshots of posterior samples (one every 100 iterations) from model of ( 4.1a ). For this e xample graphs are constructed to be decomposable and the clique marginals are specified as Clayton copulas. W e sampled 300 observations from a Multiv ariate Normal with mean zero and precision matrix 18 . 18 − 6 . 55 0 2 . 26 − 6 . 27 0 − 6 . 55 14 . 21 0 − 4 . 90 0 0 0 0 10 . 47 0 0 − 3 . 65 2 . 26 − 4 . 90 0 10 . 69 0 0 − 6 . 27 0 0 0 27 . 26 0 0 0 − 3 . 65 0 0 7 . 41 (4.3) 23 X 3 X 1 X 6 X 5 X 2 X 4 Figure 12: Graph encoding the Markov structure of the model gi ven by precision matrix ( 4.3 ). whose conditional independence structure is gi ven by the graph in Figure 12 . W e fit the model described in Section 3 using a uniform prior for each V i ∈ B 2 and r = 0 . 25 . W e employed hybrid random walk proposals in which we move all fiv e vertices { V i } independently according to the dif fusion-inspired random walk described in Section 3.2 with probability 0 . 85 ; replace one uniformly selected vertex V i with a uniform draw from Un ( B 2 ) with probability 0 . 05 ; and replace all fi ve v ertices with independent unoform draws from Un ( B 2 ) with probability 0 . 10 . W e sampled 1 000 observ ations from the posterior after a b urn in of 750 000 . Results are summarized in T able 4 Graph T opology Posterior Probability [ 1 , 2 , 4 ][ 1 , 5 ][ 3 , 6 ] 0 . 152 [1 , 5][2 , 3 , 4][2 , 3 , 6] 0 . 072 [1 , 2 , 3 , 4 , 6][1 , 5] 0 . 069 [1 , 4][2 , 4][2 , 3 , 6] 0 . 055 [1 , 2 , 4][2 , 3 , 4][1 , 5][3 , 6] 0 . 052 T able 4: The fiv e models with highest estimated posterior probability . The true model is shown in bold. 4.2.3 F actorization Based on Nerves While Gaussian joint distributions are determined entirely by the biv ariate marginals, and so only edges appear in their complete-set factorizations (see ( 2.1a )); more complex dependencies are 24 possible for other distributions. The familiar example of the joint distribution of three Bernoulli v ariables X 1 , X 2 , X 3 each with mean 1 / 2 , with X 1 and X 2 independent b ut X 3 = ( X 1 − X 2 ) 2 (so that { X i } are only pairwise independent) has only the complete set { 1 , 2 , 3 } in its factorization. Consider no w a model with the graphical structure illustrated in Figure 13 whose density func- tion, if it is continuous and strictly positi ve ((see Lauritzen , 1996 , Prop. 3 . 1 )), admits the complete- set factorization: f ( x | G , θ ) = c G φ ( x 1 , x 2 ) φ ( x 1 , x 6 ) φ ( x 2 , x 6 ) · φ ( x 3 , x 4 , x 5 ) . (4.4a) For illustration we will take each φ ( · ) to be a Clayton copula density (see ( 4.1b )). For simplicity we specify the same value θ = 4 for each association parameter , so f ( x | G , θ ) is giv en by ( 4.4a ) with φ ( x, y ) = 5 ( x − 4 + y − 4 − 1) − 9 / 4 ( x y ) − 5 (4.4b) φ ( x, y , z ) = 30( x − 4 + y − 4 + z − 4 − 2) − 13 / 4 ( x y z ) − 5 . (4.4c) In earlier examples we associated graphical structures ( i.e. , edge sets) with 1 -skeletons of nerves. W e now associate hyper graphical structures ( i.e. , abstract simplicial complexes that may include higher-order simplex es) with the entire nerv es, with maximal simplices associated with complete-set factors. F or example: the Alpha complex computed from the vertex set displayed in T able 5 with r = 0 . 40 has { 3 , 4 , 5 }{ 1 , 2 }{ 1 , 6 }{ 2 , 6 } as its maximal simplices (Figure 14 ). By V 2 V 1 V 6 V 3 V 5 V 4 Figure 13: Graph encoding the Markov structure of the model gi ven in ( 4.4 ). 25 −1 −0.5 0 0.5 −0.5 0 0.5 1 x y V 2 V 1 V 6 V 3 V 4 V 5 Figure 14: Alpha complex corresponding to the verte x set in T able 5 and r = √ 0 . 075 . Coordinate V 1 V 2 V 3 V 4 V 5 V 6 x − 0 . 0936 − 0 . 4817 0 . 0019 0 . 0930 0 . 2605 − 0 . 5028 y 0 . 6340 0 . 7876 0 . 0055 0 . 0351 − 0 . 0702 0 . 2839 T able 5: V ertex set used for generating a factorization based on nerv es. associating a Clayton copula to each of these hyperedges we recov er the model shown in ( 4.4 ). W e use the same prior and proposal distrib utions constructed in Section 3.2 from point distrib u- tions in R d ; what has changed is the way the nerve is being used: as a hyper graph whose maximal hyperedges represent factors. One complicating factor is the need to e valuate the normalizing fac- tor c G for each graph G we encounter during the simulation; unav ailable in closed form, we use Monte Carlo importance sampling to e valuate c G for each ne w graph, and store the result to be reused when G recurs. W e anticipate that uniform draws V i iid ∼ Un ( B 2 ) will giv e high probability to clusters of three or more points within a ball of radius r , fav oring higher-dimensional features (triangles and tetra- hedra) in the induced hypergraph encoding the Markov structure of { X i } . T o explore this phe- 26 nomenon, we compare results for uniform draws with those from a repulsiv e process under which clusters of three or more points are unlikely to lie within a ball of radius r , hence fav oring hyper- graphs with only edges. W e beg an by sampling 650 observ ations from model ( 4.4 ) with A = Alpha and r = 0 . 40 , with independent uniform prior distributions for the vertices V i iid ∼ Un ( B 2 ) on the unit ball in the plane. The Metropolis/Hastings proposals for the verte x set are giv en by a mixture scheme: • A random walk for each V i as described in Section 3.2 , with step size η = 0 . 020 . This proposal is picked with probability 0 . 94 • An integer 1 ≤ k ≤ 6 is chosen uniformly and, gi ven k , a subset of size k from { 1 , 2 , 3 , 4 , 5 , 6 } is sampled uniformly; the vertices corresponding to those indices are replaced with random independent draws from Un ( B 2 ) . This proposal is picked with probability 0 . 06 , 0 . 01 for each k . For θ we used the same standard exponential prior distrib ution and reflecting uniform random w alk proposals described in Example 4.2.1 . Using 5 000 posterior samples after a b urn-in period of 95 000 iterations, the models with high- est posterior probability are summarized in T able 6 . T o penalize higher -order simplexes we used a Strauss repulsi ve process ( Strauss , 1975 ) condi- tioned to hav e p points in B d as prior distribution for the v ertex set, with Lebesgue density g ( v ) ∝ γ # { ( i,j ): dist ( v i ,v j ) < 2 R } for some 0 < γ ≤ 1 , penalizing each pair of points closer than 2 R . Simulation results for this prior (with R = 0 . 7 r and γ = 0 . 75 ) are summarized in T able 7 . The posterior mode is far more distinct for this prior than for the uniform prior sho wn in T able 6 . Maximal Simplices Posterior Probability { 3 , 4 , 5 }{ 1 , 2 }{ 2 , 6 }{ 1 , 6 } 0 . 609 { 1 , 2 , 6 }{ 3 , 4 }{ 4 , 5 }{ 3 , 5 } 0 . 161 { 3 , 5 }{ 1 , 6 }{ 3 , 4 }{ 1 , 2 }{ 2 , 6 } 0 . 137 T able 6: Highest posterior factorizations with uniform prior for model of ( 4.4 ) and Figure 13 (true model is sho wn in bold). 27 Maximal Simplices Posterior Probability { 3 , 4 , 5 }{ 1 , 2 }{ 2 , 6 }{ 1 , 6 } 0 . 824 { 1 , 2 , 6 }{ 3 , 4 , 5 } 0 . 111 { 1 , 2 , 6 }{ 3 , 4 }{ 3 , 5 }{ 4 , 5 } 0 . 002 T able 7: Highest posterior factorizations with Strauss prior (true model is sho wn in bold). In a further e xperiment with γ = 0 . 35 , the posterior was concentrated on factorizations without any triads. 4.2.4 G Outside the Space Generated by A In the simulation studies of Sections 4.2.1 – 4.2.3 the class of sets A used to compute the nerve was known. In this example we in vestigate the beha vior of our methodology when the class of con ve x sets used to fit the model differs from that used to generate the true graph. W e consider three possibilities: A = Alpha in R 2 , A = Alpha in R 3 and A = ˇ Cech in R 2 . W e performed two experiments: one when the graph is feasible for each the three classes, and another example where the graph could be generated by only two of the classes. First consider a model with junction tree factorization: f θ ( x ) = ψ θ ( x 1 , x 3 ) ψ θ ( x 2 , x 3 , x 4 ) ψ θ ( x 5 ) ψ θ ( x 3 ) , (4.5) whose conditional independence structure given by the graph of Figure 15 . Again, the clique marginals are specified as a Clayton copula with θ = 4 . W e simulated 300 samples from this distribution. W e fitted the model with each of the three classes of con ve x sets using the Metropolis Hastings algorithm of Section 3.2 with random walk proposals on B d (where d = 2 or 3 , depending on A ). Algorithm 1 was used to enforce decomposability , using r = 0 . 40 and η = 0 . 020 . The same exponential prior and uniform reflecting random-walk proposals for θ were used as in Example 4.2.1 . Results of 1 000 samples after a burn-in period of 50 000 draws are summarized in T able 8 . Not surprisingly , the posterior mode coincided with the true model in all three cases. The second model we considered has junction tree factorization: f θ ( x ) = ψ θ ( x 1 , x 2 , x 4 ) ψ θ ( x 1 , x 3 , x 4 ) ψ θ ( x 1 , x 4 , x 5 ) ( ψ θ ( x 1 , x 4 )) 2 . (4.6) 28 − 1.5 − 1 − 0.5 0 0.5 1 1.5 − 1.5 − 1 − 0.5 0 0.5 1 1.5 x y V 1 V 5 V 2 V 3 V 4 Figure 15: Graph encoding the Markov structure of the model gi ven in ( 4.5 ). Nerve HPP Models Posterior Alpha in R 2 [ 1 , 3 ][ 2 , 3 , 4 ][ 5 ] 0 . 964 [1 , 3][2 , 3 , 4][1 , 5] 0 . 012 [1 , 3 , 4][2 , 3 , 4][5] 0 . 012 Alpha in R 3 [ 1 , 3 ][ 2 , 3 , 4 ][ 5 ] 0 . 982 [1 , 3][2 , 3][3 , 4][5] 0 . 011 [1][2 , 3 , 4][5] 0 . 003 ˇ Cech in R 2 [ 1 , 3 ][ 2 , 3 , 4 ][ 5 ] 0 . 595 [1 , 2 , 3 , 4][5] 0 . 179 [1 , 2 , 3 , 4 , 5] 0 . 168 T able 8: Models with highest posterior probability . The table is divided according to the class of con ve x sets used when fitting the model. The true model is shown in bold. The corresponding graph cannot be obtained from an Alpha complex in R 2 , but it is feasible for an Alpha complex in R 3 (Figure 16 ) or a ˇ Cech complex in R 2 . Using the same Clayton clique marginals before, we sampled 300 observations from this distribution and fitted the model using the three classes of con ve x sets. Results from 1 000 samples after a burn-in period of 75 000 are summarized in T able 9 . Observe that for Alpha complex es in R 2 , there is no clear posterior 29 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 V 2 V 3 V 5 V 4 V 1 Figure 16: Graph encoding the Markov structure of the model gi ven in ( 4.6 ). Nerve HPP Models Posterior Alpha in R 2 [1,2][1,3,4][1,4,5] 0.214 [1,2,4][1,3,4][1,3,5] 0.115 [1,2,4][1,3,4][3,4,5] 0.112 Alpha in R 3 [1,2,4][1,3,4][1,4,5] 0.976 [1,2,3,4][1,4,5] 0.016 [1,2,4][1,3][1,4,5] 0.009 ˇ Cech in R 2 [1,2,4][1,3,4][1,4,5] 0.758 [1,2,4][1,3,4][1,3,5] 0.177 [1,2,4][1,3,4][4,5] 0.148 T able 9: Models with highest posterior probability , for each class of con ve x sets. The true model (sho wn in bold) is unattainable for Alpha complexes in R 2 . mode (unlike the pre vious example, or Sections 4.2.1 and 4.2.3 ). The posterior mode for the ˇ Cech complex in R 2 and the Alpha complex in R 3 both match the true model. 4.3 Comparativ e performance analysis with state-of-the-art W e compared the performance of our method to Feature Inclusion Stochastic Search (FINCS), proposed by Scott and Carv alho ( 2008 ), and the adapti ve LASSO as described in the paper by F an et al. ( 2009 ). The criterium for the comparison was gi ven by esti mating the counts of specific sub- graphs of a set of graphs of size 50 . This is, we generated 100 graphs of size 50 , and we computed 30 the absolute counts for the following subgraphs: triangles, 4-cycles, 5-cycles, 3-stars, 4-stars, and 5-stars, for the true graph and for the estimated network by our method and its competitors, then we computed the absolute dif ference between the counts for the true and estimated graph and then di vided by the count for the true graph. W e denote by an ∗ the counts of induced subgraphs; this measures the performance of the methods under decomposability . W e generated the set of true networks from an Erd ¨ os-R ´ enyi random graph model with α = 0 . 05 , so we are in a re gime that can be called sparse. T o fit our method, we assumed an uniform distribution on the unit ball in R 3 for the v ertex set and for the radius of each verte x an Ex (0 . 1) as priors. For the positions of the vertices we used the same proposals as in Examples 4.2.1 - 4.2.4 . For each radius we implemented a mix- ture of random walks reflecting at 0 as proposal. W e used the nerve corresponding to the Cech complex. W e set up FINCS with a probability of 0 . 1 for resampling mov es and a probability of 0 . 1 for global moves. For the adapti ve LASSO we set up γ = 0 . 5 and Ω was initialized as the in verse of the sampled cov ariance matrix. Subgraphs were counted using the igraph com- mand graph.subisomorphic.lad. When computing the normalized error , we adopted the con vention 0 / 0 = 0 . For the induced subgraphs, we only compute the error of the Bayesian procedure and Lasso over the set of non-decomposable graphs. For the simulation displayed in T able 10 all 100 graphs were non-decomposable. Results are summarized in T able 10 . Our method incurred into less errors on a verage compared to our competitors for almost all subgraphs. The exception was the 5 − star . W e also observed that FINCS outperformed the LASSO for almost all regimes, with e xception of the 5 − star . W e performed another simulation, no w assuming an exponential random graph model for the true graph. The simulation of the true graphs was implemented using the R package statnet. W e Subgraph Bayes FINCS Lasso triangles 0 . 083 ± 0 . 07 0 . 125 ± 0 . 17 0 . 208 ± 0 . 12 4- cycles 0 . 062 ± 0 . 10 0 . 123 ± 0 . 16 0 . 166 ± 0 . 10 5- cycles 0 . 086 ± 0 . 07 0 . 112 ± 0 . 14 0 . 124 ± 0 . 08 3- stars 0 . 103 ± 0 . 08 0 . 139 ± 0 . 12 0 . 211 ± 0 . 08 4- stars 0 . 087 ± 0 . 08 0 . 096 ± 0 . 16 0 . 115 ± 0 . 11 5- stars 0 . 201 ± 0 . 10 0 . 183 ± 0 . 14 0 . 174 ± 0 . 06 4- cycles ∗ 0 . 146 ± 0 . 12 0 . 930 ± 0 . 07 0 . 229 ± 0 . 12 5- cycles ∗ 0 . 128 ± 0 . 13 1 ± 0 0 . 174 ± 0 . 08 T able 10: Estimated normalized errors for counts of specific subgraphs for our method, FINCS and adapti ve LASSO. 31 Figure 17: Here we compare the true model, which was sampled from the ERGM used in T able 11 , and the fitted model, using our method (again, as in T able 11 ). The edges that were added by our method (with respect to the true graph) are highlighted in red. The edges that were deleted by our method are highlighted in green. used the formula edges+triangles+kstar(3), with coef=c(-3.2,0.95,0.005); this specification encour- ages the presence of triangles and 3-stars. These choices produce graphs with twice the 3-stars and 3 times more triangles than an Erd ¨ os-R ´ enyi with α = 0 . 05 , while ha ving approximately the same density . The objectiv e of this experiment is to in vestigate the behavior of our method when the true graph has more structure than the typical realization of an Erd ¨ os-R ´ enyi model. Results are summarized in T able 11 ; the y are similar to what was observed in the pre vious experiment. W e observed an improv ement regarding the counts of triangles, this is not surprising since geometric random graphs tend to ha ve more triangles than realizations from an Erd ¨ os-R ´ enyi model. In Figure 17 we compare the true model (as generated from the ERGM just described) and the fitted model for a single realization. In this regime, graphs tend to be non-decomposable. W e estimated the proportion of decomposable graphs from a sample of 1,000 networks sampled from the ERGM used to obtain the simulation in T able 11 , and obtained 0.302 as the result. For the simulation displayed in T able 11 , 72 out of the 100 graphs were non-decomposable. 32 Subgraph Bayes FINCS Lasso triangles 0 . 071 ± 0 . 04 0 . 134 ± 0 . 14 0 . 217 ± 0 . 09 4- cycles 0 . 067 ± 0 . 06 0 . 121 ± 0 . 12 0 . 154 ± 0 . 04 5- cycles 0 . 075 ± 0 . 09 0 . 118 ± 0 . 11 0 . 131 ± 0 . 13 3- stars 0 . 092 ± 0 . 07 0 . 144 ± 0 . 14 0 . 236 ± 0 . 12 4- stars 0 . 086 ± 0 . 09 0 . 115 ± 0 . 15 0 . 117 ± 0 . 10 5- stars 0 . 214 ± 0 . 09 0 . 122 ± 0 . 13 0 . 121 ± 0 . 06 4- cycles ∗ 0 . 152 ± 0 . 12 0 . 720 ± 0 . 07 0 . 214 ± 0 . 10 5- cycles ∗ 0 . 133 ± 0 . 10 0 . 720 ± 0 . 13 0 . 163 ± 0 . 08 T able 11: Estimated normalized errors for counts of specific subgraphs for our method, FINCS and adapti ve LASSO. The true graphs were sampled from an ERGM that encouraged the presence of triangles and 3-stars. 4.3.1 Scalability Here we discuss scalability of the proposed method and of the competing methods ( Fan et al. ( 2009 ), Scott and Carv alho ( 2008 )) to better appreciate the cost incurred in producing the errors in T ables 10 and 11 . Since the implementation of the proposed and competing methods av ailable to us are in different programming languages, which influence greatly the actual runtime, we outline such a discussion in theory , in terms of key quantities that influence scalability , including number of nodes n , number of edges m , and number of cliques k . W e also distinguish two tasks: the task of estimating the parameter of a model-based representation of a (hyper)graph, and the task of generating b (h yper)graphs from an estimated model-based representation. Regarding the task of estimating parameters from an observ ed (hyper)graph, the proposed meth- ods requires estimating parameters { V } , r , and the estimation comple xity scales as O ( S ( n ) + k 3 ) , where S ( n ) denotes the complexity of matrix multiplication. 1 The method by Scott and Carvalho ( 2008 ) requires estimating parameters G , and the estimation complexity scales as O ( S ( n ) + k ) . The lasso requires estimating parameters Σ − 1 , and the estimation complexity scales as O ( n 3 ) . Regarding the task of generating b (hyper)graphs from an estimated model-based representa- tion, the proposed methods scales as O ( bn 2 ) . This is because the complexity of computing the 2-skeleton of the Cech complex scales as O ( n 2 ) . Alternativ e representations lead to dif ferent scal- ing: computing the Delaunay triangulation scales as O ( n log n ) , and computing the the Alpha complex scales as O ( n 2 ) . The method by Scott and Carvalho ( 2008 ) scales as O ( bnm ) , where typically m is much lar ger than n . For the lasso, this statement does not apply . 1 The parameter k in this case is actually the number of prime components, but this quantity is typically in the same order of magnitude as the number of cliques. 33 T o illustrate how our method scales up with respect to the number of v ariables, we ran the ex- periment summarized in T able 12 . W e obtained the graphs from the nerv es of ˇ Cech complex es and employed the method proposed by ( Atay-Kayis and Massam , 2005 ) to compute the normalizing constants of non-decomposable graphs. The MCMC was performed on a 2.5GHz desk computer with 4GB of RAM. Our method was implemented using Matlab (MathW orks). N V ariables Burn-in N Iterations T ime 4 50 , 000 10 , 000 7 m 40 50 , 000 10 , 000 2 h 11 m 400 50 , 000 10 , 000 3 d 2 h 17 m T able 12: This e xperiment illustrates how our method scales up with respect to the number of v ariables. 4.4 Real data analysis 4.4.1 Fisher’ s Iris data W e applied our method to Fisher’ s Iris data set. V ariables include: sepal length (1), sepal width (2), petal length (3), and petal width (4). The objectiv e is to find the conditional independence structure gi ven a family of distrib utions for the likelihood (e.g. Gaussian, Clayton copula). A summary of the distribution of this data set is gi ven in T able 13 W e first describe the specification we used for the random geometric graph, then we will make our choices for the Hyper-Mark ov law and the likelihood explicit. For the RGG we assumed an uniform distrib ution of the vertices on the unit ball in R 3 and for the radius of each vertex an Ex (0 . 1) , distribution was assumed. For the positions of the vertices we used the same proposals as in Examples 4.2.1 - 4.2.4 . For each radius we used a mixture of random walks reflecting at 0 as proposal. For the likelihood function and Hyper -Markov law we used the follo wing specifications: V ariable SL SW PL PW Sepal length 0 . 4043 Sepal width 0 . 0938 0 . 1040 Petal length 0 . 3033 0 . 0714 0 . 3046 Petal width 0 . 0491 0 . 0476 0 . 0488 0 . 0754 T able 13: Estimated variances and co variances for the Iris data. 34 A multi variate normal distrib ution for the likelihood with an HIW as Hyper-Mark ov law . For our choice for the likelihood and Hyper-Marko v law , we adopted the same values for the hyperparameters as ( Roverato , 2002 ) did, this is, the prior for the precision matrix centered at I and δ = 3 . W e used the method proposed by ( Atay-Kayis and Massam , 2005 ) to deal with the normalizing constants of non-decomposable graphs. W e ran the MCMC with 300 000 iterations as burn-in and k ept the last 10 000 for analysis. Results for the first choice are summarized in T able 14 . Here we display the 9 models with highest posterior probability . All the posterior probability was concentrated in this models. Our posterior mode coincides with the one reported by ( Roverato , 2002 ), but we obtained dif ferent results for the rest of the models. W e attribute this difference to the fact that we used dif ferent priors for graph space; ours being non-uniform. W e conducted another simulation, where we assessed the robustness of the inference for the Maximal Simplices Posterior Probability { 1 , 2 }{ 1 , 3 }{ 2 , 4 }{ 3 , 4 } 0 . 3465 { 2 }{ 1 , 3 }{ 3 , 4 } 0 . 2835 { 1 , 3 }{ 2 , 3 }{ 4 } 0 . 1999 { 1 , 2 }{ 1 , 3 }{ 4 } 0 . 1540 { 1 , 2 }{ 1 , 4 }{ 2 } 0 . 0116 { 1 , 4 }{ 2 }{ 3 , 4 } 0 . 0026 { 1 , 2 }{ 1 , 3 }{ 3 , 4 } 0 . 0016 { 1 , 3 }{ 2 , 3 }{ 3 , 4 } 0 . 0002 { 1 , 4 }{ 1 , 3 }{ 2 , 3 }{ 3 , 4 } 0 . 0001 T able 14: Highest posterior factorizations with uniform prior and Gaussian distribution for Fisher’ s Iris data set. Maximal Simplices Frequency as posterior mode { 1 , 2 }{ 1 , 3 }{ 2 , 4 }{ 3 , 4 } 0 . 68 { 1 , 3 }{ 2 , 3 }{ 4 } 0 . 16 { 1 , 2 }{ 2 , 3 }{ 2 , 4 } 0 . 08 { 1 , 4 }{ 2 , 4 }{ 3 , 4 } 0 . 04 { 1 , 4 }{ 1 , 3 }{ 2 } 0 . 04 T able 15: Results from missing data simulation: W e fit our model the same way as in T able 14 , b ut leaving one row out each time and imputing it as missing data. In this table we report the frequency by which each factorization appeared as posterior mode. 35 Gaussian graphical model via tools from missing data. W e fit our model deleting one row of the data at a time (this is, we fit our model 50 times over incomplete data sets) and imputed the missing data using the conditional distrib ution of the observed data. Results are summarized in 15 . For each of this simulations we computed the average distance between the predicted vector and the observed one. For FINCS, the av erage distance between the predicted and observed vectors (across the 50 simulations) was 3 . 69 ± 1 . 28 , while for our method it was 3 . 22 ± 1 . 15 . This is not surprising, since, in contrast to FINCS, we consider non-decomposable models. 4.4.2 Daily exchange rates data Follo wing Hernandez-Lobato et al. ( 2013 ), we considered daily exchange rates of eight cur- rencies (Swiss franc, Australian dollar , Canadian dollar , Norwegian krone, Swedish krona, Euro, Ne w Zealand dollar and British pound) with respect to the U.S. dollar . The data set consists of 717 observations, from 1 Dec., 2011, to 29 Aug., 2014. Clearly these observations are not iid , but we will not take this into account in the modeling. What makes this application interesting is the presence of a non-trivial missing data pattern. The data was downloaded from yahoo.finance.com via the R library tseries. W e applied our method to these data. W e assumed a uniform distribution for the v ertices in the unit ball in R 3 , and that the nerve was computed from the intersection pattern of balls of different sizes. W e assumed a HIW as the hyper-Marko v law and a multiv ariate normal as the likelihood. W e kept the simulated v alues from 5,000 iterations after 25,000 iterations of b urn-in. Missing data were assumed missing at random and imputed from the model. The posterior mode is illustrated in Figure 18 ; it has 0.87 posterior probability . This model is non-decomposable. 5 Discussion In this article we present a new parametrization of graphs by associating them with finite sets of points in R d . This perspectiv e supports the design of informati ve prior distributions on graphs using familiar probability distributions of point sets in R d . It also induces ne w and useful Metropo- lis/Hastings proposal distributions in graph space that include both local and global moves. As suggested by Helly’ s Theorem ( Edelsbrunner and Harer , 2009 ) characterizing the sparsity of in- tersections of con vex sets in R d , this methodology is particularly well suited for sparse graphs. The simple strategies presented here generalize easily to more detailed and subtle models for both priors and M/H proposals. Our construction leads to MCMC that naturally instantiate local and global moves in graph (and 36 ● ● ● ● ● ● ● ● CHF A UD CAD NOK SEK EUR NZD GBP Figure 18: Posterior mode of our method when applied to daily exchange rates (with respect to US Dollar) from 1 Dec., 2011 to 29 Aug., 2014. Here CHF denotes the Swiss franc, A UD, the Australian dollar , CAD, the Canadian dollar , NOK, the Norwegian krone, SEK, the Swedish krona, EUR, the Euro, NZD, the Ne w Zealand dollar , and GBP , the British pound. hyper graph) space. The proposals that produce small perturbations on the verte x set will produce local mo ves with high probability , while proposals that consist in resampling a subset of the vertex set will produce, with high probability , a global mov e (See Figure 5 ). An interesting feature of our approach is that the distrib ution on the space of graphs is modeled directly before the application of any specific hyper Marko v la w , in contrast to standard approaches in which it is the hyper Marko v la w that is used to encourage sparsity or other features on the graph. W e think that working with the space of graphs explicitly opens a lot of possibilities for prior specification in graphical models, therefore, it is a perspecti ve worth further study . While coupling the focus on the first two moments and a graph representation of pairwise de- pendencies among variables are not restricti ve modeling choices in the Gaussian graphical model frame work, they become restricti ve when w orking with graphical models in general. Ho wev er , likely because of con venience, these assumptions are seldom challenged in the graphical models literature. Here we dev elop a geometric construction where dependence relations of higher orders can be con veniently encoded within a hypergraph. F or a state of the art treatment of parametriza- 37 tions of decomposable graphs, see ( Cron and W est , 2012 ). Connected to the point abov e, while decomposability plays an important role in graphical mod- els in general, it does not play any role in our construction. This is because decomposability is rele vant for computations at the Hyper -Markov la w le vel, while our construction and results are at the le vel of the prior on graph space. About the space of graph where our construction puts positi ve mass. At this point we ha ve two results and a conjecture for characterizing the space of feasible graphs (hyper graphs) when we consider the nerve of a set of balls in R 3 with different radii (This is the case that should hav e the largest support for the distrib ution of graphs.) Theorems: (i) Any graph can be embedded in R 3 . This is a well known result. One method that computes such embedding is the Book Algorithm, proposed by ( Kainen , 1974 ) and ( Ollmann , 1973 ). (ii) Any graph can be linearly embedded in R 3 . By ‘linearly embedded’ it is meant that the segments joining the v ertices are straight lines. This is a particular case of a more general result, that ev ery k-dimensional simplicial complex can be geometrically realized in 2 k + 1 dimensions. In this case case, k = 1 . See Section 3.1 of ( Edelsbrunner and Harer , 2008 ). Conjecture: (iii) Such a linear embedding can be achiev ed using a random geometric graphs construction using balls with dif ferent radii. Interesting questions and extensions of this idea include: (1) achieving a deeper and more detailed understanding of the subspace of graphs spanned by dif ferent specific filtrations; (2) de- signing priors to control the distrib utions of specific features of graphs such as clique size or tree width; (3) modeling directed ac yclic graphs (D A Gs), and (4) concrete implementation of novel Marko v structures based on nerves. This methodology generates only graphs that are feasible for the particular filtration chosen. Although we do have some insight about which graphs can and cannot be generated by a specific filtrations, a more complete and formal understanding of this aspect of the problem would be useful. W e used very simple prior distributions for the purpose of illustrating the core idea of the methodology . It is natural in this approach to incorporate tools from point processes into graphical models to define ne w classes of priors for graph space. Future de velopments in our research will in volv e a range of repulsiv e and cluster processes. The parametrization we propose can be used to represent Markov structures on D A Gs, but the strategies for obtaining such graphs from nerves will be different and will establish stronger connections between Graphical Models and computational topology . 38 The present work is related to that of Pistone et al. ( 2009 ) in which a nerve of conv ex subsets in R d is used to obtain Markov structures for a distribution, an extension of the abstract tube theory of Naiman and W ynn ( 1997 ). This ne w perspectiv e allo ws for constructions that generalize the idea of junction trees. By modifying our methodology according to this framework (personal communication from H. W ynn suggests that this is feasible) we hope to fit models that factorize according to those nov el Markov structures. Another possible extension of this work is to discretize the set from which the verte x set is sampled ( e.g . , use a grid). Such discretization may impro ve the beha viour of the MCMC; it w ould also allow the use of a nonparametric form for the prior on the vertex set, leading to more flexible priors on graph space. While we illustrated the new construction for Bayesian inference, in a situation where we ob- serve high-dimensional vectors and we want to infer the dependencies among their components, the proposed construction can be easily used to b uild a likelihood in situations where we ha ve direct observ ations about the facets of hyper graph. Such observations occur naturally in applica- tions; just think of ho w one would encode the structure among individuals re vealed by pictures on Facebook. Each picture has one or more people. A picture with three people is naturally encoded as a 3-facet, rather than as 3 individual edges, as currently done, arguably for lack of likelihood models for hyper graphs. Refer ences Aliye Atay-Kayis and H ´ el ` ene Massam. A Monte Carlo method to compute the marginal likelihood in non decomposable graphical Gaussian models. 92(2):317–335, 2005. Julian E. Besag. Spatial interaction and the statistical analysis of lattice systems (with discussion). 36(2):192–236, 1974. Julian E. Besag. Statistical analysis of non-lattice data. 24(3):179–195, 1975. Stephen P . Brooks, Paolo Giudici, and Gareth O. Roberts. Ef ficient construction of re versible jump marko v chain Monte Carlo proposal distributions. 65(1):3–55, 2003. Carlos M. Carv alho, H ´ el ` ene Massam, and Mike W est. Simulation of hyper -in verse Wishart distri- butions in graphical models. 94(3):719–733, 2007. David G. Clayton. A model for association in biv ariate life tables and its applications in epidemi- ological studies of familial tendenc y in chronic disease incidence. 65(1):141–151, 1978. 39 Andre w Cron and Mik e W est. Models of random sparse eigenmatrices matrices with appli- cation to bayesian factor analysis. A vailable on-line at ftp://stat.duke.edu/pub/ WorkingPapers/12- 01.pdf , 2012. A. Phillip Dawid and Steffen L. Lauritzen. Hyper Markov laws in the statistical analysis of de- composable graphical models. 21(3):1272–1317, 1993. Herbert Edelsbrunner and John Harer . Persistent homology— a surve y . In Jacob E. Goodman, Janos Pach, and Richard Pollack, editors, Surve ys on Discr ete and Computational Geometry: T wenty Y ears Later , volume 453 of Contemporary Mathematics , pages 257–282. American Mathematical Society , 2008. Herbert Edelsbrunner and John Harer . Lecture notes from the course ‘Computational T opol- ogy’. A vailable on-line at http://www.cs.duke.edu/courses/fall06/cps296. 1/ , 2009. Herbert Edelsbrunner and Ernst P . M ¨ ucke. Three-dimensional alpha shapes. A CM T r ansactions on Graphics , 13:43–72, 1994. J. Fan, Y . Feng, and Y . W u. Netw ork exploration via the adaptiv e lasso and scad penalties. The Annals of Applied Statistics , 3(2):521–541, 2009. Paolo Giudici and Peter J. Green. Decomposable graphical Gaussian model determination. 86(4): 785–801, 1999. Peter J. Green. Re versible jump Markov chain Monte Carlo computation and Bayesian model determination. 82(4):711–732, December 1995. John M. Hammersley and Peter Clif ford. Markov fields on finite graphs and lattices. Unpublished, 1971. W . Keith Hastings. Monte Carlo sampling methods using Markov chains and their applications. 57(1):97–109, 1970. David Heckerman, Dan Geiger, and David M. Chick ering. Learning Bayesian networks: The combination of kno wledge and statistical data. 20(3):197–243, 1995. J.M. Hernandez-Lobato, J.R. Lloyd, and D. Hernandez-Lobato. Gaussian process conditional copulas with applications to financial time series. In NIPS , 2013. Peter D. Hoff. Extending the rank likelihood for semiparametric copula estimation. 1(1):265–283, 2007. 40 Mark L. Huber and Robert L. W olpert. Likelihood-based inference for Mat ´ ern type III repulsi ve point processes. 41(4):In press, 2009. Preprint av ailable on-line at http://www .stat.duke.edu/ftp/pub/W orkingPapers/08-27.html. Beatrix Jones, Carlos Carv alho, Adrian Dobra, Christopher Hans, Christopher K. Carter, and Mik e W est. Experiments in stochastic computation for high-dimensional graphical models. 20(4): 388–400, 2005. P . C. Kainen. Some recent results in topological graph theory . In Graphs and combinatorics , volume 406 of Lector e Notes in Mathematics , pages 76–108. Springer , 1974. Matthe w Kalhe. T opology of random simplicial comple xes: a surv ey . AMS Contemp. Math. , (620):201–222, 2014. Stef fen L. Lauritzen. Graphical Models , v olume 17 of Oxfor d Statistical Science Series . 1996. V ikash K. Mansinghka, Charles K emp, Joshua B. T enenbaum, and Thomas L. Grif fiths. Structured priors for structure learning. In Leopoldo Bertossi, Anthony Hunter , and T orsten Schaub, editors, Uncertainty in Artificial Intelligence: Pr oceedings of the T wenty-second Annual Confer ence (U AI 2006) , 2006. Sach Mukherjee and T erence P . Speed. Network inference using informati ve priors. 105(38): 14313–14318, 2008. Daniel Q. Naiman and Henry P . W ynn. Abstract tubes, improved inclusion e xclusion identities and inequalities and importance sampling. 25(5):1954–1983, 1997. Roger B. Nelsen. An Intr oduction to Copulas . 1999. T aylor L. Ollmann. On the book thicknesses of various graphs. In Frederick Hoffman, Roy B. Le vo w , and Robert S. D. Thomas, editors, Pr oc. 4th Southeastern Confer ence on Combinatorics, Graph Theory and Computing , Congr essus Numerantium VIII , page 459, 1973. Mathe w D. Penrose. Random Geometric Graphs . 2003. Mathe w D. Penrose and Joseph E. Y ukich. Central limit theorems for some graphs in computational geometry . 11(4):1005–1041, 2001. Giov anni Pistone, Henry W ynn, G. S ´ aenz de Cabez ´ on, and Jim Q. Smith. Junction tubes and improv ed factorisations for Bayes nets. (unpublished), 2009. Christian P . Robert. The Bayesian Choice: F r om Decision-Theor etic F oundations to Computa- tional Implementation . Second edition, 2001. 41 Christian P . Robert and George Casella. Monte Carlo Statistical Methods . Second edition, 2004. Alberto Roverato. Hyper in verse Wishart distribution for non-decomposable graphs and its appli- cation to Bayesian inference for Gaussian graphical models. 29(3):341–411, 2002. James G. Scott and Carlos M. Carv alho. Feature-inclusion stochastic search for Gaussian graphical models. 17(4):790–808, 2008. Scott A. Sisson. T ransdimensional Markov chains: A decade of progress and future perspectives. 100(471):1077–1089, September 2005. David J. Strauss. A model for clustering. 62(2):467–476, 1975. K evin K. F . W ong, Christopher K. Carter , and Robert K ohn. Hyper in verse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. 90(4):809–830, 2003. A Filtrations and Decomposability in Random Geometric Graphs A filtration is a sequence of simplicial comple xes that properly include their predecessors: Definition A.1 (Filtration) . A filtration for a simplicial complex K is a sequence L = K 0 , K 1 , . . . , K k of simplicial complexes suc h that ∅ = K 0 ⊂ K 1 ⊂ . . . ⊂ K k = K , with all inclusions pr oper . Filtrations are commonly used in computational geometry and topology to construct efficient algorithms for computing specific nerves including the Alpha complex ( Edelsbrunner and M ¨ ucke , 1994 ). The simplicial complex es constructed in Section 2.2 from families of con ve x sets lead to filtrations as the con ve x sets are enlarged, by increasing the radius parameter r . Although much of the graphical models literature focuses on Markov structures deriv ed from decomposable graphs, those constructed in Section 2.2 from 1 -skeletons of ˇ Cech and Alpha com- plex es need not be decomposable (see Figure 19 ). In Algorithm 1 we present an adaptation of this construction that generates decomposable graphs, for use in applications that require them. In Sections 3 and 4 we present methodology and examples for both decomposable and unrestricted model spaces. 42 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x y Figure 19: Filtration of Alpha complexes, (a) r = √ 0 . 10 , (b) r = √ 0 . 20 and (c) r = √ 0 . 75 . The central idea for generating decomposable graphs from filtrations is to note that the comple x A ( V , 0) for r = 0 is disconnected and hence decomposable; as the radius r increases, if one adds edges only if the resulting graph is decomposable, then decomposability will hold for all r ≥ 0 . This procedure is formalized in Algorithm 1 . Proposition A.1. The graph G pr oduced by Algorithm 1 is decomposable. Pr oof. The algorithm is initialized with the decomposable empty graph, and decomposability is tested with each proposed addition of an edge ( i.e. , a 1 -simplex in L ). The decomposability test is tak en from ( Giudici and Green , 1999 , Theorem 2 ). Since only finitely many edges may possibly be added, G is decomposable by construction. A.1 Example W e first illustrate the algorithm on a simple example, based on the fi ve points in R 2 sho wn in Figure 20 and giv en in T able 16 . The graph induced by a ˇ Cech complex with r = 0 . 5 will not be decomposable. T able 17 presents the e volution of cliques and separators with increasing r , as edges are proposed for inclusion in Algorithm 1 . The first three proposed edge additions are accepted, but the proposal to add edge (1 , 2) at radius r = 0 . 474 is rejected, since the intersection of prime components { 1 , 3 } and { 2 , 4 , 5 } is empty , and therefore not a separator . A.2 Algorithm deletes few edges It is interesting to note that the number of proposed edge additions that are rejected by the algorithm is typically quite small. T o illustrate this we applied Algorithm 1 to a filtration of ˇ Cech complex es corresponding to 100 points sampled uniformly from the unit square, with radius r = 0 . 05 . In Figure 21 the graph G output by the algorithm is compared to the 1 -skeleton of the ˇ Cech 43 Algorithm 1: The algorithm takes as input a filtration of k abstract complexes and returns a decomposable graph that is a subset of the 1 -skeleton of the k -th comple x. input : a filtration L = K 0 , K 1 , . . . , K k retur n : a weakly decomposable graph G m = 0 ; i = 0 ; G 0 = ∅ ; while i < k and K i +1 6 = ∅ do τ i = { κ ∈ K i +1 \ K i such that | κ | = 2 } ; // the edges in the set difference and denote τ i,s as the s th edge if τ i 6 = ∅ then P = | τ i | ; f or s = 1 to P do G 0 = G m S τ i,s ; // propose adding the edge if ] ( G 0 ) < ] ( G m ) // fewer connected components? then G m +1 = G 0 ; // accept the proposal m = m + 1 ; else [ c i ] = C ( G m ) ; // the cliques [ s i ] = S ( G m ) ; // the separators [ v 1 , v 2 ] = τ i,s ; // the proposed edge if ∃ i 6 = j, k with v 1 ∈ c i , v 2 ∈ c j and c i T c j = s k then G m +1 = G 0 ; // accept the proposal m = m + 1 ; i = i + 1 ; G = G m 44 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x y 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x y V 4 V 4 V 5 V 5 V 3 V 3 V 2 V 2 V 1 V 1 Figure 20: (a) Proximity graph computed from the v ertex set gi ven in T able 16 . (b) The decompos- able graph computed from the same vertex set using Algorithm 1 . The edge (1 , 2) is not included in the decomposable graph. Coordinate V 1 V 2 V 3 V 4 V 5 x 0 . 686 0 . 214 0 . 846 0 . 411 0 . 089 y 0 . 151 0 . 194 0 . 420 0 . 567 0 . 553 T able 16: V ertex set used to illustrate Algorithm 1. complex (with no decomposability restriction) with the same radius r = 0 . 05 . Few edges appear in the ˇ Cech complex but not in G . This occurs because geometric graphs tend to be triangulated, in the sense that if edges ( v 1 , v 2 ) and ( v 2 , v 3 ) belong to a geometric graph, then very likely the edge ( v 1 , v 3 ) will also be in the graph, preserving decomposability . 45 Cliques Separators r Update [1][2][3][4][5] − 0 − [1 , 3][2][4][5] − 0.313 (1 , 3) [1 , 3][2][4 , 5] − 0.321 (4 , 5) [1 , 3][2 , 5][4 , 5] [5] 0.379 (2 , 5) [1 , 3][2 , 4 , 5] − 0.421 (2 , 4) [1 , 3][3 , 4][2 , 4 , 5] [3][4] 0.459 (3 , 4) [1 , 3][3 , 4][2 , 4 , 5] [3][4] 0.474 ∼ (1 , 2) [1 , 3 , 4][2 , 4 , 5] [4] 0.498 (1 , 4) T able 17: Evolution of cliques and separators in the junction tree representation of G as edges are added according to Algorithm 1 . The proposed addition of edge (1 , 2) is rejected. 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Figure 21: (a) The 1 -Skeleton of ˇ Cech complex giv en the displayed point set and r = 0 . 05 . (b) The decomposable graph for the same complex, point set, and radius output by Algorithm 1 . 46 100 150 200 0 20 40 60 80 100 120 140 160 180 Number of Edges Frequency RGG 100 150 200 0 20 40 60 80 100 120 140 160 180 200 Number of Edges Decomposable RGG 100 150 200 100 110 120 130 140 150 160 170 180 190 RGG Decomposable RGG Figure 22: Distrib ution of edge counts for both unrestricted and decomposable graphs. Graphs were computed using ˇ Cech complex filtrations with p = 100 and V i iid ∼ Un ([0 , 1] 2 ) . 47
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment