Generating Similar Graphs From Spherical Features
We propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches that compute features of graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-F…
Authors: Dalton Lunga, Sergey Kirshner
GENERA TING SIMILAR GRAPHS FR OM SPHERICAL FEA TURES Dalton Lunga and Ser gey Kirshner { dlunga,skirshne } @purdue.edu Purdue Univ ersity Departmen t of Statistics W est Lafa yette, IN 47907-2035 USA www.purdue.edu/stat Ma y 12 2011 T ec hnical Rep ort No 11-01 Summary W e prop ose a no v el mo del for generating graphs similar to a giv en example graph. Unlik e standard approac hes that compute features of graphs in Euclidean space, our approac h obtains features on a surface of a hypersphere. W e then utilize a v on Mises- Fisher distribution, an exp onential family distribution on the surface of a h yp ersphere, to define a mo del o v er possible feature v alues. While our approac h bears similarit y to a p opular exp onen tial random graph mo del (ERGM), unlike ER GMs, it do es not suffer from degeneracy , a situation when a significant probability mass is placed on unrealistic graphs. W e prop ose a parameter estimation approach for our mo del, and a procedure for drawing samples from the distribution. W e ev aluate the p erformance of our approach b oth on the small domain of all 8-no de graphs as well as larger real-w orld so cial net works. i Con ten ts 1 In tro duction 1 2 ER GMs 2 2.1 ER GM Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Degeneracy in ER GMs . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Wh y Are Unrealistic Graphs Likely under ERGMs? . . . . . . . . . . 6 3 Exp onen tial Locally Spherical Random Graph Mo del 10 3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Sampling under ELSR GM . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Exp erimen tal Ev aluation 14 4.1 Small graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2 Larger Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 Conclusion and F uture W ork 17 6 Ac kno wledgements 19 ii List of Figures 1 Simple t ypical subgraph configurations for undirected graphs. F rom left to right : edge , triangle , two-star and three-star configurations. 3 2 left: Distribution of 8-no de graphs. right: Distribution of cell diameter; where w e consider each feature-pair as a cell and compute the graph edit distance betw een each pair of graphs with feature coun ts mapping to that cell (feature-pair). . . . . . . . . . . . . . . . . . . . . . . . . 5 3 A degenerate ER GM sp ecified by edge-triangle pair for an 8-no de graph( top figur e ). Colorco ded is the pmf ov er the edge-triangle space for an estimated MLE θ M LE = ( − 0 . 992 , 0 . 617). ”+” indicates the ob- serv ed feature and its mean, while the ” ” shap e indicates the ERGM mo de. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Solid p oin ts show mode placement on the edge-triangle feature pairs that form the v ertices of the extended h ull as a result of Theorem 1. Ov erlay ed on solid points are mo des obtained from using the estimated ˆ θ mle of equation (3). . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5 View of mode placement on facets of the extended 3D conv ex h ull. Eac h mo de forms a v ertex of a given triangle (denoted by blac k lines) for a facet of the con vex hull. . . . . . . . . . . . . . . . . . . . . . . 8 6 An illustration of degeneracy of exponential mo dels on non-graph data. Colorco ded is the pmf o ver 2D grid spaces for estimated MLEs (left plot) θ M LE = ( − 0 . 086 , 0 . 086) & (right plot) θ M LE = (0 . 120 , − 0 . 160). ”+” sign indicates the observ ed feature and its mean, while the ” ” shap e indicates ERGM mo de. . . . . . . . . . . . . . . . . . . . . . . 9 7 Tw o 8-no de synthetic test graphs. Left: 2-comp onen t graph G test 1 falling inside the relative interior of the con vex h ull ¯ C of extended features. Right: G test 2 falling on the relativ e b oundary of the conv ex h ull ¯ C for the extended feature space. . . . . . . . . . . . . . . . . . . 17 8 Sample K L -divergence K L ˆ f k f for ELSRGM as more graphs get unco vered. The initial neighborho o d B ⊂ G n ( n = 8) is built around G test 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 9 (a) Syn thetic graph G test 1 ; (b) Corresp onding feature-pair: f ( G test 1 ) ∈ rin t ¯ C ; (c) Sim ulations from mo dels given G test 1 . The observed statis- tics are indicated b y the solid lines; the b o x plots include the median and in terquartile range of simulated net works. E RGM s show low v ari- ance and all 100 samples seem to b e placed on the same netw ork- sign of degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 10 (a) Syn thetic graph G test 2 ; (b) Corresp onding feature-pair: f ( G test 2 ) ∈ rb d ¯ C ; (c) Sim ulations from mo dels giv en G test 2 . The observ ed statis- tics are indicated b y the solid lines; the b o x plots include the median and interquartile range of sim ulated net w orks. Both mo dels are non- degenerate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 iii 11 Dolphin 62-node net w ork summaries. The observ ed statistics are in- dicated b y the solid lines; the b o x plots include the median and in- terquartile range of simulated netw orks. E R GM plots sho ws signs of degeneracy effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 12 Faux-Mesa-High 205-no de net work summaries. The observed statistics are indicated by the solid lines; b o x plots summarize the statistics for the sim ulated netw orks. E RGM plots shows signs of degeneracy effect. 20 13 Kapferer’s 39-no de net w ork summaries. The observed statistics are indicated b y the solid lines; b ox plots summarize the statistics for the sim ulated netw orks using ELSRGM . . . . . . . . . . . . . . . . . . . . . 20 14 left: Faux-Mesa-High 205-no de original net w ork. right: ELSRGM gen- erated net work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 iv List of T ables 1 Complexit y of graph spaces for fixed num b er of no des. . . . . . . . . 4 v Degeneracy and Exp onen tial Lo cally Spherical Random Graph Mo del 1 1 In tro duction Increasingly , man y domains pro duce data sets containing relationships that are con ve- nien tly represented by netw orks, e.g., systems sciences (the Internet), bioinformatics (protein in teractions), so cial domains (so cial net works). As researchers in these ar- eas are dev eloping models and to ols to analyze the prop erties of netw orks, they are hamp ered b y few samples a v ailable to ev aluate their approaches. This giv es rise to a problem of generating more net work samples that can b e viewed as drawn from the same p opulation as the given netw ork. While there are a n umber of p ossible approac hes to this problem, p erhaps the most w ell-studied mo del is the exp onen tial random graph mo dels (ERGMs, or in so cial netw ork literature, p ? ), an exp onen tial family class of mo dels, matching the statistics o v er the set of p ossible net works to the statistics of the netw ork in question [e.g., 17]. This and similar approac hes hav e a long history as they generalize the p 1 [7] and Mark ov random graph [4] mo dels first developed in the so cial net work literature. While such approac hes are in tuitiv e and ha ve nice prop erties, they also suffer from the issue of de gener acy [5, 16], whic h is manifested in the instability of parameter estimation, and in placing most probabilit y mass of resulting distributions on unrealistic graphs (e.g., empty graph or complete graphs). As a result, these approac hes are not suitable for the purp ose of generating graphs similar to the given one. This pap er con tains t wo con tributions. First, w e zero-in on the issue of degener- acy and discov er that its cause is related to the geometry of the set of feature vectors and the n umber of graphs mapp ed to eac h feature vector (thought of as feature vec- tor weigh ts). By augmenting feature vectors with logarithms of their corresp onding w eights, we show that only graphs with such augmented feature vectors on the relative b oundary of the resulting extended con vex h ull can b ecome mo des of any exponential random graph mo del, explaining why unrealistic graphs (which are on the relativ e b oundary) often get large probabilit y masses. Second, using the insight of the obser- v ation ab o v e, we prop ose a no vel random graph mo del whic h is based on embedding the features of graphs onto a surface of a hypersphere. Since a spherical surface is a relativ e b oundary of the sphere’s con v ex hull, all of the feature v ectors w ould then b elong to the relativ e b oundary of the con vex hull and could p oten tially serve as mo des of the corresp onding distributions. This in turn helps to a void the degeneracy issues whic h plague ERGMs. Our prop osed approac h mak es use of spheric al features obtained b y embedding p ossible graphs onto a surface of a sphere [18], and then approximating the distri- bution of the resulting spherical feature space with a v on Mises-Fisher distribution. Since the space of all p ossible graphs is to o large to consider for embedding, w e con- sider determining the em b edding function based only on the neigh b orhoo d around the giv en graph th us resulting in a lo cally spherical embedding of the set of graphs. The main benefit of our approac h is that it fixes the issue of degeneracy , with the mo de of the distribution o ver the spherical feature v ector coinciding with the features Degeneracy and Exp onen tial Lo cally Spherical Random Graph Mo del 2 of the given graph. An additional adv an tage of our approac h is that its parameter es- timation pro cedure do es not require cumbersome maxim um en tropy approaches used with ER GMs. W e start b y revisiting the ER GM mo del and presen ting insigh ts on why this mo del often fails to generate realistic graphs (Section 2). W e then propose our alternativ e approac h, exp onen tial lo cally spherical random graph mo del ( ELSRGM , Section 3) and ev aluate it on both syn thetic and realistic graphs while comparing it to ER GMs (Section 4). W e conclude the pap er (Section 5) with ideas for future work. 2 ER GMs Ov erall, w e are in terested in probabilistic approaches for generating graphs similar to the giv en one. W e consider the case of simple (unw eighted, no self-lo ops) undirected graphs G = ( V , E ) ∈ G n . 1 Where n = | V | is the num b er of v ertices; E ∈ { 0 , 1 } n × n is a symmetric binary adjacency matrix with zeros on its diagonal, with e ij = 1 iff there is an edge from v i to v j , and e ij = 0 otherwise, where v i ∈ V and e ij ∈ E . There are |G n | = 2 ( n 2 ) p ossible lab eled graphs with n v ertices, a finite but often prohibitively large n umber even for fairly small n . W e first consider the well-studied exp onen tial random graph mo del (ERGM, also kno wn as p ? ) as a starting p oin t for our approac h. 2.1 ER GM Definition In the area of so cial net work analysis, scien tists are often interested in specific features of netw orks, and some of the state-of-the-art mo dels explicitly use them to define functions of netw ork sub-structures. W e will denote the vector of these functions by f : G n → R d . Among the examples of such features used by so cial scientists, we hav e the n umber of edges, the num b er of triangles, the num b er of k -stars, etc.: f edg e ( G ) = X X 1 ≤
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment