Statistical estimation for optimization problems on graphs

Statistica l estimation f or optimizati on pr oblem s on graphs Mikhail Langovoy and Suvrit Sra Max Planck Institute for Intelligent Systems 72076 T ¨ ubingen , Germany { langovoy, suvrit.sra } @tuebinge n.mpg.de Abstract Large graphs abou nd in machin e learning , data mining , and several related areas. A useful step tow ard s analyzing such graphs is that of ob taining certain summary statistics—e.g., or the expected length of a shortest path betwee n two nod es, or the e xp ected weight o f a minimum spa nning tree of the graph, etc. These statis- tics p rovide insight into the structure o f a grap h, and they can help pr edict glob al proper ties of a g raph. Motiv ated thus, we pr opose to study statistical pro perties of structured su bgraph s (of a g iv en graph) , in particular, to estimate the expected objective function value of a comb inatorial optimization pr oblem over these su b- graphs. T he general tas k is v er y difﬁcult, if not unsolv able; s o for concreten ess we describe a mor e speciﬁc statistical estimation problem based on spannin g trees. W e hop e that our position pap er en courage s others to also study o ther types of graphica l s tru ctures for which one can prove non trivial statistical estimates. 1 Intr oduction A cornuc opia of application s in machine learnin g and related areas inv olve large-scale graphs. T o- wards analy zing such grap hs a basic step is that o f obtain certain summary statistics. For examp le, we might want to know wh at might be the expected length of a shortest path between tw o n odes, or what is the expected weight of an associated minimum spanning tree, etc. Such statistics provid e insight into the g lobal st ru cture o f a graph; a nd estimating them helps predict properties of the entire graph without having t o actually look at the whole graph, a very practical scenario. Our consideration s stem from a classic pap er o f Frieze [ 1985], wh o studie d the expe cted value o f the weig ht of a minim um spann ing tree (MST) of a co mplete gra ph on n node s, with ed ge-weigh ts distributed according to a c ommon distribution fun ction. For such graph s, Fr ieze ob tained a n explicit value for the expected weight of an M ST as n ten ds to inﬁnity . In subsequ ent years, h is analysis has b een r eﬁned and extended to cover more gen eral gra phs, and u nder different assumptio ns— see [Steele, 2002] and the reference s therein. This preced ent suggests that with increa sing sizes, one can estimate statistical pro perties of various combinato rial structures on graphs. This statement brings us to the key ch allenge of this paper . Problem 1 ( Statistics o n g raphical s truct ures). Let G n = ( V n , E ) be a graph with n vertices, and let G be a co llection of certain “structured” subgraph s of G n . Let ϕ : G → R + be function that measur es the “cost” of a subg raph in G . As n tends to inﬁnity , what ca n we say about the expected value E [min g ∈G ϕ ( g )] of the min imum cost structur e, and under what r estrictions o n the structu r es G and on the cost function ϕ ? Our cur rent paper is a position paper that advances Problem 1 as a key research question worthy of careful inv estigation . Admittedly , in gener al this problem is very difﬁcult; per haps too bro ad to be useful. But our ultimate a im is less to tackle the gen eral proble m and more to ide ntify special classes 1 of structu r es and cost fun ctions , for which we can make nontrivial statistical statements. W e hope that this workshop paper stimulates discussion and also encourage s oth ers to study this problem. 2 Formu lati on Let us now mov e onto a some what mo re formal treatment of Problem 1. Let G n = ( V n , E n ) be a g raph with n vertices and | E n | edges. For any vertex v ∈ V n we observe an integer-v alued ran dom variable X v : Ω → { 1 , 2 , . . . , k } (on an appro priate pr obability spac e Ω ). Here k is assume d to be ﬁn ite, but o therwise unk nown. W e call the co llection { X v | v ∈ V n } a random coloring of the graph G n . 1 Assume that { X v | v ∈ V n } form a collection of completely indep endent random variables and that they are identically distrib uted accord ing to a distrib ution functio n F X . Further, suppose that fo r any pair of vertices v i , v j ∈ V n we ob serve a re al-valued random variable Y v i ,v j : Ω → R + , whe re { Y v i ,v j | v i , v j ∈ V n } form a collec tion of comp letely indepen dent random variables that are identically distributed accor ding to a d istribution function F . It is also assumed for n ow that the X ’ s and Y ’ s are completely indepe ndent of each other . W e will ad d extra assumptions on F an d F X below , when necessary . Remark: The a bove assum ption on edge weights cor respond s to the ca se wh en G n is a co mplete graph on n vertices. T o study general grap hs on n vertices, on e has to consider on ly the reduced collection of random variables { Y v i ,v j | ( v i , v j ) ∈ E n } . Denote by G = { G | G ⊆ G n } , a co llection of ( structured) subgrap hs of G n . Fix a cost function ϕ : G → R + . Ultimately , we w ill be interested in t h e case when ϕ is a set functio n (over the sets of vertices or edges participating in the subgraphs characterized by G ). Often, a mor e convenient a nd speciﬁc for m of ϕ might be assumed , na mely that for any G = ( V , E ) ⊆ G n there exists a decompo sition ϕ ( G ) = ϕ ( { ( X v , Y e ) | v ∈ V , e ∈ E } ) = ϕ 1 ( { Y e | e ∈ E } ) + ϕ 2 ( { X v | v ∈ V } ) , (1) where ϕ 1 is a cost function that depend s only o n the edges, and ϕ 2 depend s on ly on the vertices. For co ncreteness, we n ow focus on the f ollowing statistical g oal: estimate the a verage cost (per vertex) of a minimal cost spann ing tr ee of G n —hencefo rth, ϕ -M ST . 2 W e want a computatio nally efﬁcient pro cedure for this estimate, and the estimator itself mu st be consistent. Additionally , we also care about the correspo nding rates o f conv ergence . Since the o riginal grap h G may be very large and a direct compu tation of th e (min imal cost) spannin g tree may be infeasible, we suggest comp uting an estimator ba sed on a suitably constructed au xiliary graph that is much smaller than G , but exhibits similar statistical p roper ties. T o this end, we prop ose the following generic method. Method 1. On the basis of the color ing { X v | v ∈ V } , construct a suitable ( possibly problem - depend ent) √ n -consistent estimate b F X of the distribution function F X . 2. Using the collection { Y e | e ∈ E } , construct a suitable p | E | -consistent estimate b F o f the edge weigh t distribution functio n F . Note that when G n is a co mplete graph , alrea dy the standard empirical distribution function gi ves an n -consistent estimate. 3. Generate an a uxiliary grap h G ′ d ( n ) = ( V ′ d ( n ) , E ′ d ( n ) ) , h aving d ( n ) vertices, where d ( n ) is suitably chosen and satisﬁes the growth co nditions lim n →∞ d ( n ) = ∞ , lim n →∞ d ( n ) n = 0 . (2) 1 W e can also consider inﬁnite colorings ba sed on X v : Ω → N ; b ut for simplicit y we study o nly ﬁnite o nes. 2 A similar discussion also applies to other problems such as shortest paths, cuts, etc. 2 4. Simulate i.i. d. random variables { X ′ v | v ′ ∈ V ′ d ( n ) } and { Y ′ e | e ∈ E ′ d ( n ) } from th e distri- bution func tion estimates con structed at Steps 1 and 2 corre sponding ly . ( Remark: for a complete graph G n , we generate G ′ d ( n ) to be complete as well.) 5. Find th e minimu m ϕ -co st span ning tree ST( G ′ d ( n ) ) ; compute ϕ (ST( G ′ d ( n ) )) . (Remark: This step requires solution of a potentially hard discrete optimization problem.) Based on the above g eneric method, we introduce the estimate: \ Avcost ( G n ) := ϕ (ST( G ′ d ( n ) )) d ( n ) . (3) Processing the r educed g raph G ′ d ( n ) is obvio usly mu ch faster than processing G n itself. But we ne ed to theoretically char acterize to what e xten t it is acceptable to process G ′ d ( n ) . T o th at end, we attempt to in vestigate the follo win g main questions: • When is the above metho d consistent; • What can be the rate of con vergence of the estimator (3); and • What is the computational complexity of the new m ethod. W e show th at the above estimatio n p rocedur e h as a high ly nontr i v ial behavior . Statistical analy sis remains nonetheless possible, b ut requir es de licate results from d iscrete probab ility as well as novel statistical methods. W e provide b elow theoretical justiﬁcation o f our ap proach fo r some basic cases. 3 Ap plications to s pecial cases Consider the case wh en ϕ 2 ≡ 0 , an d th e spannin g tre e weight depend s only on edge weig hts. 3 As before, let G = { G | G ⊆ G n } be the chosen collection of (structur ed) subg raphs of G n . Ne xt, assume that for an arbitrary m ember G = ( V , E ) ∈ G th e ed ge-cost set fu nction ϕ 1 , de ﬁned in ( 1), satisﬁes addition ally ϕ 1  { Y e | e ∈ E }  = ϕ 1  X e ∈ E Y e  , (4) and that ϕ 1 is continuo us and nondecre asing. This includ es for example the imp ortant cla ss of submodu lar f unction s th at can be expressed as no ndecre asing conc av e func tions of sums (see e.g., [Stobbe and Krause, 2010, Goel et al., 2010]). Based on the assumption s (1) and (4), we can prove the following. Proposition 2. Let F b e a distribution functio n that is contin uously differ entiable at 0, h aving F (0) = 0 a nd F ′ (0) > 0 . Suppose that F has ﬁ nite mea n and variance. Assume that the cost function ϕ satisﬁes (1) an d (4) with ϕ 2 ≡ 0 . Then for a minimum sp anning tr ee o f the co mplete graph G n it holds that lim n →∞ E F ϕ (ST( G n )) = ϕ 1 ( ζ (3) /F ′ (0)) , (5) wher e ζ is the Riema nn Zeta function. Mo r eover , for a ny ε > 0 , lim n →∞ Pr ( | ϕ (ST( G n )) − ζ (3 ) /F ′ (0) | > ε ) = 0 . ( 6) The proof uses results from [Frieze, 1 985] and [Steele, 2002]. Using this proposition , we will prove a c onsistency theorem for ou r estimator (3) f or th e case of comp lete graphs 3 and a wide class of edge-d ependen t we ight function s. First, we need to introduc e a sp ecial class of estimators. 3 W e alert the reader to the fact that analysis of j ust the exp ected weight of an ordinary (linear) MST for general graphs is a difﬁcult problem [Steele, 2002, Frieze et al., 2000]. 3 Deﬁnition 1 ( Boundary respecting estimators). As above, we assume that F is a distribution function that is con tinuously differentiable at 0 , h aving F (0 ) = 0 an d F ′ (0) > 0 . Let F be so me class of real-valued distribution fun ctions that con tains F . Sup pose that we h av e a sequence of function s { Ψ ( n ) F } n ≥ 1 such that for each n it holds that Ψ ( n ) F = ( b F ( n ) , ψ ( n ) 0 ) , where b F ( n ) maps R n → F , and b F ( n ) is d ifferentiable at 0 with the derivati ve ψ ( n ) 0 . He re ψ ( n ) 0 ( X 1 , X 2 , . . . , X n ) is a real-valued random variable itself. Assume that there exists a real sequenc e { r n } such that f or a ny i.i.d. sample X 1 , X 2 , . . . , X n gen- erated fro m a distribution F ∈ F , and any ε > 0 , th ere exists a constant C ( ε, F ) > 0 su ch that for Pr  | ψ ( n ) 0 ( X 1 , X 2 , . . . , X n ) − F ′ (0) | > ε  ≤ C ( ε, F ) r n . (7) If the sequence r n satisﬁes lim n →∞ r n = ∞ , (8) we say that the estimato r { Ψ ( n ) F } n ≥ 1 r espects th e bound ary o f distribution F from the c lass F . In case the co nstant C ( F ) above can be chosen in depend ently of F ∈ F , we say that th e estimator { Ψ ( n ) F } n ≥ 1 r espects the bou ndary uniformly for distrib ution s from the class F .  Such estimators actually exist—see [Balabd aoui, 2 007] o r [ Alberts and Karunam uni, 2003] for ex- amples. I t is ne cessary to r emark here that a statistical question o f constructing estimates th at are consistent at bo undar y points can be tr icky , an d is cer tainly a nonsta ndard task. Blind use of stan- dard meth ods can lead to incor rect results: many of the well-established estimation m ethods are consistent in integral n orms such as L 1 - or L 2 -norm s, or within the interior of the parameter spaces. Behavior of estimators at bou ndary po ints is substantially less studied , and th e estimators that behave well at the boun dary are usually not governed by con ventional statistical results. An an illustratio n, we n ote that in the setup of the Deﬁnition 1, the usual kernel density es- timator gives a biased e stimate of F ′ (0) , even if one substan tially re stricts the space F . In- stead, Alberts and Karunamu ni [20 03] prop oses a mo diﬁed kernel d ensity estimator that has a cor- rection for the bias on the boun dary . On the other hand, it is imp ortant to ob serve th at Deﬁnition 1 only req uires that F and F ′ are consistently estima ted at the single boundary point 0; at oth er po ints b F ( n ) may e ven be inconsistent ! This leaves a lot of oppo rtunities for nonstanda rd constructions of estimators. Sur prisingly enough , ev en inconsistent estimators are useful in our proble m, as long as they respect the boundary . Theorem 1 (Consistency) . Let G n be a c omplete graph on n vertices with random edge weights and let th e cost function ϕ satisfy (1) an d (4) with ϕ 2 ≡ 0 . Consider the p r oblem o f estimating the expected per verte x co st o f a n MST ( using cost fun ction ϕ ) of G n . Generate a co mplete a uxiliary graph G ′ d ( n ) on d ( n ) vertices, via sampling the new edge weights { Y ′ e | e ∈ E ′ d ( n ) } fr om the d is- tribution function b F ( n ( n − 1) / 2) ( { Y e | e ∈ E n } ) . Suppo se that { Ψ ( n ) F } n ≥ 1 r espects the bo unda ry for F . 1) Then, \ Avcost( G n ) is a consistent estimate, in the sense that for any ε > 0 lim n →∞ Pr    \ Avcost ( G n ) − 1 n ϕ 1  ST( G n )    > ε  = 0 . (9) 2) Much mo r e than that, our auxiliary sample allows estimating the weight of the MST itself consis- tently in pr obab ility , i.e., for any ε > 0 lim n →∞ Pr    d ( n ) · \ Avcost( G n ) − ϕ 1 (ST( G n ))   > ε  = 0 . (10) The meanin g o f this theorem is that, for example, in the case o f ra ndom c omplete gr aphs, on e can consistently estimate some of their imp ortant characte ristics by using just a small (but properly 4 constructed ) mod el of the initial large graph. In the p articular case of span ning tre es, one can have the number of vertices d ( n ) g row to inﬁnity arbitrarily slowly , but still o btain asymptotically consistent estimates. T his ob servation could be of m uch help in pro blems that requ ire optimization on huge networks that w ou ld be practically intractable to treat as a whole. 4 Related work and open pr oblems In th is section we ﬁrst summar ize some related work, and then discuss a list of op en pro blems and challenges arising from this paper . 4.1 Related work Random graph theory is a matu re subject (see [Bollob ´ as, 2 001]); b ut our interest is mo re speciﬁc. In particular, we draw up on work on estimating weig hts of (ordinar y) M STs d ating back to [Frieze, 1985]. For a g ood summary , and addition al referen ces we refer to the paper o f Steele [2002]. Bertsimas [ 1990] studies a clo sely related but very different for mulation, wh erein he assumes that nodes may be present ( or absent) with a cer tain prob ability . Based on this mo del, he studies what the expected weight of an M ST might b e. In con trast, we assume th at the edge weights are random (accord ing to speciﬁc law), and we study the expected value under a cost functio n strictly mo re general than th e ordinar y linear co st u sed for MSTs. Also n ote that in ou r framework one tends to build auxiliary grap hs on d ( n ) ≪ n vertices, so ou r method is intende d to works for g raphs that are incompara bly sm aller than the orig inal gr aph, while Bertsima s [1990] studies gr aphs on O ( n ) nodes. T o make our method practical, we d epend on av ailability of an algorith m to solve the ϕ -M ST prob - lem o n the aux iliary gr aph. For appropriate cho ices of the co st function ϕ , recent alg orithms such as those of [Stob be and Krause, 2010] or [Jegelka and B ilmes , 2011b,a] might offer practical method s for tackling the subproblem o n the auxiliary graphs. Ad ditionally , th ere is a well-dev elop ed body on sub modular optimization that we co uld tap into; see for instance [Chudak and Nagano, 2007, Fujishige, 2005, Iwata and Nagano, 2009]. W e n ote, howe ver , that submodular set fun ctions o ffer only one clas s of possible cost fu nctions—if algorithms (or appr oximation a lgorithms) are a vailable for other type of cost function s, we c ould beneﬁt from those too—e.g., those in [Murota, 2003]. 4.2 Open Problems Since this is a position pape r that also advances a new set of research prob lems, th ere are num erous aspects that remain to be studied. W e highligh t some of the important questions belo w . An imp ortant open pro blem is to determine the typ es of d eterministic or random graph s for which we can ensur e co nsistency of the estimator from Theor em 1. There are ﬁn e pr obabilistic resu lts on MSTs for sev eral classes of random graphs, both asymptotic and ﬁnite sample ( see [Steele, 2002, Frieze et al., 200 0] and referen ces therein). Mo st notably , a lot is k nown about MSTs of cub es, and some other “ regular” graphs. And, as some reﬂection shows, for such g raphs, it is easy to check whether the estimator from T heorem 1 is con sistent o r not. But mor e g enerally , even if ther e is no hope to get closed form probabilistic results about th e weight of the ϕ -MST s, its pr oposed estimator may be expected to be consistent in many m ore interesting cases. As sh own in [Steele, 200 2], the expected weight of a (linear) MST of an arbitr ary con nected grap h G can be represented as an integral of a functio n tha t depend s on th e T utte p olynom ial of G . This observation leads us to c onjecture that the expected weight of th e ϕ -MST for sub modu lar ϕ might be rep resentable as a Choq uet integral inv olvin g T utte polynomials. If this is the c ase, o ur estimato rs will also be rando mized ap prox imations of cer tain Choquet integrals, a curious byproduct. As usu al, it would be valuable to study rates of con vergence o f ou r estimators, as well as some basic prop erties such as asymptotic variance. The fact that these estimators can be con sistent ev en when they are based on a “small” graph (with d ( n ) ≪ n vertices), is promising since it provides theoretical grou nds f or repla cing processing on g iant networks by pro cessing suitably c onstructed, smaller networks. Results on variance and rates of conver g ence of the estimato rs will co ntribute tow ard s judging actual accuracy of such replacemen ts. 5 Since we expect our estimation to work on large grap hs, it is cr ucial that we be able to min imize the cost fun ction ϕ efﬁciently , at least on the aux iliary g raph G ′ d ( n ) . This raises the cornerston e question: f or wh ich type s of cost function s ϕ ( submod ular , monoto ne, etc.) does th ere exist a n efﬁcient optimization m ethod fo r ﬁnd ing ( at least ap proxim ately) the de sired min imum cost structu re (spanning tree, path, etc.) that simultaneously also respects our statis tical estimation procedure. The present short paper suggests that this class of cost function s is rich (at least inﬁnite-dimen sional). Finally , we close by mentioning that e ven though we illustrated only spanning trees, the same argu- ment extends to o btaining estimators for any other gra phical structures such as paths, cuts, etc., as long as suitab le estimators ar e av ailable for corresponding linear co st function s. More challen gingly , we wish to co nsider deriving co nditions on ϕ 1 and ϕ 2 in the d ecompo sition (1), und er wh ich o ne obtains consistent estimators. Refer ences T . Alberts a nd R. J. Karun amuni. A semiparametric metho d of bound ary co rrection for kernel density estimation. Sta tistics and Pr obability Letter s , 61(3 ):287– 298, 2003 . F . Balabdaou i. Consistent estimation of a co n vex density at the origin. Math ematical Meth ods o f Statistics , 16:77–9 5, 2007. ISSN 1 066-5 307. D. J. Bertsimas. The pro babilistic minim um spa nning tree pro blem. Networks , 20 (3):24 5–275 , 1990. B. Bollob ´ as. Rand om Graphs . Cambrid ge Uni versity Press, 2001. F . A. Chu dak an d K. Nagan o. Ef ﬁcient solu tions to relaxa tions of combin atorial p roblems with submodu lar penalties via the Lov ´ asz extension and nonsmooth con vex optimizatio n. In SOD A , 2007. A. M. Frieze. On the value of a random minimum spannin g tree problem. Discrete Applied Mathe- matics , 10:47–5 6, 1 985. A. M. Fr ieze, M. Ru szink ´ o, and L. Thoma . A note on rand om m inimum length span ning trees. Electr onic Journal of Combinatorics , 2000. S. Fujishige. S ubmod ular functions and o ptimization , volume 58 of Ann als o f Discr ete Mathema tics . Elsevier Science, 2 005. G. Goel, P . T rip athi, and L. W ang. Optimal Ap prox imation Algo rithms for Multi-a gent Combin a- torial Problems with Discou nted Pr ice Functio ns. In F oundations o f S oftwar e T echnology and Theor etical Compu ter Science , 2010. S. Iwata and K. Nagano . Submo dular function min imization und er covering constraints. In FOCS , 2009. S. Jegelka and J. A. Bilmes. Subm odularity beyond subm odular ene rgies: cou pling edge s in graph cuts. In Compu ter V ision and P attern Recognitio n (CVPR) , June 2011a. S. Jegelka and J. A. Bilmes. Approx imation boun ds for inference using cooper ativ e cuts. In Inter- nationa l C o nfer ence on Machine Learning (ICML) , 2011b. K. Murota. D iscr ete Con vex Analysis . SIAM, 2003. J. M. Steele. Minimum spanning trees for graphs with rand om edge lengths. In In Math ematics and Co mputer Science II: Algorithms, T r ees, Combinatorics a nd Pr ob abilities, Birkh ¨ auser , pages 223–2 45, 2002. P . Stobbe and A. Krause. Efﬁcient minimization o f d ecomposab le submodular func tions. In NIPS , 2010. 6

Statistical estimation for optimization problems on graphs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment