An Optimal Algorithm for Stochastic Vertex Cover

An Optimal Algorithm for Sto c hastic V ertex Co v er Jan v an den Brand ∗ Inge Li Gørtz † Chirag P abbara ju ‡ Debmaly a Panigrahi § Cliﬀord Stein ¶ Miltiadis Stouras ‖ Ola Sv ensson ∗∗ Ali V akilian †† Abstract The goal in the sto c hastic v ertex cov er problem is to obtain an approximately minim um v ertex co ver for a graph G ⋆ that is realized by sampling eac h edge indep enden tly with some probabilit y p ∈ (0 , 1] in a base graph G = ( V , E ). The algorithm is giv en the base graph G and the probabilit y p as inputs, but its only access to the realized graph G ⋆ is through queries on individual edges in G that reveal the existence (or not) of the queried edge in G ⋆ . In this pap er, w e resolve the central open question for this problem: to ﬁnd a (1 + ε )-appro ximate vertex co ver using only O ε ( n/p ) edge queries. Prior to our work, there w ere t wo incomparable state-of-the- art results for this problem: a (3 / 2 + ε )-approximation using O ε ( n/p ) queries (Derakhshan, Durv asula, and Haghtalab, 2023) and a (1 + ε )-appro ximation using O ε (( n/p ) · RS( n )) queries (Derakhshan, Saneian, and Xun, 2025), where RS( n ) is kno wn to be at least 2 Ω ( log n log log n ) and could b e as large as n 2 Θ(log ∗ n ) . Our improv ed upp er b ound of O ε ( n/p ) matches the kno wn low er b ound of Ω( n/p ) for any constant-factor appro ximation algorithm for this problem (Behnezhad, Blum, and Derakhshan, 2022). A k ey tool in our result is a new concen tration bound for the size of minimum vertex cov er on random graphs, which might b e of indep enden t in terest. ∗ Georgia Institute of T echnology . Email: vdbrand@gatech.edu † T echnical Universit y of Denmark. Email: inge@dtu.dk ‡ Stanford Univ ersity . Email: cpabbara@stanford.edu. § Duk e Universit y . Email: debmalya@cs.duke.edu ¶ Colum bia Universit y . Email: cliff@ieor.columbia.edu ‖ EPFL. Email: miltiadis.stouras@epfl.ch ∗∗ EPFL. Email: ola.svensson@epfl.ch †† Virginia T ech. Email: vakilian@vt.edu 1 In tro duction In the sto chastic vertex c over problem, w e are given a b ase gr aph G = ( V , E ) and a sampling probabilit y p ∈ (0 , 1]. The r e alize d gr aph G ⋆ is a random graph generated by sampling eac h edge in G independently with probability p . The algorithm do es not hav e access to G ⋆ directly , but can query individual edges e ∈ E to learn whether they appear in G ⋆ . The goal of the algorithm is to output a near-optimal vertex cov er of G ⋆ while querying as few edges in G as possible. The sto c hastic setting has b een widely considered in graph algorithms. P erhaps the most extensive literature exists for the sto c hastic matc hing problem, where the goal is to query a sparse subgraph H of the base graph G suc h that the realized edges in H con tain an appro ximately maxim um matc hing of the realized graph G ⋆ . F or this problem, the ﬁrst result was obtained by Blum et al. [ BDH + 15 ], who gav e a 1 / 2-appro ximation algorithm using n · p oly(1 /p ) queries. This result has subsequen tly b een impro v ed in an extensiv e line of w ork [ AKL16 , AKL17 , BR18 , BFHR19 , AB19 , BD20 , BDH20 , DS25 ], even tually culminating in an (almost tigh t) result that gives a (1 − ε )-approximation using n · p oly(1 /p ) queries, for an y ﬁxed small ε > 0 [ ABGR25 ]. Besides the maxim um matc hing problem, sto c hastic optimization on graphs has also b een considered for other classical problems such as minim um spanning tree and shortest paths [ BT91 , GV06 , V on07 , BGPS13 ], as w ell as for more general frameworks in combinatorial optimization suc h as cov ering and packing problems [ DGV08 , ANS08 , GK11 , BGK11 , YM18 ]. In this pap er, w e consider the sto c hastic v ertex co ver problem. This problem w as in tro duced by Behnezhad, Blum, and Derakhshan [ BBD22 ], who gav e, for any ε > 0, a (2 + ε )-appro ximation (p olynomial-time) algorithm using O  n ε 3 p  queries. They also show ed a simple low er b ound that Ω( n/p ) queries are necessary for an y constan t-factor approximation. The latter result is information-theoretic and rules out algorithms with fewer queries, even allowing an arbitrarily large running time. These results raised the question of whether there exist (exp onential-time) al- gorithms that obtain a better-than-2 appro ximation, while still querying only O ( n/p ) edges. This question was answered in the aﬃrmativ e b y Derakhshan, Durv asula, and Haghtalab [ DDH23 ], who obtained an approximation factor of  3 2 + ε  , while querying O  n εp  edges. This result, which help ed delineate the information complexit y of the problem by breac hing the p olynomial-time solv- abilit y barrier, led to the natural question: can w e obtain a near-optimal (1 + ε )-appro ximation algorithm using O ε ( n/p ) queries? In terestingly , Behnezhad, Blum, and Derakshan had previously addressed this question for bip artite graphs: they obtained an (1 + ε )-appro ximation using O ε,p ( n ) queries, although the dep endence on 1 /p and 1 /ε w ere triple-exp onen tial [ BBD22 ]. The ﬁrst algorithm to obtain a (1 + ε )-appro ximation for general graphs was obtained recently by Derakhshan, Saneian, and Xun [ DSX25 ], but their algorithm uses (sup er-linear) O  n p · RS( n )  queries. Here, RS refers to Ruzsa-Szemer ´ edi Graphs, and RS( n ) is the largest β suc h that there exists an n -vertex graph whose edges can b e partitioned in to β induced, edge-disjoin t matchings of size Θ( n ). The v alue of RS( n ) is kno wn to be at least 2 Ω  log n log log n  and could be as large as n 2 Θ(log ∗ n ) . Although the num b er of queries in this last result is sup er-linear, it applies to a more general setting. Previously , [ DDH23 ] had observed that under a regime p ermitting “mild” correlation 1 b et w een edge realizations, surpassing the 3 2 factor requires Ω ( n · RS( n )) queries. Since the result of [ DSX25 ] applies to this regime as w ell, their bound is tigh t in this mildly correlated setting. This left op en the question of whether one can obtain a tight b ound of O ε ( n/p ) in the original setting of indep enden t edge sampling, or whether this dependence on the parameter RS( n ) w as fundamen tal to the problem even with full indep endence. 1.1 Our Result In this pap er, we give a (1 + ε )-appro ximation algorithm for the sto c hastic v ertex cov er problem using O ε ( n/p ) queries, for any small ε > 0. By the lo wer b ound of [ BBD22 ], our algorithm is optimal , up to the dependence on ε which is 1 /ε 5 . T o the best of our kno wledge, the only previously kno wn (1 + ε )-approximation with linearly man y (in n ) queries w as for bipartite graphs, but a linear dep endence on 1 /p (or polynomial dep endence on 1 /ε ) was not kno wn ev en in this sp ecial case. Our result also shows that the dep endence on RS( n ) in [ DSX25 ] was an artifact of the correlations b et w een edges, and in this sense, is not fundamen tal to the sto chastic v ertex co v er problem itself. Theorem 1.1. F or any ε ∈ (0 , c ) , wher e c > 0 is a smal l enough c onstant, ther e is a deterministic algorithm for the sto chastic vertex c over pr oblem that achieves an appr oximation factor of 1 + ε using O  n ε 5 p  e dge queries. Similar to all previous algorithms for sto c hastic v ertex cov er, our approximation factor is with resp ect to the exp e cte d size of the minimum v ertex cov er in the realized graph G ⋆ , and the set of edges queried b y our algorithm is non-adaptive , i.e., independent of the realized graph G ⋆ . 1.2 Our T ec hniques Description of the algorithm. Our algorithm is simple, and in some sense, canonical among non-adaptiv e algorithms. Since the set of edges queried by the algorithm is non-adaptive, the output m ust deterministically include a v alid vertex cov er on the remaining (non-queried) edges in G to ensure feasibility . Call this deterministic v ertex set P . Since P is alw ays part of the output, it is wasteful to query an y edge inciden t to P . So, w e query the edges that are not inciden t to P , i.e., that are in the induced graph G [ V \ P ]. These queries rev eal the realized graph G ⋆ [ V \ P ]; we compute its m inim um vertex cov er and add these v ertices to P in the output. Finally , to c ho ose P optimally , we solv e an optimization problem that minimizes the exp ected size of the v ertex co ver output b y the ab ov e algorithm, under the constrain t that the n um b er of edges in G [ V \ P ] is at most O ε ( n/p ), our desired query b ound. This optimal c hoice of P is denoted ˆ P . W e call this the Ver tex-Cover algorithm and formally describe it in Section 2.3 . An adaptive algorithm as an analysis to ol. While our algorithm is simple, its analysis is quite subtle. In the rest of this section, w e give an outline of the main ideas in the analysis. Since our algorithm is deﬁned via an optimization problem, it is diﬃcult to directly compare its cost to opt , the exp ected size of an optimal vertex co v er. Instead, we ﬁrst deﬁne a surrogate algorithm that adaptively c ho oses a set SEED( G ⋆ ) such that G [ V \ SEED( G ⋆ )] has O ε ( n/p ) edges. Later, w e will compare this adaptiv e strategy to our non-adaptiv e Ver tex-Cover algorithm. The set SEED( G ⋆ ) has t wo parts: a non-adaptiv e part SEED NA and an adaptiv e part SEED A ( G ⋆ ). First, w e describ e the choice of SEED NA . Let MVC( H ) denote a minimu m vertex co ver of any graph H . 2 W e partition the vertices into three groups, L , M and S , according to their probability to app ear in MV C( G ⋆ ). V ertices in L hav e “large” probability (at least 1 − 2 ε ), those in M hav e “mo derate” probabilit y (b et ween ε and 1 − 2 ε ) and the ones in S hav e “small” probability (at most ε ). Observe that w e can safely include all vertices in L to SEED NA . That is b ecause (1 − 2 ε ) · | L | ≥ opt , from whic h it follows that E [ | L \ MV C( G ⋆ ) | ] ≤ O ( ε ) · opt . Next, we can discard all v ertices in S ; these vertices will not app ear in SEED( G ⋆ ) for any G ⋆ . Using the fact that the v ertices in S are infrequen t in the optimal solution, we sho w that there are only O ε ( n/p ) edges that are en tirely within S or b et ween S and M , therefore these edges can b e queried (notice that the edges b et ween S and L will b e cov ered b y v ertices in L ). This allo ws us to fo cus on the subgraph G [ M ] in the remaining discussion. Deciding the adaptive part of SEED( G ⋆ ) . The remaining vertices, namely the set M , app ear in MVC( G ⋆ ) with probabilities ranging from ε to 1 − 2 ε . W e cannot aﬀord to add all v ertices in M to SEED NA , but we cannot discard all these v ertices either. Instead, w e use a greedy algorithm to select vertices in M to add to SEED NA . A natural strategy w ould b e to c ho ose vertices with high degree in G . Indeed, if the degree of a v ertex is at most O ε (1 /p ), we can query all its inciden t edges, and hence, it is redundan t to add such a vertex to SEED( G ⋆ ). But, in general, high-degree v ertices in M ma y only app ear in MV C( G ⋆ ) with probabilit y ε , and as such, they migh t b e n umerous compared to opt . Since we cannot aﬀord to add all these vertices to SEED NA , w e ask: which high-degree vertices should w e prefer? The answer to the ab o ve question lies at the heart of our analysis. Observe that if a high-degree v ertex v is not in MVC( G ⋆ ), then al l the neighb ors of v m ust be in MVC( G ⋆ ). Using this observ ation, w e deﬁne a deterministic procedure (called the Ver tex-Seed algorithm) that outputs a sequence of vertices Q , where the i -th v ertex v i has the follo wing prop ert y: with probabilit y at least some constan t δ ov er the choice of G ⋆ , the v ertex v i is not in MV C( G ⋆ ) and has a large neigh b orho od in G among v ertices whose status (whether in MVC( G ⋆ ) or not) has not b een “decided” b y previous v ertices in Q . The intuition is that suc h a v ertex r eve als a large n umber of previously undecided v ertices to b e in MVC( G ⋆ ), and therefore, allows | Q | to b e b ounded against opt . W e add the v ertices in Q to SEED NA , and show that the exp ected size of | Q \ MV C( G ⋆ ) | is at most O ( ε 2 ) · opt (a bound of O ( ε ) · opt w ould suﬃce for no w, but later, w e will need the sharp er bound of O ( ε 2 ) · opt ). F urthermore, w e add the neigh b ors of v ertices in Q \ MVC( G ⋆ ) to the adaptiv e set SEED A ( G ⋆ ); since these v ertices m ust b e in MVC( G ⋆ ), this does not aﬀect the approximation b ound. Finally , w e consider the v ertices A that rev eal a large n umber of previously undecided v ertices to b e in MV C( G ⋆ ) for a sp eciﬁc realization G ⋆ , but are not in Q b ecause they do not meet the probabilit y threshold δ ov er the diﬀeren t realizations of G ⋆ . W e add the v ertices in A to the adaptive set SEED A ( G ⋆ ) as w ell, and show that the expected size of A \ MVC( G ⋆ ) is also O ( ε ) · opt . This completes the description of the set SEED( G ⋆ ). This last step ensures that the vertices outside SEED( G ⋆ ) each hav e only O ε (1 /p ) edges in G [ M ] that need to be queried. Using SEED( G ∗ ) to analyze our algorithm. So far, we hav e describ ed the adaptiv e set SEED( G ⋆ ), and outlined intuition for t wo facts: (1) that SEED( G ⋆ ) con tains at most (1 + O ( ε )) · opt v ertices in exp ectation, and (2) that there are at most O ε ( n/p ) edges in G that are not co v ered b y SEED( G ⋆ ). W e remark that although SEED( G ⋆ ) satisﬁes the conditions of the optimization problem in the Ver tex-Cover algorithm, it do es not immediately giv e an adaptive algorithm 3 with O ε ( n/p ) queries. This is b ecause the computation of the set SEED( G ⋆ ) can require more than O ε ( n/p ) queries. So, the reader should view the deﬁnition of SEED( G ⋆ ) strictly as an analysis to ol, and not an alternative adaptive algorithm. In tuitiv ely , we wan t to compare SEED( G ⋆ ) to ˆ P , the vertices c hosen non-adaptively in the Ver tex-Cover algorithm. If SEED( G ⋆ ) were non- adaptiv e, this comparison is immediate since it would b e a v alid c hoice of S in the optimization problem deﬁning the Ver tex-Cover algorithm. But, in general, SEED( G ⋆ ) can v ary based on the realization of G ⋆ , and therefore, the expected size of SEED( G ⋆ ) might b e smaller than | ˆ P | . Dealing with the adaptivit y of SEED( G ⋆ ) . Note that SEED( G ⋆ ) con tains t wo parts: a non- adaptiv e set SEED NA and an adaptive set SEED A ( G ⋆ ). The set SEED A ( G ⋆ ) only dep ends on tw o random quan tities: the identit y of the set Q ∩ MV C( G ⋆ ) and the realizations of edges incident to v ertices in Q \ MVC( G ⋆ ). Imp ortan tly , our choice to include Q in SEED NA mak es SEED( G ⋆ )’s extension to a v alid vertex cov er (i.e. MVC( G ⋆ [ V \ SEED( G ⋆ )])) independent of the realizations of the edges inciden t to Q ∩ MVC( G ⋆ ). This allows us to ﬁx the realization of these edges, and analyze SEED( G ⋆ ) ov er the remaining randomness in G ⋆ . This limits the adaptivity of SEED( G ⋆ ), as it now only depends on the se t Q ∩ MV C( G ⋆ ) which can tak e at most 2 | Q | = 2 O ( ε 2 ) · opt v alues. In addition, we show that the size of the SEED( G ⋆ )’s extension, namely | MV C( G ⋆ [ V \ SEED( G ⋆ )]) | , sharply concen trates. W e do so b y pro ving a general theorem on the conv ergence of | MV C( G ⋆ ) | on a randomly generated graph G ⋆ ∼ G p : Theorem 1.2. L et Z = | MV C( G ⋆ ) | and opt = E G ⋆ ∼ G p [ | MV C( G ⋆ ) | ] . Then for any t ≥ 0 , Pr[ | Z − opt | ≥ t ] ≤ 2 exp  − t 2 4 C · opt + 2 t/ 3  , (1) wher e C < 8 is a c onstant. Note that this theorem is not sp eciﬁc to our construction and establishes a general concen tration result for minimum vertex cov er. The tail b ound prov en is muc h sharp er than what one can obtain from standard tec hniques (e.g., vertex exp osure martingales) and w e b eliev e it might b e of indep enden t combinatorial interest. 1 Finally , we use a union b ound ov er the 2 O ( ε 2 ) · opt diﬀeren t realizations of Q ∩ MVC( G ⋆ ), for eac h of whic h the tail b ound on the size of MV C( G ⋆ [ V \ SEED NA ]) applies. Using this, w e can no w claim that the adv an tage of adaptivit y in deﬁning SEED( G ⋆ ) is negligible. F ormally , w e show that, for an y ﬁxed realization of the edges inciden t to Q , the set SEED( G ⋆ ) can take at most 2 O ( ε ) · opt diﬀeren t forms for the graphs G ⋆ that are consistent with the ﬁxed realization of edges inciden t to Q . Out of those forms, the one that minimizes the exp ected size of SEED( G ⋆ ) ∪ MVC( G ⋆ [ V \ SEED( G ⋆ )]) is at most (1 + O ( ε )) w orse than adaptiv ely selecting the b est of these forms for each graph G ⋆ . Since the latter holds for an y realization of the edges inciden t to Q , a veraging o ver their randomness giv es us that there exists a deterministic set P that pro duces a solution of expected size (1 + O ( ε )) opt . This, in turn, establishes that our algorithm, which optimizes ov er all non-adaptive choices of P , is a (1 + O ( ε ))-appro ximation algorithm. 1 A strengthened version of T alagrand’s inequality for c -Lipsc hitz, s -certiﬁable functions also yields a sligh tly weak er concen tration b ound, whic h still suﬃces to derive our results; see Remark 4.5 . 4 Roadmap. W e formally deﬁne the sto c hastic v ertex cov er problem and give our Ver tex-Cover algorithm in Section 2 . The analysis of this algorithm, assuming Theorem 1.2 (the concentration result for minim um vertex cov er), in given in Section 3 . Finally , the pro of of Theorem 1.2 is giv en in Section 4 . 2 The Ver tex-Cover Algorithm In this section, w e ﬁrst formally deﬁne the sto c hastic v ertex co ver problem, and establish notation that we will use throughout the pap er. Then, w e giv e a formal description of our Ver tex-Cover algorithm. 2.1 Notation and T erminology Throughout, let G = ( V , E ) denote the base graph, with n = | V | . Fix an edge-realization parameter p ∈ (0 , 1]; each edge e ∈ E is realized independently with probabilit y p . All results extend to the heterogeneous mo del with edge-wise probabilities ( p e ) e ∈ E b y replacing p in the statemen ts with p := min e ∈ E p e . F or clarity of exp osition, throughout the paper we presen t the homogeneous case p e = p for all e ∈ E . Let G ⋆ b e the random realized subgraph obtained b y including each e ∈ E indep enden tly with probabilit y p ; w e write G ⋆ ∼ G p . F or a set of edges F ⊆ E , w e use G ⋆ \ F ⋆ to denote the distribution obtained b y realizing all edges in E \ F as abov e while leaving the edges in F unresolved (to b e realized later). An e dge query rev eals whether a particular e ∈ E is presen t in G ⋆ . F or a graph H = ( V , E ) and v ∈ V , let N H ( v ) b e the set of neighbors of v in H ; for S ⊆ V , let N H ( S ) = { u ∈ V \ S : ∃ s ∈ S with ( u, s ) ∈ E } . Let MV C( · ) be the mapping that assigns to ev ery realized graph G ⋆ an arbitrary but ﬁxed minim um v ertex cov er MVC( G ⋆ ) = S ⊆ V . Let MV C( G ⋆ ) denote the resulting (random) minim um vertex cov er on the realized graph, and deﬁne opt = E G ⋆ ∼ G p  | MV C( G ⋆ ) |  . F or each v ∈ V , set c v = Pr  v ∈ MV C( G ⋆ )  , so that P v ∈ V c v = opt . F or an edge e = ( u, v ) ∈ E deﬁne the probability that edge e is cov ered as c e = Pr[ u ∈ MV C( G ⋆ ) or v ∈ MVC( G ⋆ )]. 2.2 The Sto c hastic V ertex Cov er Problem Giv en a base graph G = ( V , E ) and a probability parameter p , in the sto c hastic v ertex co ver problem, the goal is to output a feasible vertex cov er of the realized graph G ⋆ , in whic h eac h edge of G is realized in G ⋆ indep enden tly with probability p , while querying as few edges in G as p ossible. The algorithm is allow ed unlimited access to the base graph G as well as unlimited computation time. F or an α > 0, w e sa y a (randomized) solution C is an α -appro ximate sto c hastic vertex cov er, if any edge in G ⋆ has at least one of its endpoint in C , and E [ | C | ] ≤ α · E [ | MVC( G ⋆ ) | ]. In this paper, w e giv e a non-adaptiv e (1 + ε )-approximation algorithm for the sto chastic vertex cov er problem, i.e., it queries a ﬁxed set of edges c hosen in adv ance, independent of all query outcomes. 2.3 Description of the Ver tex-Cover algorithm First, we c ho ose a subset P ⊆ V , minimizing | P | + E  | MV C( G ⋆ [ V \ P ]) |  under the constraint that the induced subgraph G [ V \ P ] con tains at most O ( n/ ( ε 5 p )) edges. Let that set b e ˆ P . W e 5 then query all edges in the induced subgraph G [ V \ ˆ P ] and compute the minim um vertex co ver H of the no w kno wn, realized graph G ⋆ [ V \ ˆ P ]. Finally , w e return the set ˆ P ∪ H as our v ertex co ver. Recall, as in previous pap ers, that w e are not concerned with computational eﬃciency , but only the query complexit y . The pseudoco de for the algorithm is as follo ws: Algorithm: Ver tex-Cover 1. Let ˆ P b e the optimal solution to min P ⊆ V | P | + E    MV C( G ⋆ [ V \ P ])    s.t. G [ V \ P ] has at most 2 · 10 3 n ε 5 p edges. (2) 2. Query the edges in G [ V \ ˆ P ] to get its realization G ⋆ [ V \ ˆ P ]. 3. Let H = MVC  G ⋆ [ V \ ˆ P ]  . 4. Return ˆ P ∪ H . It is immediate that this algorithm correctly pro duces a v alid v ertex co ver: Claim 2.1. The output of the Ver tex-Cover algorithm is a vertex c over for G ⋆ . Pr o of. Eac h edge either has an endp oin t in ˆ P or is an edge in the induced subgraph G ⋆ [ V \ ˆ P ]. 3 Analysis of the Ver tex-Co ver Algorithm In this section w e analyse the Ver tex-Cover algorithm using our surrogate algorithm. In the ﬁrst tw o subsections we describ e the Ver tex-Seed algorithm and b ound the size of the set Q c hosen b y the algorithm. Next we describ e how to c ho ose the adaptiv e set SEED( G ⋆ ) and pro ve that it con tains at most (1 + O ( ε )) · opt v ertices in exp ectation. Finally , we prov e that there are at most O ε ( n/p ) edges in G that are not cov ered b y SEED( G ⋆ ), and analyze the p erformance of the Ver tex-Cover algorithm. Throughout the analysis, we assume that opt ≥ c · log(1 /ε ) /ε 2 for some constant c . In App endix A w e sho w that this assumption is without loss of generality: if opt = O (log(1 /ε ) /ε 2 ), then the base graph G contains O ( n/ ( pε 3 )) edges and the Ver tex-Co ver algorithm can select ˆ P = ∅ , query all of G and obtain an exact solution. 3.1 The Ver tex-Seed algorithm In this section, w e describ e the Ver tex-Seed algorithm, a deterministic algorithm that returns a sequence of v ertices Q = ( v 1 , v 2 , . . . , v k ) in the base graph G dep ending only on a vertex cov er function V C( G ⋆ ) that maps ev ery realization G ⋆ to a ﬁxed feasible v ertex co ver of G ⋆ . The in tuition is that eac h vertex v i in Q corresp onds to a query of the type “is v i ∈ V C( G ⋆ )?” If the answ er is negative, i.e. v i / ∈ V C( G ⋆ ), then all neighbors of v i in G ⋆ are necessarily in V C( G ⋆ ). Our goal in selecting Q is to keep | Q | small while rev ealing a large n umber of v ertices in V C( G ⋆ ) b y virtue 6 of b eing neighbors of vertices in Q \ VC( G ⋆ ). The Ver tex-Seed algorithm describ es a greedy pro cedure for incrementally constructing Q with this purp ose. As input, in addition to the base graph G , the Ver tex-Seed algorithm tak es t wo parameters, δ and γ , which will b e deﬁned later. Before deﬁning the algorithm, w e in tro duce some notation: • F or a sequence of v ertices Q i = ( v 1 , v 2 , . . . , v i ) and a ﬁxed realization G ⋆ , deﬁne decided ( Q i , G ⋆ ) = { v ∈ V \ Q i : N G ⋆ ( v ) ∩ ( Q i \ VC( G ⋆ ))  = ∅ } . (3) In words, a vertex v is in decided ( Q i , G ⋆ ) if it has a neigh b or that is in Q i but not in VC( G ⋆ ). Note that a vertex v ∈ decided ( Q i , G ⋆ ) necessarily b elongs to V C( G ⋆ ), b y virtue of feasibilit y of VC( G ⋆ ). W e further let undecided ( Q i , G ⋆ ) = ( V \ Q i ) \ decided ( Q i , G ⋆ ). Thus, the sets decided ( Q i , G ⋆ ) and undecided ( Q i , G ⋆ ) induce a partition of the v ertices in V \ Q i . • F or a sequence of v ertices Q i = ( v 1 , v 2 , . . . , v i ) and a ﬁxed realization G ⋆ , deﬁne revealing ( Q i , G ⋆ ) =  v ∈ V \ Q i : v ∈ V C( G ⋆ ) and | N G ( v ) ∩ undecided ( Q i , G ⋆ ) | ≥ 1 p · γ  . (4) In w ords, a vertex v is in revealing ( Q i , G ⋆ ) if it is not in V C( G ⋆ ) and has at least 1 p · γ neigh b ors in the base graph G that are in undecided ( Q i , G ⋆ ). Intuitiv ely , a vertex v ∈ revealing ( Q i , G ⋆ ) is a go od candidate for extending Q i to Q i +1 since it is likely to mov e a large num b er of v ertices from undecided ( Q i , G ⋆ ) to decided ( Q i +1 , G ⋆ ). The Ver tex-Seed algorithm starts with Q b eing empt y and then constructs it iteratively by adding v ertices that hav e a high probability of b eing in revealing ( Q, G ⋆ ). The pseudo co de is as follo ws: Algorithm: Ver tex-Seed Initialize Q to b e the empt y sequence. While there is a v ertex v suc h that Pr G ⋆ [ v ∈ revealing ( Q, G ⋆ )] ≥ δ , app end v to Q . When analyzing the algorithm, w e use the notation Q i = ( v 1 , . . . , v i ) to denote the set of v ertices in Q after i iterations of the Ver tex-Seed algorithm: Q 0 denotes the initial empty set, and Q k denotes the ﬁnal set if the algorithm terminates after k iterations. Note that the sequence of computed Q i ’s is deterministic, and when we measure probability within the lo op (namely Pr G ⋆ [ v ∈ revealing ( Q i , G ⋆ )]), it is only o ver the dra w of G ⋆ . F urthermore, note that the algorithm necessarily terminates after at most n iterations. 3.2 Bounding the num b er of v ertices in Q Observ e that the Ver tex-Seed algorithm is deterministic and hence, the num b er of v ertices in Q is also deterministic. The next lemma bounds this num b er. 7 Lemma 3.1. L et Q = ( v 1 , . . . , v k ) b e the se quenc e of vertic es c onstructe d by the Ver tex-Seed algorithm. F or any n ≥ 4 log (2 /δ ) , we have that k ≤ 10 γ δ · n . First, w e give the intuition b ehind the lemma, b efore we formally prov e it. Supp ose a v ertex v ∈ revealing ( Q, G ⋆ ) is added to Q by the Ver tex-Seed algorithm. Then, it has 1 p · γ neigh b ors in G that are curren tly undecided. Informally , w e should exp ect that 1 /γ of these neighbors realize in G ⋆ and, therefore, are mo ved from undecided ( Q, G ⋆ ) to decided ( Q, G ⋆ ) as a result of adding v to Q . Since there are only n vertices in G , the n umber of times this can happ en is γ · n . Finally , since the ev ent v ∈ revealing ( Q, G ⋆ ) holds with probability δ for every v ertex added to Q , we should exp ect that Q can only con tain γ · n δ v ertices. The argument ab o ve is not formal b ecause the exp ected num b er of v ertices mov ed from undecided ( Q, G ⋆ ) to decided ( Q, G ⋆ ) by a v ertex v ∈ revealing ( Q, G ⋆ ) is not necessarily 1 /γ , since the neigh b orhoo d of v in G ⋆ is not indep enden t of the even t v ∈ revealing ( Q, G ⋆ ). Nevertheless, we sho w that this in tuition holds, and can b e made formal, in the pro of b elo w. Pr o of of L emma 3.1 . F or i = 1 , . . . , k , let X i b e the (random) indicator v ariable that is 1 if v i ∈ revealing ( Q i − 1 , G ⋆ ) an d 0 otherwise. Note that E [ X i ] ≥ δ for all i by deﬁnition of the Ver tex-Seed algorithm. Let X = P k i =1 X i . Thus, E [ X ] = P k i =1 E [ X i ] ≥ k · δ . Let u = 10 γ δ · n . Recall that w e wan t to sho w that k ≤ u . W e consider tw o cases, X ≤ u · δ / 2 and X > u · δ / 2, and write k · δ ≤ E [ X ] ≤ u · δ / 2 + k · Pr[ X > u · δ / 2] . (5) T o complete the pro of, w e will show that Pr[ X > u · δ / 2] ≤ δ / 2, whic h implies k ≤ u as required. No w let G b e the subset of realizations of G ⋆ for whic h X ≥ u · δ / 2. In other w ords, w e w ant to sho w that Pr[ G ⋆ ∈ G ] ≤ δ / 2. W e further partition G as follo ws: for each S ⊆ Q suc h that | S | ≥ u · δ / 2, w e let G S ⊆ G b e the realizations of G ⋆ for which { v i ∈ Q | v i ∈ revealing ( Q i − 1 , G ⋆ ) } = S . These c hosen G S partition G since, b y deﬁnition of G , ev ery realization G ⋆ ∈ G has X = |{ v i ∈ Q | v i ∈ revealing ( Q i − 1 , G ⋆ ) }| ≥ u · δ / 2 . W e th us ha ve that G is partitioned into at most P k ℓ = uδ / 2  k ℓ  ≤ 2 k man y sets G S . W e pro ceed to b ound Pr[ G ⋆ ∈ G S ] for a ﬁxed set S . T o do this, the follo wing viewp oin t for generating a realization G ⋆ will b e helpful. There is a random string R = ( r 1 , r 2 , . . . ) of bits where eac h r i is a random bit that is 1 with probabilit y p and 0 otherwise. The realization G ⋆ is generated as follows: • F or each v i ∈ S (in the order it was added to Q ), the existence of an edge e = ( v i , v ) in G ⋆ is determined by the next random bit in R if the follo wing properties are satisﬁed b y v : – v is a neighbor of v i in the base graph G , – v is not in { v 1 , . . . , v i − 1 } , and – there is no edge ( v j , v ) realized so far in G ⋆ suc h that v j is a v ertex in { v 1 , . . . , v i − 1 } ∩ S . • An y remaining edge not considered ab o ve is simply realized with probabilit y p indep enden tly . 8 The ab ov e pro cedure simply realizes eac h edge with probabilit y p indep enden tly but distinguishes (as a function of S ) certain random bits with R . No w, we claim that if an y G ⋆ realized this w ay ends up b elonging to G S , then w e m ust ha ve used at least 5 n/p bits from R . T o see this, consider the realization of edges adjacen t to some ﬁxed v ertex v i ∈ S in the pro cess abov e. Since the realized graph G ⋆ ⊆ G S , it must b e the case that v i ∈ revealing ( Q i − 1 , G ⋆ ), whic h means that v i has at least 1 / ( p · γ ) neighbors in G that are in undecided ( Q i − 1 , G ⋆ ). Since undecided ( Q i − 1 , G ⋆ ) ⊆ V \ Q i − 1 , all these neigh b ors are not in Q i − 1 . Moreo ver, since eac h v ertex v i ∈ S is in revealing ( Q i − 1 , G ⋆ ), we hav e that none of the vertices in S are in VC( G ⋆ ). Therefore, if a vertex v is in undecided ( Q i − 1 , G ⋆ ), its neigh b orho od in G ⋆ is disjoint from Q i − 1 ∩ S . As a result, v i has at least 1 / ( p · γ ) neigh b ors v in G , such that v / ∈ Q i − 1 , and furthermore, the neigh b orhoo d of v in G ⋆ is disjoint from Q i − 1 ∩ S . F or any such neigh b or v of v i , the process described abov e uses a new bit from R to conﬁrm the presence of the edge ( v i , v ). In total, the n umber of p oten tial edges determined using the random bits from R is hence at least | S | p · γ ≥ u · δ / 2 p · γ = 5 n p , since u = 10 γ δ · n. Observ e also that if an edge ( v i , v ) is conﬁrmed to b e in G ⋆ using a bit from R , then no other edge ( v j , v ) for j > i is determined using a bit from R . This means that the n umber of edges conﬁrmed to b e in G ⋆ using bits from R can b e at most the num b er of distinct vertices in G , which is n . In summary , w e conclude that, if the realized graph G ⋆ ∈ G S , then it m ust b e the case that we used at least 5 n/p bits from R , and at most n of these bits realized edges in G ⋆ . So, consider the ﬁrst 5 n/p bits in R . As the expectation of the num b er of ones in the indep enden t Bernoulli trials r 1 , . . . , r 5 n/p is 5 n , w e ha ve by a standard Chernoﬀ b ound that Pr[ G ⋆ ∈ G S ] ≤ Pr   5 n/p X j =1 r j ≤  1 − 4 5  5 n   ≤ exp  − 5 n · (4 / 5) 2 3  ≤ e − n . Finally , b y a union bound o ver the partitioning of G , Pr[ G ⋆ ∈ G ] ≤ 2 k · e − n ≤ (2 /e ) n ≤ δ / 2 , where the last inequality holds for n ≥ 4 log(2 /δ ). So, by Eq. ( 5 ), k ≤ 10 γ δ · n as required. 3.3 Selection of the v ertex set SEED( G ⋆ ) W e can use the Ver tex-Seed algorithm directly on the base graph G and add the v ertices in Q to the set SEED( G ⋆ ). But, this results in a set Q whose size is indep enden t of the exp ected size of MVC( G ⋆ ), which translates to an additive error in the appro ximation bound of the algorithm. Instead, we use the Ver tex-Seed algorithm in a more n uanced fashion that av oids this additiv e loss. W e partition v ertices V in G into three sets: 9 • The set L := { v ∈ V : Pr G ⋆ [ v ∈ MV C( G ⋆ )] ≥ 1 − 2 ε } of v ertices that hav e lar ge probability of b eing in MV C( G ⋆ ), • the set M := { v ∈ V : ε < Pr G ⋆ [ v ∈ MV C( G ⋆ )] < 1 − 2 ε } of v ertices that hav e mo der ate probabilit y of b eing in MV C( G ⋆ ), and • the set S := { v ∈ V : Pr G ⋆ [ v ∈ MV C( G ⋆ )] ≤ ε } of v ertices that ha ve smal l probabilit y of b eing in MV C( G ⋆ ). Recall that opt denotes the expected size of the minim um v ertex cov er, i.e., opt := E [MV C( G ⋆ )]. In the next t wo claims, w e b ound the n umber of vertices in L and M , in terms of opt . Note that L and M are deterministic sets, but MV C( G ⋆ ) is random. Claim 3.2. F or any ε ≤ 1 / 4 , E [ L \ MVC( G ⋆ )] is at most 4 ε · opt . Pr o of. Note that for any vertex v ∈ L , we hav e Pr[ v ∈ L \ MVC( G ⋆ )] ≤ 2 ε . Therefore, E [ | L \ MVC( G ⋆ ) | ] ≤ 2 ε · | L | . Using this bound, w e get | L | ≤ E [ | L \ MVC( G ⋆ ) | ] + E [ | MVC( G ⋆ ) | ] ≤ 2 ε · | L | + opt , whic h implies that | L | ≤ opt / (1 − 2 ε ). Therefore, E [ | L \ MVC( G ⋆ ) | ] ≤ 2 ε · | L | ≤ 2 ε 1 − 2 ε · opt ≤ 4 ε · opt , for ε ≤ 1 / 4 . Claim 3.3. The numb er of vertic es in M is at most opt /ε . Pr o of. Ev ery v ertex in M is in MV C( G ⋆ ) with probability at least ε , and opt is at least the expected n umber of vertices in M that are in MV C( G ⋆ ). The claim follows. No w, w e run the Ver tex-Seed algorithm on the induced graph G [ M ] and designate the output set Q . W e also use G ⋆ [ M ] to denote the induced subgraph on M of the realized graph G ⋆ . In the Ver tex-Seed algorithm, we use the follo wing parameters: - The parameter δ, which is used as the probabilit y threshold for a v ertex to b e added to Q , is set to δ : = ε 2 . - The parameter γ , whic h decides the threshold on the n um b er of undecided neigh b ors in the base graph G for a vertex to b e deemed revealing , is set to γ : = ε 3 · δ / 10 3 = ε 5 / 10 3 . F urthermore, the vertex co ver VC( G ⋆ [ M ]) used in the Ver tex-Seed algorithm is set to MVC( G ⋆ ) ∩ M . Clearly , this is a feasible v ertex co ver of G ⋆ [ M ]. Based on this choice of parameters, w e can b ound the num b er of vertices in Q . Recall that Ver tex-Seed is a deterministic algorithm and it is run on a deterministic graph G [ M ]. Hence, Q is deterministic. 10 Claim 3.4. The size of the set Q output by the Ver tex-Seed algorithm when run on G [ M ] is at most ( ε 2 / 100) · opt . Pr o of. By Lemma 3.1 and our choice of parameters, we get | Q | ≤ ( ε 3 / 100) · | M | . No w, the claim follo ws from the bound on M in Claim 3.3 . W e are no w ready to deﬁne the v ertex set SEED( G ⋆ ) for an y ﬁxed realized graph G ⋆ . W e deﬁne SEED( G ⋆ ) as the union of four sets describ ed b elo w. The ﬁrst t wo sets, L and Q , do not dep end on the realized graph G ⋆ , whereas the last t w o sets dep end on G ⋆ . - the set L of v ertices that ha ve large probabilit y of b eing in MV C( G ⋆ ), - the set Q ⊆ M of v ertices returned by the Ver tex-Seed algorithm on G [ M ], - the set of vertices decided b y Q in G ⋆ [ M ], i.e., decided ( Q, G ⋆ [ M ]), and - the set of vertices A := { v ∈ M \ Q : | N G [ M ] ( v ) ∩ undecided ( Q, G ⋆ [ M ]) | ≥ 1 p · γ } . Next we prov e that in expectation, SEED( G ⋆ ) contains a small num b er of v ertices that are not in MV C( G ⋆ ). T o do this we ﬁrst pro ve the follo wing b ound on the size of A . Claim 3.5. We have E [ | A \ MVC( G ⋆ ) | ] ≤ ε · opt . Pr o of. Consider any vertex v ∈ M \ Q . Up on sampling G ⋆ ∼ G p , observe that if v ∈ A \ MVC( G ⋆ ), then v ∈ revealing ( Q, G ⋆ [ M ]). By deﬁnition of the termination of the Ver tex-Seed algorithm, w e therefore hav e that Pr G ⋆ [ v ∈ A \ MVC( G ⋆ )] ≤ Pr G ⋆ [ v ∈ revealing ( Q, G ⋆ [ M ])] < δ. Finally , b y the b ound on M in Claim 3.3 , this implies E [ | A \ MVC( G ⋆ ) | ] < δ · | M | ≤ ε · opt . W e can no w bound the exp ected size of the set SEED( G ⋆ ). Lemma 3.6. We have E [ | SEED( G ⋆ ) \ MVC( G ⋆ ) | ] ≤ O ( ε ) · E [MVC( G ⋆ )] for any ε < 1 / 4 . Pr o of. Note ﬁrst, that decided ( Q, G ⋆ [ M ]) ⊆ MVC( G ⋆ ), since b y the deﬁnition of decided , for each v ertex v ∈ decided ( Q, G ⋆ [ M ]), there is a v ertex v i ∈ MV C( G ⋆ ) that is adjacen t to v in the realization G ⋆ . Thus, E [ | SEED( G ⋆ ) \ MVC( G ⋆ ) | ] = E [ | L \ MV C( G ⋆ ) | ] + E [ | Q \ MV C( G ⋆ ) | ] + E [ | A \ MV C( G ⋆ ) | ] . By Claim 3.2 , 3.4 , and 3.5 , w e ha ve E [ | SEED( G ⋆ ) \ MVC( G ⋆ ) | ] ≤ 4 ε · opt + ( ε 2 / 100) · opt + ε · opt ≤ O ( ε ) · E [ | MVC( G ⋆ ) | ] . 11 3.4 Using Ver tex-Seed to analyze Ver tex-Cover In this section we prov e our main result ab out the p erformance of Ver tex-Cover algorithm, whic h w e formally state in the next theorem. Theorem 3.7. The output of the Ver tex-Cover algorithm has an exp e cte d size of at most (1 + O ( ε )) · opt . T o analyze the Ver tex-Cover algorithm, we ﬁrst deﬁne an auxiliary problem (Problem ( 6 )), whic h strengthens the constraints of Problem ( 2 ) by requiring that the solution S includes the v ertices in Q , where Q is the deterministic output of the Ver tex-Seed algorithm on G [ M ]. F or the remaining of the section we will use ˆ P to denote the optimal solution to Problem ( 6 ). min P ⊆ V | P | + E    MV C( G ⋆ [ V \ P ])    s.t. Q ⊆ P , G [ V \ P ] has at most 2 · 10 3 n ε 5 p edges. (6) Our analysis compares the cost of ˆ P to the exp ected cost of a strategy that adaptiv ely pic ks S = SEED( G ⋆ ) for every realization G ⋆ . W e b egin b y proving that for an y ﬁxed realization G ⋆ , the set SEED( G ⋆ ) is a feasible solution to ( 6 ) as Q ⊆ SEED( G ⋆ ), b y deﬁnition, and b y sho wing that the induced graph G [ V \ SEED( G ⋆ )] is sparse (Lemma 3.8 ). Moreo ver, Lemma 3.6 implies that E [ | SEED( G ⋆ ) | + | MV C( G ⋆ [ V \ SEED( G ⋆ )]) | ] = (1 + O ( ε )) · opt , therefore adaptiv ely picking SEED( G ⋆ ) as a solution for eac h G ⋆ , would ac hieve an exp ected ob jectiv e v alue of (1 + O ( ε )) · opt in Problem ( 6 ). Our goal is to prov e that the optimal static solution ˆ P p erforms essen tially as w ell. F or any realization G ⋆ , the set SEED( G ⋆ ) dep ends only on t w o quan tities: (a) on the realization of edges inciden t to Q and (b) on the intersection of Q with MVC( G ⋆ ). Let F b e the set of edges, of the base graph G , whic h hav e at least one endpoint in Q and let also F ⋆ ⊆ F denote a ﬁxed realization of them. F or conv enience, w e slightly abuse notation and index each p ossible SEED set b y the tuple ( Q V C , F ⋆ ) that determines it. F or the sake of completeness, we re-state the formal deﬁnitions of decided , undecided and SEED in this new notation. F or a ﬁxed F ⋆ ⊆ F and a ﬁxed Q V C ⊆ Q , we hav e • decided ( Q V C , F ⋆ ) = { u ∈ M \ Q : ∃ ( u, v ) ∈ F ⋆ s.t. v ∈ ( Q \ Q V C ) } • undecided ( Q V C , F ⋆ ) = M \ ( Q ∪ decided ( Q V C , F ⋆ )) • A ( Q V C , F ⋆ ) = { u ∈ M \ Q : | N G [ M ] ( u ) ∩ undecided ( Q V C , F ⋆ ) | ≥ 1 / ( p · γ ) } • SEED( Q V C , F ⋆ ) = L ∪ Q ∪ decided ( F ⋆ , Q V C ) ∪ A ( Q V C , F ⋆ ). Notice that for an y realization G ⋆ , for which F ⋆ is the realization of the edges in F , w e ha ve that decided ( Q ∩ MVC( G ⋆ ) , F ⋆ ) = decided ( Q, G ⋆ [ M ]) and SEED( G ⋆ ) = SEED( Q ∩ MVC( G ⋆ ) , F ⋆ ). A key observ ation is that the ob jectiv e of Problem 6 is indep enden t of ho w the edges in F realize, b ecause ev ery feasible solution S con tains the set Q , and th us the edges in F are not part of the 12 induced graph G [ V \ S ]. W e can therefore ﬁx a realization F ⋆ of these edges and compare ˆ P and SEED( G ⋆ ) ov er the randomness outside of F . F or a ﬁxed realization F ⋆ , SEED only depends on Q V C , which ranges o v er subsets of Q and can, th us, tak e at most 2 | Q | v alues. By emplo ying the concen tration of the minim um v ertex co v er (Theorem 1.2 ) and taking a union b ound ov er the 2 | Q | p ossible v alues of SEED, we are able to show that the v alue of SEED that has the minim um exp ected cost (for the ﬁxed F ⋆ and o ver the randomness outside of F ) is almost as go od as adaptively selecting SEED( G ⋆ ). Since eac h v alue of SEED is feasible for Problem 6 and the ob jective only dep ends on the randomness outside F , w e can conclude that ˆ P cannot b e worse than the best ﬁxed SEED. Finally , since the ab o ve holds for an y ﬁxed F ⋆ , w e conclude the pro of taking the expectation o ver the randomness in F and showing that the exp ected c ost of ˆ P almost as go o d as the exp ected cost of SEED( G ⋆ ). W e will now formalize the pro of outlined ab ov e. First, w e b egin by pro ving that for an y F ⋆ ⊆ F and any Q V C ⊆ Q , the induced graph G [ V \ SEED( Q V C , F ⋆ )] is sparse. This prov es that SEED( Q V C , F ⋆ ) is a feasible solution to Problem ( 6 ). Lemma 3.8. F or any F ⋆ ⊆ F and any Q V C ⊆ Q , the gr aph G [ V \ SEED( Q V C , F ⋆ )] has at most 2 · 10 3 · n pε 5 many e dges. Pr o of of L emma 3.8 . Since SEED( Q V C , F ⋆ ) contains L , all v ertices in V \ SEED( Q V C , F ⋆ ) lie in ( M ∪ S ) \ SEED( Q V C , F ⋆ ). Moreov er, since SEED( Q V C , F ⋆ ) also con tains decided ( Q V C , F ⋆ ), M \ SEED( Q V C , F ⋆ ) ⊆ undecided ( Q V C , F ⋆ ) . No w, ev ery v ertex u ∈ M \ SEED( Q V C , F ⋆ ) has at most 1 / ( p · γ ) neighbors inside undecided ( Q V C , F ⋆ ) (since otherwise, such a v ertex would be included in A ( Q V C , F ⋆ )). Hence the num b er of edges in G [ V \ SEED( Q V C , F ⋆ )] that ha ve b oth endp oin ts in M is at most | M | pγ ≤ 10 3 n pε 5 , since γ = ε 5 / 10 3 . The remaining edges in G [ V \ SEED( Q V C , F ⋆ )], are either betw een S and M or entirely within S . Fix any such edge e = ( u, v ) with u ∈ S and v ∈ ( M ∪ S ) \ SEED( Q V C , F ⋆ ), and let MVC( G ⋆ ) b e the minimum vertex co ver of a realization G ⋆ . Let c e := Pr[ e is co v ered b y MVC( G ⋆ )] ≤ Pr[ u ∈ MV C( G ⋆ )] +Pr[ v ∈ MVC( G ⋆ )] ≤ ε + 1 − 2 ε = 1 − ε. As pro ven in [ DDH23 ], there can be at most O ( n/ ( ε · p )) suc h edges ( u, v ) in G . W e also pro vide a concise pro of here for completeness. Let W = { e ∈ E : c e ≤ 1 − ε } . Set X = |{ e ∈ E : e is not co vered b y MV C( G ⋆ ) }| . By linearit y of exp ectation, E [ X ] = X e ∈ E (1 − c e ) ≥ X e ∈ W (1 − c e ) ≥ ε | W | . (7) 13 T o upp er b ound E [ X ], observe that every unco vered edge under MV C( G ⋆ ) must be absent from G ⋆ (otherwise MV C( G ⋆ ) w ould not be a v ertex co ver of G ⋆ ). The probabilit y that a vertex-induced subgraph of G has more than n/p unrealized edges is at most (1 − p ) n/p ≤ e − n . Also, there are at most 2 n c hoices for the v ertex induced subgraph H = G [ V \ MVC( G ⋆ )]. Therefore, Pr [ H has more than n/p edges] ≤ (2 /e ) n . Using the abov e, w e can upp er b ound E [ X ] as E [ X ] ≤ n p + n 2  2 e  n ≤ n p + 6 . (8) Com bining ( 7 ) and ( 8 ) yields the deterministic bound | W | ≤ E [ X ] ε ≤ n p ε + 6 ε ≤ 2 n p ε ≤ 2 n pε 5 . Therefore the total num b er of remaining edges is at most 2 · 10 3 n/ ( pε 5 ) whic h concludes the pro of of the lemma. F or con venience, we deﬁne g ( A ), for an y set A ⊆ V , to b e the random v ariable corresp onding to the size of a vertex cov er of G ⋆ that includes the set A and co vers the induced graph G ⋆ [ V \ A ] optimally , i.e., g ( A ) = | A | + | MVC( G ⋆ [ V \ A ]) | . Notice that the ob jective v alues of the optimization problems ( 2 ) and ( 6 ) are equal to E [ g ( S )]. Before we con tinue with the analysis, w e will pro v e the follo wing tail b ound for g ( S ), b y using the concen tration of the minim um v ertex co v er (Theorem 1.2 ). Lemma 3.9. F or any set S ⊆ V and any ε ∈ [0 , 1] , the fol lowing holds Pr [ | g ( S ) − E [ g ( S )] | > ε E [ g ( S )] ] ≤ 2 e − ε 2 opt / 66 . Pr o of of L emma 3.9 . Let X := | MV C( G ⋆ [ V \ S ]) | . Since | S | is deterministic, Pr[ | g ( S ) − E [ g ( S )] | > ε E [ g ( S )] ] = Pr[ | X − E [ X ] | > ε E [ X ] + ε | S | ] . ( ∗ ) Case (a): E [ X ] ≥ opt / 2. F rom ( ∗ ), Pr[ | X − E [ X ] | > ε E [ X ] + ε | S | ] ≤ Pr[ | X − E [ X ] | > ε E [ X ] ] ≤ 2 e − ε 2 E [ X ] / 33 ≤ e − ε 2 opt / 66 , where w e used the tail b ound of the minimum vertex cov er (Corollary 4.3 ) for X and the fact that E [ X ] ≥ opt / 2. Case (b): E [ X ] ≤ opt / 2. Since g ( S ) = | S | + X and S ∪ MVC( G ⋆ [ V \ S ]) is a v ertex co ver of G ⋆ , w e hav e g ( S ) ≥ | MVC( G ⋆ ) | for every re alization, hence E [ g ( S )] ≥ opt and th us | S | + E [ X ] = E [ g ( S )] ≥ opt . Therefore, from ( ∗ ), Pr[ | X − E [ X ] | > ε E [ X ] + ε | S | ] ≤ Pr[ | X − E [ X ] | > ε opt ] 14 = Pr h | X − E [ X ] | > ε opt E [ X ] · E [ X ] i ≤ 2 e − ε 2 opt 2 / (33 · E [ X ]) ≤ 2 e − ε 2 opt / 66 , where we again used concentration for X and the b ound E [ X ] ≤ opt / 2. Com bining the tw o cases yields the stated inequalit y . W e are now ready to pro ve that, for an y ﬁxed realization F ⋆ , ˆ P is almost as go od as adaptiv ely selecting the b est set among { SEED( Q V C , F ⋆ ) : Q V C ⊆ Q } . W e state this formally in the next lemma. Lemma 3.10. L et ˆ P b e an optimal solution to ( 6 ) . Then, for any r e alization F ⋆ , E G ⋆ \ F ⋆ h g  ˆ P i ≤ (1 + O ( ε )) E G ⋆ \ F ⋆  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ ))  . Pr o of of L emma 3.10 . As w e previously discussed, b ecause Q ⊆ S , edges in F never app ear in G [ V \ S ], therefore the ob jectiv e of ( 6 ) dep ends only on randomness outside F and equals E G ⋆ \ F ⋆ [ g ( S )]. F or an y ﬁxed F ⋆ and any Q V C ⊆ Q , the set S = SEED( Q V C , F ⋆ ) is feasible for ( 6 ): (a) it contains Q by deﬁnition, and (b) by Lemma 3.8 , the graph G [ V \ SEED( Q V C , F ⋆ )] has O ( n/ ( ε 5 p )) edges. Hence, by the optimalit y of ˆ P , E G ⋆ \ F ⋆ h g  ˆ P i ≤ min Q V C ⊆ Q E G ⋆ \ F ⋆ [ g (SEED( Q V C , F ⋆ ))] . It remains to relate the exp ected cost of the best SEED to the exp ected cost of adaptiv ely selecting the b est SEED for every realization, i.e. the minim um exp ectation and the exp ected minim um. Deﬁne µ := min Q V C ⊆ Q E G ⋆ \ F ⋆ [ g (SEED( Q V C , F ⋆ ))] . By a union b ound and Lemma 3.9 applied to eac h ﬁxed Q V C , Pr  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ )) < (1 − ε ) µ  ≤ 2 | Q | +1 e − ε 2 opt / 66 . T ogether with Claim 3.4 (whic h giv es | Q | ≤ ε 2 opt / 100), we obtain Pr  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ )) < (1 − ε ) µ  ≤ 2  2 e  ε 2 opt / 100 . Therefore, E G ⋆ \ F ⋆  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ ))  ≥ Pr  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ )) ≥ (1 − ε ) µ  (1 − ε ) µ 15 ≥  1 − 2  2 e  ε 2 opt / 100  (1 − ε ) µ, whic h rearranges to min Q V C ⊆ Q E G ⋆ \ F ⋆ [ g (SEED( Q V C , F ⋆ ))] ≤ 1  1 − 2 (2 /e ) ε 2 opt / 100  (1 − ε ) E G ⋆ \ F ⋆  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ ))  . Using opt ≥ C log (1 /ε ) /ε 2 to absorb the 2 · (2 /e ) ε 2 opt / 2 term into O ( ε ) yields the claimed (1 + O ( ε )) factor. As w e hav e discussed, in the case of opt = O (log (1 /ε ) /ε 2 ) the problem becomes trivial, in the sense that it is feasible to query the whole base graph G (see App endix A ). Finally , a veraging the abov e result o ver the randomness of F ⋆ yields our ﬁnal b ound. Lemma 3.11. L et ˆ P b e an optimal solution to ( 6 ) . Then E h g  ˆ P i ≤ (1 + O ( ε )) · opt . Pr o of of L emma 3.11 . F rom the la w of total expectation, with resp ect to the edges F , we ha ve that E h g  ˆ P i = E F ⋆ h E G ⋆ \ F ⋆ h g  ˆ P ii . By av eraging the result of Lemma 3.10 o v er the randomness in F ⋆ , we get E h g  ˆ P i ≤ (1 + O ( ε )) · E F ⋆  E G ⋆ \ F ⋆  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ ))  . Observ e that the exp ectation in the right hand side abov e, is simply taken ov er the all randomness in G ⋆ . W e can upp er b ound this term b y picking F ⋆ to b e the realization of edges in F and Q V C = Q ∩ MVC( G ⋆ ), which yields E  min Q V C ⊆ Q g (SEED( Q V C , F ⋆ ))  ≤ E [ g (SEED( G ⋆ ))] = E    SEED( G ⋆ ) ∪ MVC( G ⋆ [ V \ SEED( G ⋆ )])    ≤ E    SEED( G ⋆ ) ∪ MVC( G ⋆ )    . The pro of is concluded by employing Lemma 3.6 , E    SEED( G ⋆ ) ∪ MVC( G ⋆ )    ≤ E    MV C( G ⋆ )    + E    SEED( G ⋆ ) \ MVC( G ⋆ )    ≤ (1 + O ( ε )) · E    MV C( G ⋆ )    . Finally , Theorem 3.7 follows from Lemma 3.11 and the fact that Problem ( 6 ) is a more constrained v ersion of Problem ( 2 ). 16 4 Concen tration Bound for Minim um V ertex Co v er Size In this section we pro ve that the size of the minimum v ertex co ver is concentrated around its exp ectation. Rather w ell-known edge-exp osure and v ertex-exp osure Do ob martingale arguments yield concen tration b ounds in terms of the num b er of edges and v ertices in the graph, resp ectively . W e strengthen this by giving a nov el concentration b ound in opt , where opt = E [ | MVC( G ⋆ ) | ], yielding a m ultiplicative (1 ± ε )-type concen tration guaran tee. W e b egin by deﬁning a martingale, and stating a sp ecial case of F reedman’s general inequality for martingales, whic h is an analogue of Bernstein’s inequality for martingales. While Bernstein’s inequalit y applies only to indep endent random v ariables having b ounded v ariance, F reedman’s inequalit y yields a similar bound for a sequence of random v ariables that is a martingale, and not necessarily indep endent. Deﬁnition 4.1 (Martingale) . A sequence of random v ariables Y 0 , . . . , Y N is a martingale with resp ect to a sequence of random v ariables X 1 , . . . , X N if the follo wing conditions hold: 1. Y k is a function of X 1 , . . . , X k for all k ≥ 1. 2. E [ | Y k | ] < ∞ for all k ≥ 0. 3. E [ Y k | X 1 , . . . , X k − 1 ] = Y k − 1 for all k ≥ 1. The following is a sp ecial case of F reedman’s inequalit y for martingales: Theorem 4.2 (F reedman’s Inequality) . L et Y 0 , . . . , Y N b e a martingale with r esp e ct to the se quenc e X 1 , . . . , X N . Supp ose the diﬀer enc es D k = Y k − Y k − 1 satisfy | D k | ≤ R for any k ≥ 1 always, and supp ose W = P N k =1 V ar( D k | X 1 , . . . , X k − 1 ) ≤ σ 2 always. Then, for any t ≥ 0 , Pr[ | Y N − Y 0 | ≥ t ] ≤ 2 exp  − t 2 2( σ 2 + Rt/ 3)  . W e are restate our concen tration bound from the in tro duction: Theorem 1.2. L et Z = | MV C( G ⋆ ) | and opt = E G ⋆ ∼ G p [ | MV C( G ⋆ ) | ] . Then for any t ≥ 0 , Pr[ | Z − opt | ≥ t ] ≤ 2 exp  − t 2 4 C · opt + 2 t/ 3  , (1) wher e C < 8 is a c onstant. W e note that we set the v alue of C to be the constant from Lemma 4.4 . Pr o of of The or em 1.2 . T o pro ve this result, we will require the existence of a set S ⊆ V of v ertices, whic h satisﬁes that | S | = O ( opt ), and furthermore, the induced subgraph G [ V \ S ] has at most O ( opt /p ) edges. W e defer the construction of suc h a set satisfying these prop erties to Lemma 4.4 in the next section; here, we assume its existence and pro ceed. Let U = V \ S and E U = E ( G [ U ]). 17 Let m = | E U | and k = | S | . Fix an arbitrary ordering E U = { e 1 , . . . , e m } of all edges in E U , as well as an ordering S = { v 1 , . . . , v k } of the vertices in S . W e will exp ose the randomness of the graph realization G ⋆ in tw o phases via a sequence of random v ariables X 1 , . . . , X m , follow ed by X m +1 , . . . , X m + k . In the ﬁrst m steps, w e rev eal, in order, the existence of every edge in E U in G ⋆ . In the next k steps, we rev eal, in order, the edges adjacen t to eac h v ertex in S . 1. Phase 1 (edge exp osure inside U ). F or each i = 1 , . . . , m , reveal the indicator X i = 1 { e i ∈ E ( G ⋆ ) } . That is, X i describ es if edge e i ∈ E U is realized in G ⋆ . 2. Phase 2 (v ertex exp osure on S ). F or j = 1 , . . . , k , let X m + j = ( 1 { ( v j , v ) ∈ E ( G ⋆ ) } ) v ∈ V \{ v j } . That is, X m + j describ es the neigh b orho od of vertex v j ∈ S in G ⋆ . Let N = m + k . W e will deﬁne the Do ob martingale Y 0 , Y 1 , . . . , Y N with resp ect to the sequence of random v ariables X 1 , . . . , X N describ ed ab o ve as Y i = E [ Z | X 1 , . . . , X i − 1 ] for i ≥ 0. Note that Y 0 = opt and Y N = | MV C( G ⋆ ) | . W e observ e how this martingale is a h ybridization of the standard edge exp osure and vertex exp osure martingales, where the sequence X 1 , . . . , X N comprises entirely of either Phase 1 or Phase 2 v ariables from ab o ve. Denote the martingale diﬀerences by D i = Y i − Y i − 1 , for i = 1 , . . . , N . Our goal will b e show that the conditions of F reedman’s inequalit y (Theorem 4.2 ) are satisﬁed. First, w e will show that the martingale diﬀerences are b ounded by R = 1 alw ays, i.e., | D i | ≤ 1 for all i . • Phase 1 ( 1 ≤ i ≤ m ). A t step i in Phase 1, w e rev eal the status of edge e i ∈ E U . Let x 1 , . . . , x i denote any assignment of the random v ariables X 1 , . . . , X i . W e will denote E [ Z | X 1 = x 1 , . . . , X i = x i ] simply as E [ Z | x 1 , . . . , x i ] for con venience. Then, observe that | D i | = | Y i − Y i − 1 | = | E [ Z | x 1 , . . . , x i ] − E [ Z | x 1 , . . . , x i − 1 ] | =       X ˜ x i Pr[ X i = ˜ x i | X 1 = x 1 , . . . , X i − 1 = x i − 1 ] · ( E [ Z | x 1 , . . . , x i ] − E [ Z | x 1 , . . . , ˜ x i ])       ≤ X ˜ x i Pr[ X i = ˜ x i | X 1 = x 1 , . . . , X i − 1 = x i − 1 ] · | ( E [ Z | x 1 , . . . , x i ] − E [ Z | x 1 , . . . , ˜ x i ]) | No w, let R rest denote all of the remaining randomness in G ⋆ b ey ond X 1 , . . . , X i ; since R rest is indep endent of X 1 , . . . , X i , we hav e that | E [ Z | x 1 , . . . , x i ] − E [ Z | x 1 , . . . , ˜ x i ] | =      X r rest Pr[ R rest = r rest ] · ( E [ Z | x 1 , . . . , x i , r rest ] − E [ Z | x 1 , . . . , ˜ x i , r rest ])      . But no w, note that the graph G ⋆ is completely determined b y x 1 , . . . , x i , r rest (resp ectiv ely x 1 , . . . , ˜ x i , r rest ); furthermore, these graphs diﬀer only in the realization of the edge X i ; in particular, the size of the minim um v ertex cov ers of these tw o graphs can diﬀer b y at most 1. Substituting this ab o ve, we get that | D i | ≤ 1. 18 • Phase 2 ( m + 1 ≤ i ≤ N ). Let j = i − m . At Step i in Phase 2, w e rev eal all edges inciden t to v ertex v j ∈ S , namely X i = ( 1 { ( v j , v ) ∈ E ( G ⋆ ) } ) v ∈ V \{ v j } . Similar to the calculation abov e, where x i no w denotes an assignment to all the indicator random v ariables in volv ed in X i , w e ha ve that | D i | ≤ X ˜ x i Pr[ X i = ˜ x i | X 1 = x 1 , . . . , X i − 1 = x i − 1 ] · | ( E [ Z | x 1 , . . . , x i ] − E [ Z | x 1 , . . . , ˜ x i ]) | No w, let R rest denote all of the remaining randomness in G ⋆ b ey ond X 1 , . . . , X i ; since R rest is indep endent of X 1 , . . . , X i , we hav e that | E [ Z | x 1 , . . . , x i ] − E [ Z | x 1 , . . . , ˜ x i ] | =      X r rest Pr[ R rest = r rest ] · ( E [ Z | x 1 , . . . , x i , r rest ] − E [ Z | x 1 , . . . , ˜ x i , r rest ])      . But no w, note that the graph G ⋆ is completely determined b y x 1 , . . . , x i , r rest (resp ectiv ely x 1 , . . . , ˜ x i , r rest ); furthermore, these graphs diﬀer only in the realization of the neigh b orho o d of the single v ertex v j = v i − m ; in particular, the size of the minim um vertex co vers of these t wo graphs can diﬀer by at most 1 (either v j is included in one or not). Substituting this ab o v e, w e get that | D i | ≤ 1. Summarily , w e conclude that | D i | ≤ R = 1 for every i ≥ 1 alwa ys. Next, w e bound the v ariance W = P N i =1 V ar( D i | X 1 , . . . , X i − 1 ). W e will split it in to W 1 (Phase 1) and W 2 (Phase 2). • Bounding W 1 = P m i =1 V ar( D i | X 1 , . . . , X i − 1 ) . Consider any ﬁxing of the random v ariables X 1 = x 1 , . . . , X i − 1 = x i − 1 , and recall that D i = Y i − Y i − 1 = E [ Z | x 1 , . . . , x i − 1 , X i ] − E [ Z | x 1 , . . . , x i − 1 ] Note that X i is indep enden t of X 1 , . . . , X i − 1 , and is in fact a Bernoulli random v ariable with parameter p . In particular, conditioned on x 1 , . . . , x i − 1 , the distribution of D i is as follo ws: D i = ( E [ Z | x 1 , . . . , x i − 1 , X i = 1] − E [ Z | x 1 , . . . , x i − 1 ] := a with probability p E [ Z | x 1 , . . . , x i − 1 , X i = 0] − E [ Z | x 1 , . . . , x i − 1 ] := b with probability 1 − p. Therefore, we hav e that V ar( D i | x 1 , . . . , x i − 1 ) = pa 2 + (1 − p ) b 2 − ( pa + (1 − p ) b ) 2 = p (1 − p )( a − b ) 2 = p (1 − p ) | E [ Z | x 1 , . . . , x i − 1 , X i = 1] − E [ Z | x 1 , . . . , x i − 1 , X i = 0] | 2 . By similar reasoning as earlier in the pro of, where we realize the rest of the randomness, we ha ve that | E [ Z | x 1 , . . . , x i − 1 , X i = 1] − E [ Z | x 1 , . . . , x i − 1 , X i = 0] | 2 ≤ 1 . So, we get ﬁnally that W 1 = P m i =1 V ar( D i | x 1 , . . . , x i − 1 ) ≤ mp (1 − p ) = p (1 − p ) | E U | . By Lemma 4.4 , | E U | ≤ C · opt /p . So, W 1 ≤ (1 − p ) C · opt ≤ C · opt . 19 • Bounding W 2 = P N i = m +1 V ar( D i | X 1 , . . . , X i − 1 ) . In this case, note that by our previous calculations, we kno w that − 1 ≤ D i ≤ 1 alw a ys, which means that V ar( D i | X 1 , . . . , X i − 1 ) ≤ 1. This implies that P N i = m +1 V ar( D i | X 1 , . . . , X i − 1 ) ≤ k ≤ C · opt , where we used the guarantee of Lemma 4.4 again. In total, W = W 1 + W 2 ≤ 2 C · opt alwa ys. Finally , we apply F reedman’s inequality with R = 1 and σ 2 = 2 C · opt , to get Pr[ | Z − opt | ≥ t ] ≤ 2 exp  − t 2 4 C · opt + 2 t/ 3  , whic h completes the proof. Corollary 4.3. L et Z = | MVC( G ⋆ ) | and opt = E [ | MVC( G ⋆ ) | ] . F or any t ≤ opt Pr[ | Z − opt | > t ] ≤ 2 e − ( t 2 / 33) / opt . (9) Pr o of. Apply Theorem 1.2 . If t ≤ opt , then the denominator is upper bounded b y 4 C · opt + 2 opt / 3 = (4 C + 2 / 3) opt . This ﬁnally gives Pr[ | Z − opt | ≥ t ] ≤ 2 exp  − t 2 (4 C + 2 / 3) · opt  This is of the form C ′ e − ct 2 / opt with C ′ = 2 and c = 1 / (4 C + 2 / 3) ≥ 1 / 33. 4.1 A Structural Lemma based on Exp ected Minim um V ertex Co ver Next, w e establish a structural lemma on minimum v ertex co ver, which is a crucial comp onen t in the concentration result of Theorem 1.2 . Lemma 4.4. L et G = ( V , E ) b e a gr aph, and p ∈ (0 , 1] . L et opt = E [ | MVC( G ⋆ ) | ] . Ther e exists a subset of vertic es S ⊆ V such that: 1. | S | ≤ c · opt . 2. The induc e d sub gr aph G [ V \ S ] has at most C opt p e dges. The c onstants c an b e chosen as c = C = 2  e e − 1 + 2  < 8 . Pr o of. W e will ﬁrst construct a sp eciﬁc ordering π of the v ertices in G , by analyzing the follo wing greedy pro cess, whic h incremen tally constructs π . Initialize P = (). While V \ P  = ∅ : 1. Supp ose P = ( v 1 , . . . , v | P | ) so far. Draw a random realization G ⋆ ∼ G p 2. F or i = 1 , 2 , . . . , | P | : 20 – If v i is unmatc hed, match v i uniformly at random to one of the y et-unmatched neigh b ors of v i within the set V \ { v 1 , . . . , v i − 1 } in the realization G ⋆ . 3. The abov e pro cess induces a probability distribution for ev ery vertex in V \ P to b e matc hed to a vertex in P , o ver the randomness in the draw of G ⋆ and the matching—so, for every v ertex v ∈ V \ P , denote b y p ( v | P ) the probabilit y that it gets matc hed to a v ertex in P . 4. Select u = arg max v ∈ V \ P p ( v | P ). 5. App end u to P . Let π b e the resulting p erm utation of the vertices in G at the end of the pro cedure ab o ve. F or G ⋆ ∼ G p , running only the for-lo op in Step 2 ab o ve with P = π constitutes a greedy pro cedure for constructing a matching M of G ⋆ . W e will analyze the matching M that results by this pro cess in the follo wing. Note that E [ | M | ] ≤ E [ | MM( G ⋆ ) | ] ≤ opt , where MM( G ⋆ ) is a maximum matching of G ⋆ , whose size is at most | MV C( G ⋆ ) | . F or an y v ertex u , let P u b e its predecessors in π and let S u b e its successors. • δ + ( u ): The n umber of neigh b ors of u in G that are in S u (forw ard degree). • p ( u ): The probabilit y that u is matc hed b y a v ertex in P u (matc hed bac kw ards). • P ( u ): The total probability that u is matched in M . • R ( u ) = 1 − e − pδ + ( u ) . A crucial consequence of the greedy construction of π is the follo wing property: F or an y successor v ∈ S u , the probabilit y that v is matc hed b y P u is at most p ( u ). This is because when u w as selected to be added to π , it had the maximum probabilit y of b eing matc hed b y P u among all remaining v ertices, including v . W e aim to prov e the follo wing k ey inequality: P ( u ) ≥ max ( p ( u ) , (1 − 2 p ( u )) R ( u )) . (10) T rivially , P ( u ) ≥ p ( u ), since p ( u ) accoun ts for only part of the total probability P ( u ). W e now fo cus on the second term. If p ( u ) ≥ 1 / 2, the term is non-p ositive and the inequality holds trivially . So, assume p ( u ) < 1 / 2. Let A be the even t that u is matched backw ards b y P u , so that Pr[ A ] = p ( u ). F or v ∈ S u , let C v b e the ev en t that v is matc hed bac kwards to a v ertex in P u . As argued ab o ve, b y the construction of π , Pr[ C v ] ≤ p ( u ). W e now analyze the probability of the ev ent F that u matc hes forw ard. Again, w e ha ve that P ( u ) ≥ Pr[ F ]. Observe that u matc hes forw ard, only if u is unmatc hed backw ards ( A c ), and there exists a successor v such that the edge ( u, v ) gets realized in G ⋆ AND v is unmatc hed when u is pro cessed in the greedy pro cedure according to the p erm utation π for constructing the matc hing M of G ⋆ (i.e., the ev ent C c v ). No w, to further analyze Pr[ F ], let us ﬁx an arbitrary ordering on the successors S u = { v 1 , v 2 , . . . } . Let B i b e the ev ent that ( u, v i ) is the ﬁrst realized edge from u to S u in this ordering. The ev ents 21 B i are disjoint. Let B = ∪ i B i b e the even t that at least one edge from u to S u is realized. W e hav e that Pr[ B ] = 1 − (1 − p ) δ + ( u ) . Since 1 − p ≤ e − p , we hav e Pr[ B ] ≥ 1 − e − pδ + ( u ) = R ( u ). Let F i b e the ev ent that A c , B i , and C c v i all o ccur. If any F i o ccurs, then there exists a realized edge ( u, v i ) where b oth u and v i are av ailable when u is pro cessed. Thus, the greedy matc hing pro cedure will matc h u forw ard. Since the B i ’s are disjoin t, the F i ’s are disjoin t, which implies that Pr[ F ] ≥ Pr[ ∪ i F i ] = X i Pr[ A c ∩ B i ∩ C c v i ] . W e can now use indep endence. The even t B i dep ends only on the realization of edges betw een u and S u . The even ts A and C v i dep end only on the realization of edges inciden t to P u . Since these sets of edges are disjoin t, B i is indep endent of the join t ev ent ( A c ∩ C c v i ), giving us that Pr[ F ] ≥ X i Pr[ B i ] Pr[ A c ∩ C c v i ] . Next, we use the union bound to lo wer b ound the probabilit y that b oth u and v i are av ailable. Pr[ A c ∩ C c v i ] = 1 − Pr[ A ∪ C v i ] ≥ 1 − (Pr[ A ] + Pr[ C v i ]) . Using Pr[ A ] = p ( u ) and the crucial prop ert y that Pr[ C v i ] ≤ p ( u ), we get that P [ A c ∩ C c v i ] ≥ 1 − 2 p ( u ) . Substituting this bac k in to the expression for Pr[ F ] ﬁnally giv es that Pr[ F ] ≥ X i Pr[ B i ](1 − 2 p ( u )) = (1 − 2 p ( u )) X i Pr[ B i ] = (1 − 2 p ( u )) Pr[ B ] . Since Pr[ B ] ≥ R ( u ), we conclude P ( u ) ≥ Pr[ F ] ≥ (1 − 2 p ( u )) R ( u ). This establishes the desired inequalit y ( 10 ). W e no w pro ceed to wards constructing the set S from the lemma statemen t. Construction of the Set S . Deﬁne the set S = { u ∈ V | pδ + ( u ) ≥ 1 } . Let K = 1 − e − 1 . F or u ∈ S , pδ + ( u ) ≥ 1, so R ( u ) ≥ 1 − e − 1 = K . Applying this to Inequalit y ( 10 ): P ( u ) ≥ max( p ( u ) , (1 − 2 p ( u )) K ) . This expression is minimized when the t wo terms are equal: p ( u ) = (1 − 2 p ( u )) K . This yields p ( u ) = K/ (1 + 2 K ). Let C 0 = K 1+2 K . Th us, for all u ∈ S , P ( u ) ≥ C 0 . Next, w e bound | S | using the exp ected size of the matc hing M : E [ | M | ] = 1 2 X u ∈ V P ( u ) . Since E [ | M | ] ≤ opt , we hav e P u ∈ V P ( u ) ≤ 2 · opt . So, 2 · opt ≥ X u ∈ S P ( u ) ≥ C 0 | S | . Therefore, | S | ≤ 2 C 0 opt . This prov es the ﬁrst part of the lemma with c = 2 C 0 . Note that since 1 C 0 = 1+2 K K = e e − 1 + 2, w e ha ve c = 2  e e − 1 + 2  . 22 Bounding Edges in G [ V \ S ] . Let U = V \ S . F or u ∈ U , we hav e pδ + ( u ) < 1. W e use the inequalit y that for x ∈ [0 , 1], 1 − e − x ≥ K x (due to the conca vit y of 1 − e − x ). Thus, R ( u ) ≥ K pδ + ( u ). Applying this to Inequality ( 10 ), P ( u ) ≥ max  p ( u ) , (1 − 2 p ( u )) K pδ + ( u )  . Similar to the minimization of P ( u ), computed in the construction of S , this yields a lo wer b ound: P ( u ) ≥ K pδ + ( u ) 1 + 2 K pδ + ( u ) . W e seek a linear low er b ound C ′ pδ + ( u ). W e minimize the ratio K 1+2 K pδ + ( u ) . Since pδ + ( u ) < 1, the ratio is minimized as pδ + ( u ) → 1. The minimum ratio is K 1+2 K = C 0 . So, for all u ∈ U , P ( u ) ≥ C 0 pδ + ( u ). Now we sum this probabilit y o v er U : 2 · opt ≥ X u ∈ U P ( u ) ≥ X u ∈ U C 0 pδ + ( u ) = C 0 p X u ∈ U δ + ( u ) . Let | E ( G [ U ]) | b e the n um b er of edges in the induced subgraph G [ U ]. Ev ery edge ( u, v ) in G [ U ] has b oth endpoints in U . In the p erm utation π , one v ertex m ust precede the other. The edge is coun ted exactly once in the forw ard degree of the earlier endpoint. Th us, | E ( G [ U ]) | ≤ X u ∈ U δ + ( u ) ≤ 2 C 0 · opt p . This prov es the second part of the lemma with C = 2 C 0 = 2  e e − 1 + 2  . R emark 4.5 (A T alagrand-based alternativ e) . One can also deriv e a weak er concen tration bound for opt = E [ | MVC( G ⋆ ) | ] via T alagrand’s inequality on the vertex-exposure. Fix an arbitrary ordering V = { v 1 , . . . , v n } of the vertices of G . F or eac h i ∈ [ n ], let X i =  1 { ( v i , v j ) ∈ E ( G ⋆ ) }  j 0 such that for all t ≥ 0, Pr    | MV C( G ⋆ ) | − opt   ≥ t + C √ opt  ≤ 4 exp  − ct 2 opt  . By a standard tw o-case argument, it also implies that for some absolute constant c ′ > 0, Pr    | MV C( G ⋆ ) | − opt   > t  ≤ 2 exp  − c ′ t 2 opt  for all t ∈ [0 , opt ] , whic h matc hes the gran tee of Corollary 4.3 up to a constant factor. 5 Concluding Remarks In this paper, we ga ve the ﬁrst algorithm for the sto c hastic vertex problem that obtain a (1 + ε )- appro ximation using O ε ( n/p ) queries. This result is optimal, up to the dependence on ε in O ε ( · ), whic h for our algorithm is 1 /ε 5 . As in prior work, the set of edge queries in our algorithm is non- adaptiv e. Our algorithm is simple, and in some sense, the canonical algorithm for non-adaptiv e queries. A k ey tool in our analysis is a new concen tration bound on the size of minim um v ertex co ver in random graphs, whic h might b e of indep enden t interest. While this result concludes the searc h for b etter sto chastic vertex cov er algorithms (mo dulo sharp ening the dep endence of the query b ound on ε ), we b eliev e that sto c hastic approximation for fundamental graph problems is an in teresting domain for future researc h that is curren tly under-explored. Ac kno wledgmen ts The authors thank the anon ymous review er for pointing out the alternativ e approac h in Remark 4.5 for proving the concen tration b ound for the size of a minim um v ertex co ver. Miltiadis Stouras and Ola Sv ensson are supported b y the Swiss State Secretariat for Education, Re- searc h and Innov ation (SERI) under con tract n umber MB22.00054. Jan v an den Brand is supported b y NSF Aw ard CCF-2338816 and CCF-2504994. Inge Li Gørtz is supp orted b y Danish Researc h Council gran t DFF-8021-002498. Chirag P abbara ju is supported b y Gregory V aliant’s and Moses Charik ar’s Simons In v estigator Aw ards, and a Google PhD F ello wship. Cliﬀ Stein is supp orted in part b y NSF grant CCF-2218677, ONR grant ONR-13533312, and b y the W ai T. Chang Chair in Industrial Engineering and Op erations Research. Debmalya P anigrahi is supp orted in part b y NSF gran ts CCF-2329230 and CCF-1955703. References [AB19] Sep ehr Assadi and Aaron Bernstein. T o wards a uniﬁed theory of sparsiﬁcation for matc hing problems. In 2nd Symp osium on Simplicity in A lgorithms (SOSA 2019) , pages 11–1. Sc hloss Dagstuhl–Leibniz-Zen trum f ¨ ur Informatik, 2019. [ABGR25] Amir Azarmehr, Soheil Behnezhad, Alma Ghafari, and Ronitt Rubinfeld. Sto c hastic matc hing via in-n-out lo cal computation algorithms. In Pr o c e e dings of the 57th A nnual A CM Symp osium on The ory of Computing , pages 1055–1066, 2025. 24 [AKL16] Sepehr Assadi, Sanjeev Khanna, and Y ang Li. The sto c hastic matching problem with (v ery) few queries. In Pr o c e e dings of the 2016 ACM Confer enc e on Ec onomics and Computation , pages 43–60, 2016. [AKL17] Sepehr Assadi, Sanjeev Khanna, and Y ang Li. The sto c hastic matching problem: Beat- ing half with a non-adaptiv e algorithm. In Pr o c e e dings of the 2017 A CM Confer enc e on Ec onomics and Computation , pages 99–116, 2017. [ANS08] Arash Asadp our, Hamid Nazerzadeh, and Amin Saberi. Sto c hastic submo dular maxi- mization. In International Workshop on Internet and Network Ec onomics , pages 477– 489. Springer, 2008. [BBD22] Soheil Behnezhad, Avrim Blum, and Mahsa Derakhshan. Sto c hastic v ertex co ver with few queries. In Pr o c e e dings of the 2022 Annual ACM-SIAM Symp osium on Discr ete A lgorithms (SODA) , pages 1808–1846. SIAM, 2022. [BD20] Soheil Behnezhad and Mahsa Derakhshan. Sto chastic weigh ted matching:(1- ϵ ) appro x- imation. In 2020 IEEE 61st Annual Symp osium on F oundations of Computer Scienc e (F OCS) , pages 1392–1403. IEEE, 2020. [BDH + 15] Avrim Blum, John P Dick erson, Nik a Haghtalab, Ariel D Pro caccia, T uomas Sand- holm, and Ankit Sharma. Ignorance is almost bliss: Near-optimal sto c hastic matching with few queries. In Pr o c e e dings of the Sixte enth A CM Confer enc e on Ec onomics and Computation , pages 325–342, 2015. [BDH20] Soheil Behnezhad, Mahsa Derakhshan, and MohammadT aghi Ha jiagha yi. Sto chastic matc hing with few queries:(1- ε ) approximation. In Pr o c e e dings of the 52nd Annual A CM SIGACT Symp osium on The ory of Computing , pages 1111–1124, 2020. [BFHR19] Soheil Behnezhad, Alireza F arhadi, MohammadT aghi Ha jiagha yi, and Nima Reyhani. Sto c hastic matching with few queries: New algorithms and to ols. In Pr o c e e dings of the Thirtieth Annual A CM-SIAM Symp osium on Discr ete Algorithms , pages 2855–2874. SIAM, 2019. [BGK11] Anand Bhalgat, Ashish Go el, and Sanjeev Khanna. Impro ved appro ximation results for sto c hastic knapsack problems. In Pr o c e e dings of the twenty-se c ond annual ACM-SIAM symp osium on Discr ete Algorithms , pages 1647–1665. SIAM, 2011. [BGPS13] Avrim Blum, An upam Gupta, Ariel Pro caccia, and Ankit Sharma. Harnessing the pow er of tw o crossmatc hes. In Pr o c e e dings of the fourte enth ACM c onfer enc e on Ele ctr onic c ommer c e , pages 123–140, 2013. [BR18] Soheil Behnezhad and Nima Reyhani. Almost optimal sto c hastic w eighted matching with few queries. In Pr o c e e dings of the 2018 A CM Confer enc e on Ec onomics and Com- putation , pages 235–249, 2018. [BT91] Dimitri P Bertsek as and John N Tsitsiklis. An analysis of sto c hastic shortest path problems. Mathematics of Op er ations R ese ar ch , 16(3):580–595, 1991. 25 [DDH23] Mahsa Derakhshan, Na veen Durv asula, and Nik a Hagh talab. Sto c hastic minimum v ertex co ver in general graphs: A 3/2-appro ximation. In Pr o c e e dings of the 55th A nnual A CM Symp osium on The ory of Computing , pages 242–253, 2023. [DGV08] Brian C Dean, Michel X Go emans, and Jan V ondr´ ak. Appro ximating the sto c hastic knapsac k problem: The b eneﬁt of adaptivit y . Mathematics of Op er ations R ese ar ch , 33(4):945–964, 2008. [DS25] Mahsa Derakhshan and Mohammad Saneian. Query eﬃcien t w eigh ted stochastic match- ing. In 52nd International Col lo quium on Automata, L anguages, and Pr o gr amming (ICALP 2025) , pages 67–1. Schloss Dagstuhl–Leibniz-Zentrum f ¨ ur Informatik, 2025. [DSX25] Mahsa Derakhshan, Mohammad Saneian, and Zhiyang Xun. Query complexity of sto c hastic minimum vertex cov er. In 16th Innovations in The or etic al Computer Scienc e Confer enc e (ITCS 2025) , pages 41–1. Sc hloss Dagstuhl–Leibniz-Zen trum f¨ ur Informatik, 2025. [EG61] P´ al Erd˝ os and Tib or Gallai. On the minimal n umber of v ertices representing the edges of a graph. A MAGY AR TUDOM ´ ANYOS AKAD ´ EMIA MA TEMA TIKAI KUT A T ´ O INT ´ EZET ´ ENEK K ¨ OZLEM ´ ENYEI , 6(1-2):181–203, 1961. [GK11] Daniel Golovin and Andreas Krause. Adaptiv e submo dularit y: Theory and applica- tions in activ e learning and sto c hastic optimization. Journal of Artiﬁcial Intel ligenc e R ese ar ch , 42:427–486, 2011. [GL21] Andr´ as Gy´ arf´ as and Jen˝ o Lehel. Order plus size of τ -critical graphs. Journal of Gr aph The ory , 96(1):85–86, 2021. [GV06] Mic hel X Go emans and Jan V ondr´ ak. Co v ering minimum spanning trees of random subgraphs. R andom Structur es & Algorithms , 29(3):257–276, 2006. [LP09] L´ aszl´ o Lo v´ asz and Michael D Plummer. Matching the ory , v olume 367. American Math- ematical So c., 2009. [MR02] Mic hael Molloy and Bruce Reed. Gr aph c olouring and the pr ob abilistic metho d , vol- ume 23. Springer Science & Business Media, 2002. [V on07] Jan V ondr´ ak. Shortest-path metric appro ximation for random subgraphs. R andom Structur es & Algorithms , 30(1-2):95–104, 2007. [YM18] Y utaro Y amaguc hi and T ak anori Maehara. Sto chastic pac king integer programs with few queries. In Pr o c e e dings of the Twenty-Ninth Annual A CM-SIAM Symp osium on Discr ete Algorithms , pages 293–310. SIAM, 2018. A Handling the case of opt = O  log(1 /ε ) /ε 2  Lemma A.1. F or any universal c onstant C ′ > 0 , if opt < C ′ log(1 /ε ) ε 2 , 26 then the total numb er of e dges of G is O  n log (1 /ε ) /ε 2  . In p articular, in this r e gime the algorithm c an query the entir e gr aph and r eturn an exact solution. Conse quently, it suﬃc es to analyze the c ase opt ≥ C ′ log(1 /ε ) ε 2 . Pr o of of L emma A.1 . Supp ose that opt < C ′ log(1 /ε ) ε 2 . By the Lemma 4.4 , there exists a set S ⊆ V with | S | = Θ( opt ) such that the induced subgraph G [ V \ S ] has O ( opt ) edges. Every remaining edge of G is incident to at least one vertex of S , hence the num b er of such edges is at most n · | S | = O ( n · opt ). Therefore | E ( G ) | ≤ n | S | + O ( opt ) = O ( n · opt ) = O  n log(1 /ε ) ε 2  . 27

An Optimal Algorithm for Stochastic Vertex Cover

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment