Counting with Combined Splitting and Capture-Recapture Methods

Coun ting with Com bined Splitting and Capture-Recapture Metho ds P aul Dupuis a Br own University, Pr ovidenc e, USA Bahar Ka ynar b Ad Ridder d V rije Unive rsity, Amster dam, Netherlands Reuv en Rubinstein c Radisla v V aisman T e chnion, Haifa, Isr ael No v em b er 6, 2018 Abstract W e apply the splitting metho d to three well-known counting problems, namely 3-SA T, random gra phs with prescrib ed degrees, and binar y con tingency tables. W e presen t an enhanced version of the s plitting metho d based o n the capture-rec a pture technique, and show by exp eriments the superio r ity of this techn ique fo r SA T problems in terms of v ariance of the a sso ciated estimators, and sp eed of the alg orithms. Keyw ords. Counting, Gibb s Sampler, Captur e-Recapture, Splitting. 1 In tro du c tion In this pap er w e apply the sp litting metho d introd uced in [5] to a v ariet y of count ing problems in #P-c omplete. F ormally , giv en an y decision pr oblem in the class NP , e.g. the satisﬁabilit y problem (SA T), one can form ulate the corresp ond ing counting a Researc h supp orted by A FOSR grant F A9550-09-0378. b Researc h supp orted by N WO grant 400-06-044. c Researc h supp orted b y BSF(Bi national Science F oundation) gran t 2008482 , and b y NWO gran t 040-11-168. d Corresponding author. Department of Econometrics and Op erations Researc h; V rij e Universit y Amsterdam; Neth erlands. Email address: aridder@fewe b.vu.nl 1 problem w hic h asks for th e total n umb er of s olutions for a giv en instance of the problem. In the case of th e SA T problem, this corresp ond ing coun ting p roblem has complexit y #SA T. Generally , the complexit y class #P consists of the coun ting problems asso ciated with the d ecision problems in NP . Clearly , a #P problem is at least as hard as its corresp on d ing NP p roblem. In this pap er we consider #P- complete p roblems. Completeness is deﬁ ned similarly as for the decision problems: a problem is #P -complete if it is in #P , and if every #P problem can b e reduced to it in p olynomial counting r eduction. This m eans th at exact solutions to these problems cannot b e obtained in p olynomial time, and accordingly , our stud y fo cuses on approxi mation algorithms. F or more bac kground on the complexit y theory of problems we r efer to [13]. The p rop osed sp litting algorithm for app ro ximate count ing is a randomized one. It is based on designin g a sequen tial samplin g plan, with a view to decomp osin g a “diﬃcult” count ing p roblem deﬁned on some set X ∗ in to a num b er of “easy” ones asso ciate d with a sequence of r elated sets X 0 , X 1 , . . . , X m and su c h that X m = X ∗ . Splitting algorithms explore the connection b et w een counting and s ampling problems, in particular the reduction from app ro ximate count ing of a discrete set to appro ximate sampling of elemen ts of this se t, with the sampling perform ed, t ypically , b y some Marko v chai n Mo nte Carlo metho d . Recen tly , counting problems ha v e attracted researc h int erest, notably the so- called mo del count ing or # S A T, i.e. computing the num b er of mo dels for a given prop ositional form ula [10]. Although it has b een sho wn that man y solution tec h- niques for S A T prob lems can b e adapted f or these pr oblems, y et due to the exp onen- tial increase in memory usage and runn ing times of these metho ds, their application area in counting is limited. T his dra wb ac k motiv ate d the appr o ximativ e approac h men tioned earlier. T here are t wo main h euristic algorithms for appro ximate count- ing metho ds in #SA T. The ﬁ rst one, called ApproxCoun t , is introdu ced by W ei and Selman in [17]. It is a lo cal searc h m etho d that uses Mark o v Chain Mont e Carlo (MCMC) sampling to compu te an app ro ximation of th e tr ue mo del count of a giv en form ula. It is fast and has b een s h o wn to pr ovide goo d estimates for feasible so- lution coun ts, b ut, in con trast with our prop osed splitting metho d, there are no guaran tees as to the un iformit y of the MCMC samples. G ogate and Dec hter [9] recen tly pr op osed a second mo del counting tec hn ique called Sample Minisat , wh ic h is based on sampling f r om the so-called backt rack-free searc h space of a Bo olean form ula through SampleSear ch . An appro ximation of the searc h tr ee th us found is used as the imp ortance sampling densit y instead of the un iform d istribution o v er all 2 solutions. Exp erimen ts with Sa mpleMin isat sh o w that it is v ery fast and typical ly it provides very go o d estimates. The sp litting metho d discussed in this work for co unting in deterministic p rob- lems is based on its classic coun terpart for eﬃcien t estimation of rare-e ven t probabili- ties in sto c hastic problems. The relation b etw een rare-ev en t simulatio n metho d s and appro ximate count ing metho d s ha ve also b een discus sed, for in stance, b y Blanc het and Rudoy [2], Bo tev and Kro ese [4], and Ru binstein [14]; see also [15, Chapter 9]. As said, we p r op ose to app ly the sequen tial samp ling metho d presente d in [5] whic h yields a pro d uct estimat or f or counting the num b er of solutions |X ∗ | , where the pro du ct is tak en o v er the estimators of the consecutive conditional probabilities, eac h of whic h represent s an “easy” pr oblem. In addition, we s hall consider an alternativ e v ersion, in whic h we u se the generated samples after the last iteration of the splitting algoritm as a s amp le for the capture-recapure metho d. This metho d giv es us an alternativ e estimate of the count ing problem. F urthermore, w e shall study an extended v ersion of the captur e-recapture metho d when th e problem size is to o large for the splitting metho d to giv e reliable estimates. The idea is to decrease artiﬁcially the pr oblem size and then ap p ly a b ac kw ards estimation. Whenev er applicable, the estimators asso ciated with our prop osed enhancements outp erform the splitting estimators in terms of v ariance. The pap er is organized as follo ws. W e ﬁrs t start with d escribing the splitting metho d in detail in S ection 2. S ection 3 deals with th e com bin ation of the classic capture-recapture m etho d w ith the splitting algorithm. Finally , n umerical results and concluding r emarks are pr esented in S ections 4 and 5, resp ectiv ely . 2 Splitting A lgorithms for Coun ting The sp litting metho d is one of the main tec hn iques for the eﬃcien t estimatio n of rare-ev en t probabilities in stochastic p roblems. The metho d is based on the idea of restarting th e simulation in certain states of the system in order to obtain more o ccurrences of th e rare even t. Although the metho d originated as a r are ev en t sim ulation tec hnique (see [1], [6], [7 ], [8 ], [11], [12]), it has b een mo d iﬁed in [2], [4], and [14], for counting and com binatorial optimizat ion problems. Consider a NP decision problem with solution set X ∗ , i.e., the set con taining all solutions to the problem. W e are in terested to compute the size |X ∗ | of the solution set. Sup p ose that there is a larger set X ⊃ X ∗ whic h can b e represented b y a simple description or formula; sp eciﬁcally , its size |X | is kno wn and easy to 3 compute. W e call X the state space of the problem. Denote by p = |X ∗ | / |X | the fraction (or “pr obabilit y”) of the solution set w.r.t. the state sp ace. S in ce |X | is kno wn, it suﬃces to compute p . In m ost cases p is extremely small, in other w ords w e deal with a rare-ev ent probabilit y . Ho w ev er, assu m ing we can estimate p by ˆ p , w e obtain automat ically d |X ∗ | = |X | ˆ p as an estimator of |X ∗ | . Note that str aightforw ard sim u lation based on generation of i.i.d. uniform samples X i ∈ X and d elivering the Mon te Carlo estimator ˆ p MC = 1 N P N i =1 I { X i ∈X ∗ } as an unbiased estimator of |X ∗ | / |X | f ails w hen p is a rare-ev en t probabilit y . T o b e more sp eciﬁc, assume a p arametrizatio n of the decision pr oblem. The size of th e state space |X | is p arameterized by n , suc h that |X | → ∞ as n → ∞ . F or instance, in SA T n represen ts the num b er of v ariables. F urthermore we assume that the f raction of the solution set p → 0 as n → ∞ . The required samp le size N to obtain a relativ e accuracy ε of the 95% conﬁdence interv al by the Monte Carlo estimation metho d is [1, Chapter 6] N ≈ 1 . 96 2 ε 2 p , whic h increases lik e p − 1 as n → ∞ . The p urp ose of the splitting m etho d is to estimate p more eﬃciently via the follo wing s teps: 1. Find a s equence of sets X = X 0 , X 1 , . . . , X m suc h th at X 0 ⊃ X 1 ⊃ · · · ⊃ X m = X ∗ . 2. W rite |X ∗ | = |X m | as the telescoping pro duct |X ∗ | = |X 0 | m Y t =1 |X t | |X t − 1 | , (1) th us the target probabilit y b ecomes a p ro du ct p = Q m t =1 c t , with ratio factors c t = |X t | |X t − 1 | . (2) 3. Dev elop an eﬃcient estimator ˆ c t for eac h c t and estimate |X ∗ | b y ˆ ℓ = d |X ∗ | = |X 0 | ˆ p = |X 0 | m Y t =1 ˆ c t . (3) It is readily seen that in order to obtain a meaningfu l estimator of |X ∗ | , w e ha ve to solv e th e follo wing t w o ma jor problems: 4 (i). Pu t the count ing problem in to the framew ork (1) by making sure that X 0 ⊃ X 1 ⊃ · · · ⊃ X m = X ∗ , (4) suc h that ea ch c t is not a rare-eve nt probability . (ii). Obtain a lo w-v ariance esti mator ˆ c t of eac h r atio c t . T o th is end, w e prop ose an adaptiv e v ersion of the splitting method. As a demon- stration, consider a sp eciﬁc family of decision pr ob lems, namely those wh ose solution set is ﬁnite and giv en by linear integ er constrain ts. In other wo rd s, X ∗ ⊂ Z n + is giv en b y          P n j =1 a ij x j = b i , i = 1 , . . . , m 1 ; P n j =1 a ij x j ≥ b i , i = m 1 + 1 , . . . , m 1 + m 2 = m ; x j ∈ { 0 , 1 , . . . , d } , ∀ j = 1 , . . . , n. (5) Our goal is to count the num b er of feasible solutions (or p oints) to the set (5). Note that w e assume that we know, or can compute easily , the b oun ding ﬁnite set X = { 0 , 1 . . . , d } n , with p oin ts x = ( x 1 , . . . , x n ) (in this case |X | = ( d + 1) n ) as w ell for other counting p roblems. Belo w we follo w [14]. Deﬁne the Bo olean fun ctions C i : X → { 0 , 1 } ( i = 1 , . . . , m ) b y C i ( x ) =    I { P n j =1 a ij x j = b i } , i = 1 , . . . , m 1 ; I { P n j =1 a ij x j ≥ b i } , i = m 1 + 1 , . . . , m 1 + m 2 . (6) F ur thermore, deﬁne the function S : X → Z + b y counting ho w man y constraints are satisﬁed b y a p oint x ∈ X , i.e ., S ( x ) = P m i =1 C i ( x ). Now w e can form ulate the coun ting problem as a p robabilistic problem of ev aluating p = E f  I { S ( X )= m }  , (7) where X is a r andom p oint on X , un iformly distributed with pr obabilit y densit y function (p df ) f ( x ), d enoted b y X d ∼ f = U ( X ). Consider an increasing sequence of thresh olds 0 = m 0 < m 1 < · · · < m T − 1 < m T = m , and deﬁn e the sequ ence of decreasing sets (4) b y X t = { x ∈ X : S ( x ) ≥ m t } . Note that in this wa y X t = { x ∈ X t − 1 : S ( x ) ≥ m t } , 5 for t = 1 , 2 , . . . . The latter representat ion is most useful since it shows that th e ratio factor c t in (2) can b e considered as a conditional exp ectatio n: c t = |X t | |X t − 1 | = E g t − 1 [ I { S ( X ) ≥ m t } ] , (8) where X d ∼ g t − 1 = U ( X t − 1 ). Note that g t − 1 ( x ) is also obtained as a conditional p df b y g t − 1 ( x ) = f ( x |X t − 1 ) =    f ( x ) f ( X t − 1 ) , x ∈ X t − 1 ; 0 , x 6∈ X t − 1 . (9) T o d ra w s amp les fr om the uniform p df g t − 1 = U ( X t − 1 ) on a complex set giv en implicitly , on e applies typical ly MCMC metho d s. F or further d etails we refer to [14]. 2.1 The Basic Adaptiv e Splitting Algorit hm W e describ e h ere the adaptiv e splitting algorithm fr om [5]. T he thresholds ( m t ) are not giv en in adv ance, but determined adap tively via a sim ulation pro cess. Hence, the num b er T of thresh olds b ecomes a random v ariable. In fact, the ( m t )-thresholds should satisfy the r equirement s c t = |X t | / |X t − 1 | ≈ ρ t , where the parameters ρ t ∈ (0 , 1) are not to o small, s a y ρ t ≥ 0 . 01, and set in adv ance. W e call th ese th e splitting con trol p arameters. I n most applications we c hose these all equal, that is ρ t ≡ ρ . Consider a sample set [ X ] t − 1 = { X 1 , . . . , X N } of N random p oin ts in X t − 1 . That is, all these p oint s are uniformly distribu ted on X t − 1 . Let m t b e the (1 − ρ t − 1 )- th quanti le of the ordered s tatistics v alues of th e s cores S ( X 1 ) , . . . , S ( X N ). The elite set [ X ] (e) t − 1 ⊂ [ X ] t − 1 consists of those p oints of th e s amp le set f or wh ic h S ( X i ) ≥ m t . Let N t b e the size of th e elite set. If all scores S ( X i ) w ould b e distinct, it follo ws that the num b er of elites N t = ⌈ N ρ t − 1 ⌉ , where ⌈·⌉ denotes roundin g to the large st in teger. Ho we ver, dealing with a discrete space, t ypically w e will ﬁnd more samples with S ( X i ) ≥ m t . All these are added to the elite set. Finally we remark that from (9) it easily follo ws that the elite p oin ts are d istr ibuted unif orm ly on X t . Ha ving an elite set in X t , w e d o t wo things. First, w e screen out (delete) d u- plicates, so th at w e end up with a set of size N (s) t of d istinct elites. Secondly , eac h screened elite is th e starting p oin t of a Marko v c hain simulatio n (MCMC method ) on X t using a transition pr obabilit y matrix P t with g t = U ( X t ) as its stationary dis- tribution. Because the starting p oin t is uniformly distrib uted, all consecutive p oin ts on the sample path are uniformly distribu ted on X t . Therefore, we m a y u se all th ese p oints in the next ite r ation. 6 Supp ose th at eac h sample path has length b t = ⌊ N / N (s) t ⌋ , then we get a total of N (s) t b t ≤ N uniform p oin ts in X t . T o con tinue w ith the next iteration again w ith a sample set of size N , we choose rand omly N − N (s) t b t of these sample paths and extend them b y one p oin t. Denote the new samp le set by [ X ] t , and rep eat the same pro cedur e as ab o v e. The algorithm iterates u n til we ﬁn d m t = m , sa y at iteration T , at wh ic h stage we stop and deliv er d |X ∗ | = |X | T Y t =1 ˆ c t (10) as an estimator of |X ∗ | , where ˆ c t = N t / N in iteration t . In our exp erimen ts we applied a Gibbs s amp ler to implement the MCMC sim ulation for obtaining u niformly distributed samples. T o summarize, we give the algorithm. Algorithm 2.1 (Basic splitting algorithm for coun ting) . 1. Set a c ounter t = 1 . Gener ate a sample set [ X ] 0 of N p oints uniformly distribute d in X 0 . Compute the thr e shold m 1 , and determine the size N 1 of the elite set. Set ˆ c 1 = N 1 / N as an estimato r of c 1 = |X 1 | / |X 0 | . 2. Scr e en out the elite set to obtain N (s) t distinct p oints uniformly distribute d in X t . 3. L et b t = ⌊ N / N (s) t ⌋ . F or al l i = 1 , 2 , . . . , N (s) t , starting at the i -th scr e ene d elite p oint run a M arkov chain of length b t on X t with g t = U ( X t ) as its stationary distribution. Extend N − N (s) t b t r andomly chosen sample p aths with one p oint. Denote the ne w sample set of size N by [ X ] t . 4. Incr e ase the c ounter t = t + 1 . Compute the thr eshold m t , and determine the size N t of the elite set. Se t ˆ c t = N t / N as an estimato r of c t = |X t | / |X t − 1 | . 5. If m t = m deliver the estima tor (10) ; otherwise r ep e at fr om step 2. 3 Com bining S p litting and Capture–Recapture In this section we discuss how to combine the wel l kno wn capture-recapture (CAP- RECAP) metho d with the basic splitting Algo rithm 2.1. First w e p resen t the clas- sical capture-cecapture alg orithm in the literature. 7 3.1 The Classic Capture–Recapture in the Literature Originally the captur e-recapture metho d w as u sed to estimate the size, sa y M , of an unknown p opulation on the basis of t w o indep enden t samples from it. T o see ho w the CAP-REC AP metho d works, consider an u r n mo del w ith a total of M id en tical balls. De note by N 1 and N 2 the sample sizes take n at the ﬁ rst and second d ra ws, resp ectiv ely . Assu me in addition that • T he s econd dra w tak es p lace after all N 1 balls ha v e b een retur ned to th e urn. • Before r eturning the N 1 balls, eac h is mark ed, s ay we pain ted them a diﬀerent color. Denote by R the num b er of balls from the ﬁr st dra w th at reapp ear in the second. Then an (biased) estimate f M of M b ecomes f M = N 1 N 2 R . This is b ased on the observ ation that N 2 / M ≈ R/ N 1 . Note that the name captur e- recapture was b orr o we d from a problem of estimating the animal p opulation size in a p articular area on the basis of t wo visits. In this case R den otes the n umb er of animals captured on the ﬁ r st visit and r ecaptur ed on the second. A sligh tly less biased estimat or of M is c M = ( N 1 + 1)( N 2 + 1) ( R + 1) − 1 . (11) See [16] for an analysis of th e bias and for the deriv ation of an appro ximate unbiased estimator of th e v ariance of c M : E  ( N 1 + 1)( N 2 + 1)( N 1 − R )( N 2 − R ) ( R + 1) 2 ( R + 2)  ≈ V ar ( c M ) . (12) 3.2 Splitting algorithm com bined with Capture–Recapture Application of the CAP-REC AP to coun ting problems is trivial . W e set |X ∗ | = M and n ote th at N 1 and N 2 corresp ond to the screened-out samples at the ﬁ rst and second dra ws, which are p erform ed after Algorithm 2.1 reac hes the desired lev el m . Note that w e n eed to remov e duplicate samples b ecause these do not o ccur in the capture-recapture m etho d. As an example, let us assume that w e run the splitting algorithm 2.1 till its last step T with N = 10 , 000. After reac hin g the desired level m , we d r a w t w o indep en d en t sets of magnitude N 1 = 5000 and N 2 = 5010 and assume that the 8 n umb er of solutions that app eared in b oth dra ws in 10, i.e. R = 10. The CAP- RECAP estimator of |X ∗ | , denoted by d |X ∗ | cap is therefore d |X ∗ | cap = 2 , 505 , 000 . Our numerical resu lts in Section 4 clearly indicate that the C AP-REC AP estimator is t yp ically more accurate than the pr o duct estimator (10), that is V ar [ d |X ∗ | cap ] ≤ V ar [ d |X ∗ | ] , pro vided the sample N is limited, sa y b y 10 , 000 and |X ∗ | is large but also limited, sa y b y 10 6 . W e m ake a distinction f or la rger solution sets: if 10 6 < |X ∗ | ≤ 10 9 , we apply an extended version of the captur e-recapture metho d, as we will describ e in the next section. If |X ∗ | is even larger ( |X ∗ | > 10 9 ), w e can estimate it with the crud e Mont e Carlo. 3.3 Extended Capture–Recapture Metho d Recall that the regular CAP-RECAP m etho d 1. Is implemen ted at the last iteration T of the splitting algorithm, that is when some conﬁgurations hav e already reac hed the d esired set X ∗ . 2. It p ro vides relia b le estimat ors of |X ∗ | if it is n ot to o large, s a y |X ∗ | ≤ 10 6 . In t yp ical rare even ts count ing problems, lik e SA T |X ∗ | is ind eed ≤ 10 6 , nev ertheless w e pr esent b elo w an extended CAP-RECAP version, whic h extend s the original CAP-RECAP for 2-3 orders more, that is it p ro vides reliable coun ting estimators for 10 6 < |X ∗ | ≤ 10 9 . If not stated otherwise we s h all h av e in min d a SA T problem. Th e en hanced CAP-RECAP algorithm in volv es additional constrain ts (clauses) and can b e written as follo ws. Algorithm 3.1 (Extended CAP-RECAP) . As so on as al l m clauses C 1 , . . . , C m of X m have b e en r e ache d b y the splitting algorithm and it o c curs that the r esulting pr o duct estimator d |X m | of |X m | is lar ger than > 10 6 pr o c e e d as fol lows: 1. Gener ate a sample X 1 , . . . , X N X m of uniformly distribute d p oints in the desir e d pr oblem set X m by adding one b y one some arbitr ary auxiliary clauses until for some τ we have tha t [ c m + τ = N X m + τ N X m ≤ c m + τ . (13) 9 Her e c m + τ is a r elatively smal l nu mb er, ﬁxe d in advanc e, say 10 − 2 ≤ c m + τ ≤ 10 − 3 ; furthermor e, N X m and N X m + τ r e pr esent the r esp e ctive numb er of p oints gener ate d at X m and ac c epte d at X m + τ . Note that the estimate [ c m + τ is obtaine d as in Step 4 of the b asic splitting algo rithm 2.1 . 2. Estimate |X ∗ | = |X m | by d |X m | ecap = [ c m + τ − 1 · \ |X m + τ | cap . (14) W e call [ |X m | ecap the extended CAP-RECAP estimator. It is essential to b ear in mind that • \ |X m + τ | cap is a CAP-RECAP estimator rather than a sp litting (pro d u ct) one. • [ |X m | ecap do es not con tain the original estimators b c 1 , . . . , b c T generated by the splitting metho d. • S ince w e only n eed here the uniformity of the samples at X m , we can r un the splitting method of S ection 2.1 all the wa y with relativ ely small v alues of sample size N and splitting cont rol parameter ρ until it reac hes the v icinity of X m denoted b y X m − r , where r is a small in teger say , r = 1 or r = 2; and then switc h to large r N and ρ . • I n contrast to the splitting estimator wh ic h employs a pro d uct of T terms, form ula (14) emplo ys only a sin gle c factor. Recall that this add itional [ c m + τ − 1 factor allo ws to enlarge the CAP-RECAP estimators of |X m | for ab out tw o- three additional ord er s , namely from |X m | ≈ 10 6 to |X m | ≈ 10 9 . 4 Numerical Results Belo w we present numerical results with the splitting algorithm for counting. In particular we consider the follo wing p r oblems: 1. T h e 3-satisﬁabilit y pr oblem (3-SA T) 2. Graph s with prescrib ed degrees 3. C ontingency tables F or the 3-SA T problem we shall also use the the CAP-RECAP metho d . W e shall sho w that typicall y CAP-RECAP outp erforms the splitting algorithm. W e shall use the follo wing notatio n s . 10 Notation A. F or iteration t = 1 , 2 , . . . • N t and N (s) t denote the actual num b er of elites and the num b er after screening, resp ectiv ely; • m ∗ t and m ∗ t denote the upp er and the lo w er elite leve ls reac hed, resp ectiv ely (the m ∗ t lev els are the same as the m t lev els in the description of the algorithm); • ρ t is the splitting control parameter (w e c hose ρ t ≡ ρ ); • ˆ c t = N t / N is the estimator of the t -th conditional prob ab ility; • p ro du ct estimator d |X ∗ t | = |X | Q t i =1 ˆ c i . 4.1 The 3-Satisﬁabilit y Problem (3-SA T) There are m clauses of length 3 tak en from n b o olean (or binary) v ariables x 1 , . . . , x n . A literal of the j -th v ariable is either TRUE ( x j = 1) or F ALSE ( x j = 0 ⇔ ¯ x j = 1, w here ¯ x j = NOT( x j )). A clause is a disjun ction of literals. W e assume that all clauses consist of 3 literals. Th e 3-SA T pr oblem is deﬁned as the problem of determining if the v ariables x = ( x 1 , . . . , x n ) can b e assigned in a suc h wa y as to mak e all clauses TRU E. More formally , let X = { 0 , 1 } n b e th e set of all conﬁgurations of th e n v ariables, and let C i : X → { 0 , 1 } , b e the m clauses. Then deﬁne φ : X → { 0 , 1 } b y φ ( x ) = m ^ i =1 C i ( x ) . The original 3-SA T problem is to ﬁnd a conﬁguration of the x j v ariables for whic h φ ( x ) = 1. In this work we are in terested in the total num b er of such conﬁgur ations (or f easible solutions). Then as discu ssed in Section 2, X ∗ denotes the set of feasible solutions. T rivially , there are |X | = 2 n conﬁgurations. The 3-SA T problems can also b e conv erted int o the f amily of decision problems (5) giv en in S ection 2. Deﬁne the m × n matrix A with entries a ij ∈ {− 1 , 0 , 1 } b y a ij =          − 1 if ¯ x j ∈ C i , 0 if x j 6∈ C i and ¯ x j 6∈ C i , 1 if x j ∈ C i . F ur thermore, let b b e the m -(column) v ector with ent ries b i = 1 − |{ j : a ij = − 1 }| . Then it is easy to see that for any conﬁguration x ∈ { 0 , 1 } n x ∈ X ∗ ⇔ φ ( x ) = 1 ⇔ Ax ≥ b . 11 Belo w w e compare the eﬃciencies of the classic and the extended C AP-RECAP with th eir s p litting counterpart, b earing in min d that the extended CAP-RECAP v ersion is u sed for larger v alues of |X ∗ | th en the classic one. As an example w e consider the estimation of |X ∗ | for th e 3-SA T p roblem with an instance m atrix A of dimension (122 × 515), meaning n = 122 , m = 515. In particular T able 1 pr esen ts the th e p erforman ce of the splitting Algorithm 2.1 based on 10 ind ep end en t run s using N = 25 , 000 and ρ = 0 . 1, while T able 2 sho ews the dyn amics of a run of the Algorithm 2.1 for th e same data. T able 1: Pe rf ormance of splitting algorithm for the 3-SA T (122 × 515) mo del with N = 25 , 000 and ρ = 0 . 1. Run nr. of its. d |X ∗ | CPU 1 33 1.41E+ 06 212.32 2 33 1.10E+ 06 213.21 3 33 1.68E+ 06 214.05 4 33 1.21E+ 06 215.5 5 33 1.21E+ 06 214.15 6 33 1.47E+ 06 216.05 7 33 1.50E+ 06 252.25 8 33 1.73E+ 06 243.26 9 33 1.21E+ 06 238.63 10 33 1.88E+ 06 224.36 Average 33 1.44E+ 06 224.38 The relativ e error, denoted by RE is 1 . 815 E − 01. Notice that the relativ e err or of a rand om v ariable Z is calculated by the standard formula, namely RE = S/ b ℓ, where b ℓ = 1 N N X i =1 Z i , S 2 = 1 N − 1 N X i =1 ( Z i − b ℓ ) 2 . 12 T able 2: Dynamics of a run of the splitting algorithm for the 3-SA T (122 × 515) mo del u sing N = 25 , 000 and ρ = 0 . 1. t d |X ∗ t | N t N (s) t m ∗ t m ∗ t ˆ c t 1 6.53E+ 35 3069 3069 48 0 460 1.23E-01 2 8.78E+ 34 3364 3364 48 3 467 1.35E-01 3 1.15E+ 34 3270 3270 48 4 472 1.31E-01 4 1.50E+ 33 3269 3269 48 9 476 1.31E-01 5 2.49E+ 32 4151 4151 49 0 479 1.66E-01 6 3.37E+ 31 3379 3379 49 2 482 1.35E-01 7 3.41E+ 30 2527 2527 49 4 485 1.01E-01 8 6.19E+ 29 4538 4538 49 5 487 1.82E-01 9 9.85E+ 28 3981 3981 49 7 489 1.59E-01 10 1.31E+ 28 3316 3316 49 8 491 1.33E-01 11 1.46E+ 27 2797 2797 5 0 1 493 1.12E-01 12 4.61E+ 26 7884 7884 50 1 494 3.15E-01 13 1.36E+ 26 7380 7380 5 0 1 495 2.95E-01 14 3.89E+ 25 7150 7150 50 2 496 2.86E-01 15 1.06E+ 25 6782 6782 50 5 497 2.71E-01 16 2.69E+ 24 6364 6364 5 0 3 498 2.55E-01 17 6.42E+ 23 5969 5969 50 4 499 2.39E-01 18 1.42E+ 23 5525 5525 50 6 500 2.21E-01 19 3.03E+ 22 5333 5333 5 0 5 501 2.13E-01 20 5.87E+ 21 4850 4850 50 6 502 1.94E-01 21 1.06E+ 21 4496 4496 50 7 503 1.80E-01 22 1.71E+ 20 4061 4061 5 0 7 504 1.62E-01 23 2.50E+ 19 3647 3647 50 9 505 1.46E-01 24 3.26E+ 18 3260 3260 5 1 0 506 1.30E-01 25 3.62E+ 17 2778 2778 51 0 507 1.11E-01 26 3.68E+ 16 2539 2539 51 0 508 1.02E-01 27 3.05E+ 15 2070 2070 5 1 1 509 8.28E-02 28 2.17E+ 14 1782 1782 51 2 510 7.13E-02 29 1.21E+ 13 1398 1398 51 3 511 5.59E-02 30 5.00E+ 11 1030 1030 5 1 3 512 4.12E-02 31 1.49E+ 10 743 743 514 513 2.97E-0 2 32 2.39E+ 08 402 402 515 514 1.61E-0 2 33 1.43E+ 06 150 150 515 515 6.00E-0 3 W e increased th e s ample size at the last t wo iterations f rom N = 25 , 000 to N = 100 , 000 to get a more accurate estimator. As can b e seen from T able 1, the estimator d |X ∗ | > 10 6 , hence for this instance the extended CAP-REC AP Algorithm 3.1 can also b e u sed. W e shall sho w that the relativ e error (RE) of the extended CAP-REC AP estimator [ |X m | ecap is less than that of d |X ∗ | . Before doing so w e need to ﬁnd the extend ed 3-SA T instance matrix (122 × 515) + τ , where τ is th e num b er of auxiliary clauses. Applying the extend ed 13 CAP-RECAP Algorithm 3.1 w e found that τ = 5 an d th u s the extended in stance matrix is (122 × 520). Recall that the cardinalit y |X m + τ | of the extended (122 × 52 0) mo del should b e man ageable by the regular CAP-RECAP , that is w e assumed that |X m + τ | < 10 6 . In deed, T able 3 p resen ts th e p erformance of the regular CAP-RECAP for that extended (122 × 520) mo d el. Here w e used again ρ = 0 . 1. As for the sample size, we to ok N = 1 , 000 un til iteration 28 and then switc hed to N = 100 , 000. Th e ﬁnal CAP-RECAP estimator is obtained by taking t w o equal samples, eac h of size N = 70 , 000 at the ﬁnal su bset X m + τ = X 520 . (The sample sizes that were us ed in the estimation are smaller due to the screening step.) T able 3: P erformance of the regular CAP-RECAP for the extended (122 × 520) mo del. Run nr. of its. d |X ∗ | cap CPU 1 34 5.53E+ 04 159.05 2 35 5.49E+ 04 174.46 3 35 5.51E+ 04 178.08 4 34 5.51E+ 04 166.36 5 34 5.52E+ 04 159.36 6 33 5.52E+ 04 152.38 7 33 5.54E+ 04 137.96 8 34 5.50E+ 04 157.37 9 35 5.51E+ 04 179.08 10 34 5.51E+ 04 163.7 Average 34.1 5 .51E+ 0 4 162.78 The relativ e error of d |X ∗ | cap o v er 10 runs is 2 . 600 E − 03. Next we compare the eﬃciency of the regular CAP-RECAP (as p er T able 3) with that of the splitting algorithm for the extended (122 × 520) mo del. T able 4 p resen ts the p erformance of splitting for ρ = 0 . 1 and N = 100 , 000. It readily follo ws th at the relativ e error of th e regular CAP-RECAP is ab out 30 times less than that of splitting. Notice in addition that the CP U time of CAP-RECAP is ab out 6 times less than that of s p litting. This is so since the total sample size of the f orm er is ab ou t 6 time less than of th e latter. Thus the o verall sp eed up obtained b y CAP-RECAP is ab ou t 5 , 000 times. 14 T able 4: P erformance of sp litting algorithm for th e 3-SA T (12 2 × 520) mo del. Run nr. of its. d |X ∗ | CPU 1 34 6.03E+ 04 900.28 2 34 7.48E+ 04 904.23 3 34 4.50E+ 04 913.31 4 34 5.99E+ 04 912.27 5 34 6.03E+ 04 910.44 6 33 4.94E+ 04 898.91 7 34 5.22E+ 04 931.88 8 34 5.74E+ 04 916.8 9 34 5.85E+ 04 919.63 10 34 5.72E+ 04 927.7 Average 33.9 5 .75E+ 0 4 913.54 The relativ e error of d |X ∗ | o ve r 10 run s is 1 . 315 E − 01. With these results at hand w e can pro ceed w ith the extended CAP-RECAP and compare its eﬃciency with splitting (see T able 1) for the in stance matrix (122 × 515 ). T able 5 pr esen ts the p erf ormance of the extended CAP-REC AP estimato r d |X ∗ | ecap for the (122 × 515) m o del along w ith the p erf ormance of the regular CAP-RECAP estimator d |X ∗ | cap for the (122 × 520) mo del (see also the resu lts of T able 3 for d |X ∗ | cap ). W e set again ρ = 0 . 1. Regarding the sample size w e to ok N = 1 , 000 f or the ﬁ rst 31 iterations and th en switched to N = 100 , 000 until reac hing the lev el m = 515. Recall that the leve l m + τ = 520 and the corresp onding CAP-RECAP estimator d |X ∗ | cap w as obtained from the set X m = X 515 b y add ing τ = 5 more auxiliary clauses. Note that in this case w e used for d |X ∗ | cap t w o equal samples eac h of length N = 100 , 000. Comparing the r esults of T able 1 with that of T able 5 it is readily seen that the extended CAP-REC AP estimator d |X ∗ | ecap outp erforms the splitting one d |X ∗ | in b oth RE and CPU time. In particular, w e ha v e that b oth RE and CPU times of the former are ab out 1 . 6 times less than of the latter. This means that the o v erall sp eed up obta ined by d |X ∗ | ecap v ersus d |X ∗ | is ab out 1 , 6 2 · 1 . 6 ≈ 4 times. Note ﬁnally that the total n umb er of samples us ed in the extended CAP-RECAP estima tor d |X ∗ | ecap is ab out N = 500 , 000, w hile in its counterpart - the s plitting estimator d |X ∗ | is ab out N = 50 , 000 ∗ 36 = 1 , 800 , 000. 15 T able 5: Performance of the extended CAP-REC AP estimator d |X ∗ | ecap for the (122 × 515) mo del along with the regular CAP-REC AP one d |X ∗ | cap for the (122 × 520) mo del. Run nr. its. [ c m + τ d |X ∗ | cap d |X ∗ | ecap CPU 1 33 3.13E- 02 5.41E+ 04 1.73E+ 06 138 .99 2 34 3.47E- 02 5.51E+ 04 1.59E+ 06 154 .64 3 34 3.55E- 02 5.52E+ 04 1.55E+ 06 161 .78 4 33 4.51E- 02 5.40E+ 04 1.20E+ 06 163 .53 5 34 3.04E- 02 5.13E+ 04 1.69E+ 06 143 .84 6 34 2.99E- 02 5.41E+ 04 1.81E+ 06 151.1 7 34 4.27E- 02 5.51E+ 04 1.29E+ 06 174 .08 8 34 3.87E- 02 5.42E+ 04 1.40E+ 06 143 .27 9 33 3.27E- 02 5.42E+ 04 1.66E+ 06 171 .07 10 34 4.22E- 02 5.51E+ 04 1.30E+ 06 154 .71 Average 33.7 3.63E- 02 5.42E+ 04 1.52E+ 06 155 .70 The relativ e error of d |X ∗ | cap o v er 10 runs is 2 . 010 E − 02. The relativ e error of d |X ∗ | ecap o v er 10 runs is 1 . 315 E − 01. 4.2 Random graphs with prescrib ed degrees Random graphs with giv en vertex d egrees hav e attained atten tion as a mo d el for real- w orld complex net w orks, including W orld Wide W eb, so cial net w orks and biologica l net wo rks. The problem is basically ﬁnd ing a graph G = ( V , E ) with n vertice s, giv en the d egree sequence d = ( d 1 , . . . , d n ) form ed of nonnegativ e integ ers. F ollo wing [3 ], a ﬁnite sequence ( d 1 , . . . , d n ) of nonnegativ e in tegers is called graphical if th ere is a lab eled simp le graph with v ertex set { 1 , . . . , n } in wh ich vertex i has d egree d i . Suc h a graph is called a realization of the degree sequence ( d 1 , . . . , d n ). W e are inte rested in the total num b er of realizati ons for a give n degree sequence, hence X ∗ denotes the set of all graphs G = ( V , E ) w ith the degree sequ ence ( d 1 , . . . , d n ). Similar to (5) for SA T w e conv ert the problem in to a coun ting p roblem. T o p r o- ceed consider the complete graph K n of n v ertices, in which eac h v ertex is connected with all other vertic es. Thus the total num b er of edges in K n is m = n ( n − 1) / 2, lab eled e 1 , . . . , e m . The random graph problem with p rescrib ed degrees is trans - lated to the pr oblem of choosing those edges of K n suc h that the resulting graph G matc hes the give n sequen ce d . S et x i = 1 when e i is chosen, and x i = 0 otherwise, i = 1 , . . . , m . In order that suc h an assignmen t x ∈ { 0 , 1 } m matc hes the giv en d e- gree sequence ( d 1 , . . . , d n ), it holds n ecessarily that P m j =1 x j = 1 2 P n i =1 d i , since this 16 is the total n um b er of edges. In other w ords, the conﬁguration sp ace is X =    x ∈ { 0 , 1 } m : m X j =1 x j = 1 2 n X i =1 d i    . Let A b e the incidence matrix of K n with en tries a ij =    0 if v i 6∈ e j 1 if v i ∈ e j . It is easy to see that whenever a conﬁgur ation x ∈ { 0 , 1 } m satisﬁes Ax = d , the asso ciated graph has degree sequen ce ( d 1 , . . . , d n ). W e conclude that the prob lem set is r ep resen ted by X ∗ = { x ∈ X : Ax = d } . W e ﬁrst presen t a small example as illustration. Let d = (2 , 2 , 2 , 1 , 3) with n = 5, and m = 10. After orderin g the edges of K 5 lexicographicall y , the corresp ond ing incidence matrix is giv en as A =       1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 1       It is readily seen that the follo wing x = (0 , 0 , 1 , 1 , 1 , 0 , 1 , 0 , 1 , 0) ′ , and x = (1 , 0 , 0 , 1 , 1 , 0 , 0 , 0 , 1 , 1) ′ present tw o solutions of this example. F or the rand om graph problem we deﬁn e the score fu nction S : X → Z − b y S ( x ) = − n X i =1 | deg( v i ) − d i | , where d eg( v i ) is the degree of v ertex i und er the conﬁguration x . Eac h conﬁguration that satisﬁes the degree s equ ence will ha ve a p erformance fun ction equal to 0. The imp lemen tation of the Gibbs sampler for this p roblem is slightly d iﬀeren t than for the 3-SA T problem, since w e kee p the n umber of edges in eac h realization ﬁxed to P d i / 2. Ou r ﬁrst algorithm tak es care of this requiremen t and generates a random x ∈ X . Algorithm 4.1. L e t ( d 1 , . . . , d n ) b e the pr esc rib e d de gr e es se quenc e. • Gener ate a r andom p ermutation of 1 , . . . , m . 17 • Cho ose the ﬁrst P d i / 2 plac es in this p ermutation and deliver a ve ctor x having one’s in those plac es. The adaptiv e thresholds in the basic splitting algorithm are n egativ e, increasing to 0: m 1 ≤ m 2 ≤ · · · ≤ m T − 1 ≤ m T = 0 . The r esulting Gibbs sampler (in Step 3 of the basic splitting algorithm starting with a conﬁguration x ∈ X f or which S ( x ) ≥ m t ) can b e wr itten as follo ws. Algorithm 4.2 (Gibbs Algorithm for rand om graph sampling) . F or e ach e dge x i = 1 , while ke eping al l oth er e dges ﬁxe d, do: 1. R emove x i fr om x , i.e . make x i = 0 . 2. Che ck al l p ossible plac ements for the e dge r esulting a new ve ctor ¯ x c onditioning on the p erformanc e function S ( ¯ x ) ≥ m t 3. With uniform pr ob ability cho ose one of the valid r e alizations. W e w ill apply the splitting algo rithm to t w o problems tak en f rom [3]. 4.2.1 A small problem F or this small problem we hav e the d egree sequence d = (5 , 6 , 1 , . . . , 1 | {z } 11 ones ) . The solution can be obtained analytically and already giv en in [3]: “T o count the n umb er of lab eled graph s with this degree sequence, note that there are  11 5  = 462 suc h graphs with ve rtex 1 not joined to v ertex 2 b y an edge (these graph s lo ok like t w o separate stars), and there are  11 4  7 5  = 6930 su c h graphs with an edge b et w een ve rtices 1 and 2 (these lo ok lik e tw o joined stars with an isolated ed ge left ov er). Thus, the total n umb er of realizat ions of d is 739 2.” As w e can s ee f rom T able 6, the algorithm easily h andles th e p roblem. T able 7 present s the typical dynamics. 18 T able 6: P erformance of the splitting algorithm for a small problem usin g N = 50 , 000 and ρ = 0 . 5. Run nr. of its. d |X ∗ | CPU 1 10 71 46.2 15.723 2 10 71 69.2 15.251 3 10 74 68.7 15.664 4 10 71 45.9 15.453 5 10 7583 15.555 6 10 72 06.4 15.454 7 10 70 79.3 15.495 8 10 75 45.1 15.347 9 10 75 97.2 15.836 10 10 71 81.2 15.612 Average 10 73 12.2 15.539 The relativ e error of d |X ∗ | o ve r 10 run s is 2 . 710 E − 02. T able 7: Typical d ynamics of the splitting algorithm for a sm all p roblem using N = 50 , 000 and ρ = 0 . 5 (recall Notation A at the b eginning of S ection 4). t d |X ∗ t | N t N (s) t m ∗ t m ∗ t ˆ c t 1 4.55E+ 12 29227 2 9227 -4 - 3 0 0.5845 2 2.56E+ 12 28144 2 8144 -4 - 1 8 0.5629 3 1.09E+ 12 21227 2 1227 -6 - 1 6 0.4245 4 3.38E+ 11 15565 1 5565 -4 - 1 4 0.3113 5 7.51E+ 10 11104 1 1104 -4 - 1 2 0.2221 6 1.11E+ 10 7408 7408 - 2 -10 0.148 2 7 1.03E+ 09 4628 4628 - 2 -8 0.09 26 8 5.37E+ 07 2608 2608 - 2 -6 0.05 22 9 1.26E+ 06 1175 1175 0 -4 0.0 235 10 7223.9 286 280 0 -2 0.0057 4.2.2 A large problem A muc h harder instance (see [3 ]) is d eﬁned b y d = (7 , 8 , 5 , 1 , 1 , 2 , 8 , 10 , 4 , 2 , 4 , 5 , 3 , 6 , 7 , 3 , 2 , 7 , 6 , 1 , 2 , 9 , 6 , 1 , 3 , 4 , 6 , 3 , 3 , 3 , 2 , 4 , 4) . In [3] the num b er of s u c h graphs is estimated to b e ab out 1 . 533 × 10 57 T able 8 present s 10 r uns using the splitting algorithm. 19 T able 8: Performance of th e splitting algorithm for a large problem using N = 100 , 000 and ρ = 0 . 5. Run nr. its. d |X ∗ | CPU 1 39 1.66E+ 57 4295 2 39 1.58E+ 57 4223 3 39 1.58E+ 57 4116 4 39 1.53E+ 57 4281 5 39 1.76E+ 57 4301 6 39 1.75E+ 57 4094 7 39 1.46E+ 57 4512 8 39 1.71E+ 57 4287 9 39 1.39E+ 57 4158 10 39 1 .38E+ 57 4264 Average 39 1.5 8E+5 7 4253 The relativ e error of d |X ∗ | o ve r 10 run s is 8 . 430 E − 02. 4.3 Binary Contingency T ables Giv en are tw o vecto rs of p ositiv e int egers r = ( r 1 , . . . , r m ) and c = ( c 1 , . . . , c n ) su ch that r i ≤ n for all i , c j ≤ n for all j , and P m i =1 r i = P n j =1 c j . A binary c ontingency table with ro w sums r and column sums c is a m × n matrix X of zero-one en tries x ij satisfying P n j =1 x ij = r i for ev ery r o w i and P m i =1 x ij = c j for ev ery column j . The problem is to count all con tingency ta bles. The extension of the prop osed Gibb s sampler f or counting th e con tingency tables is s traigh tforw ard . W e deﬁne the conﬁguration sp ace X = X (r) ∪ X (c) as the space where all column or row sums are satisﬁed: X (c) = ( X ∈ { 0 , 1 } m + n : m X i =1 x ij = c j ∀ j ) , X (r) =    X ∈ { 0 , 1 } m + n : n X j =1 x ij = r i ∀ i    . Clearly w e can s ample un iformly at ran d om fr om X withou t any p roblem. The score function S : X → Z − is deﬁned by S ( X ) =    − P m i =1 | P n j =1 x ij − r i | , for X ∈ X (c) , − P n j =1 | P m i =1 x ij − c j | , for X ∈ X (r) , 20 that is, the diﬀerence of the ro w sums P n j =1 x ij with the target r i if the column sums are r igh t, and vice v ersa. The Gibbs sampler is very s im ilar to th e one in the p revious section concerning random graphs w ith prescrib ed degrees. Algorithm 4.3 (Gibbs algorithm for random continge n cy tables sampling) . Give n a matrix r e alization X ∈ X (c) with sc or e S ( X ) ≥ m t . F or e ach c olumn j and for e ach 1-entry in this c olumn ( x ij = 1 ) do: 1. R emove this 1 , i.e. set x ′ ij = 0 . 2. Che ck al l p ossible plac ements for this 1 in the gi v en c olumn j c onditioning on the p erformanc e function S ( X ′ ) ≥ m t ( X ′ is th e matrix r esulting b y setting x ′ ij = 0 , x ′ i ′ j = 1 for some x i ′ j = 0 , and al l other entries r emain unchange d). 3. Supp ose that the set of valid r e al ization is A = { X ′ | S ( X ′ ) ≥ m t } . (Ple ase note that this set also c onta ins the original r e alization X ). Than with pr ob ability 1 |A| pick any r e alizatio n at r andom and c ontinue with step 1. Note that in this wa y we ke ep the column sums corr ect. Similarly , w hen we started with a matrix conﬁguration with all row su ms correct, we execute these steps for eac h ro w and s wap 1 and 0 p er r o w. 4.3.1 Model 1 The date are m = 12 , n = 12 with ro w and column su m s r = (2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2) , c = (2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2) . The true coun t v alue is kno wn to b e 21 , 959 , 547 , 410 , 0 77 , 2 00. T able 9 presents 10 runs using the splitting algorithm. T able 10 pr esen ts a typica l dynamics. 21 T able 9: P erformance of the splitting algorithm for Mod el 1 u s ing N = 50 , 000 and ρ = 0 . 5. Run nr.its. d |X ∗ | CPU 1 7 2.1 5E+1 6 4.54 2 7 2.3 2E+1 6 4.55 3 7 2.2 3E+1 6 4.54 4 7 2.1 1E+1 6 4.58 5 7 2.0 5E+1 6 4.57 6 7 2.2 3E+1 6 4.54 7 7 2.0 2E+1 6 4.55 8 7 2.3 8E+1 6 4.58 9 7 2.0 6E+1 6 4.57 10 7 2.1 4E+1 6 4.55 Average 7 2.17E+16 4.56 The relativ e error of d |X ∗ | o ve r 10 run s is 5 . 210 E − 02. T able 10: Typical dyn amics of the splitting algorithm f or Mo del 1 u sing N = 50 , 000 and ρ = 0 . 5. t d |X ∗ t | N t N ( s ) t m ∗ t m ∗ t ˆ c t 1 4.56E+2 1 1336 1 1336 1 -2 -2 4 0.6681 2 2.68E+ 21 117 47 117 47 -2 - 1 2 0.5874 3 1.10E+2 1 8234 8234 -2 -10 0.4117 4 2.76E+ 20 5003 5003 - 2 -8 0.25 02 5 3.45E+1 9 2497 2497 0 -6 0.12 49 6 1.92E+ 18 1112 1112 0 -4 0.0 556 7 2.08E+ 16 217 217 0 -2 0.01 09 4.3.2 Model 2 Darwin’s Finc h Data from Y uguo Chen, P ersi Diaconis, Su s an P . Holmes, and Jun S. Liu: m = 12 , n = 17 with row and columns sums r = (14 , 13 , 14 , 10 , 12 , 2 , 10 , 1 , 10 , 11 , 6 , 2) , c = (3 , 3 , 10 , 9 , 9 , 7 , 8 , 9 , 7 , 8 , 2 , 9 , 3 , 6 , 8 , 2 , 2) . The true coun t v alue is kn o wn to b e 67 , 149 , 106 , 137 , 567 , 600. T able 11 presents 10 runs using the splitting algorithm. 22 T able 11: P erformance of th e splitting algorithm for Mo d el 2 using N = 200 , 000 and ρ = 0 . 5. Run nr. its. d |X ∗ | CPU 1 24 6.1 6 E+16 246 .83 2 24 6.5 0 E+16 244 .42 3 24 7.0 7 E+16 252 .71 4 24 7.9 1 E+16 247 .36 5 24 6.6 1 E+16 260 .99 6 24 6.7 7 E+16 264 .07 7 24 6.5 9 E+16 269 .86 8 24 6.5 1 E+16 273 .51 9 24 7.1 0 E+16 272 .49 10 24 5.9 1 E+16 267 .23 Average 2 4 6.71 E+16 259 .95 The relativ e error of d |X ∗ | o ve r 10 run s is 7 . 850 E − 02. 5 Concluding Remarks In this pap er we applied the splitting metho d to sev eral well- known coun ting prob- lems, like 3-SA T, random graph s with prescrib ed degrees and b inary continge ncy tables. While imp lementing the splitting algorithm, w e d iscussed several MCMC algorithms and in particular the Gibbs s amp ler. W e sho w ho w to incorp orate the classic capture-recapture metho d in the sp litting algorithm in order to obtain a lo w v ariance estimator for the desired counting quantit y . F u rthermore, we presen ted an extended v ersion of the capture-recapture algorithm, whic h is suitable for p roblems with a larger n umb er of feasible solutions. W e ﬁ nally presente d n um erical results with the splitting and capture-recapture estimators, and show ed the sup eriority of the latter. References [1] Asm ussen , S. and Glynn , P .W. (2007). Sto cha stic Simulation: Algo rithms and Ana lyses , Springer. [2] Blanc het, J . and Rudo y , D. (2009). “Rare-ev en t simulat ion and counting pr ob- lems”, in R ar e Event Si mulation , eds. G. Rubino and B. T uﬃn , Wiley , pp . 171-1 92. 23 [3] Blitzstein, J. an d Diaconis, P . (2006). “A sequential imp ortance samp ling al- gorithm for generating rand om graphs with pr escrib ed degrees”, Prepr int. [4] Botev, Z.I. and K ro ese, D.P . (2008 ). “An eﬃcient algorithm for r are-eve nt probabilit y estima tion, com binatorial op timization, and coun ting”, Metho dol- o gy and Computing in Applie d Pr ob ability 10, pp. 471-505. [5] Botev, Z .I. and Kro ese, D.P . (2011 ). “Eﬃcient Mon te Carlo simulat ion via the generalized splitting metho d”, to app ear in Statistics and Com puting . [6] L’Ecuy er, P ., Demers, V., and T uﬃn, B. (2007). “Rare-ev en ts, sp litting, and Quasi-Mon te Carlo”, ACM T r ansactions on Mo deling and Computer Simula- tion 17, issu e 2, artic le 9. [7] Garv els, M.J.J. (2000). The splitting metho d in r ar e-e v ent simulation , Ph .D. thesis, Unive rs it y of Tw en te. [8] Garv els, M.J.J. and Rubin stein, R.Y. (2010). “A com bined sp litting - cross en tropy m etho d for rare ev ent probabilit y estimation of single queu es and A TM Net w orks”, S u bmitted. [9] Gogate , V. and Dec hter, R. (2007). “Approxima te counti ng by sampling the bac ktrac k-free search space”, in Pr o c e e dings 22-nd Confer e nc e on Artiﬁcial Intel ligenc e , p p. 198-203 . [10] Gomes, C.P ., Sabharwal, A., an d Salman, B. (2008). “Mo d el Counting”, Hand- b o ok of Satisﬁability , eds. A. Bie re, M. Heule, H. v an Maaren and T. W alsc h, IOS Press. [11] Lagnoux-Renaudie, A. (2009 ). “A t w o-steps branc hin g sp litting mo del under cost constrain t”, J ournal of Applie d Pr ob ability 46, pp. 429-452 . [12] Melas, V.B. (1997). ”On the eﬃciency of the s p litting and rou lette appr oac h for s en sitivit y analysis”, in Pr o c e e dings of 1997 Winter Si mulation Confer enc e , eds. D. Andradottir, K.J. Healy , D.H. Withers, and B.L. Nelson, pp. 269-274. [13] P apadimitriou, C.H. (19 94). Computa tional c omplexity , Addison W esley . [14] Rubinstein, R.Y. (2010). “Rand omized algorithms with splitting: Wh y the classic rand omized algorithms do not w ork and h o w to make them w ork”, Metho dolo gy and Computing in A pplie d P r ob ability 12, pp. 1-50. [15] Rubinstein, R.Y. and Kro ese, D.P . (2008 ). Simulation and the Monte Carlo Metho d , second edition, Wiley . [16] Seb er, G.A.F. (1970). “The eﬀect of trap resp onse on tag recaptur e estimates”, Biometrics 26, pp. 13-22. 24 [17] W ei, W. and Selman, B. (2005). “A new appr oac h to mo d el counting” . In Pr o- c e e dings of SA T-05: 8th International Confer enc e on The ory and Applic atio ns of Satisﬁability T esting , volume 3569 of “Lecture Notes in Comp uter Science”, St. Andrews, U.K, pp . 324-33 9. 25

Counting with Combined Splitting and Capture-Recapture Methods

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment