The Magic of Logical Inference in Probabilistic Programming

Under c onsider ation for public ation in The ory and Pr actic e of L o gic Pr o gr amming 1 The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming Bernd Gutmann, Ingo Thon, Angelik a Kimmig, Maurice Bruyno oghe and Luc De Raedt Dep artment of Computer Scienc e, Katholieke Universiteit L euven, Celestijnenlaan 200A - bus 2402, 3001 Heverle e, Belgium { ﬁrstname.lastname } @cs.kuleuven.b e submitte d 1 January 2003; r evise d 1 January 2003; ac c epte d 1 January 2003 Abstract T o da y , man y diﬀerent probabilistic programming languages exist and even more inference mec hanisms for these languages. Still, most logic programming based languages use back- w ard reasoning based on SLD resolution for inference. While these metho ds are t ypically computationally eﬃcien t, they often can neither handle inﬁnite and/or con tinuous distri- butions, nor evidence. T o ov ercome these limitations, w e in tro duce distributional clauses, a v ariation and extension of Sato’s distribution semantics. W e also contribute a nov el ap- pro ximate inference metho d that integrates forward reasoning with imp ortance sampling, a w ell-known tec hnique for probabilistic inference. T o achiev e eﬃciency , we in tegrate tw o logic programming tec hniques to direct forward sampling. Magic sets are used to fo cus on relev an t parts of the program, while the integration of backw ard reasoning allows one to iden tify and av oid regions of the sample space that are inconsistent with the evidence. 1 In tro duction The adv ent of statistical relational learning (Getoor and T ask ar 2007; De Raedt et al. 2008) and probabilistic programming (De Raedt et al. 2008) has resulted in a v ast num b er of diﬀeren t languages and systems such as PRISM (Sato and Kameya 2001), ICL (P o ole 2008), ProbLog (De Raedt et al. 2007), Dyna (Eisner et al. 2005), BLPs (Kersting and De Raedt 2008), CLP( B N ) (Santos Costa et al. 2008), BLOG (Milc h et al. 2005), Ch urch (Go odman et al. 2008), IBAL (Pfeﬀer 2001), and MLNs (Richardson and Domingos 2006). While inference in these languages generally in volv es ev aluating the probabilit y distribution deﬁned by the model, often conditioned on evidence in the form of kno wn truth v alues for some atoms, this div ersity of systems has led to a v ariety of inference approaches. Languages such as IBAL, BLPs, MLNs and CLP( B N ) com bine kno wledge-based model construction to generate a graphical mo del with standard inference techniques for such mo dels. Some probabilistic programming languages, for instance BLOG and Ch urch, use sampling for approximate inference in generative models, that is, they estimate probabilities from a large n um b er of randomly generated program traces. Finally , probabilistic logic programming framew orks such as ICL, PRISM and ProbLog, com bine SLD-resolution with probability calculations. So far, the second approach based on sampling has received little atten tion in 2 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt logic programming based systems. In this pap er, w e inv estigate the integration of sampling-based approac hes into probabilistic logic programming frameworks to broaden the applicability of these. P articularly relev an t in this regard are the abil- it y of Ch urch and BLOG to sample from contin uous distributions and to answ er conditional queries of the form p ( q | e ) where e is the evidence. T o accommo date (con tinuous and discrete) distributions, w e introduce distributional clauses , which deﬁne random v ariables together with their asso ciated distributions, conditional up on logical predicates. Random v ariables can b e passed around in the logic pro- gram and the outcome of a random v ariable can b e compared with other v alues by means of special built-ins. T o formally establish the semantics of this new construct, w e show that these random v ariables deﬁne a basic distribution o ver facts (using the comparison built-ins) as required in Sato’s distribution semantics (Sato 1995), and thus induces a distribution ov er least Herbrand mo dels of the program. This con trasts with previous instances of the distribution se man tics in that we no longer en umerate the probabilities of alternatives, but instead use arbitrary densities and distributions. F rom a logic programming p erspective, BLOG (Milch et al. 2005) and related approac hes p erform forwar d r e asoning , that is, the samples needed for probability estimation are generated starting from kno wn facts and deriving additional facts, th us generating a p ossible world . PRISM and related approaches follow the opposite approac h of b ackwar d r e asoning , where inference starts from a query and follows a chain of rules backw ards to the basic facts, thus generating pr o ofs . This diﬀer- ence is one of the reasons for using sampling in the ﬁrst approach: exact forward inference would require that all p ossible worlds b e generated, which is infeasible in most cases. Based on this observ ation, we contribute a new inference metho d for probabilistic logic programming that com bines sampling-based inference tec hniques with forward reasoning. On the probabilistic side, the approach uses rejection sam- pling (Koller and F riedman 2009), a well-kno wn sampling technique that rejects samples that are inconsistent with the evidence. On the logic programming side, w e adapt the magic set tec hnique (Bancilhon et al. 1986) tow ards the probabilistic setting, thereb y com bining the adv an tages of forw ard and bac kward reasoning. F ur- thermore, the inference algorithm is improv ed along the lines of the SampleSe ar ch algorithm (Gogate and Dec hter 2011), whic h a v oids c hoices leading to a sample that cannot b e used in the probabilit y estimation due to inconsistency with the evidence. W e realize this using a heuristic based on backw ard reasoning with limited pro of length, the b eneﬁt of which is exp erimen tally conﬁrmed. This nov el approach to inference creates a num b er of new p ossibilities for applications of probabilistic logic programming systems, including con tinuous distributions and Bay esian inference. This pap er is organized as follows: we start by reviewing the basic concepts in Section 2. Section 3 introduces the new language and its seman tics, Section 4 a no v el forw ard sampling algorithm for probabilistic logic programs. Before concluding, w e ev aluate our approach in Section 5. The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 3 2 Preliminaries 2.1 Pr ob abilistic Infer enc e A discrete probabilistic mo del deﬁnes a probability distribution p ( · ) ov er a set Ω of basic outcomes, that is, v alue assignments to the mo del’s random v ariables. This distribution can then b e used to ev aluate a conditional probability distribution p ( q | e ) = p ( q ∧ e ) p ( e ) , also called tar get distribution . Here, q is a query inv olving random v ariables, and e is the evidenc e , that is, a partial v alue assignment of the random v ariables 1 . Ev aluating this target distribution is called pr ob abilistic infer enc e (Koller and F riedman 2009). In probabilistic logic programming, random v ariables often corresp ond to ground atoms, and p ( · ) thus deﬁnes a distribution ov er truth v alue assignmen ts, as w e will see in more detail in Sec. 2.3 (but see also De Raedt et al. 2008). Probabilistic inference then asks for the probability of a logical query b eing true giv en truth v alue assignments for a num ber of such ground atoms. In general, the probability p ( · ) of a query q is in the discrete case the sum ov er those outcomes ω ∈ Ω that are consistent with the query . In the contin uous case, the sum is replaced by an (multidimensional) integral and the distribution p ( · ) by a (pro duct of ) densities F ( · ) That is, p ( q ) = X ω ∈ Ω p ( ω ) 1 q ( ω ) , and p ( q ) = Z · · · Z Ω 1 q ( ω ) d F ( ω ) (1) where 1 q ( ω ) = 1 if ω | = q and 0 otherwise. As common (e.g. (W asserman 2003)) w e will use for con venience the notation R xdF ( x ) as unifying notation for b oth discrete and con tinuous distributions. As Ω is often very large or even inﬁnite, exact inference based on the summation in (1) quickly b ecomes infeasible, and inference has to resort to approximation tec hniques based on samples , that is, randomly dra wn outcomes ω ∈ Ω. Giv en a large set of such samples { s 1 , . . . , s N } drawn from p ( · ), the probability p ( q ) can b e estimated as the fraction of samples where q is true. If samples are instead drawn from the target distribution p ( ·| e ), the latter can directly b e estimated as ˆ p ( q | e ) := 1 N N X i =1 1 q ( s i ) . Ho wev er, sampling from p ( ·| e ) is often highly ineﬃcient or infeasible in practice, as the evidence needs to b e taken in to account. F or instance, if one would use the standard deﬁnition of conditional probability to generate samples from p ( · ), all samples that are not consistent with the evidence do not contribute to the estimate and w ould thus hav e to b e discarded or, in sampling terminology , r eje cte d . More adv anced sampling metho ds therefore often resort to a so-called pr op osal distribution whic h allows for easier sampling. The error introduced by this sim- pliﬁcation then needs to b e accounted for when generating the estimate from the 1 If e contains assignments to contin uous v ariables, p ( e ) is zero. Hence, evidence on contin uous v alues has to be deﬁned via a probability density function, also called a sensor model. 4 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt set of samples. An example for such a metho d is imp ortanc e sampling , where each sample s i has an asso ciated weight w i . Samples are drawn from an imp ortanc e dis- tribution π ( ·| e ), and weigh ts are deﬁned as w i = p ( s i | e ) π ( s i | e ) . The true target distribution can then b e estimated as ˆ p ( q | e ) = 1 W N X i =1 w i · 1 q ( s i ) where W = P i w i is a normalization constant. The simplest instance of this algo- rithm is r eje ction sampling as already sk etched abov e, where the samples are drawn from the prior distribution p ( · ) and weigh ts are 1 for those samples consistent with the evidence, and 0 for the others. Especially for evidence with lo w probabilit y , rejection sampling suﬀers from a very high rejection rate, that is, man y samples are generated, but do not con tribute to the ﬁnal estimate. This is kno wn as the r eje ction pr oblem . One wa y to address this problem is likeliho o d weighte d sampling , which dynamically adapts the prop osal distribution during sampling to av oid choosing v alues for random v ariables that cause the sample to b ecome inconsistent with the evidence. Again, this requires corresp onding modiﬁcations of the asso ciated weigh ts in order to pro duce correct estimates. 2.2 L o gic al Infer enc e A (deﬁnite) clause is an expression of the form h :- b 1 , . . . , b n , where h is called head and b 1 , . . . , b n is the b ody . A program consists of a set of clauses and its semantics is given b y its least Herbrand mo del. There are at least tw o wa ys of using a deﬁnite clause in a logical deriv ation. First, there is b ackwar d chaining , which states that to prov e a goal h with the clause it suﬃces to pro ve b 1 , . . . , b n ; second, there is forwar d chaining , which starts from a set of known facts b 1 , . . . , b n and the clause and concludes that h also holds (cf. (Nilsson and Ma liszy ´ nski 1996)). Prolog em- plo ys bac kward c haining (SLD-resolution) to answ er queries. SLD-resolution is very eﬃcien t b oth in terms of time and space. How ev er, similar subgoals ma y b e derived m ultiple times if the query contains recursiv e calls. Moreo ver, SLD-resolution is not guaran teed to alw a ys terminate (when searching depth-ﬁrst). Using forw ard reason- ing, on the other hand, one starts with what is known and employs the immediate consequence op erator T P un til a ﬁxp oint is reac hed. This ﬁxp oin t is identical to the least Herbrand mo del. Deﬁnition 1 ( T P op erator) . L et P b e a lo gic pr o gr am c ontaining a set of deﬁnite clauses and g r ound ( P ) the set of al l gr ound instanc es of these clauses. Starting fr om a set of gr ound facts S the T P op er ator r eturns T P ( S ) = { h | h :- b 1 , . . . , b n ∈ g round ( P ) and { b 1 , . . . , b n } ⊆ S } 2.3 Distribution Semantics Sato’s distribution semantics (Sato 1995) extends logic programming to the prob- abilistic setting by choosing truth v alues of basic facts randomly . The core of this The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 5 seman tics lies in splitting the logic program into a set F of facts and a set R of rules . Giv en a probability distribution P F o ver the facts, the rules then allow one to extend P F in to a distribution ov er least Herbrand mo dels of the logic program. Suc h a Herbrand mo del is called a p ossible world . More precisely , it is assumed that D B = F ∪ R is ground and denumerable, and that no atom in F uniﬁes with the head of a rule in R . Each truth v alue assignmen t to F giv es rise to a unique least Herbrand mo del of DB . Th us, a probability distribu- tion P F o ver F can directly b e extended in to a distribution P DB o ver these mo dels. F urthermore, Sato shows that, giv en an enumeration f 1 , f 2 , . . . of facts in F , P F can b e constructed from a series of ﬁnite distributions P ( n ) F ( f 1 = x 1 , . . . , f n = x n ) pro vided that the series fulﬁlls the so-called compatibility condition, that is, P ( n ) F ( f 1 = x 1 , . . . , f n = x n ) = X x n +1 P ( n +1) F ( f 1 = x 1 , . . . , f n +1 = x n +1 ) (2) 3 Syn tax and Seman tics Sato’s distribution semantics, as summarized in Sec. 2.3, pro vides the basis for most probabilistic logic programming languages including PRISM (Sato and Kameya 2001), ICL (Poole 2008), CP-logic (V ennekens et al. 2009) and ProbLog (De Raedt et al. 2007). The precise wa y of deﬁning the basic distribution P F diﬀers among languages, though the theoretical foundations are essentially the same. The most basic instance of the distribution semantics, employ ed by ProbLog, uses so-called pr ob abilistic facts . Each ground instance of a pr ob abilistic fact directly corresp onds to an indep enden t random v ariable that tak es either the v alue “true” or “false”. These probabilistic facts can also b e seen as binary switches, cf. (Sato 1995), which again can b e extended to multi-ary switches or c hoices as used b y PRISM and ICL. F or switc hes, at most one of the probabilistic facts b elonging to the switc h is “true” according to the sp eciﬁed distribution. Finally , in CP-logic, such choices are used in the head of rules leading to the so-called annotate d disjunction . Hybrid ProbLog (Gutmann et al. 2010) extends the distribution semantics with con tinuous distributions. T o allo w for exact inference, Hybrid ProbLog imp oses sev ere restrictions on the distributions and their further use in the program. Two sampled v alues, for instance, cannot be compared against eac h other. Only compar- isons that in v olve one sampled v alue and one n um b er constant are allo wed. Sampled v alues ma y not b e used in arithmetic expressions or as parameters for other distri- butions, for instance, it is not p ossible to sample a v alue and use it as the mean of a Gaussian distribution. It is also not p ossible to reason ov er an unknown num b er of ob jects as BLOG (Milc h et al. 2005) do es, though this is the case mainly for algorithmic reasons. Here, we alleviate these restrictions by deﬁning the basic distribution P F o ver probabilistic facts based on b oth discrete and contin uous random v ariables. W e use a three-step approach to deﬁne this distribution. First, we introduce explicit ran- dom v ariables and corresponding distributions ov er their domains, b oth denoted b y terms. Second, we use a mapping from these terms to terms denoting (sampled) outcomes, which, then, are used to deﬁne the basic distribution P F on the level of 6 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt probabilistic facts. F or instance, assume that an urn contains an unknown num b er of balls where the num b er is dra wn from a Poisson distribution and w e say that this urn con tains many balls if it con tains at least 10 balls. W e introduce a random v ari- able number , and we deﬁne many :- dist gt ( ' ( number ) , 9 ) . Here, ' ( number ) is the Herbrand term denoting the sampled v alue of number , and dist gt ( ' ( number ) , 9 ) is a probabilistic fact whose probability of b eing true is the exp ectation that this v alue is actually greater than 9. This probabilit y then carries ov er to the derived atom many as w ell. W e will elab orate on the details in the following. 3.1 Syntax In a logic program, follo wing Sato, w e distinguish b et ween probabilistic facts, whic h are used to deﬁne the basic distribution, and rules, which are used to derive addi- tional atoms. 2 Probabilistic facts are not allow ed to unify with any rule head. The distribution ov er facts is based on random v ariables, whose distributions we deﬁne through so called distributional clauses. Deﬁnition 2 (Distributional clause) . A distributional clause is a deﬁnite clause with an atom h ∼ D in the he ad wher e ∼ is a binary pr e dic ate use d in inﬁx notation. F or eac h ground instance ( h ∼ D :- b 1 , . . . , b n ) θ with θ b eing a substitution o ver the Herbrand universe of the logic program, the distributional clause de- ﬁnes a random v ariable h θ and an asso ciated distribution D θ . In fact, the dis- tribution is only deﬁned when ( b 1 , . . . , b n ) θ is true in the semantics of the logic program. These random v ariables are terms of the Herbrand universe and can b e used as any other term in the logic program. F urthermore, a term ' ( d ) con- structed from the reserved functor ' / 1 represents the outcome of the random v ari- able d . These functors can b e used inside calls to sp ecial predicates in dist r el = { dist eq / 2 , dist l t/ 2 , dist l eq/ 2 , dist g t/ 2 , dist g eq / 2 } . W e assume that there is a fact for each of the ground instances of these predicate calls. These facts are the pr ob abilistic facts of Sato’s distribution seman tics. Note that the set of probabilistic facts is enumerable as the Herbrand universe of the program is enumerable. A term ' ( d ) links the random v ariable d with its outcome. The probabilistic facts compare the outcome of a random v ariable with a constant or the outcome of another ran- dom v ariable and succeed or fail according to the probability distribution(s) of the random v ariable(s). Example 1 (Distributional clauses) . nballs ∼ poisson ( 6 ) . (3) color ( B ) ∼ [ 0 . 7 : b , 0 . 3 : g ] :- between ( 1 , ' ( nballs ) , B ) . (4) diameter ( B , MD ) ∼ gamma ( MD / 20 , 20 ) :- between ( 1 , ' ( nballs ) , B ) , mean diameter ( ' ( color ( B )) , MD ) . (5) 2 A rule can hav e an empty b ody , in which case it represents a deterministic fact. The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 7 The deﬁne d distributions dep end on the fol lowing lo gic al clauses: mean diameter ( C , 5 ) :- dist eq ( C , b ) . mean diameter ( C , 10 ) :- dist eq ( C , g ) . between ( I , J , I ) :- dist leq ( I , J ) . between ( I , J , K ) :- dist lt ( I , J ) , I1 is I + 1 , between ( I1 , J , K ) . The distributional clause (3) mo dels the numb er of b al ls as a Poisson distribution with me an 6. The distributional clause (4) mo dels a discr ete distribution for the r andom variable c olor(B). With pr ob ability 0.7 the b al l is blue and gr e en otherwise. Note that the distribution is deﬁne d only for the values B for which between (1 , ' ( nball s ) , B ) suc c e e ds. Exe cution of c al ls to the latter give rise to c al ls to pr ob abilistic facts that ar e instanc es of dist l eq ( I , ' ( nbal ls )) and dist l t ( I , ' ( nball s )) . Similarly, the distributional clause (5) deﬁnes a gamma distribution that is also c onditional ly deﬁne d. Note that the c onditions in the distribution dep end on c al ls of the form mean diameter ( ' ( col or ( n )) , M D ) with n a value r eturne d by b etwe en/3. Exe cution of this c al l ﬁnal ly le ads to c al ls dist eq ( ' ( color ( n )) , b ) and dist eq ( ' ( color ( n )) , g ) . It lo oks feasible, to allow ' ( d ) terms everywhere and to hav e a simple pro- gram analysis insert the sp ecial predicates in the appropriate places by replacing < / 2, > / 2, ≤ / 2, ≥ / 2 predicates by dist r el/ 2 facts. Though extending uniﬁca- tion is a bit harder: as long as a ' ( h ) term is uniﬁed with a free v ariable, standard uniﬁcation can be p erformed; only when the other term is bound an extension is re- quired. In this pap er, we assume that the sp ecial predicates dist eq / 2 , dist lt / 2 , dist leq / 2 , dist gt / 2 , and dist geq / 2 are used whenev er the outcome of a ran- dom v ariable need to b e compared with another v alue and that it is safe to use standard uniﬁcation whenev er a ' ( h ) term is used in another predicate. F or the basic distribution on facts to b e well-deﬁned, a program has to fulﬁll a set of v alidit y criteria that hav e to b e enforced by the programmer. Deﬁnition 3 (V alid program) . A pr o gr am P is c al le d v alid if: (V1) In the r elation h ∼ D that holds in the le ast ﬁxp oint of a pr o gr am, ther e is a functional dep endency fr om h to D , so ther e is a unique gr ound distribution D for e ach gr ound r andom variable h . (V2) The pr o gr am is distribution-stratiﬁed , that is, ther e exists a function r ank ( · ) that maps gr ound atoms to N and that satisﬁes the fol lowing pr op erties: (1) for e ach gr ound instanc e of a distribution clause h ∼ D :- b 1 , . . . b n holds r ank ( h ∼ D > r ank ( b i ) (for al l i ). (2) for e ach gr ound instanc e of another pr o gr am clause: h :- b 1 , . . . b n holds r ank ( h ) ≥ r ank ( b i ) (for al l i ). (3) for e ach gr ound atom b that c ontains (the name of ) a r andom variable h , r ank ( b ) ≥ r ank ( h ∼ D ) (with h ∼ D the he ad of the distribution clause deﬁning h ). (V3) Al l gr ound pr ob abilistic facts or, to b e mor e pr e cise, the c orr esp onding indi- c ator functions ar e Leb esgue-measurable . (V4) Each atom in the le ast ﬁxp oint c an b e derive d fr om a ﬁnite numb er of pr ob- abilistic facts ( ﬁnite supp ort condition (Sato 1995)). 8 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt T ogether, (V1) and (V2) ensure that a single basic distribution P F o ver the probabilistic facts can b e obtained from the distributions of individual random v ariables deﬁned in P . The requirement (V3) is crucial. It ensures that the series of distributions P ( n ) F needed to construct this basic distribution is well-deﬁned. Finally , the n umber of facts o ver which the basic distribution is deﬁned needs to b e coun table. This is true, as we hav e a ﬁnite n umber of constants and functors: those app earing in the program. 3.2 Distribution Semantics W e now deﬁne the series of distributions P ( n ) F where we ﬁx an en umeration f 1 , f 2 , . . . of probabilistic facts such that i < j = ⇒ r ank ( f i ) ≤ r ank ( f j ) where r ank ( · ) is a r anking function showing that the program is distribution-stratiﬁed. F or each predicate rel / 2 ∈ dist r el , we deﬁne an indic ator function as follows: I 1 rel ( X 1 , X 2 ) = ( 1 if rel ( X 1 , X 2 ) is true 0 if rel ( X 1 , X 2 ) is false (6) F urthermore, we set I 0 rel ( X 1 , X 2 ) = 1 . 0 − I 1 rel ( X 1 , X 2 ). W e then use the exp ected v alue of the indicator function to deﬁne probabilit y distributions P ( n ) F o ver ﬁnite sets of ground facts f 1 , . . . , f n . Let { rv 1 , . . . r v m } be the set of random v ariables these n facts dep end on, ordered such that if r ank ( rv i ) < r ank ( rv j ), then i < j (cf. (V2) in Deﬁnition 3). F urthermore, let f i = r el i ( t i 1 , t i 2 ), x j ∈ { 1 , 0 } , and θ − 1 = {' ( r v 1 ) /V 1 , . . . , ' ( r v m ) /V m } . The latter replaces all ev aluations of random v ariables on which the f i dep end by v ariables for integration. P ( n ) F ( f 1 = x 1 , . . . , f n = x n ) = E [ I x 1 rel 1 ( t 11 , t 12 ) , . . . , I x n rel n ( t n 1 , t n 2 )] (7) = Z · · · Z  I x 1 rel 1 ( t 11 θ − 1 , t 12 θ − 1 ) · · · I x n rel n ( t n 1 θ − 1 , t n 2 ) θ − 1  d D rv 1 ( V 1 ) · · · d D rv m ( V m ) Example 2 (Basic Distribution) . L et f 1 , f 2 , . . . = dist lt ( ' ( b 1) , 3) , dist l t ( ' ( b 2) , ' ( b 1)) , . . . . The se c ond distribution in the series then is P (2) F ( dist lt ( ' ( b 1) , 3) = x 1 , dist lt ( ' ( b 2) , ' ( b 1)) = x 2 ) = E [ I x 1 dist lt ( ' ( b 1) , 3) , I x 2 dist lt ( ' ( b 2) , ' ( b 1))] = Z Z ( I x 1 dist lt ( V 1 , 3) , I x 2 dist lt ( V 2 , V 1)) d D b 1 ( V 1) d D b 2 ( V 2) By no w we are able to prov e the following prop osition. Prop osition 1. L et P b e a valid pr o gr am. P deﬁnes a pr ob ability me asur e P P over the set of ﬁxp oints of the T P op er ator. Henc e, P also deﬁnes for an arbitr ary formula q over atoms in its Herbr and b ase the pr ob ability that q is true. Pr o of sketch. It suﬃces to sho w that the series of distributions P ( n ) F o ver facts (cf. (7)) is of the form that is required in the distribution semantics, that is, these are w ell-deﬁned probability distributions that satisfy the compatibility condition, cf. (2). This is a direct consequence of the deﬁnition in terms of indicator functions and the measurabilit y of the underlying facts required for v alid programs. The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 9 3.3 T P Semantics In the following, w e give a pro cedural view onto the semantics by extending the T P op erator of Deﬁnition 1 to deal with probabilistic facts dist rel ( t 1 , t 2 ). T o do so, we introduce a function ReadT able ( · ) that keeps track of the sampled v al- ues of random v ariables to ev aluate probabilistic fac ts. This is required b ecause in terpretations of a program only contain such probabilistic facts, but not the un- derlying outcomes of random v ariables. Giv en a probabilistic fact dist rel ( t1 , t2 ), ReadT able returns the truth v alue of the fact based on the v alues of the random v ariables h in the arguments, which are either retrieved from the table or sampled according to their deﬁnition h ∼ D as included in the interpretation and stored in case they are not y et av ailable. Deﬁnition 4 (Sto c hastic T P op erator) . L et P b e a valid pr o gr am and g r ound ( P ) the set of al l gr ound instanc es of clauses in P . Starting fr om a set of gr ound facts S the S T P op er ator r eturns S T P ( S ) := n h    h :- b 1 , . . . , b n ∈ g round ( P ) and ∀ b i : either b i ∈ S or  b i = dist r el ( t 1 , t 2) ∧ ( t j = ' ( h ) → ( h ∼ D ) ∈ S ) ∧ ReadT able ( b i ) = tr ue  o ReadT able ensures that the basic facts are sampled from their join t distribution as deﬁned in Sec. 3.2 during the construction of the standard ﬁxp oin t of the logic program. Thus, each ﬁxp oin t of the S T P op erator corresp onds to a p ossible world whose probabilit y is given by the distribution semantics. 4 F orw ard sampling using Magic Sets and backw ard reasoning In this section we introduce our new metho d for probabilistic forward inference. T o this aim, we ﬁrst extend the magic set transformation to distributional clauses. W e then develop a rejection sampling scheme using this transformation. This scheme also incorp orates bac kward reasoning to c heck for consistency with the evidence during sampling and th us to reduce the rejection rate. 4.1 Pr ob abilistic magic set tr ansformation The disadv an tage of forward reasoning in logic programming is that the search is not goal-driven, whic h might generate irrelev ant atoms. The magic set trans- formation (Bancilhon et al. 1986; Nilsson and Ma liszy ´ nski 1996) fo cuses forward reasoning in logic programs tow ards a goal to av oid the generation of uninteresting facts. It th us combines the adv antages of b oth reasoning directions. Deﬁnition 5 (Magic Set T ransformation) . If P is a lo gic pr o gr am, then we use Magic ( P ) to denote the smal lest pr o gr am such that if A 0 :- A 1 , . . . , A n ∈ P then • A 0 :- c ( A 0 ) , A 1 , . . . , A n ∈ Ma gic ( P ) and • for e ach 1 ≤ i ≤ n : c ( A i ) :- c ( A 0 ) , A 1 , . . . , A i − 1 ∈ Ma gic ( P ) 10 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt The meaning of the additional c / 1 atoms (c=call) is that they “switch on” clauses when they are needed to pro ve a particular goal. If the corresponding switch for the head atom is not true, the b ody is not true and thus cannot b e prov en. The magic transformation is b oth sound and complete. F urthermore, if the SLD-tree of a goal is ﬁnite, forward reasoning in the transformed program terminates. The same holds if forw ard reasoning on the original program terminates. W e now extend this transformation to distributional clauses. The idea is that the distributional clause for a random v ariable h is activ ated when there is a call to a probabilistic fact dist r el ( t 1 , t 2 ) dep ending on h . Deﬁnition 6 (Probabilistic Magic Set T ransformation) . F or pr o gr am P , let P L b e P without distributional clauses. M ( P ) is the smal lest pr o gr am s.t. Magic ( P L ) ⊆ M ( P ) and for e ach h ∼ D :- b 1 , . . . , b n ∈ P and rel ∈ { eq , lt , leq , gt , geq } : • h ∼ D :- ( c ( dist rel ( ' ( h ) , X )); c ( dist rel ( X , ' ( h ))) , b 1 , . . . , b n . ∈ M ( P ) . • c ( b i ) :- ( c ( dist rel ( ' ( h ) , X )); c ( dist rel ( X , ' ( h ))) , b 1 , . . . , b i − 1 . ∈ M ( P ) . Then PMagic ( P ) c onsists of: • a clause a p ( t 1 , . . . , t n ) :- c ( p ( t 1 , . . . , t n )) , p ( t 1 , . . . , t n ) for e ach built-in pr e d- ic ate (including dist rel / 2 for rel ∈ { eq , lt , leq , gt , geq } ) use d in M ( P ) . • a clause h :- b 0 1 , . . . , b 0 n for e ach clause h :- b 1 , . . . , b n ∈ M ( P ) wher e b 0 i = a b i if b i uses a built-in pr e dic ate and else b 0 i = b i . Note that ev ery call to a built-in b is replaced by a call to a b ; the latter pred- icate is deﬁned by a clause that is activ ated when there is a call to the built-in ( c ( b )) and that eﬀectively calls the built-in. The transformed program computes the distributions only for random v ariables whose v alue is relev an t to the query . These distributions are the same as those obtained in a forward computation of the original program. Hence w e can show: Lemma 1. L et P b e a pr o gr am and PMagic ( P ) its pr ob abilistic magic set tr ans- formation extende d with a se e d c ( q ) . The distribution over q deﬁne d by P and by PMagic ( P ) is the same. Pr o of sketch. In both programs, the distribution ov er q is determined by the distri- butions of the atoms dist eq ( t 1 , t 2 ), dist l eq ( t 1 , t 2 ), dist l t ( t 1 , t 2 ), dist g eq ( t 1 , t 2 ), and dist g t ( t 1 , t 2 ) on which q dep ends in a forward computation of the program P . The magic set transformation ensures that these atoms are called in the forward execution of PMagic ( P ). In PMagic ( P ), a call to such an atom activ ates the dis- tributional clause for the inv olved random v ariable. As this distributional clause is a logic program clause, soundness and completeness of the magic set transformation ensures that the distribution obtained for that random v ariable is the same as in P . Hence also the distribution o ver q is the same for b oth programs. 4.2 R eje ction sampling with heuristic lo okahe ad As discussed in Section 2.1, sampling-based approac hes to probabilistic inference es- timate the conditional probability p ( q | e ) of a query q given evidence e by randomly The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 11 Algorithm 1 Main lo op for sampling-based inference to calculate the conditional probabilit y p ( q | e ) for query q , evidence e and program L . 1: function Ev alua te ( L , q , e , D epth ) 2: L ∗ := PMagic ( L ) ∪ { c ( a ) | a ∈ e ∪ q } 3: n + := 0 n − := 0 4: while Not conv erged do 5: ( I , w ) := STPMagic ( L ∗ , L, e, D epth ) 6: if q ∈ I then n + := n + + w else n − := n − + w 7: return n + / ( n + + n − ) Algorithm 2 Sampling one interpretation; used in Algorithm 1. 1: function STPMagic ( L ∗ , L, e, D epth ) 2: T pf := ∅ , T dis := ∅ , w := 1, I old := ∅ , I new := ∅ 3: rep eat 4: I old := I new 5: for all ( h :- body ) ∈ L ∗ do 6: split b ody in B P F (prob. facts) and B L (the rest) 7: for all grounding substitution θ such that B L θ ⊆ I old do 8: s := true , w d := 1 9: while s ∧ B P F 6 = ∅ do 10: select and remo ve pf from B P F 11: ( b pf , w pf ) := ReadT able ( pf θ , I old , T pf , T dis , L, e, D epth ) 12: s := s ∧ b pf w d := w d · w pf 13: if s then 14: if hθ ∈ e − then return ( I new , 0)  c heck negative evidence 15: I new := I new ∪ { hθ } w := w · w d 16: un til I new = I old ∨ w = 0  Fixp oin t or imp ossible evidence 17: if e + ⊆ I new then return ( I new , w )  c heck p ositiv e evidence 18: else return ( I new , 0) generating a large num b er of samples or p ossible worlds (cf. Algorithm 1). The al- gorithm starts by preparing the program L for sampling by applying the PMagic transformation. In the following, we discuss our choice of subroutine STPMagic (cf. Algorithm 2) which realizes likelihoo d weigh ted sampling. It is used in Algo- rithm 1, line 5, to generate individual samples. It iterates the sto c hastic consequence op erator of Deﬁnition 4 un til either a ﬁxpoint is reac hed or the curren t sample is in- consisten t with the evidence. Finally , if the sample is inconsisten t with the evidence, it receiv es weigh t 0. Algorithm 3 details the pro cedure used in line 11 of Algorithm 2 to sample from a given distributional clause. The function ReadT able returns the truth v alue of the probabilistic fact, together with its weigh t. If the outcome is not yet tabled, it is computed. Note that false is returned when the outcome is not consistent with the evidence. In volv ed distributions, if not yet tabled, are sampled in line 5. In 12 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt Algorithm 3 Ev aluating a probabilistic fact pf ; used in Algorithm 2. Com- putePF ( pf , T dis ) computes the truth v alue and the probability of pf according to the information in T dis . 1: function ReadT able ( pf , I , T pf , T dis , L, e, D epth ) 2: if pf / ∈ T pf then 3: for all random v ariable h o ccurring in pf where h / ∈ T dis do 4: if h ∼ D / ∈ I then return ( f alse, 0) 5: if not Sample ( h, D , T dis , I , L, e, D epth ) then return ( f alse, 0) 6: ( b, w ) := ComputePF ( pf , T dis ) 7: if ( b ∧ ( pf ∈ e − )) ∨ ( ¬ b ∧ ( pf ∈ e + )) then 8: return ( f alse, 0)  inconsisten t with evidence 9: extend T pf with ( pf , b, w ) 10: return ( b, w ) as stored in T pf for pf 11: pro cedure Sample ( h , D , T dis , I , L, e, D epth ) 12: w h := 1, D 0 := D  Initial w eight, temp. distribution 13: if D 0 = [ p 1 : a 1 , . . . , p n : a n ] then  ﬁnite distribution 14: for p j : a j ∈ D 0 where dist eq ( h , a j ) ∈ e − do  remo ve neg. evidence 15: D 0 := Norm ( D 0 \ { p j : a j } ), w h := w h × (1 − p j ) 16: if ∃ v : dist eq ( ' ( h ) , v ) ∈ e + and p : v ∈ D 0 then 17: D 0 := [1 : v ], w h := w h × p 18: for p j : a j ∈ D 0 do  remo ve choices that make e + imp ossible 19: if ∃ b ∈ e + : not Ma ybeProof ( b, Depth, I ∪ { dist eq ( h , a j ) } , L ) or 20: ∃ b ∈ e − : not Ma ybeF ail ( b, D epth, I ∪ { dist eq ( h , a j ) } , L ) then 21: D 0 := Norm ( D 0 \ { p j : a j } ), w h := w h × (1 − p j ) 22: if D 0 = ∅ return false 23: Sample x according to D 0 , extend T dis with ( h, x ) and return true the inﬁnite case, Sample simply returns the sampled v alue. In the ﬁnite case, it is directed tow ards generating samples that are consistent with the evidence. Firstly , all p ossible choices that are inconsisten t with the negative evidence are remov ed. Secondly , when there is p ositiv e evidence for a particular v alue, only that v alue is left in the distribution. Thirdly , it is chec k ed whether each left v alue is consistent with all other evidence. This consistency chec k is p erformed by a simple depth-b ounded meta-in terpreter. F or p ositive evidence, it attempts a top-do wn pro of of the evidence atom in the original program using the function Ma ybeProof . Subgoals for whic h the depth-b ound is reached, as w ell as probabilistic facts that are not yet tabled are assumed to succeed. If this results in a pro of, the v alue is consistent, otherwise it is remov ed. Similarly for negative evidence: in Ma ybeF ail , subgoals for which the depth-b ound is reached, as w ell as probabilistic facts that are not yet tabled are assumed to fail. If this results in failure, the v alue is consisten t, otherwise it is remo ved. The Depth parameter allows one to trade the computational cost asso ciated with this consistency chec k for a reduced rejection rate. The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 13 Note that the mo diﬁed distribution is normalized and the weigh t is adjusted in eac h of these three cases. The weigh t adjustment takes into account that remov ed elemen ts cannot b e sampled and is necessary as it can dep end on the distributions sampled so far which elements are remov ed from the distribution sampled in Sam- ple (the clause b odies of the distribution clause are instantiating the distribution). 5 Exp erimen ts W e implemented our algorithm in Y AP Prolog and set up exp erimen ts to answer the questions Q1 Do es the lo ok ahead-based sampling improv e the p erformance? Q2 How do rejection sampling and likelihoo d w eighting compare? T o answ er the ﬁrst question, we used the distributional program in Figure 1, whic h mo dels an urn containing a random num ber of balls. The num b er of balls is uniformly distributed b etw een 1 and 10 and each ball is either red or green with equal probability . W e dra w 8 times a ball with replacemen t from the urn and observ e its color. W e also deﬁne the atom nogreen ( D ) to b e true if and only if we did not dra w any green ball in draw 1 to D . W e ev aluated the query P ( dist eq ( ' ( color ( ' ( drawnball ( 1 )))) , red ) | nogreen ( D )) for D = 1 , 2 , . . . , 8. Note that the evidence implies that the ﬁrst drawn ball is red, hence that the prob- abilit y of the query is 1; how ev er, the num b er of steps required to pro of that the evidence is inconsistent with drawing a green ﬁrst ball increases with D, so the larger is D, the larger Depth is required to reac h a 100% acceptance rate for the sample as illustrated in Figure 1. It is clear that by increasing the depth limit, eac h sample will tak e longer to be generated. Thus, the Depth parameter allows one to trade oﬀ conv ergence sp eed of the sampling and the time each sample needs to b e generated. Dep ending on the program, the query , and the evidence there is an optimal depth for the lo ok ahead. T o answer Q2 , we used the standard example for BLOG (Milch et al. 2005). An urn con tains an unknown num b er of balls where every ball can b e either green or blue with p = 0 . 5. When dra wing a ball from the urn, we observe its color but do not know which ball it is. When we observe the color of a particular ball, there is a 20% c hance to observe the wrong one, e.g. green instead of blue. W e ha ve some prior b elief ov er the n umber of balls in the urn. If 10 balls are drawn with replacement from the urn and we sa w 10 times the color green, what is the probabilit y that there are n balls in the urn? W e consider t wo diﬀerent prior distributions: in the ﬁrst case, the num b er of balls is uniformly distributed b et ween 1 and 8, in the second case, it is P oisson-distributed with mean λ = 6. One run of the exp eriment corresponds to sampling the num b er N of balls in the urn, the color for each of the N balls, and for each of the ten draws b oth the ball dra wn and whether or not the color is observed correctly in this draw. Once these v alues are ﬁxed, the sequence of colors observed is determined. This implies that for a ﬁxed num b er N of balls, there are 2 N · N 10 p ossible proofs. In case of the uniform distribution, exact PRISM inference can b e used to calculate the probability for 14 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt numballs ∼ uniform ([ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]) . ball ( M ) :- between ( 1 , numballs , M ) . color ( B ) ∼ uniform ([ red , green ]) :- ball ( B ) . draw ( N ) :- between ( 1 , 8 , N ) . nogreen ( 0 ) . 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 Sample Acceptance Rate Depth limit for MaybeProof D=1 D=2 D=3 D=4 D=5 D=6 D=7 D=8 nogreen ( D ) :- dist eq ( ' ( color ( ' ( drawnball ( D )))) , red ) , D2 is D − 1 , nogreen ( D2 ) . drawnball ( D ) ∼ uniform ( L ) :- draw ( D ) , findall ( B , ball ( B ) , L ) . Fig. 1. The program mo deling the urn (left); rate of accepted samples (right) for ev aluating the query P ( dist eq ( ' ( color ( ' ( drawnball ( 1 )))) , red ) | nogreen ( D )) for D = 1 , 2 , . . . , 10 and for D epth = 1 , 2 , . . . , 8 using Algorithm 1. The acceptance rate is calculated b y generating 200 samples using our algorithm and counting the n umber of sample with weigh t larger than 0. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 1 2 3 4 5 6 7 8 Probability Number of balls in urn Uniform Prior True Posterior Likelihood Weighting Rejection Sampling 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0 5 10 15 20 Probability Number of balls in urn Poisson Prior True Posterior Likelihood Weighting Rejection Sampling Fig. 2. Results of urn experiment with forw ard reasoning. 10 balls with replacemen t w ere drawn and each time green w as observed. Left: Uniform prior o ver # balls, righ t: Poisson prior ( λ = 6). eac h giv en num ber of balls, with a total runtime of 0 . 16 seconds for all eight cases. In the case of the P oisson distribution, this is only p ossible up to 13 balls, with more balls, PRISM runs out of memory . F or inference using sampling, w e generate 20,000 samples with the uniform prior, and 100,000 with P oisson prior. W e rep ort av erage results o ver ﬁve rep etitions. F or these priors, PRISM generates 8,015 and 7,507 samples p er second resp ectiv ely , ProbLog backw ard sampling 708 and 510, BLOG 3,008 and 2,900, and our new forw ard sampling (with rejection sampling) 760 and 731. The results using our algorithm for b oth rejection sampling and likelihoo d w eighting with D epth = 0 are shown in Figure 2. As the graphs show, the standard deviation for rejection sampling is m uch larger than for likelihoo d w eighting. The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 15 6 Conclusions and related w ork W e hav e contributed a no vel construct for probabilistic logic programming, the dis- tributional clauses, and deﬁned its se man tics. Distributional clauses allo w one to represen t contin uous v ariables and to reason ab out an unknown num ber of ob jects. In this regard this construct is similar in spirit to languages such as BLOG and Ch urch, but it is strongly embedded in a logic programming context. This embed- ding allo wed us to prop ose also a nov el inference metho d based on a combination of imp ortance sampling and forward reasoning. This contrasts with the ma jorit y of probabilistic logic programming languages which are based on backw ard reasoning (p ossibly enhanced with tabling (Sato and Kameya 2001; Man tadelis and Janssens 2010)). F urthermore, only few of these techniques employ sampling, but see (Kim- mig et al. 2011) for a Monte Carlo approach using bac kward reasoning. Another k ey diﬀerence with the existing probabilistic logic programming approaches is that the describ ed inference metho d can handle evidence. This is due to the magic set transformation that targets the generative pro cess tow ards the query and evidence and instan tiates only relev ant random v ariables. P-log (Baral et al. 2009) is a probabilistic language based on Answer Set Prolog (ASP). It uses a standard ASP solv er for inference and thus is based on forward reasoning, but without the use of sampling. Magic sets are also used in proba- bilistic Datalog (F uhr 2000), as w ell as in Dyna, a probabilistic logic programming language (Eisner et al. 2005) based on rewrite rules that uses forward reasoning. Ho wev er, neither of them uses sampling. F urthermore, Dyna and PRISM require that the exclusiv e-explanation assumption. This assumption states that no tw o dif- feren t pro ofs for the same goal can be true simultaneously , that is, they hav e to rely on at least one basic random v ariable with diﬀeren t outcome. Distributional clauses (and the ProbLog language) do not imp ose such a restriction. Other related work includes MCMC-based sampling algorithms such as the approach for SLP (An- gelop oulos and Cussens 2003). Churc h’s inference algorithm is based on MCMC to o, and also BLOG is able to employ MCMC. At least for BLOG it seems to b e required to deﬁne a domain-sp eciﬁc prop osal distribution for fast con v ergence. With regard to future work, it w ould b e interesting to consider evidence on contin uous distributions as it is curren tly restricted to ﬁnite distribution. Program analysis and transformation techniques to further optimize the program w.r.t. the evidence and query could be used to increase the sampling sp eed. Finally , the implementation could b e optimized b y memoizing some information from previous runs and then use it to more rapidly prune as w ell as sample. Ac knowledgemen ts Angelik a Kimmig and Bernd Gutmann are supp orted by the Research F oundation- Flanders (FWO-Vlaanderen). This work is supp orted by the GOA pro ject 2008/08 Probabilistic Logic Learning and by the Europ ean Communit y’s Seven th F rame- w ork Programme under grant agreement First-MM-248258. 16 B. Gutmann, I. Thon, A. Kimmig, M. Bruyno o ghe, L. De R ae dt References Angelopoulos, N. and Cussens, J. 2003. Prolog issues and exp erimental results of an MCMC algorithm. In Web Know le dge Management and De cision Supp ort , O. Barten- stein, U. Gesk e, M. Hannebauer, and O. Y oshie, Eds. Lecture Notes in Computer Sci- ence, vol. 2543. Springer, Berlin / Heidelb erg, 186–196. Bancilhon, F. , Maier, D. , Sagiv, Y. , and Jeffrey D. Ullman . 1986. Magic sets and other strange wa ys to implement logic programs (extended abstract). In Pr o c e e dings of the ﬁfth ACM SIGACT-SIGMOD symp osium on Principles of datab ase systems (PODS 1986) . ACM, Cambridge, Massach usetts, United States, 1–15. Baral, C. , Gelfond, M. , and Rushton, N. 2009. Probabilistic reasoning with answer sets. The ory and Pr actic e of L o gic Pr o gr amming 9, 1, 57–144. De Raedt, L. , Demoen, B. , Fierens, D. , Gutmann, B. , Janssens, G. , Kimmig, A. , Landwehr, N. , Mant adelis, T. , Meer t, W. , Rocha, R. , Santos Cost a, V. , Thon, I. , and Vennekens, J. 2008. T ow ards digesting the alphab et-soup of statistical rela- tional learning. In Pr o c e e dings of the 1st Workshop on Prob abilistic Pr o gr amming: Uni- versal L anguages, Systems and Applications , D. Ro y, J. Winn, D. McAllester, V. Mans- inghk a, and J. T enen baum, Eds. Whistler, Canada. De Raedt, L. , Frasconi, P. , Kersting, K. , and Muggleton, S. 2008. Pr ob abilistic Inductive L o gic Pr o gr amming - The ory and Applic ations . LNCS, v ol. 4911. Springer, Berlin / Heidelb erg. De Raedt, L. , Kimmig, A. , and Toivonen, H. 2007. ProbLog: A probabilistic Prolog and its application in link discov ery . In IJCAI . 2462–2467. Eisner, J. , Goldlust, E. , and Smith, N. 2005. Compiling Comp Ling: W eighted dy- namic programming and the Dyna language. In Pr o c e e dings of the Human L anguage T e chnolo gy Confer enc e and Confer enc e on Empiric al Metho ds in Natur al L anguage Pr o- c essing (HL T/EMNLP-05) . Fuhr, N. 2000. Probabilistic Datalog: Implemen ting logical information retriev al for adv anced applications. Journal of the Americ an So ciety for Information Scienc e (JA- SIS) 51, 2, 95–110. Getoor, L. and T askar, B. 2007. An Intr o duction to Statistic al R elational Le arning . MIT Press. Goga te, V. and Dechter, R. 2011. SampleSearch: Imp ortance sampling in presence of determinism. Artif. Intel l. 175 , 694–729. Goodman, N. , Mansinghka, V. K. , Ro y, D. M. , Bona witz, K. , and Tenenbaum, J. B. 2008. Ch urch: A language for generative mo dels. In UAI . 220–229. Gutmann, B. , Jaeger, M. , and De Raedt, L. 2010. Extending ProbLog with contin- uous distributions. In Pr o c e e dings of the 20th International Confer enc e on Inductive L o gic Pr o gr amming (ILP–10) , P . F rasconi and F. A. Lisi, Eds. Firenze, Italy . Kersting, K. and De Raedt, L. 2008. Basic principles of learning Bay esian logic programs. See De Raedt et al. (2008), 189–221. Kimmig, A. , Demoen, B. , De Raedt, L. , Santos Cost a, V. , and R ocha, R. 2011. On the implementation of the probabilistic logic programming language ProbLog. The ory and Practic e of L o gic Pr o gr amming (TPLP) 11 , 235–262. K oller, D. and Friedman, N. 2009. Pr ob abilistic Gr aphic al Mo dels: Principles and T e chniques . MIT Press. Mant adelis, T. and Janssens, G. 2010. Dedicated tabling for a probabilistic setting. In T e chnic al Communic ations of the 26th International Confer enc e on L o gic Pr o gr amming (ICLP-10) , M. V. Hermenegildo and T. Schaub, Eds. LIPIcs, vol. 7. Schloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 124–133. The Magic of L o gic al Infer enc e in Pr ob abilistic Pr o gr amming 17 Milch, B. , Mar thi, B. , Russell, S. , Sont a g, D. , Ong, D. , and Kolobo v, A. 2005. BLOG: Probabilistic mo dels with unknown ob jects. In IJCAI . 1352–1359. Milch, B. , Mar thi, B. , Sont ag, D. , Russell, S. , Ong, D. L. , and Kolobov, A. 2005. Approximate inference for inﬁnite con tingent Bay esian net works. In Pr o c ee d- ings of the T enth International Workshop on Artiﬁcial Intel ligence and Statistics, Jan 6-8, 2005, Savannah Hotel, Barb ados , R. G. Cow ell and Z. Ghahramani, Eds. So- ciet y for Artiﬁcial Intelligence and Statistics, 238–245. (Av ailable electronically at h ttp://www.gatsby .ucl.ac.uk/aistats/). Nilsson, U. and Ma liszy ´ nski, J. 1996. L o gic, Pr o gr amming And Pr olo g , 2nd ed. Wiley & Sons. Pfeffer, A. 2001. IBAL: A probabilistic rational programming language. In IJCAI . 733–740. Poole, D. 2008. The indep enden t choice logic and b ey ond. In Pr ob abilistic Inductive L o gic Pr o gr amming - The ory and Applications , L. De Raedt, P . F rasconi, K. Kersting, and S. Muggleton, Eds. LNCS, vol. 4911. Springer, Berlin/Heidelb erg, 222–243. Richardson, M. and Domingos, P. 2006. Marko v logic net works. Machine L e arn- ing 62, 1-2, 107–136. Santos Cost a, V. , P age, D. , and Cussens, J. 2008. CLP( BN ): Constraint logic pro- gramming for probabilistic knowledge. See De Raedt et al. (2008), 156–188. Sa to, T. 1995. A statistical learning metho d for logic programs with distribution se- man tics. In Pr o c e e dings of the Twelth International Conferenc e on L o gic Pr o gr amming (ICLP 1995) . MIT Press, 715–729. Sa to, T. and Kamey a, Y. 2001. P arameter learning of logic programs for sym b olic- statistical mo deling. Journal of Artiﬁcial Intel ligenc e R esese ar ch (JAIR) 15 , 391–454. Vennekens, J. , Denecker, M. , and Bruynooghe, M. 2009. CP-logic: A language of causal probabilistic even ts and its relation to logic programming. The ory and Pr actic e of Lo gic Pr o gr amming 9, 3, 245–308. W asserman, L. 2003. Al l of Statistics: A Concise Course in Statistic al Infer enc e (Springer T exts in Statistics) . Springer.

The Magic of Logical Inference in Probabilistic Programming

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment