Update of prior probabilities by minimal divergence

Update of prior pro babilities by minimal di v er gence Jan Naudts Uni versiteit Antwerpen Antwerpen, Belgium No vember 16, 2021 Abstract The presen t paper in vestig ates the update of an empirical probabili ty dis- trib ution with the results of a new set of observ ation s. The optimal update is obtain ed by minimizing eithe r the Helling er distanc e or the quad ratic Breg- man div ergen ce. The results obtaine d by the two methods diff er . Upd ates with information about condit ional probabi lities are considered as well. 1 Introduction In the p resent work prio r probabil i ties are assumed t o be k n own. The approach is t h en to look for updated probabiliti es that reproduce the observed probabiliti es while taking the prior into account. No further m odel ass umptions are imp osed. Hence, the statistical m o d el under consideration cons i sts of all probability di s tributions that are consistent with the newly o b tained empirical data. Internal consistency of the empi rical data ensures that the model is not empty . The update is the model poin t that minimizes the chosen diver gence function from the prior to the manifold of the model. In the context of Maximum Likelihood Estimation (MLE) the model is usuall y well-known, the dimens ion of the model can be kept low and properties of the model can be used t o ease th e calculations. It can then happen that t he model i s misspeciﬁed [1] and that the update is only a good approximati o n of the emp i rical data. Here the model is dictated by the newly acquired em pirical data and the update is forced to reproduce the measured d at a. 1 In Bayesian s tatistics the update q ( B ) of the probability p ( B ) of an e vent B equals q ( B ) = p emp ( A ) p ( B | A ) + p emp ( A c ) p ( B | A c ) . The quantities p emp ( A ) and p emp ( A c ) are the em pirical p ro b abi lities ob t ained after repeated measurement of e vent A and its complement A c . Th i s result is obtained also when mini mizing t he Hell i nger distance b etween the prior and t he model manifold. A proof of the latter fol l ows later on in Section 4. One incentive for starting the present work is a paper of Banerjee, Guo, and W ang [2, 3]. They consider the problem of predicting a random variable Z 1 giv en observations of a random variable Z 2 . It is well-known that the conditional ex- pectation, as deﬁned by Kolmogorov , i s the o ptimal predictor . The y show th at this statement remains true when the metric dis tance is replaced by a Bre gman div er gence. Their result is limi ted to the u pdated expectation of rando m variables Z 1 that are functi o ns of the measured random var iable Z 2 . It is shown in Theorem 4.2 below th at a proof i n a mo re general context y i elds a de viating result. The present work i s inspired by t h e current practices in Informati on Geometry [4, 5, 6] wh ere minimi zation of diver gences is an important tool. I n Statistical Physics a diver gence is called a relati ve entropy . Its importance was noted rather late in the twentieth century , after the work o f Jaynes on the maximal entropy principle [7]. Estimation i n the presence of hidd en variables by m i nimizing a div er gence function is brieﬂy discussed in Chapter 8 of [5]. The n ext Section ﬁxes not ations. Section 3 coll ects s ome results about the squared Helli n ger distance and the q u adratic Bregman diver gence. Section 4 dis- cusses the optimal choi ce and contain s th e Theorems 4.1 and 4 . 2 . The p roof of the th eorem s can be adapted to cover the s ituation t hat a subsequent measurement also yields inform ation on conditional probabil ities. This is done in Section 4.3. A ﬁnal section summarizes the results of th e paper . 2 Empirica l data Consider a probability space X , µ . A measurable subset A o f X is called an e vent . Its probabilit y is denoted p ( A ) and is giv en by p ( A ) = Z X I A ( x ) d µ ( x ) , where I A ( x ) equals 1 when x ∈ A and 0 otherwise. The condit ional expectation of a random variable f giv en an event A wi th non-vanishing probability p ( A ) is giv en by E µ f | A = 1 p ( A ) E µ f I A . 2 The probability space X , µ reﬂects the prior kn owledge o f the system at hand. When new data become a v ailable an update procedure is used to select the poste- rior probabilit y space. Th e latter is denoted X , ν in what follows. The correspond - ing probabili ty of an event A is denoted q ( A ) . The outcome of repeated e xperiments is the empirical probabi l ity distribution of the e vents, denoted p emp ( A ) . The question at h and is then to est ablish a criterion for ﬁnding the update ν of the probabili ty distribution µ that is as close as possible to µ while reproducing the empirical results. The e vent A deﬁnes a partiti on A , A c of the probability space X , µ . As before A c denotes the complement of A in X . In what fo l lows a slight ly m ore general situation is considered in wh ich the ev ent A is replaced by a parti t ion ( O i ) n i = 1 of the measure space X , µ into subsets with non-v anishing prob abi lity . The n otations p i and µ i are used, with p i = p ( O i ) and d µ i ( x ) = 1 p i I O i ( x ) d µ ( x ) . (1) Introduce th e random var iable g deﬁned by g ( x ) = i when x ∈ O i . Repeated measurement of the random variable g yi elds the empirical probabilities p emp i = Emp Prob { x : g ( x ) = i } . They m ay deviate from the prior probabilities p i . In some cases one also measures the condition al probabiliti es p emp ( B | O i ) = Emp Prob of B given that g ( x ) = i of some other ev ent B . 3 A geometric appr oach In thi s section two diver gences are revie wed, the s q uared Hellin g er dist ance and the quadratic Bregman di ver gence. 3.1 Squar ed Hellin ger distance For simplicit y the present section i s restricted t o the case t h at the sample space X is the real line. Giv en t wo probabi lity measures µ and σ , both abso l utely continuous w .r .t. the Lebesgue m easure, t he squared Hellinger dist ance is the diver g ence D 2 ( σ || µ ) deﬁned by D 2 ( σ || µ ) = 1 2 Z R r d σ d x − r d µ d x ! 2 d x . 3 It satisﬁes D 2 ( σ || µ ) = 1 − Z R r d σ d x d µ d x d x . Let ( O i ) i be a partit ion o f X , µ and let g ( x ) = i when x belo n gs t o O i , as before. Let p i and µ i be deﬁned by (1). Consider the fol lowing functions of i , with i i n { 1 , · · · , n } , τ ( 1 ) ( i ) = µ , independent of i , τ ( 2 ) ( i ) = µ i , τ ( 3 ) ( i ) = σ i , where each of the σ i is a probability distribution with support in O i . The empirical expectation of a function f ( i ) is gi ven by E emp f = ∑ i p emp i f ( i ) . Pr oposition 3.1 If p emp i > 0 for all i and ∑ i p emp i = 1 then one has E emp D 2 ( τ ( 1 ) || τ ( 3 ) ) ≥ E emp D 2 ( τ ( 1 ) || τ ( 2 ) ) with equality if and only if σ i = µ i for all i. First prove the following two lemmas. Lemma 3.2 Assume that the pr o bability measur e ν i is absolu tely continuous w .r .t. the measur e µ i , with Radon-Nikodym derivative g iven by d ν i ( x ) = f i ( x ) d µ i . Then one has D 2 ( µ || σ i ) − D 2 ( µ || ν i ) = √ p i [ D 2 ( µ i || σ i ) − D 2 ( µ i || ν i )] and D 2 ( µ i || ν i ) = 1 − Z O i p f i ( x ) d µ i ( x ) . Pr oof One calculates D 2 ( µ || σ i ) − D 2 ( µ || ν i ) = Z R r d µ d x " r d ν i d x − r d σ i d x # d x = √ p i Z O i r d µ i d x " r d ν i d x − r d σ i d x # d x 4 = √ p i " Z O i p f i ( x ) d µ i ( x ) − Z O i  d µ i d x d σ i d x  1 / 2 d x # = √ p i  Z O i p f i ( x ) d µ i ( x ) − 1 + D 2 ( µ i || σ i )  . Now take σ i = ν i to obtain the desired results.  Lemma 3.3 (Pythagor ean r elatio n ) F or any i is D 2 ( µ || σ i ) = D 2 ( µ || µ i ) + √ p i D 2 ( µ i || σ i ) . Pr oof The proof follows by taking ν i = µ i in the previous lem m a.  Pr oof of Proposition 3.1. From the pre vious lemma it follo ws th at D 2 ( τ ( 1 ) || τ ( 3 ) ) ≥ D 2 ( τ ( 1 ) || τ ( 2 ) ) . Note that σ i = µ i implies that τ ( 3 ) = τ ( 2 ) and hence D 2 ( τ ( 1 ) || τ ( 3 ) ) = D 2 ( τ ( 1 ) || τ ( 2 ) ) . Con versely , if E emp D 2 ( τ ( 1 ) || τ ( 3 ) ) = E emp D 2 ( τ ( 1 ) || τ ( 2 ) ) then it foll ows from the pre vious p rop osition that E emp D 2 ( τ ( 2 ) || τ ( 3 ) ) = 0. If in addition p emp i > 0 for all i th en it follo ws that for all i 0 = D 2 ( τ ( 2 ) ( i ) || τ ( 3 ) ( i )) . Because the squared Hellinger dis t ance is a di ver gence this implies that τ ( 2 ) ( i ) = τ ( 3 ) ( i ) , which is equiv alent with µ i = σ i .  3.2 Bregman di vergence In the present section the squared Hellinger distance, whi ch is an f-div er gence, is replaced by a div er gence of the Bregman type. In addition let X be a ﬁnite set. Then there exists for each of th e elements O i of the partition of X a counting measure ρ i such that ρ i ( x ) = 1 | O i | if x ∈ O i , 5 = 0 otherwise . (2) Fix a stri ctl y con vex function φ : R 7→ R . The Bregman diver gence of the probability measures σ and µ is deﬁned by D φ ( σ || µ ) = ∑ x  φ ( σ ( x )) − φ ( µ ( x )) − ( σ ( x ) − µ ( x )) φ ′ ( µ ( x ))  In the case that φ ( x ) = x 2 / 2, which is used below , it becomes D φ ( σ || µ ) = 1 2 ∑ x [ σ ( x ) − µ ( x )] 2 . (3) For conv enience, th is case is referred to as the quadratic Br e gma n diver gence . The following result, obtained with t he quadratic Bregman di ver gence, is more elegant th an the result of Lemma 3.3. Pr oposition 3.4 Consider the quadratic B re g man diver gence D φ as given by (3). Let ν i = p i µ i + ( 1 − p i ) ρ i . Let σ i be any probability measur e with support in O i . Then the following Pythagor ean r elation holds. D φ ( µ || σ i ) = D φ ( µ || ν i ) + D φ ( ν i || σ i ) . Pr oof One calculates D φ ( µ || σ i ) − D φ ( µ || ν i ) = D φ ( ν i || σ i ) + ∑ x [ µ ( x ) − ν i ( x )]  φ ′ ( ν i ( x )) − φ ′ ( σ i ( x ))  = D φ ( ν i || σ i ) + ∑ x ∈ O i [ p i µ i ( x ) − ν i ( x )]  φ ′ ( ν i ( x )) − φ ′ ( σ i ( x ))  = D φ ( ν i || σ i ) − ( 1 − p i ) 1 | O i | ∑ x ∈ O i  φ ′ ( ν i ( x )) − φ ′ ( σ i ( x ))  . Use no w that φ ′ ( u ) = u and the normalization of the probability measures ν i and σ i to ﬁnd the desired result.  4 The optimal choice 4.1 Updated prob abilities The foll owing result proves t hat the stand ard K olmogorovian deﬁnition of the conditional probability minimizes the Hellin ger d i stance between t he prior prob- ability measure µ and the updated probability measure ν . The optimal choice of 6 the updated probabilit y measure ν is given by corresponding probabil ities q ( B ) . They satisfy q ( B ) = n ∑ i = 1 p emp i p ( B | O i ) for any ev ent B . Theor em 4.1 Let be given a pa rtition ( O i ) n i = 1 of the pr obabi l ity space X , µ with X = R . Let µ i be given by (1). Let p i = p ( O i ) > 0 denote the probability o f the event O i and l et be given str i ctly positi ve empi rical pr obabiliti es p emp i , i = 1 , · · · , n . The squar ed Hellinger distan ce D 2 ( σ || µ ) as a functi on of σ is mi nimal if and only if σ i = µ i for all i. Her e, σ is any pr oba b ility measure on X sa tisfying σ = n ∑ i = 1 p emp i σ i , and each of the σ i is a pr obabi lity measur e wit h support i n O i and abso l utely continuous w .r .t. µ i . Note that the probability measure ν giv en by ν ( x ) = n ∑ i = 1 p emp i µ i ( x ) uses the K olmogorovian conditional probability as the predictor because the prob- abilities determined b y the µ i are obtained from the prior probabil i ty di stribution µ by p i ( x ) = p ( x | O i ) . By t h e above theorem this predictor is the optim al one w .r .t . the squared Hellinger distance. Pr oof W ith the not at i ons o f the pre vious section is D 2 ( σ || µ ) = E emp D 2 ( τ ( 1 ) || τ ( 3 ) ) . Proposition 3.1 shows t h at it is minimal if and only if σ i = µ i for all i .  Next, consider th e use of the quadratic Bregman d iv er gence in the context o f a ﬁnite probability space. Theor em 4.2 Let be given a partiti on ( O i ) n i = 1 of t he ﬁnit e pr obability space X , µ . Let ρ i be the counting measur e on O i deﬁned by (2). Let µ i be given by (1). Let p i = p ( O i ) > 0 denote the pr obability of the event O i and let be given strict l y positive empirical pr o babilities p emp i , i = 1 , · · · , n summin g up to 1. Ass ume that p emp i ≥ p i [ 1 − | O i | µ i ( x )] for all x ∈ O i and for i = 1 , · · · , n . (4) Then the following hold . 7 1) A pr obabili ty distribution ν is deﬁned by ν = ∑ i p emp i ν i with ν i =  1 − p i p emp i  ρ i + p i p emp i µ i . 2) Let σ be any pr obabi l ity measur e on X satisfying σ = ∑ n i = 1 p emp i σ i , wher e eac h of t he σ i is a pr ob a bility distri bution with sup p ort in O i . Then th e quadratic Br e g m a n diver gence satisﬁes the Pythagor ean r elation D φ ( σ || µ ) = D φ ( ν || µ ) + n ∑ i = 1 ( p emp i ) 2 D φ ( σ i || ν i ) . (5) 3) The qu a dratic Br e gman diver gence D φ ( σ || µ ) is min i mal i f and only if σ = ν . Pr oof 1) The assumptio n (4) guarantees t hat the ν i ( x ) are probabil ities. 2) One calculates D φ ( σ || µ ) − D φ ( ν || µ ) = 1 2 ∑ x [ σ ( x ) − ν ( x )] [ σ ( x ) + ν ( x ) − 2 µ ( x )] = n ∑ i = 1 p emp i 1 2 ∑ x ∈ O i [ σ i ( x ) − ν i ( x )] ×  p emp i σ i ( x ) + p emp i ν i ( x ) − 2 p i µ i ( x )  = n ∑ i = 1 ( p emp i ) 2 1 2 ∑ x ∈ O i [ σ i ( x ) − ν i ( x )] 2 + n ∑ i = 1 p emp i ∑ x ∈ O i [ σ i ( x ) − ν i ( x )] ( p emp i − p i ) ρ i ( x ) = n ∑ i = 1 ( p emp i ) 2 D φ ( σ i || ν i ) . In the abov e calculation the third lin e i s obtained by elim inating p i µ i using the deﬁnition of ν i . This gives p emp i σ i ( x ) + p emp i ν i ( x ) − 2 p i µ i ( x ) = p emp i σ i ( x ) + p emp i ν i ( x ) − 2 p emp i  ν i ( x ) −  1 − p i p emp i  ρ i ( x )  = p emp i [ σ i ( x ) − ν i ( x )] + 2 ( p emp i − p i ) ρ i ( x ) . 8 The term n ∑ i = 1 p emp i ∑ x ∈ O i [ σ i ( x ) − ν i ( x )] ( p emp i − p i ) ρ i ( x ) vanishes because ρ i ( x ) is constant on th e set O i and the probability measures ν i and σ i hav e support in O i . 3) From 2) it follows that D φ ( σ || µ ) ≥ D φ ( ν || µ ) , with equality when σ = ν . Con versely , when D φ ( σ || µ ) = D φ ( ν || µ ) then (5) implies that n ∑ i = 1 ( p emp i ) 2 D φ ( σ i || ν i ) = 0 . The empirical probabilities are strictly positive by ass u mption. Hence, it follows that D φ ( µ || σ i ) = D φ ( µ || ν i ) for all i and hence, t h at σ i = ν i for all i . The latter implies σ = ν .  The optimal update ν can be written as ν = ∑ i  ( p emp i − p i ) ρ i + p i µ i  = µ + ∑ i ( p emp i − p i ) ρ i . This result is in general quite diffe rent from the update propos ed by Theorem 4.1, which is ν = ∑ i p emp i µ i . The upd at es proposed by the two theorems coincide only in the special cases that eit her p emp i = p i for all i or that µ i = ρ i for all i . In the latter case the prior distribution µ = ∑ i p i ρ i is replaced by the update ν = ∑ i p emp i ρ i . The entropy of the up date when e vent O i is o bserved, according to Theorem 4.1, equals S ( ν i ) = S ( µ i ) . Accordin g to Theorem 4.2 it equals S ( ν i ) = S  [ 1 − p i p emp i ] ρ i + p i p emp i µ i  . If p i ≤ p emp i then it follows that S ( ν i ) ≥ [ 1 − p i p emp i ] S ( ρ i ) + p i p emp i S ( µ i ) ≥ S ( µ i ) . 9 The former inequality follows because the ent ropy is a concav e function. The latter follo ws because entropy is maximal for the uniform distribution ρ i . On the other hand, if p i > p emp i then one has S ( µ i ) = S  [ 1 − p emp i p i ] ρ i + p emp i p i ν i  ≥ [ 1 − p emp i p i ] S ( ρ i ) + p emp i p i S ( ν i ) ≥ S ( ν i ) . In the latter case the decrease of the entropy is st ronger than in t h e case of the update based on the squared Hellinger distance. In conclus ion, t he update rely- ing o n the quadratic Bre gm an diver gence looses detail s of the prior distribution by makin g a con ve x combination with a uniform distribution weighed with the probabilities of the observation. It does this mo re so for the e vents with observed probability larger than predict ed, this i s when p emp i > p i . Note that Theorem 4 . 2 cannot always be appl ied because it contains restric- tions on the empirical probabilities. In particular , if the prior probability µ ( x ) of some point x in X vanishes then the cond i tion (4) requires that t he empirical prob- ability p emp i of the partition O i to whi ch the point x belongs i s larger than or equal to the prior probabili t y p i . 4.2 Update of conditional prob abilities The two previous theorems assu m e that no empirical informati on i s a v ailable about condit ional probabi lities. If such inform ation is present t h en an opti mal choice should m ake use of it. In one case the solution of t he problem is straight- forward. If the probabi lities p emp i are a v ailable together with all conditional proba- bilities p emp ( B | O i ) and there exists an update ν which reproduces these resul ts t hen it i s unique. T wo cases remain: 1) The information about the condi tional prob- abilities is incomplete; 2 ) the inform ati on i s internally inconsistent – no upd at e exists which reproduces the data. Let us tackle the problem by considering the case that the single information that is av ailable besides th e probabili ties p emp i is the vec tor of conditi o nal prob a- bilities p emp ( B | O i ) of a ﬁxed e vent B given the out come of t he measurement of the random variable g as introduced in Section 2. The following result is independent of the choice of diver gence function. Pr oposition 4.3 F ix a n event B in X . Ass ume that th e conditi onal pr obabiliti es p ( B | O i ) , i = 1 , · · · , n, ar e stri ctly p o sitive and stri ctly l ess than 1. Assume in addition that p emp i p emp ( B | O i ) ≤ 1 for all i. Then t her e ex ists an update ν with 10 corr espond i ng pr obabilities q ( · ) such that q ( O i ) = p emp i and q ( B | O i ) = p emp ( B | O i ) , i = 1 , · · · , n. Pr oof An obvious choice is to take ν of the form ν = ∑ i p emp i ν i with ν i of the form d ν i ( x ) = [ a i I B ∩ O i ( x ) + b i I B c ∩ O i ( x )] d µ ( x ) , with a i ≥ 0 and b i ≥ 0. Normalization of th e ν i giv es the condi tions 1 = a i p ( B ∩ O i ) + b i p ( B c ∩ O i ) . (6) Reproduction of the conditio n al probabil ities gi ves the conditi ons p emp ( B | O i ) = q ( B ∩ O i ) q ( O i ) = a i p ( B ∩ O i ) p emp i . The latter gives a i = p emp i p i p emp ( B | O i ) p ( B | O i ) . The normalization condit i on (6) b ecom es 1 = p emp i p emp ( B | O i ) + b i p ( B c ∩ O i ) . It has a posit ive solution for b i because p emp i p emp ( B | O i ) ≤ 1 and p ( B c ∩ O i ) > 0.  4.3 The Hellinger case The optimal updates can be deriv ed easily from Theorem 4.1. Double the parti t ion by introducti on of t he following sets O + i = B ∩ O i and O − i = B c ∩ O i . They hav e prior probabilities p ± i = p ( O ± i ) . Correspondi ng p ri o r measures µ ± i are deﬁned by d µ ± i ( x ) = 1 p ± i I O ± i ( x ) d µ ( x ) 11 The empi rical probabi l ity of the set O + i is t aken equal to p emp i p emp ( B | O i ) , t h at of O − i equals p emp i [ 1 − p emp ( B | O i )] . The optimal update ν follows from Theorem 4.1 and is given by d ν ( x ) = ∑ i p emp i p emp ( B | O i ) d µ + i ( x ) + ∑ i p emp i [ 1 − p emp ( B | O i )] d µ − i ( x ) . (7) By constructio n is q ( O + i ) = p emp i p emp ( B | O i ) and q ( O − i ) = p emp i [ 1 − p emp ( B | O i )] . One now veriﬁes that q ( O i ) = p emp i and q ( B | O i ) = p emp ( B | O i ) , which is th e intended result. 4.4 The Bregman case Next cons i der the optim ization with the q uadratic Bre gman div ergence. Probabil- ity distributions ρ ± i are deﬁned by ρ ± i ( x ) = 1 | O ± i | I O ± i ( x ) . Introduce the notatio ns r + i = p + i p emp i p emp ( B | O i ) , r − i = p − i p emp i [ 1 − p emp ( B | O i )] , ν ± i ( x ) = ( 1 − r ± i ) ρ ± i + r ± i µ ± i ( x ) . Then the con dition for Theorem 4 . 2 t o hold is t hat ν ± i ( x ) ≥ 0 for all x , i . The optimal probabilit y di s tribution ν is given by ν ( x ) = ∑ i p emp i p emp ( B | O i ) ν + i ( x ) + ∑ i p emp i [ 1 − p emp ( B | O i )] ν − i ( x ) = ∑ i  p emp i p emp ( B | O i ) − p + i  ρ + i + ∑ i p + i µ + i + ∑ i  p emp i [ 1 − p emp ( B | O i )] − p − i  ρ − i + ∑ i p − i µ − i = ∑ i p emp i p emp ( B | O i )  ρ + i − ρ − i  − ∑ i p + i ρ + i + ∑ i [ p emp i − p − i ] ρ − i + µ . 12 5 Summary It is well known that the use of u nmodiﬁed prior con d itional prob abilities i s th e optimal way for updat i ng a probability distri bution after new data come a v ailable. The update procedure minimizes the Hel l inger distance between pri or and pos- terior probabilit y distributions. For the s ake of comp l eteness a p roof is given in Theorem 4.1. In the context o f the present research t he work o f Banerjee, Guo and W ang [2] was consi dered as well. They prove that m inimizatio n of the Hellin ger dist ance can be replaced by min i mization of a Bregman diver gence, wi thout modifyin g the outcome. Howe ver , their proof is restricted to random v ariables that are functions of the updated random variable. It is shown in Theorem 4.2 that a mo re g eneral context yields results quite distinct from those obtained by the usual procedure. References [1] H. Whi te, Maximum Likelihood Estimation of Misspeciﬁed Models, Econo- metrica 50 , 1–25 (1982). [2] A. Banerjee, X. Guo, H. W ang, On the Op timality of Condi tional Expectation as a Br e gman Pr edictor , IEEE T rans. Inf. Th. 51 , 2664–2669 (2005). [3] B.A. Frigyik, S. Sriv astav a, M.R. Gupta, Functional B re gman Diver gences and Bayesian Est imation of Distributions, IEEE T rans. Inf. Th. 54 , 5 130– 5139 (2008). [4] S. Amari, H. Nagaoka, Methods of Informa tion Geometry (Oxford University Press, 2000 ) (Originally publis h ed in Japanese by Iwanami Shoten, T okyo, Japan, 1993) [5] S. Amari, Information Geometry and its Applicat i ons (Spring er , 2016) [6] N. A y , J. Jost, H. V ˆ an L ˆ e, L. Schwachh ¨ ofer , Information Geometry (Springer , 2017). [7] E. Jaynes, Information theory and statisti cal mechanics, Phys. Re v . 106 , 620– 630 (1957). 13

Update of prior probabilities by minimal divergence

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment