Information field theory

Inf ormation ﬁeld theory T orsten Enßlin Max Planck Institute for Astr ophysics Karl-Schwarzsc hildstr . 1, 85741 Garc hing bei München, Germany http://www.mpa- garching.mpg.de/ift Abstract. Non-linear image reconstruction and signal analysis deal with complex in verse prob- lems. T o tackle such problems in a systematic way , I present information ﬁeld theory (IFT) as a means of Bayesian, data based inference on spatially distrib uted signal ﬁelds. IFT is a statistical ﬁeld theory , which permits the construction of optimal signal recov ery algorithms ev en for non- linear and non-Gaussian signal inference problems. IFT algorithms exploit spatial correlations of the signal ﬁelds and beneﬁt from techniques developed to in vestigate quantum and statistical ﬁeld theories, such as Feynman diagrams, re-normalisation calculations, and thermodynamic potentials. The theory can be used in many areas, and applications in cosmology and numerics are presented. Keyw ords: INFORMA TION THEOR Y , FIELD THEOR Y , IMA GE RECONSTRUCTION P A CS: 87.19.lo, 11.10.-z , 42.30.Wb INFORMA TION FIELD THEOR Y Field infer ence A physical ﬁeld is a function ov er some continuous space. The air temperature ov er Europe, the magnetic ﬁeld within the Milky W ay , or the dark matter density in the Uni verse are all ﬁelds we might want to know as accurately as possible. Fortunately , we have measurement de vices deliv ering us data on these ﬁelds. But the data is always ﬁnite in size, whereas any ﬁeld has an inﬁnite number of degrees of freedom, the ﬁeld v alues at all locations of the continuous space the ﬁeld is li ving in. Since it is impossible to determine an iniﬁnte number of unknowns from a ﬁnite number of constraints, an exact ﬁeld reconstruction from the data alone is impossible. Additional information 1 is needed. Additional information might be av ailable in form of physical laws, statistical sym- metries, or smoothness properties known to be obeyed by the ﬁeld. A unique ﬁeld re- construction might still be impossible, b ut the conﬁguration space of possible ﬁeld real- izatoins might be suf ﬁcently constrained to single out a good guess for the ﬁeld. The combination of data and additional information is preferentially done in an information theoretically correct way by using probabilistic logic. Information ﬁeld theory (IFT) is therefore information theory applied to ﬁelds, Bayesian reasoning with 1 Information is understood here in its original and colloquial meaning to give form to the mind , or “Information is whatev er forces a change of rational beliefs” [1]. Mathematically , information theory is just probability theory . In some contexts, but not here, negati ve entropy is called information as well, although it is rather a measure of the amount of information than information itself. an inﬁnite number of unkowns [2, 3]. For a physicists, it is just a statistical ﬁeld theory , as we will see, and can borrow many concepts and techniques dev eloped for such. Mathematically , it deals with stochastic functions and processes and beneﬁts from the theory of Gauss-, Marko v-, Lévy-, and other random processes. The main difference of IFT to the usual Bayesian inference is that the continuity of the ph ysical space plays a special role. The fact that many physical ﬁelds do not exhibit abitrary roughness due to their causal origins implies that ﬁeld v alues at nearby locations are similar , and typically more so the closer the locations are. The consequent exploitation of any knowledge on the ﬁeld correlation structure permits us to ov ercome the ill-posedness of the ﬁeld reconstruction problem. Path integrals Probabilistic reasoning requires that probability density functions (PDFs) can prop- erly be deﬁned ov er the space of all possibilities [4]. The conﬁguration space of a ﬁeld is of inﬁnite dimensionality , since every location in space carries a ﬁeld degree of free- dom. A little bit of thought is therefore needed on ho w to deal with PDFs ov er functional spaces before we can use probabilistic logic for ﬁeld inference. Let s = ( s x ) x be our unknown signal ﬁeld li ving on some physical space Ω = { x } x , e.g. s might be a real- or complex-v alued function s : Ω → R or C . The conﬁguration space of s could be constructed if the set of physical locations in space would be ﬁnite, say of size N with Ω = { x 1 , . . . , x N } . Then the ﬁeld values at these locations would form a ﬁnite-dimensional vector s = ( s x 1 , . . . , s x N ) ≡ ( s i ) N i = 1 and the conﬁguration space would be just the space of such vectors. W e could then deﬁne any PDF on this vector space, like a signal prior P ( s ) . This would also permit us to calculate conﬁguration space inte grals, like the signal prior e xpectation value of an y function f ( s ) of the discretized signal h f ( s ) i ( s ) ≡ ˆ D s f ( s ) P ( s ) ≡ N ∏ i = 1 ˆ d s i ! f ( s ) P ( s ) . (1) No w , we just have to require that the continuous limit of this discretization is possible yielding a path integral. This requires on the one hand that our space discretization gets ﬁner ev erywhere with N → ∞ and on the other hand that all the in volv ed quantities ( s , f ( s ) , P ( s ) ) behave well under this limit. The latter just implies that any reasonable expectation value h f ( s ) i ( s ) should not depend on the the discretization resolution if the resolution is chosen sufﬁciently high. Thus, the deﬁnitions of the quantities s , f ( s ) , and P ( s ) cannot depend on an y grid speciﬁc properties and must be possible in the contiuum limit. W e turn the last requirement into a design property: An inf ormation ﬁeld theory is deﬁned over continuous space. Space discretization can be done in a second step, if needed in order to do inference on a computer 2 . Ho wev er, the theory shall not contain any discretization speciﬁc element. This distinguishes IFT from man y other proposed methods for ﬁeld inference, Bayesian or not, since these often have deﬁnitions tightly linked to speciﬁc space discretizations, e.g. by using concepts like pixel statistics and nearest pixel ﬁeld differences. The infer - ence results of such methods might depend on the chosen space discretization and might not be resolution independent. For IFT , we require that giv en a sufﬁciently high spatial resolution, the solution shall not change signiﬁcantly with further resolution increase or with a rotation of the computational grid. Dealing with an inﬁnite number of degrees of freedom, we should not be surpriesed about mathematical objects in IFT that are inﬁnite (e.g. conﬁguration space volumes, entropies) or zero (e.g. properly normalized ﬁeld PDFs) in the continuous limit. As long as the quantity we are interested in is well deﬁned in the continuous limit (i.e. posterior mean ﬁeld), we should not worry too much, since div ergences of auxilliary quantities are well known in ﬁeld theory and usually harmless. Frequently , only the well beha ved dif ferences or ratios of such unbound objects are of actual interest (relativ e entropies, energy dif ferences). It is most instructi ve to see ho w IFT works in a concrete example. W e therefore turn no w to the simplest possible case. Fr ee theory Information Hamiltonian Suppose we are interested in a zero mean random ﬁeld s , our signal, over continuous u -dimensional Euclidean space Ω = R u . The a priori ﬁeld knowledge might be that the ﬁeld is follo wing homogeneous and isotropic Gaussian statistics, P ( s ) = G ( s , S ) = 1 p | 2 π S | exp  − 1 2 s † S − 1 s  , (2) with the ﬁeld cov ariance matrix S = h s s † i ( s ) being kno wn if the ﬁeld power spectrum is kno wn from some physical considerations. E.g., the ﬁeld might be the cosmic density ﬁeld for which, gi ven a cosmological model, the power spectrum can be calculated theoretically . The ﬁeld s is here regarded as a vector from a function vector space (the conﬁguration space of s ) with the scalar product s † j = ˆ Ω d x s x j x . (3) The determinant | S | is of course poorly deﬁned in the continuum limit, but it is a perfectly sensible quantity in any ﬁnite space discretization. Since we only use | S | to ensure proper 2 A code to handle this discretization properly is N I F T Y – N umerical I nformation F ield T heor y . normalization of P ( s ) , whereas our interest is in inferring s , there is nothing to worry about. Our measured data set d = ( d i ) i = ( d 1 , d 2 , . . . ) enters the game via a data model. In the simplest case of a linear measurement, the data is d = R s + n (4) with R s = ´ d x R i x s x being the signal response and n = ( n i ) i = ( n 1 , . . . ) being the noise. The response operator R encodes the point spread function of our instrument, the scanning strategy of the used telescope, and any (linear) operation done on the data, like a F ourier transformation in case we measure with an interferometer . The noise shall here also obe y Gaussian zero mean statistics with kno wn cov ariance N = h n n † i ( n ) (no w with the data space scalar product n † d = ∑ i n i d i ) so that the data likelihood giv en the signal is P ( d | s ) = G ( d − Rs , N ) . (5) No w the signal ﬁeld posterior can be constructed via Bayes theorem, P ( s | d ) = P ( d | s ) P ( s ) P ( d ) ≡ e − H ( d , s ) Z d , (6) where we just deﬁned the information Hamiltonian and its partition function, H ( d , s ) ≡ − ln P ( d , s ) = − ln P ( d | s ) − ln P ( s ) and (7) Z d ≡ ˆ D s e − H ( d , s ) = ˆ D s P ( d , s ) = P ( d ) , (8) in order to translate Bayesian language into that of statistical ﬁeld theory . Thus, we can use any technique de veloped for such in order to do our signal inference. W iener ﬁlter For our speciﬁc linear and Gaussian measurement problem, the Hamiltonian H ( d , s ) b = 1 2 ( d − Rs ) † N − 1 ( d − Rs ) + 1 2 s † S − 1 s (9) is quadratic in s . W e ha ve dropped here irrele vant s -independent terms, as indicated by “ b = ”. This Hamiltonian can be brought into the canonical form H ( d , s ) b = 1 2 ( s − m ) † D − 1 ( s − m ) (10) via quadratic completion, where m = D j , D = ( S − 1 + R † N − 1 R ) − 1 , and j = R † N − 1 d . This implies that the signal posterior is Gaussian with mean m = h s i ( s | d ) and cov ariance D = h ( s − m ) ( s − m ) † i ( s | d ) , P ( s | d ) = G ( s − m , D ) , (11) a result well kno wn in W iener ﬁlter theory of signal reconstruction [5]. In a ﬁeld theoretical language, the data dependent j is an information source ﬁeld, which e xcites our knowledge on s being non-zero (as the preferred prior value was). The W iener variance D plays two distinct roles. On the one hand it is the susceptibility of our mean ﬁeld m to the force of the information source j , since m = D j , on the other hand it describes the remaining a posteriori uncertainty D = h ( s − m ) ( s − m ) † i ( s | d ) . In a ﬁeld theoretical language, D is the information propagator , since D xy transports the information source at location y to the location x of interest in m x = ( D j ) x = ´ d y D xy j y . In practice, one will use an iterati ve linear algebra method lik e the conjugate gradient method to solve numerically the equation D − 1 m = j for m on a computer [6]. Interacting theory Interaction Hamiltonian If any of the assumptions of our W iener ﬁlter theory scenario is violated, in that the signal response is non-linear , the ﬁeld or the noise is non-Gaussian, the noise variance depends on the signal, or the noise or signal cov ariances are unkno wn and have to be determined from the data itself, the resulting information Hamiltonian will contain anharmonic terms. These terms couple the different eigenmodes of the information propagator and lead to an interacting ﬁeld theory . In many cases the Hamiltonian can be T aylor-Fréchet e xpanded as H ( d , s ) = − ln P ( d , s ) = H 0 − j † s + 1 2 s † D − 1 s | {z } H free + ∞ ∑ i = 3 ˙ ( d x 1 · · · d x i ) Λ ( i ) x 1 ... x i s x 1 · · · s x i | {z } H int , (12) and thereby split into a free ( H free ) and an interaction ( H int ) part. Let us assume that the interaction terms are small. This can often be achiev ed, i.e., by shifting the ﬁeld v alues to s 0 = s − s cl , where s cl is the minimum of the Hamiltonian, the classical ﬁeld, or in inference language, the maximum a posteriori estimator . Expanding H ( d , s 0 ) = H ( d , s = s cl + s 0 ) around s 0 = 0 then often ensures small interaction terms around the origin. In this case, it is possible to expand the mean ﬁeld value, or any other quantity of interest, around its free theory value. Since the terms of such an expansion can become numerous and complex, this is best done diagrammatically . F e ynman diagrams Feynman diagrams pro vide a diagrammatical expansion to calculate perturbati vely ﬁeld e xpectation values. W e are not explaining here how the y work in detail, which for IFT is detailed in [3]. W e rather stress the important point that the main elements of the diagrams, the lines connecting source points and interaction vertices, are just an application of the propagator D . Since this could be done numerically for the free theory/W iener ﬁlter case, we are already equipped with the necessary computational tools to calculate more complex diagrams. For e xample, the mean ﬁeld of an interacting theory might be m = h s i ( s | d ) = + + + . . . = D j − 1 2 D Λ ( 3 ) [ · , D j , D j ] − 1 2 D Λ ( 3 ) [ · , D ] + . . . , (13) where we introduced Λ ( n ) [ a , b , . . . ] = ¯ ( d x 1 · · · d x n ) Λ ( n ) x 1 ... x n a x 1 b x 2 · · · as a compact tensor notation. The ﬁrst diagram giv es the W iener ﬁlter signal reconstruction. In the second diagram, two W iener ﬁlter maps are combined by the Λ ( 3 ) -interaction, and then propagated to form the ﬁrst non-linear correction to the W iener ﬁlter . In the third diagram, the W iener cov ariance replaces the two W iener maps of the previous diagram, providing a correction due to the non-linearity effects on the uncertainty structure. More complex diagrams might also provide signiﬁcant corrections, and ha ve then to be calculated too. Ho wev er , their computation can alw ays be based on the linear W iener ﬁlter case of the free theory , and is therefore possible. Thermodynamical infer ence A diagrammatic perturbation calculation leads to well performing algorithms in case the interaction terms are small. If they are large, resummation and renormalization techniques can be used and ha ve proven to lead to well performing algorithms e ven for very non-linear measurement situations [3] or in cases where the signal cov ariance has to be inferred as well from the data used for the signal reconstruction [7]. These techniques can be complex, and the meaning of the results is not necessarily intuiti vely understood. For the treatment of highly interacting quantum ﬁeld theories, the effecti ve action approach has proven helpful. The effecti ve action is the Gibbs free energy G kno wn from thermodynamics (here with temperature T = 1), and this energy has the property that the map m , which minimizes it, is the desired mean ﬁeld m = h s i ( s | d ) gi ven all constraints by the data. The Gibbs free energy is the Legendre transformed Helmholtz free ener gy , which itself is (basically) the logarithm of the partition function Z d . If we could calculate the partition function, we would be able to calculate mean ﬁeld reconstruction directly from it via deri vation with respect to the information source coef ﬁcient: h s i ( s | d ) = δ ln Z d δ j . (14) Thus, on a ﬁrst sight, we did not win an ything by reformulating the inference problem in terms of a Gibbs free energy , since this can only be calculated exactly in case we already hav e solved it. Ho wev er , the Gibbs free energy can also be expressed in terms of the internal energy U = h H ( d , s ) i ( s | d ) = ´ D s P ( s | d ) H ( d , s ) and the Boltzmann entropy S B = − ´ D s P ( s | d ) ln P ( s | d ) as G = U − T S B . (15) This allows for a con venient approximativ e scheme, by replacing P ( s | d ) in the above deﬁnitions with an approximati ve Gaussian surrogate G ( s − m , D ) (e xcept for the Hamil- tonian in U ), with mean m and dispersion D still to be determined. This replacement turns the deﬁnitions for U and S B into Gaussian integrals, which can often be calculated analytically , e.g. S B ≈ 1 2 tr ( 1 + ln ( 2 π D )) . Minimizing the resulting Gibbs free energy with respect to the unknown m and D gi ves then equations determining these quantities approximati vely . This method of ther - modynamical inference has prov en to reproduce previously found results from renor- malization and resummation calculations with much less ef fort [8]. It was also very useful in developing nov el algorithms, e.g. to deal with the problem of reconstructing a Gaussian signal ﬁeld where the signal cov ariance is unknown but spectral smoothness can be assumed [9] or where both the signal and the noise cov ariance where not known [10]. The r esulting algorithm, named e xtended critical ﬁlter , w as successfully used for a reconstruction of the Galactic Faraday rotation sk y signal [11]. It is interesting to note that this minimal Gibbs free ener gy is equiv alent to a minimal Kullback Leibler distance of G ( s − m , D ) to P ( s | d ) or to Maximum Entropy for G ( s − m , D ) with P ( s | d ) as the prior distribution [8]. Thus information theory has basically reformulated methods de veloped earlier in thermodynamics, e.g. see [1]. APPLICA TIONS As the general theory of signal ﬁeld inference, IFT has v ast applications of which I want to mention a fe w listed at www.mpa- garching.mpg.de/ift . Cosmic magnetism studies have already been mentioned. IFT was here used to con- struct Galactic Faraday rotation maps from noisy data with unreliable noise information [11]. The resulting maps can be analysed in order to test for helicity in Galactic magnetic ﬁelds [12, 13]. Cosmography is the 3-d cartograph y of the Cosmos. The main landmarks are the ambundant galaxies tracing the ﬁlamentary and knotty distrib ution of dark matter in space. Initial studies used W iener ﬁltering [6, 14], later the log-normal-Poisson model [3, 15, 16, 17, 18], whereas the latest use the e volution of the Gaussian initial conditions into the observed density ﬁeld [19, 20]. Cosmic Micr owav e Background (CMB) studies are particularly well suited for IFT , since the CMB temperature statistics is very Gaussian. The weak non-Gaussianity is scientiﬁcally extremely interesting, since it is one of the few characteristic signatures of the inﬂationary epoch. An IFT data ﬁlter to search for such non-Gaussianity repro- duces already known non-Gaussianity detection methods, while transfering them into a Bayesian setting [3]. Cross correlation studies of CMB and cosmic structure are also con veniently formulated in an IFT -language [21, 22, 23]. Stochastic estimation methods are widespread in numerics. For example, the diag- onals and traces of complex numerical operators on high-dimensional function spaces (e.g. like the propagator D of IFT) are often calculated approximativ ely via stochas- tic probing. Howe ver , the real space structure of many such operator diagonals often exhibits suf ﬁcient smoothness that IFT methods can speed up their calculation [24]. Numerical simulations of partial differential equations face the problem that their dif ferential operators require continuous ﬁelds to act on, but the data in computer memory is discrete. Thus a speciﬁc sub-grid ﬁeld structure is usually assumed by con ventional simulations schemes. IFT permits to construct the ensemble of plausible continuous ﬁelds being consistent with the data and other kno wledge on which the operators can act in order to produce the time e volved ﬁeld ensemble. A recast of this into an ensemble described by computer-data using entropic matching leads to a ne w and e ventually better simulation methodology , called information ﬁeld dynamics [25]. A CKNO WLEDGMENTS I want to thank all my students and co work ers, who accompanied me on the journey into the realm of IFT and gav e me valuable guidance through feedback, discussions, and their o wn research. These are Michael Bell, Mona Frommert, Maksim Greiner , Jens Jasche, Henrik Junklewitz, Francisco Shu Kitaura, Niels Oppermann, Marco Selig, Maximilian Ullherr , Cornelius W eig, Helin W eingartner , and Lars W inderling. Further , I want to thank John Skilling and an anon ymous referee for helpful comments on the manuscript. REFERENCES 1. A. Caticha, “Information and Entropy, ” in Bayesian Inference and Maximum Entr opy Methods in Science and Engineering , edited by K. H. Knuth, A. Caticha, A. Gifﬁn, and C. C. Rodríguez, 2007, vol. 954 of American Institute of Physics Confer ence Series , pp. 11–22, 0710.1068 . 2. J. C. Lemm, Bayesian F ield Theory , Johns Hopkins Univ ersity Press, 2003. 3. T . A. Enßlin, M. Frommert, and F . S. Kitaura, Phys. Rev . D 80 , 105005 (2009), 0806.3474 . 4. E. T . Jaynes, Pr obability Theory: The Logic of Science , 2003. 5. N. W iener, Extrapolation, Interpolation, and Smoothing of Stationary T ime Series , NY : W iley , 1949. 6. F . S. Kitaura, and T . A. Enßlin, MNRAS 389 , 497–544 (2008), 0705.0429 . 7. T . A. Enßlin, and M. Frommert, Phys. Rev . D 83 , 105014 (2011), 1002.2928 . 8. T . A. Enßlin, and C. W eig, Phys. Rev . E 82 , 051112 (2010), 1004.2868 . 9. N. Oppermann, M. Selig, M. R. Bell, and T . A. Enßlin, ArXiv e-prints (2012), 1210.6866 . 10. N. Oppermann, G. Robbers, and T . A. Enßlin, Phys. Rev . E 84 , 041118 (2011), 1107.2384 . 11. N. Oppermann et al., A&A 542 , A93 (2012), 1111.6186 . 12. H. Junklewitz, and T . A. Enßlin, A&A 530 , A88 (2011), 1008.1243 . 13. N. Oppermann, H. Junklewitz, G. Robbers, and T . A. Enßlin, A&A 530 , A89+ (2011), 1008.1246 . 14. F . S. Kitaura et al., MNRAS 400 , 183–203 (2009), 0906.3978 . 15. J. Jasche, and F . S. Kitaura, ArXiv e-prints (2009), 0911.2496 . 16. J. Jasche, F . S. Kitaura, C. Li, and T . A. Enßlin, ArXiv e-prints (2009), 0911.2498 . 17. F . Kitaura, J. Jasche, and R. B. Metcalf, MNRAS 403 , 589–604 (2010), 0911.1407 . 18. C. W eig, and T . A. Enßlin, MNRAS 409 , 1393–1411 (2010), 1003.1311 . 19. J. Jasche, and B. D. W andelt, ArXiv e-prints (2012), 1203.3639 . 20. F .-S. Kitaura, MNRAS in pr ess, ArXiv e-prints (2012), 1203.4184 . 21. M. Frommert, T . A. Enßlin, and F . S. Kitaura, MNRAS 391 , 1315–1326 (2008), 0807.0464 . 22. M. Frommert, and T . A. Enßlin, MNRAS 395 , 1837–1844 (2009). 23. M. Frommert, and T . A. Enßlin, MNRAS 403 , 1739–1748 (2010), 0908.0453 . 24. M. Selig, N. Oppermann, and T . A. Enßlin, ArXiv e-prints (2011), 1108.0600 . 25. T . A. Enßlin, Phys. Rev . E in press, ArXiv e-prints (2012), 1206.4229 .

Information field theory

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment