Reconstruction of Dark Matter and Baryon Density From Galaxies: A Comparison of Linear, Halo Model and Machine Learning-Based Methods

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

For many analyses in cosmology it is necessary to reconstruct the likely distribution of unobserved fields, such as dark matter or non-luminous baryons, from observed luminous tracers. The dominant approach in cosmology has been to use the so-called halo model, which assumes radially symmetric profiles centered around luminous tracers such as galaxies. More recently, field-level machine learning methods have been proposed that can learn to estimate the unobserved field after being trained on simulations. However, it is unclear whether machine learning methods indeed significantly improve over linear methods or the halo model. In this paper we make a systematic comparison of different approaches to reconstruct dark matter and non-luminous baryons, from galaxy data using the CAMELS simulations. These simulations are in a $25\ \texttt{Mpc/h}$ box, allowing us to compare performance on the mildly non-linear scales $(k\sim 0.4\ \mathrm{h/Mpc})$ down to the size of individual halos. We find the best results using a combined GNN-CNN approach. We also provide a general analysis and visualization of the relationship of matter, non-luminous baryons, halos, and galaxies in these simulations to interpret our results.

💡 Research Summary

This paper presents a systematic comparison of three classes of methods for reconstructing the three‑dimensional dark‑matter density field (δ m) and the non‑luminous baryon (gas) density field (δ e) from galaxy catalogs. Using the CAMELS suite of IllustrisTNG‑CV hydrodynamic simulations, the authors evaluate performance on a modest 25 Mpc h⁻¹ box, which allows them to probe scales from the mildly non‑linear regime (k ≈ 0.4 h Mpc⁻¹) down to individual halo sizes.

The first baseline is a linear transfer function in Fourier space, T(k)=P_gm(k)/P_gg(k), which simply rescales the observed galaxy overdensity δ_g to obtain an estimate of the target field. While this approach reproduces the target power spectrum by construction, it does not improve the cross‑correlation coefficient r(k) beyond the linear regime and therefore provides limited fidelity on intermediate and small scales.

The second class follows the traditional halo‑model paradigm. “Halo‑painting” uses the true halo catalog from the simulation to place spherically symmetric NFW profiles (for dark matter) or empirically motivated gas profiles at halo centers, reproducing the field with high accuracy when the exact halo masses and concentrations are known. To mimic realistic observations, the authors also train a graph neural network (GNN) to infer halo mass and concentration from the galaxy point cloud (the “GNN‑NFW” method). This learned halo‑painting recovers most large‑scale features but suffers a modest 5–10 % degradation relative to the exact‑halo case because of inference errors in the halo parameters.

The third and most innovative approach is a hybrid GNN‑CNN architecture introduced in an earlier work (Ref.

Reconstruction of Dark Matter and Baryon Density From Galaxies: A Comparison of Linear, Halo Model and Machine Learning-Based Methods

💡 Research Summary

Comments & Academic Discussion

Leave a Comment