Stabilizing simulation-based cosmological Fisher forecasts: a case study using the Voronoi volume function
Forecasting cosmological constraints from halo-based statistics often suffers from instability in derivative estimates, especially when the number of simulations is limited. This instability reduces the reliability of Fisher forecasts and machine learning based approaches that use derivatives. We introduce a general framework that addresses this challenge by stabilizing the input statistic and then systematically identifying the optimal subset of summary statistics that maximizes cosmological information while simultaneously minimizing the instability of predicted constraints. We demonstrate this framework using the halo mass function as well as the Voronoi volume function (VVF), a summary statistic that captures beyond two-point clustering information. Applying our two-step procedure – random sub-sampling followed by optimization – improves the constraining power by up to a factor of 4, while also enhancing the stability of the forecasts across realizations. As surveys like Euclid, DESI, and LSST push toward tighter constraints, the ability to produce stable and accurate theoretical predictions is essential. Our results suggest that new summary statistics such as the VVF, combined with careful data curation and stabilization strategies, can play a key role in next-generation precision cosmology.
💡 Research Summary
This paper tackles a persistent problem in simulation‑based Fisher forecasting: the instability of numerical derivatives when only a limited number of N‑body realizations are available. Such instability propagates into noisy covariance estimates and unreliable parameter constraints, especially for non‑Gaussian summary statistics that are increasingly popular for large‑scale structure (LSS) analyses. The authors propose a two‑step, statistics‑agnostic framework designed to both stabilize the input statistic and to select an optimal subset of data points that maximizes cosmological information while minimizing forecast variance across realizations.
In the first step, the authors introduce a random sub‑sampling scheme applied to the Voronoi volume function (VVF), a higher‑order statistic that captures the distribution of Voronoi cell volumes of halo or galaxy tracers. For each tracer sample, 70 % of the points are randomly drawn without replacement, the VVF is computed, and this process is repeated ten times. The final VVF is the average of these ten realizations. This averaging dramatically reduces the scatter of the VVF under small cosmological parameter variations (e.g., Ωₘ) and yields smoother derivative estimates, particularly for low‑density tracer samples where the raw VVF is noisy. The authors show that the sub‑sampling effect is less pronounced at higher tracer densities, but still beneficial for derivative stability.
The second step addresses the curse of dimensionality in Fisher analyses. While the full VVF can be evaluated at hundreds of percentiles, using all points inflates the covariance matrix and amplifies noise. The authors therefore define three complementary objective functions: (i) the mean‑squared error of the numerical derivatives, (ii) the condition number of the derivative covariance matrix, and (iii) an information metric derived from the Fisher matrix (e.g., determinant or trace of the inverse). By minimizing a weighted sum of these quantities, they identify a subset of percentiles (Nₙ) that provides the best trade‑off between information content and stability. The optimization is performed using greedy selection and genetic algorithms, with explicit penalization of highly correlated points to avoid multicollinearity.
The methodology is tested on two suites of N‑body simulations, “Sinhagad” and “Sahyadri”, both with a 200 h⁻¹ Mpc box but differing in particle resolution (256³ vs. 2048³). Six cosmological parameters (Ωₘ, nₛ, h, Aₛ, Ω_b, Ω_k) are varied individually around Planck‑2018 fiducial values using finite‑difference steps of ±Δ and ±2Δ. Derivatives are estimated from seed‑matched simulations, while covariance matrices are built from 100 default realizations (Sinhagad) or from sub‑box divisions (27 sub‑boxes for Sahyadri). Halo catalogs are generated with a 6‑D FoF algorithm, selecting halos by V_peak and imposing a relaxation criterion (0.5 ≤ η ≤ 1.5). Two tracer densities (2 × 10⁻² and 2 × 10⁻³ (Mpc⁻³)⁻³) are examined to probe the impact of sampling noise.
Results show that random sub‑sampling reduces the standard deviation of VVF derivatives by ~30–40 % for low‑density samples and by ~10 % for high‑density samples. After the optimization step, the Fisher constraints on Ωₘ improve by a factor of 2 on average, with the most dramatic cases reaching a four‑fold tightening. Similar gains are observed for the halo mass function (HMF), confirming that the framework is not specific to VVF. The authors also explore sensitivity to the magnitude of parameter variations and to the polynomial order used in fitting the derivative curves, finding that the optimized subset remains robust across these choices.
The paper concludes by emphasizing the broader applicability of the approach. It can be combined with other non‑Gaussian statistics (e.g., marked correlation functions, Minkowski functionals) or with machine‑learning‑derived summaries, and it provides a practical solution when computational resources limit the number of high‑resolution simulations. By delivering stable and accurate Fisher forecasts, the method positions new summary statistics like the VVF as valuable tools for upcoming surveys such as Euclid, DESI, and LSST, where extracting maximal information from the nonlinear regime will be essential for precision cosmology.
Comments & Academic Discussion
Loading comments...
Leave a Comment