Machine Learning vs. Spectral Energy Distribution Fitting: A Comparative Analysis of Accuracy in Stellar Mass Estimation
Traditional spectral energy distribution (SED)-fitting methods for stellar mass estimation face persistent challenges including systematic biases and computational constraints. We present a controlled comparison of machine learning (ML) and SED-fitting methods, assessing their accuracy, robustness, and computational efficiency. Using a sample of COSMOS-like galaxies from the Horizon-AGN simulation as a benchmark with known true masses, we evaluate the Parametric t-SNE (Pt-SNE) algorithm – trained on noise-injected BC03 models – against the established SED-fitting code LePhare. Our results demonstrate that Pt-SNE achieves superior accuracy, with a root-mean-square error (sigma_F) of 0.169 dex compared to LePhare’s 0.306 dex. Crucially, Pt-SNE exhibits significantly lower bias (0.029 dex) compared to LePhare (0.286 dex). Pt-SNE also shows greater robustness across all stellar mass ranges, particularly for low-mass galaxies (10^9 to 10^10 solar masses), where it reduces errors by 47-53 %. Even when restricted to only six optical bands, Pt-SNE outperforms LePhare using all 26 available photometric bands, underscoring its superior informational efficiency. Computationally, Pt-SNE processes large datasets approximately 3.2 x 10^3 times faster than LePhare. These findings highlight the fundamental advantages of ML methods for stellar mass estimation, demonstrating their potential to deliver more accurate, stable, and scalable measurements for large-scale galaxy surveys.
💡 Research Summary
This paper presents a controlled, head‑to‑head comparison between a traditional spectral‑energy‑distribution (SED) fitting code (LePhare) and a modern machine‑learning (ML) approach (Parametric t‑SNE, abbreviated Pt‑SNE) for estimating galaxy stellar masses. The authors use a mock COSMOS‑like galaxy catalog derived from the Horizon‑AGN hydrodynamical simulation, which provides “true” stellar masses for validation. After applying redshift (0.8 < z < 1.2), signal‑to‑noise, and K‑band depth cuts, the final sample contains 91,261 galaxies with photometry in 26 bands (10 broad, 14 medium, and 2 Spitzer/IRAC channels).
Data and Pre‑processing
The ML side is trained on synthetic spectral energy distributions generated with the Bruzual & Charlot (2003) (BC03) stellar population synthesis code. The synthetic library spans a wide range of ages (log age/yr = 7.7–10.0), exponentially declining star‑formation histories (τ = 0.1–10 Gyr), sub‑solar metallicity (0.4 Z⊙), and Calzetti dust attenuation (E(B‑V) = 0–1). Approximately 14 000 model galaxies are produced, redshifted to z ≈ 1, and convolved with the same 12 filters used for the ML analysis (u, B, V, r, i+, z++, Y, J, H, K_s, ch1, ch2). To mimic the observational uncertainties of the mock catalog, the authors first train a RandomForestRegressor to predict per‑color errors, then add Gaussian noise scaled by those predictions to the synthetic colors. Sixty‑six color indices (differences between pairs of the 12 filters) are standardized (zero mean, unit variance) and fed to Pt‑SNE.
Pt‑SNE Methodology
Pt‑SNE learns a parametric mapping from the high‑dimensional color space to a two‑dimensional manifold while preserving local neighborhoods. The default perplexity of 30 is used; a systematic test varying perplexity from 10 to 100 shows RMS changes < 0.01 dex, indicating robustness to this hyper‑parameter. After training, each mock galaxy is projected onto the learned manifold. Stellar mass‑to‑light ratios (M/L_Ks) are estimated by a distance‑weighted average of the ten nearest training points in the 2‑D space, and the final stellar mass is obtained by multiplying this ratio by the galaxy’s K_s‑band luminosity.
LePhare SED‑Fitting
LePhare employs the same BC03 template library but explores two metallicities (Z⊙, 0.4 Z⊙), two extinction laws (Arnouts et al. 2013, Calzetti 2000), and two star‑formation histories (exponentially declining and delayed). The τ grid includes 0.1, 0.3, 1, 3, 4, 30 Gyr for the exponential case and 1, 3 Gyr for the delayed case. All 26 photometric bands are simultaneously fitted, following the procedure used for the real COSMOS2015 catalog.
Performance Metrics
Four statistical quantities are computed against the simulation truth: root‑mean‑square error (σ_F), normalized median absolute deviation (σ_NMAD), standard deviation (σ_STD), and mean offset (Bias). Outliers are defined as |Δlog M| > 0.5 dex; the outlier fraction (OLF) and the RMS after outlier removal (σ_0) are also reported.
Results
Pt‑SNE achieves σ_F = 0.169 dex, σ_NMAD = 0.171 dex, σ_STD = 0.166 dex, OLF = 0.4 %, σ_0 = 0.165 dex, and a negligible bias of +0.029 dex. In contrast, LePhare yields σ_F = 0.306 dex, σ_NMAD = 0.415 dex, σ_STD = 0.110 dex, OLF = 3.1 %, σ_0 = 0.291 dex, and a substantial negative bias of –0.286 dex. The bias in LePhare indicates a systematic under‑estimation of stellar masses, while its smaller σ_STD reflects tighter clustering around a biased mean. Pt‑SNE’s bias‑free estimates are therefore more accurate overall, even though its scatter is modestly larger.
The advantage of Pt‑SNE is especially pronounced for low‑mass galaxies (10⁹–10¹⁰ M⊙), where it reduces the RMS error by 47–53 % relative to LePhare. Remarkably, Pt‑SNE uses only 12 filters (12 bands → 66 colors) yet outperforms LePhare that exploits all 26 bands, demonstrating superior informational efficiency.
Computational Efficiency
Processing time per galaxy is ~0.3 ms for Pt‑SNE versus ~1 s for LePhare, corresponding to a speed‑up factor of ≈3.2 × 10³. This dramatic gain makes Pt‑SNE highly attractive for upcoming massive photometric surveys (e.g., LSST, Euclid, Roman) where billions of objects must be analyzed.
Discussion and Limitations
The authors acknowledge that Pt‑SNE inherits any systematic biases present in the BC03 training set (fixed IMF, limited metallicity range, single‑burst SFHs). Real observations may contain nebular emission lines, more complex dust geometries, or IMF variations that are not captured in the synthetic library, potentially leading to residual biases when applying the model to actual data. Using only colors discards absolute flux information, which could be valuable for distinguishing extreme SED shapes. Future work is suggested to incorporate multiple stellar‑population synthesis models (e.g., FSPS, BPASS), variable IMF, more diverse SFHs, and to explore ensemble or transfer‑learning techniques to improve generalization to real surveys.
Conclusion
The study demonstrates that a well‑designed ML pipeline (Pt‑SNE) can surpass a state‑of‑the‑art SED‑fitting code in both accuracy (lower RMS and negligible bias) and speed (thousands of times faster), while also being more robust across the full stellar‑mass range, especially at the low‑mass end. These results provide compelling evidence that ML‑based stellar‑mass estimators should be seriously considered for the next generation of large‑scale galaxy surveys, offering a path toward more reliable, unbiased, and computationally tractable measurements of a key galaxy property.
Comments & Academic Discussion
Loading comments...
Leave a Comment