Broken neural scaling laws in materials science
In materials science, data are scarce and expensive to generate, whether computationally or experimentally. Therefore, it is crucial to identify how model performance scales with dataset size and model capacity to distinguish between data- and model-limited regimes. Neural scaling laws provide a framework for quantifying this behavior and guide the design of materials datasets and machine learning architectures. Here, we investigate neural scaling laws for a paradigmatic materials science task: predicting the dielectric function of metals, a high-dimensional response that governs how solids interact with light. Using over 200,000 dielectric functions from high-throughput ab initio calculations, we study two multi-objective graph neural networks trained to predict the frequency-dependent complex interband dielectric function and the Drude frequency. We observe broken neural scaling laws with respect to dataset size, whereas scaling with the number of model parameters follows a simple power law that rapidly saturates.
💡 Research Summary
In this work the authors address a fundamental challenge in materials informatics: the scarcity and high cost of data, which hampers the systematic development of machine‑learning (ML) models for predicting complex physical properties. They focus on a paradigmatic, high‑dimensional target—the frequency‑dependent complex dielectric function ε(ω) of metals, together with the Drude plasma frequency ω_D, which together fully describe a metal’s optical response.
To enable a rigorous study of neural scaling laws (NSLs) in this domain, the authors generated a massive dataset of 201 361 dielectric functions using high‑throughput density‑functional theory (DFT) calculations within the independent‑particle approximation (IPA) plus a Drude term. The structures were drawn from the Alexandria database, filtered for quality, and the workflow was fully automated to ensure reproducibility. Validation against experimental spectra for 27 elemental metals demonstrated that the calculated ε(ω) and ω_D are of sufficient accuracy for ML training.
Two graph neural network (GNN) architectures were then built: OptiMetal2B, which employs only two‑body message passing, and OptiMetal3B, which adds explicit three‑body interactions. Both models use rotationally invariant atomic features (element type, bond length, bond angle) and are deliberately constrained in architectural degrees of freedom so that the number of trainable parameters N can be varied systematically by changing the hidden dimension d_h (from 2⁴ to 2¹⁰, corresponding to roughly 10⁵–10⁸ parameters). The authors also examined two message‑passing schemes—Crystal Graph Convolution (CGC) and Transformer Convolution (TC)—and found negligible performance differences after architecture optimization.
The core of the study is a detailed scaling analysis. For data scaling, the training set was randomly subsampled into seven sizes (2.5 k, 5 k, 10 k, 20 k, 40 k, 80 k, 160 k) while keeping N fixed at ≈10 M (d_h = 256). Validation loss L_val, defined as the sum of mean absolute errors for ε_inter(ω) and ω_D, was measured for each subset. The authors fitted four candidate functional forms (simple power law, power law with saturation, smoothly broken power law with and without adjustable amplitude) and used the corrected Akaike Information Criterion (AICc) to select the best description. The data‑scaling curve is best captured by a smoothly broken power law without an adjustable amplitude: a low‑data exponent α_D,1≈0.15–0.18, a crossover dataset size D_c≈10⁴·⁴–10⁴·⁷ (≈25 k–50 k samples), and a high‑data exponent α_D,2≈0.38–0.42. This “broken” scaling indicates that, for small datasets, the model behaves like a best‑guess estimator, extracting only coarse trends; once enough diverse materials are present, each additional sample yields disproportionately larger performance gains.
For parameter scaling, the authors fixed D = 20 k and varied N from ≈10⁵ to ≈10⁸. All architectures follow a simple power law L(N) = L_∞ + (N₀/N)^{α_N} with exponents α_N ranging from 0.41 (OptiMetal3B) to 0.58 (OptiMetal2B). The loss quickly saturates around N₀≈10⁴·⁴–10⁴·⁹ (≈25 k–80 k parameters), indicating diminishing returns from further increasing model capacity. Notably, the three‑body model reaches a lower asymptotic loss (L_∞≈0.89) than the two‑body variants (L_∞≈1.01–1.03), showing that richer interaction modeling improves absolute accuracy even though it does not alter the scaling exponent.
To test whether the observed broken scaling is merely an artifact of over‑parameterized models on limited data, the authors constructed two‑dimensional scaling maps L(D,N). These maps reveal that the transition in scaling behavior persists even when model capacity is matched to dataset size, confirming that the broken scaling is intrinsic to the learning problem rather than a methodological artifact.
The implications for materials ML are significant. First, the existence of a crossover dataset size suggests that modest investments in data generation can yield rapid performance improvements up to D_c, after which further gains become more pronounced but may require substantially larger computational resources. Second, the rapid saturation of performance with respect to model parameters indicates that, beyond a modest number of trainable weights (≈5 M), additional capacity offers little benefit for this task. Third, the modest impact of three‑body interactions on scaling exponents—but their clear effect on absolute error—highlights a design trade‑off: more expressive architectures can lower the error floor without necessarily changing how performance scales with data.
Overall, the paper makes three key contributions: (1) the creation of the largest publicly available dielectric‑function dataset for metals, (2) a systematic, statistically rigorous investigation of neural scaling laws in a high‑dimensional materials‑science task, and (3) the introduction of “broken neural scaling laws” (BNSLs) to the materials community, providing a quantitative framework for balancing data acquisition, model complexity, and computational budget. These insights will guide future efforts in dataset curation, model architecture selection, and resource allocation for accelerated materials discovery.
Comments & Academic Discussion
Loading comments...
Leave a Comment