Approximation of Functions: Optimal Sampling and Complexity

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider approximation or recovery of functions based on a finite number of function evaluations. This is a well-studied problem in optimal recovery, machine learning, and numerical analysis in general, but many fundamental insights were obtained only recently. We discuss different aspects of the information-theoretic limit that appears because of the limited amount of data available, as well as algorithms and sampling strategies that come as close to it as possible. We also discuss (optimal) sampling in a broader sense, allowing other types of measurements that may be nonlinear, adaptive and random, and present several relations between the different settings in the spirit of information-based complexity. We hope that this article provides both, a basic introduction to the subject and a contemporary summary of the current state of research.

💡 Research Summary

The paper provides a comprehensive survey of the theory and practice of function approximation when only a finite number of function evaluations (or more general measurements) are available. Framed within the information‑based complexity (IBC) paradigm, the authors introduce the notion of sampling numbers (g_n(F,Y)), which quantify the minimal worst‑case error achievable by any algorithm that uses (n) measurements of a function from a model class (F) and outputs an approximation in a target space (Y) (typically an (L^p) or sup‑norm space).

The exposition begins with a pedagogical “easy example”: univariate Lipschitz functions on the torus. Here, uniform sampling and simple linear interpolation achieve the optimal rate (g_n\asymp n^{-1}), illustrating how sampling numbers relate to classical best‑approximation errors in linear subspaces.

The core of the paper is organized around three families of measurement models: (1) pure function‑value sampling (non‑adaptive, deterministic), (2) linear measurements (e.g., inner products with known test functions), and (3) nonlinear, adaptive, or randomized measurements. For each model the authors connect the achievable error to well‑studied s‑numbers of the embedding operator (F\hookrightarrow Y): Kolmogorov, Gelfand, and Bernstein widths, as well as to entropy numbers (covering numbers). These connections yield both lower bounds (via widths and entropy) and upper bounds (via constructive algorithms).

A substantial portion is devoted to weighted least‑squares (WLS) methods. By drawing sampling points i.i.d. from a probability density proportional to the Christoffel function of the reconstruction space, the resulting random design matrix enjoys favorable spectral properties. Using modern random matrix theory, the authors show that with (n\gtrsim m\log m) samples (where (m=\dim V) of the trial space) one obtains, with high probability, an error comparable to the best (m)-term approximation error (\sigma_m(f)_V). Moreover, for Sobolev spaces on general domains the logarithmic oversampling can be eliminated, leading to the optimal linear rate (n^{-s/d}).

The paper then bridges to nonlinear approximation via compressed sensing. When a dictionary (D) is fixed, the best (k)-sparse approximation error (\sigma_k(f)_D) serves as a benchmark. Random linear measurements (Gaussian or subsampled Fourier) together with (\ell_1)-minimization achieve an error bound of order (\sigma_k(f)_D) provided (k\approx n/\log n). This demonstrates that, under sparsity assumptions, one can substantially reduce the number of required samples compared to linear approximation.

Entropy numbers are employed to derive information‑theoretic lower bounds: for a Sobolev class (W^{s,p}) on a (d)-dimensional domain, the covering numbers satisfy (\log N(\varepsilon,F,|\cdot|_{L^p})\asymp \varepsilon^{-d/s}), which translates into the fundamental rate (g_n\gtrsim n^{-s/d}). Consequently, any algorithm—linear, nonlinear, adaptive, or randomized—cannot beat this rate without additional structural assumptions.

High‑dimensional settings are examined through the lens of the curse of dimensionality. The authors show that for generic Sobolev spaces the sampling numbers decay as (n^{-s/d}), which becomes prohibitively slow as (d) grows. They then discuss tractability criteria: weighted Sobolev spaces with rapidly decaying anisotropic weights, function classes with low‑dimensional manifold structure, and ANOVA‑type decompositions all lead to polynomial (or even strong) tractability, i.e., error bounds whose dependence on (d) is at most polynomial.

Section 9 expands the notion of sampling beyond point evaluations. By introducing a general measurement operator (\Lambda) and allowing adaptive or randomized selection of measurement functionals, the authors compare the power of various information classes. They prove that adaptive linear measurements can strictly improve over non‑adaptive ones, and that random linear measurements (e.g., Gaussian) can achieve the same optimal rates as the best deterministic linear measurements but often with simpler implementation. Continuous measurements (such as integrals) are shown to be even more powerful when combined with adaptation.

The final sections compile the main results into six concise theorems (Section 11) and provide a practical guide to constructing sampling points: Christoffel‑weighted designs, Leja sequences, quasi‑Monte Carlo points, and multilevel Monte Carlo schemes. The paper concludes with an outlook on open problems, notably (i) tighter lower bounds for nonlinear measurements, (ii) data‑driven adaptive designs informed by machine‑learning models, and (iii) integration of optimal sampling with uncertainty quantification for high‑dimensional stochastic PDEs.

Overall, the survey unifies a broad spectrum of recent advances—random matrix theory, compressed sensing, entropy methods, and IBC—into a coherent framework that clarifies when and how optimal sampling is possible, what the intrinsic limits are, and how algorithmic choices (linear vs. nonlinear, deterministic vs. random, adaptive vs. non‑adaptive) affect the achievable approximation quality.

Approximation of Functions: Optimal Sampling and Complexity

💡 Research Summary

Comments & Academic Discussion

Leave a Comment