A physics-inspired nonlinear momentum method for gradient descent with applications to inverse photonic design
In this work, a nonlinear momentum method is introduced to enhance the convergence performance of momentum-based gradient optimization algorithms. Classical momentum methods, such as the Heavy Ball method, can be viewed as a dynamical system with quadratic kinetic energy and linear damping. By extending this analogy to non-Newtonian dynamical systems, we construct a Hamiltonian framework for optimization problems. In this framework, nonlinear kinetic energy and nonlinear damping effects naturally emerge. It provides a more flexible and physically interpretable mechanism for optimization algorithms. Specifically, we employ an anharmonic kinetic energy function to capture the inertial effects of accumulated gradient information during the optimization process, while the nonlinear damping mechanism effectively regulates the contribution of momentum during convergence. Numerical experiments show that the proposed method achieves faster convergence compared to classical momentum algorithms, making it particularly suitable for inverse design tasks. Moreover, the Hamiltonian based algorithmic framework may offer physical insights for the development of efficient physics-inspired optimization algorithms.
💡 Research Summary
The paper introduces a novel nonlinear momentum optimization method that draws inspiration from non‑Newtonian mechanics and Hamiltonian dynamics. Classical momentum algorithms such as the Heavy‑Ball (HB) method can be interpreted as damped second‑order differential equations with quadratic kinetic energy (K = ½‖ẋ‖²) and linear viscous damping (γẋ). The authors extend this analogy by replacing the quadratic kinetic term with an anharmonic power‑law form K(ẋ)=m s⁻¹‖ẋ‖ˢ (s>1) and by introducing a velocity‑dependent nonlinear damping potential ϕ(ẋ)=γ‖ẋ‖^{η‑1}sgn(ẋ), where η>0 controls the degree of nonlinearity. When η=2 the method collapses to the standard linear damping; for η<2 the damping grows more slowly at low speeds, preserving momentum and helping escape saddle points, while η>2 yields stronger damping at high speeds, suppressing oscillations.
Using the Legendre transform, the kinetic energy’s convex conjugate K* (p) = (s‑1)/s m‖p‖ʳʳ (with r = s/(s‑1)) is derived, allowing the dynamics to be written in Hamiltonian form: ˙x = ∇ₚH = ∇K*(p), ˙p = –∇ₓV(x) – D(v), where D(v)=∇ϕ(v) and H(x,p)=K*(p)+V(x). The energy dissipation identity dH/dt = –v·D(v) ≤ 0 guarantees monotonic decrease of the Hamiltonian, ensuring stability of the continuous‑time system for any η, s>1, and γ>0.
To obtain a practical algorithm, the authors discretize the Hamiltonian equations with a forward‑Euler scheme. The resulting update rules are: pₖ = pₖ₋₁ – h∇V(xₖ) – h∇ϕ(pₖ₋₁), xₖ₊₁ = xₖ + h∇K*(pₖ), where h is the step size, γ and η are damping parameters, and r (or equivalently s) determines the kinetic energy’s non‑quadratic exponent. Setting η=2 and r=2 recovers the classical HB method; varying η and r yields a family of algorithms that can interpolate between aggressive acceleration and conservative damping. The paper also shows how to embed the nonlinear momentum scheme into Nesterov‑type acceleration (Algorithm 2) and into mirror‑descent frameworks (Algorithm 3), thereby combining temporal dynamics adaptation with spatial geometry adaptation.
A convergence analysis is provided under the standard assumptions that the objective V is m‑strongly convex and L‑smooth. Defining an energy functional Eₖ = (m/2)‖vₖ‖² + V(xₖ) with vₖ = xₖ – xₖ₋₁, the authors derive a descent inequality that leads to ΔE ≤ 0 provided the step size satisfies (L+m)h ≤ 1 and the damping coefficient obeys (1–γh)² ≤ mh. Under these conditions the discrete dynamics inherit the monotone energy decay of the continuous system, guaranteeing convergence to the unique minimizer. The analysis further highlights that for 1<η≤2 the term (1–γh)² acts as a tunable bound on the effective damping, offering a principled way to balance speed and stability.
Extensive numerical experiments validate the theoretical claims. First, synthetic non‑convex benchmark functions demonstrate that the nonlinear momentum method converges faster than HB, Nesterov’s accelerated gradient, and Adam, often reducing the number of iterations by 20–40 %. Second, the method is applied to inverse photonic design problems, specifically the optimization of near‑field thermal radiation patterns. Using adjoint‑based gradient computation, the authors show that the nonlinear momentum scheme reaches lower objective values in fewer iterations, even when the design space contains thousands of variables. Parameter sweeps reveal that intermediate values (e.g., η≈1.5, r≈1.8) generally yield the best trade‑off between rapid descent and oscillation suppression, confirming the practical relevance of the nonlinearity.
The paper concludes that embedding physical concepts—non‑quadratic kinetic energy and velocity‑dependent damping—into optimization algorithms opens a new design space beyond traditional linear momentum methods. While the current convergence proof relies on strong convexity, the empirical success on highly non‑convex photonic design problems suggests broader applicability. Future work is suggested on extending the theoretical analysis to non‑convex settings, developing adaptive schemes for η, r, and γ, and exploring hybridizations with other geometry‑aware methods such as Riemannian or stochastic optimization. Overall, the study provides a compelling bridge between mechanics and modern large‑scale optimization, offering both deeper physical insight and tangible performance gains.
Comments & Academic Discussion
Loading comments...
Leave a Comment