On the Effectiveness of Classical Regression Methods for Optimal Switching Problems
Simple regression methods provide robust, near-optimal solutions for optimal switching problems, including high-dimensional ones (up to 50). While the theory requires solving intractable PDE systems, the Longstaff-Schwartz algorithm with classical regression methods achieves excellent switching decisions without extensive hyperparameter tuning. Testing linear models (OLS, Ridge, LASSO), tree-based methods (random forests, gradient boosting), $k$-nearest neighbors, and feedforward neural networks on four benchmark problems, we find that several simple methods maintain stable performance across diverse problem characteristics, outperforming the neural networks we tested against. In our comparison, $k$-NN regression performs consistently well, and with minimal hyperparameter tuning. We establish concentration bounds for this regressor and show that PCA enables $k$-NN to scale to high dimensions.
💡 Research Summary
The paper investigates the practical performance of a range of regression techniques within the Longstaff‑Schwartz Monte‑Carlo framework for solving optimal switching (OS) problems, which involve choosing among multiple operational modes under stochastic, high‑dimensional market conditions. Classical PDE approaches become intractable as the state dimension grows (the authors consider up to 50 dimensions), so the authors turn to simulation‑based dynamic programming: they generate forward paths of a jump‑diffusion process, then work backwards in time, estimating continuation values by regression at each time step and for each mode.
Four families of regressors are examined: (i) linear models (ordinary least squares, Ridge, LASSO), (ii) tree‑based ensembles (random forests, gradient boosting), (iii) the non‑parametric k‑nearest neighbours (k‑NN), and (iv) feed‑forward neural networks (multilayer perceptrons). The authors implement each method with minimal hyper‑parameter tuning, aiming to assess robustness rather than achieve the absolute best possible performance for any single method.
A key theoretical contribution is the derivation of concentration bounds for k‑NN regression when the underlying state process follows either a sub‑Gaussian diffusion or a sub‑exponential jump‑diffusion. Under standard regularity assumptions, they prove that the mean‑squared error of the k‑NN estimator decays at the rate O(n⁻¹ᐟ²) as the number of simulated paths n grows, even in the presence of jumps. Moreover, they show that applying Principal Component Analysis (PCA) before k‑NN mitigates the curse of dimensionality: by retaining enough components to explain 95 % of the variance, the distance calculations remain informative while computational cost is dramatically reduced.
The empirical study comprises four benchmark OS problems that vary in dimensionality (from 3 up to 50), number of modes, and the structure of the payoff and switching‑cost functions. For each benchmark the authors generate 10 000 Monte‑Carlo paths with 50 time steps, and they evaluate the regressors on three criteria: (1) relative error of the estimated expected profit compared to a high‑accuracy reference solution, (2) accuracy of the implied switching times, and (3) computational time for training and prediction.
Results show that k‑NN consistently delivers the best trade‑off. In low‑dimensional settings (≤10) linear models perform reasonably well, but their error grows quickly as dimension increases. Tree‑based methods achieve competitive accuracy in medium dimensions (10–20) but require substantially more memory and CPU time, especially because a separate regression must be solved for each time step and each mode (N × D regressions). Neural networks can match k‑NN in some medium‑dimensional cases, yet they demand extensive hyper‑parameter searches (layer depth, width, learning rate, regularisation) and exhibit instability in the highest‑dimensional experiments. When PCA is combined with k‑NN, the method maintains sub‑3 % relative error even at 50 dimensions, with prediction times comparable to the simplest linear regressors.
The discussion emphasizes that, contrary to the prevailing belief that deep learning is necessary for high‑dimensional control problems, a simple non‑parametric method with modest preprocessing can be both theoretically sound and practically superior. The authors argue that k‑NN’s locality captures the structure of the optimal switching rule (which is essentially a threshold on continuation values), and that its minimal tuning burden makes it attractive for real‑time operational settings where models must be retrained frequently.
Future work suggested includes adaptive selection of the neighbor count k, vectorised regression that simultaneously predicts all mode values to reduce the N × D regression burden, and designing custom distance metrics that respect the jump‑diffusion dynamics. Overall, the paper provides strong evidence that classical regression techniques—especially k‑nearest neighbours with PCA—remain powerful tools for solving high‑dimensional optimal switching problems, offering a compelling alternative to more complex deep‑learning approaches.
Comments & Academic Discussion
Loading comments...
Leave a Comment