Autocorrelated Optimize-via-Estimate: Predict-then-Optimize versus Finite-sample Optimal
Models that directly optimize for out-of-sample performance in the finite-sample regime have emerged as a promising alternative to traditional estimate-then-optimize approaches in data-driven optimization. In this work, we compare their performance in the context of autocorrelated uncertainties, specifically, under a Vector Autoregressive Moving Average VARMA(p,q) process. We propose an autocorrelated Optimize-via-Estimate (A-OVE) model that obtains an out-of-sample optimal solution as a function of sufficient statistics, and propose a recursive form for computing its sufficient statistics. We evaluate these models on a portfolio optimization problem with trading costs. A-OVE achieves low regret relative to a perfect information oracle, outperforming predict-then-optimize machine learning benchmarks. Notably, machine learning models with higher accuracy can have poorer decision quality, echoing the growing literature in data-driven optimization. Performance is retained under small mis-specification.
💡 Research Summary
This paper investigates the relative performance of two major paradigms in data‑driven optimization—Predict‑then‑Optimize (PTO) and Optimize‑via‑Estimate (O‑VE)—under autocorrelated uncertainty modeled by a vector autoregressive moving‑average (VARMA(p,q)) process. Traditional PTO first learns a point predictor for the uncertain variable and plugs it into a deterministic optimization problem, while O‑VE directly optimizes the out‑of‑sample expected cost by treating the decision rule as a function of the observed data set. The authors extend the O‑VE framework, originally developed for i.i.d. settings, to the time‑series context, introducing the Autocorrelated‑O‑VE (A‑OVE) model.
A central technical contribution is the derivation of sufficient statistics for a general VARMA(p,q) model via a Fisher‑Neyman factorization of the exact likelihood. By constructing a forward‑transformed process {Wₜ} and its one‑step predictor {Ŵₜ}, the likelihood can be written as L(Y;ξ)=g₀(Y)·g₁(Y,ξ), where g₁ depends on the sum of quadratic prediction errors weighted by time‑varying covariances Σₜ. This factorization shows that the sufficient statistics need not be computed explicitly; the O‑VE objective can be optimized directly using the likelihood term g₁.
The authors embed A‑OVE in a realistic portfolio‑optimization problem that balances risk‑adjusted returns against quadratic trading costs that depend on the next‑period log trading volume Y_{T+1}. Assuming a simple risk model (Σ=δ²I) the problem reduces to minimizing a quadratic function of the portfolio vector x, with two data‑dependent diagonal matrices D₁ (risk term) and D₂(Y_{T+1}) (trading‑cost term). Closed‑form solutions are derived for PTO (using a point forecast ˆY_{T+1}) and for ETO (using the conditional expectation of D₂ given estimated VARMA parameters). For A‑OVE, the decision rule x_{OVE}(·) is obtained by solving a convex optimization problem that incorporates the Fisher‑Neyman factor g₁, effectively minimizing the posterior expected out‑of‑sample loss under a prior over the VARMA parameters.
Empirical evaluation proceeds in two stages. First, synthetic VARMA data are generated with controlled misspecification (incorrect order, covariance errors). Across a range of misspecification levels, A‑OVE consistently achieves regret close to that of a perfect‑information oracle and outperforms all PTO/ETO baselines built on neural networks, gradient‑boosted trees, and random forests. Second, real‑world stock‑market data are used, focusing on trading‑volume series as the source of uncertainty (following Goyenko et al., 2024). A recurrent‑neural‑network PTO model attains the highest prediction R², yet its resulting portfolio incurs higher transaction costs and lower net returns than A‑OVE. This confirms the growing evidence that higher predictive accuracy does not guarantee superior decision quality.
A robustness analysis shows that A‑OVE’s performance degrades gracefully under modest parameter estimation errors, thanks to its Bayesian‑style prior and reliance on sufficient‑statistic‑based likelihood rather than point estimates. The paper thus delivers three key contributions: (1) a novel Fisher‑Neyman decomposition for general VARMA(p,q) processes; (2) the A‑OVE algorithm that extends finite‑sample optimality to autocorrelated settings; and (3) a comprehensive empirical demonstration that A‑OVE outperforms state‑of‑the‑art PTO and ETO methods in a financially relevant portfolio‑allocation task. By bridging time‑series modeling with finite‑sample optimal decision rules, the work opens a new research direction for integrated learning‑and‑optimization under temporal dependence.
Comments & Academic Discussion
Loading comments...
Leave a Comment