Learning to Solve Orienteering Problem with Time Windows and Variable Profits

Learning to Solve Orienteering Problem with Time Windows and Variable Profits
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The orienteering problem with time windows and variable profits (OPTWVP) is common in many real-world applications and involves continuous time variables. Current approaches fail to develop an efficient solver for this orienteering problem variant with discrete and continuous variables. In this paper, we propose a learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST), which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them. In the first stage, a parallel decoding structure is employed to predict the path and the initial service time allocation. The second stage optimizes the service times through a linear programming (LP) formulation and provides a long-horizon learning of structure estimation. We rigorously prove the global optimality of the second-stage solution. Experiments on OPTWVP instances demonstrate that DeCoST outperforms both state-of-the-art constructive solvers and the latest meta-heuristic algorithms in terms of solution quality and computational efficiency, achieving up to 6.6x inference speedup on instances with fewer than 500 nodes. Moreover, the proposed framework is compatible with various constructive solvers and consistently enhances the solution quality for OPTWVP.


💡 Research Summary

The paper tackles the Orienteering Problem with Time Windows and Variable Profits (OPTWVP), a combinatorial optimization task where a vehicle must select a subset of locations to visit, allocate a service time at each visited node, and respect both a global time budget and individual time‑window constraints. Unlike classic orienteering, the profit earned at a node is a linear function of the service time allocated, making routing decisions and continuous service‑time decisions tightly coupled. Existing heuristic, meta‑heuristic, and neural combinatorial optimization (NCO) approaches either rely on hand‑crafted local search, suffer from shortsighted routing predictions, or cannot jointly handle the discrete‑continuous interdependence efficiently.

To address this, the authors propose DeCoST (DEcoupled discrete‑Continuous optimization with Service‑time‑guided Trajectory), a two‑stage learning framework. In Stage 1, a parallel decoder architecture simultaneously predicts a feasible route and an initial normalized service‑time ratio for each visited node. The architecture consists of a transformer‑based routing decoder and a Service‑Time Decoder (STD). Both share the same encoder representation of the problem instance (node coordinates, travel times, profit rates, time‑window bounds, etc.). The routing decoder outputs a permutation of nodes, while the STD outputs δ_i ∈


Comments & Academic Discussion

Loading comments...

Leave a Comment