Beware of the Classical Benchmark Instances for the Traveling Salesman Problem with Time Windows
We propose a simple and exact method for the Traveling Salesman Problem with Time Windows and Makespan objective (\TSPTW-M) that solves all instances of the classical benchmark with $50$ or more customers in less than ten seconds each. Applying this algorithm as an off-the-shelf method, we also solve all but one of these instances for the Duration objective. Our main conclusion is that these instances alone are no longer representative for evaluating the TSPTW-M and its Duration variant: their structure can be exploited to yield results that seem outstanding at first glance. Additionally, caution is advised when designing hard training sets for machine learning algorithms.
💡 Research Summary
The paper investigates the Traveling Salesman Problem with Time Windows (TSPTW) under two objectives: minimizing the makespan (TSPTW‑M) and minimizing the total duration (TSPTW‑D). The authors introduce a remarkably simple yet exact algorithm that solves every classical benchmark instance containing 50 or more customers in under ten seconds for the makespan objective, and solves all but one of those instances for the duration objective when used as an off‑the‑shelf preprocessing step.
The core of the method is a backward best‑first search. Starting from a partial route that contains only the end depot, the algorithm expands the search tree by prepending feasible customers to the current partial route. For each partial route R the algorithm maintains two values: δ_m(R), the earliest possible arrival time when departing at the later of the given start time and the earliest time window of the first vertex, and δ⁻¹_m(R), the latest departure time that still guarantees arrival at the end depot before a given upper bound ub. By traversing the tree in reverse (from the end towards the start) and always expanding the node with the smallest δ_m, the search naturally prioritizes routes that can finish early, effectively minimizing makespan. The upper bound ub is updated whenever a feasible route with a smaller makespan is found, allowing aggressive pruning.
The authors argue that the classical benchmark sets—compiled by Lóp ez‑Ibáñez and Blum (2023) and later extended by da Silva & Urrutia (2010) and Rifki et al. (2020) (as modified by Fontaine 2024)—have structural properties that make them vulnerable to this approach. In most of these instances the time windows are relatively wide, and a second‑nearest‑neighbor tour is already feasible and close to optimal. Consequently, the backward search explores only a tiny fraction of the exponential solution space, often finding the optimal route on the first few expansions.
Empirical evaluation covers 1 337 instances across ten benchmark groups: the seven classic groups (Asc, Dum, Gen, Lan, Ohl, Pes, Pot), the DaS set (125 instances), and the Rif set (720 instances). For TSPTW‑M, the algorithm solves all 467 instances with ≥50 customers in an average of 3 seconds, never exceeding 9.8 seconds. For TSPTW‑D, the same algorithm is used to obtain a makespan‑optimal solution first; then the duration of that route is computed, yielding optimal solutions for all but one instance, each in under 30 minutes (average ≈12 minutes). This outperforms prior state‑of‑the‑art exact methods, which required up to several hours on the same data.
Limitations are acknowledged. The method fails on several smaller instances (fewer than 50 customers) and on the hardest Rif instances where the time‑window tightness parameter β is set to 0, producing extremely loose windows that dramatically increase the search space. Moreover, for the Ohl‑D set (duration‑focused instances with travel times rounded to one decimal) a few cases exceed the 30‑minute threshold, indicating that the duration objective is intrinsically more challenging for this simple approach.
Beyond algorithmic performance, the paper raises a broader methodological concern: many recent machine‑learning‑based TSPTW solvers generate training data by replicating the classical benchmark generation procedures. Because those benchmarks are now trivially solvable by a basic exact method, any reported superiority of new heuristics or learning models may be inflated. The authors therefore recommend using the time‑window tightness parameter β (as proposed by Arigliano et al., 2019) to systematically vary difficulty, even for small‑scale instances, and to design new benchmark suites that better reflect real‑world constraints.
In conclusion, the study demonstrates that a straightforward backward best‑first search can dominate the performance landscape on widely used TSPTW benchmarks, calling into question the continued reliance on those datasets for algorithmic evaluation. Researchers are urged to reassess benchmark selection, incorporate harder, more realistic instances, and consider simple exact methods as baseline filters before applying sophisticated metaheuristics or learning‑based solvers. This work thus contributes both a practical tool for quickly solving many existing TSPTW instances and a critical perspective on future benchmark design and experimental methodology.
Comments & Academic Discussion
Loading comments...
Leave a Comment