Approximation Algorithms for Optimal Decision Trees and Adaptive TSP Problems
We consider the problem of constructing optimal decision trees: given a collection of tests which can disambiguate between a set of $m$ possible diseases, each test having a cost, and the a-priori likelihood of the patient having any particular disea…
Authors: Anupam Gupta, Viswanath Nagarajan, R. Ravi
Consider the following two adaptive covering optimization problems:
• Adaptive TSP under stochastic demands (AdapTSP). A traveling salesperson is given a metric space (V, d)
and distinct subsets S 1 , S 2 , . . . , S m ⊆ V such that S i appears with probability p i (and ∑ i p i = 1). She needs to serve requests at a random subset S of locations drawn from this distribution. However, she does not know the identity of the random subset: she can only visit locations, at which time she finds out whether or not that location is part of the subset S. What adaptive strategy should she use to minimize the expected time to serve all requests in the random set S?
• Optimal Decision Trees. Given a set of m diseases, there are n binary tests that can be used to disambiguate between these diseases. If the cost of performing test t ∈ [n] is c t , and we are given the likelihoods {p j } j∈[m] that a typical patient has the disease j, what (adaptive) strategy should the doctor use for the tests to minimize the expected cost to identify the disease?
It can be shown that the optimal decision tree problem is a special case of the adaptive TSP problem: a formal reduction is given in Section 4. In both these problems we want to devise adaptive strategies, which take into account the revealed information in the queries so far (e.g., locations already visited, or tests already done) to determine the future course of action. Such an adaptive solution corresponds naturally to a decision tree, where nodes encode the current "state" of the solution and branches represent observed random outcomes: see Definition 2 for a formal definition. A simpler class of solutions, that have been useful in some other adaptive optimization problems, eg. [DGV08, GM09, BGL + 12], are non-adaptive solutions, which are specified by just an ordered list of actions. However there are instances for both the above problems where the optimal adaptive solution costs much less than the optimal non-adaptive solution. Hence it is essential that we find good adaptive solutions.
The optimal decision tree problem has long been studied, its NP-hardness was shown by Hyafil and Rivest in 1976 [HR77] and many references and applications can be found in [Now11]. There have been a large number of papers providing algorithms for this problem [GG74, Lov85, KPB99, Das04, AH12, CPR + 11, Now11, GB09]. The best results yield approximation ratios of O log 1 p min and O log(m c max c min ) , where p min is the minimum non-zero probability and c max (resp. c min ) is the maximum (resp. minimum) cost. In the special cases when the likelihoods {p j } or the costs {c t } are all polynomially bounded in m, these imply an O(log m)-approximation algorithm. However, there are instances (when probabilities and costs are exponential) demonstrating an Ω(m) approximation guarantee for all previous algorithms. On the hardness side, an Ω(log m) hardness of approximation (assuming P = NP) is known for the optimal decision tree problem [CPR + 11]. While the existence of an O(log m)-approximation algorithm for the general optimal decision tree problem has been posed as an open question, it has not been answered prior to this work.
Optimal decision tree is also a basic problem in average-case active learning [Das04,Now11,GB09]. In this application, there is a set of n data points, each of which is associated with a + orlabel. The labels are initially unknown. A classifier is a partition of the data points into + andlabels. The true classifier h * is the partition corresponding to the actual data labels. The learner knows beforehand, a "hypothesis class" H consisting of m classifiers; it is assumed that the true classifier h * ∈ H. Furthermore, in the average case model, there is a known distribution π of h * over H. The learner wants to determine h * by querying labels at various points. There is a cost c t associated with querying the label of each data point t. An active learning strategy involves adaptively querying labels of data points until h * ∈ H is identified. The goal is to compute a strategy that minimizes the expectation (over π) of the cost of all queried points. This is precisely the optimal decision tree problem, with points being tests and classifiers corresponding to diseases.
Apart from being a natural adaptive routing problem, AdapTSP has many applications in the setting of message ferrying in ad-hoc networks [ZA03, SRJB03, ZAZ04, ZAZ05, HLS10]. We cite two examples below:
• Data collection in sparse sensor networks (see eg. [SRJB03]). A collection of sensors is spread over a large geographic area, and one needs to periodically gather sensor data at a base station. Due to the power and cost overheads of setting up a communication network between the sensors, the data collection is instead performed by a mobile device (the message ferry) that travels in this space from/to the base station. On any given day, there is a known distribution D of the subset S of sensors that contain new information: this might be derived from historical data or domain experts. The routing problem for the ferry then involves computing a tour (originating from the base station) that visits all sensors in S, at the minimum expected cost.
• Disaster management (see eg. [ZAZ04]). Consider a post-disaster situation, in which usual communication networks have broken down. In this case, vehicles can be used in order to visit locations and assess the damage. Given a distribution of the set of affected locations, the goal here is to route a vehicle that visits all affected locations as quickly as possible in expectation.
In both these applications, due to the absence of a direct communication network, the information at any location is obtained only when it is visited: this is precisely the AdapTSP problem.
In this paper, we settle the approximability of the optimal decision tree problem:
Theorem 1 There is an O(log m)-approximation algorithm for the optimal decision tree problem with arbitrary test costs and arbitrary probabilities, where m is the number of diseases. The problem admits the same approximation ratio even when the tests have non-binary outcomes.
In fact, this result arises as a special case of the following theorem:
Theorem 2 There is an O(log 2 n log m)-approximation algorithm for the adaptive Traveling Salesman Problem, where n is the number of vertices and m the number of scenarios in the demand distribution.
To solve the AdapTSP problem, we first solve the "isolation problem", which seeks to identify which of the m scenarios has materialized. Once we know the scenario we can visit its vertices using any constant-factor approximation algorithm for TSP. The high-level idea behind our algorithm for the isolation problem is thissuppose each vertex lies in at most half the scenarios; then if we visit one vertex in each of the m scenarios using a short tour, which is an instance of the group Steiner tree problem1 , we'd notice at least one of these vertices to have a demand; this would reduce the number of possible scenarios by at least 50% and we can recursively run the algorithm on the remaining scenarios. This is an over-simplified view, and there are many details to handle: we need not visit all scenarios-visiting all but one allows us to infer the last one by exclusion; the expectation in the objective function means we need to solve a minimum-sum version of group Steiner tree; not all vertices need lie in less than half the scenarios. Another major issue is that we do not want our performance to depend on the magnitude of the probabilities, as some of them may be exponentially small. Finally, we need to charge our cost directly against the optimal decision tree. All these issues can indeed be resolved to obtain Theorem 2.
The algorithm for the isolation problem involves an interesting combination of ideas from the group Steiner [GKR00, CCGG98] and minimum latency TSP [BCC + 94, CGRT03, FHR07] problems-it uses a greedy approach that is greedy with respect to two different criteria, namely the probability measure and the number of scenarios. This idea is formalized in our algorithm for the partial latency group Steiner (LPGST) problem, which is a key subroutine for Isolation. While this LPGST problem is harder to approximate than the standard group Steiner tree (see Section 2), for which O(log 2 n log m) is the best approximation ratio, we show that it admits a better O(log 2 n), 4 bicriteria approximation algorithm. Moreover, even this bicriteria approximation guarantee for LPGST suffices to obtain an O(log 2 n • log m)-approximation algorithm for Isolation.
We also show that both AdapTSP and the isolation problem are Ω(log 2-ε n) hard to approximate even on tree metrics; our results are essentially best possible on such metrics, and we lose an extra logarithmic factor to go to general metrics, as in the group Steiner tree problem. Moreover, any improvement to the result in Theorem 2 would lead to a similar improvement for the group Steiner tree problem [GKR00, HK03, CP05] which is a long-standing open question.
For the optimal decision tree problem, we show that we can use a variant of minimum-sum set cover [FLT04] which is the special case of LPGST on star-metrics. This avoids an O(log 2 n) loss in the approximation guarantee, and hence gives us an O(log m)-approximation algorithm which is best possible [CPR + 11]. Although this variant of min-sum set cover is Ω(log m)-hard to approximate (it generalizes set cover as shown in Section 2), we again give a constant factor bicriteria approximation algorithm, which leads to the O(log m)-approximation for optimal decision tree. Our result further reinforces the close connection between the min-sum set cover problem and the optimal decision tree problem that was first noticed by [CPR + 11].
Finally, we consider the related adaptive traveling repairman problem (AdapTRP), which has the same input as AdapTSP, but the objective is to minimize the expected sum of arrival times at vertices in the materialized demand set. In this setting, we cannot first isolate the scenario and then visit all its nodes, since a long isolation tour may negatively impact the arrival times. So AdapTRP (unlike AdapTSP) cannot be reduced to the isolation problem. However, we show that our techniques for AdapTSP are robust, and can be used to obtain: Theorem 3 There is an O(log 2 n log m)-approximation algorithm for the adaptive Traveling Repairman Problem, where n is the number of vertices and m the number of scenarios in the demand distribution.
Paper Outline: The results on the isolation problem appear in Section 3. We obtain the improved approximation algorithm for optimal decision tree in Section 4. The algorithm for the adaptive traveling salesman problem is in Section 5; Appendix A contains a nearly matching hardness of approximation result. Finally, Section 6 is on the adaptive traveling repairman problem.
The optimal decision tree problem has been studied earlier by many authors, with algorithms and hardness results being shown by [GG74, HR77, Lov85, KPB99, AH12, Das04, CPR + 11, CPRS09, GB09]. As mentioned above, the algorithms in these papers gave O(log m)-approximation ratios only when the probabilities or costs (or both) are polynomially-bounded. The early papers on optimal decision tree considered tests with only binary outcomes. More recently, [CPR + 11] studied the generalization with K ≥ 2 outcomes per test, and gave an O(log K • log m)-approximation under uniform costs. Subsequently, [CPRS09] improved this bound to O(log m), again under uniform costs. Later, [GB09] gave an algorithm under arbitrary costs and probabilities, achieving an approximation ratio of O log 1 p min or O log(m c max c min ) . This is the previous best approximation guarantee; see also Table 1 in [GB09] for a summary of these results. We note that in terms of the number m of diseases, the previous best approximation guarantee is only Ω(m). On the other hand, there is an Ω(log m) hardness of approximation for the optimal decision tree problem [CPR + 11]. Our O(log m)-approximation algorithm for arbitrary costs and probabilities solves an open problem from these papers. A crucial aspect of this algorithm is that it is non greedy. All previous results were based on variants of a greedy algorithm.
There are many results on adaptive optimization dealing with covering problems. E.g., [GV06] considered the adaptive set-cover problem; they gave an O(log n)-approximation when sets may be chosen multiple times, and an O(n)-approximation when each set may be chosen at most once. The latter approximation ratio was improved in [MSW07] to O(log 2 n log m), and subsequently to the best-possible O(log n)-approximation ratio by [LPRY08], also using a greedy algorithm. In recent work [GK11] generalized adaptive set-cover to a setting termed 'adaptive submodularity', and gave many applications. In all these problems, the adaptivity-gap (ratio between optimal adaptive and non-adaptive solutions) is large, as is the case for the problems considered in this paper, and so the solutions need to be inherently adaptive.
The AdapTSP problem is related to universal TSP [JLN + 05, GHR06] and a priori TSP [Jai88, SS08, ST08] only in spirit-in both the universal and a priori TSP problems, we seek a master tour which is shortcut once the demand set is known, and the goal is to minimize the worst-case or expected length of the shortcut tour. The crucial difference is that the demand subset is revealed in toto in these two problems, leaving no possibility of adaptivity-this is in contrast to the slow revelation of the demand subset that occurs in AdapTSP.
We work with a finite metric (V, d) that is given by a set V of n vertices and distance function d : V ×V → R + . As usual, we assume that the distance function is symmetric and satisfies the triangle inequality. For any integer t ≥ 1, we let [t] := {1, 2, . . . ,t}.
Definition 1 (r-tour) Given a metric (V, d) and vertex r ∈ V , an r-tour is any sequence (r = u 0 , u 1 , • • • , u k = r) of vertices that begins and ends at r. The length of such an r-tour is ∑ k i=1 d(u i , u i-1 ), the total length of all edges in the tour.
Throughout this paper, we deal with demand distributions over vertex-subsets that are specified explicitly. A demand distribution D is specified by m distinct subsets {S i ⊆ V } m i=1 having associated probabilities {p i } m i=1 such that ∑ m i=1 p i = 1. This means that the realized subset D ⊆ V of demand-vertices will always be one of {S i } m i=1 , where D = S i with probability p i (for all i ∈ [m]). We also refer to the subsets {S i } m i=1 as scenarios. The following definition captures adaptive strategies.
Definition 2 (Decision Tree) A decision tree T in metric (V, d) is a rooted binary tree where each non-leaf node of T is labeled with a vertex u ∈ V , and its two children u yes and u no correspond to the subtrees taken if there is demand at u or if there is no demand at u. Thus given any realized demand D ⊆ V , a unique path T D is followed in T from the root down to a leaf.
Depending on the problem under consideration, there are additional constraints on decision tree T and the expected cost of T is also suitably defined. There is a (problem-specific) cost C i associated with each scenario i ∈ [m] that depends on path T S i , and the expected cost of T (under distribution D) is then ∑ m i=1 p i • C i . For example in AdapTSP, cost C i corresponds to the length of path T S i .
Since we deal with explicitly specified demand distributions D, all decision trees we consider will have size polynomial in m (support size of D) and n (number of vertices).
Adaptive Traveling Salesman This problem consists of a metric (V, d) with root r ∈ V and a demand distribution D over subsets of vertices. The information on whether or not there is demand at a vertex v is obtained only when that vertex v is visited. The objective is to find an adaptive strategy that minimizes the expected time to visit all vertices of the realized scenario drawn from D.
We assume that the distribution D is specified explicitly with a support-size of m. This allows us to model demand distributions that are arbitrarily correlated across vertices. We note however that the running time and performance of our algorithm will depend on the support size. The most general setting would be to consider blackbox access to the distribution D: however, as shown in [Nag09], in this setting there is no o(n)-approximation algorithm for AdapTSP that uses a polynomial number of samples from the distribution. One could also consider AdapTSP under independent demand distributions. In this case there is a trivial constant-factor approximation algorithm, that visits all vertices having non-zero probability along an approximately minimum TSP tour; note that any feasible solution must visit all vertices with non-zero probability as otherwise (due to the independence assumption) there would be a positive probability of not satisfying a demand.
Definition 3 (Adaptive TSP) The input is a metric (V, d), root r ∈ V and demand distribution D given by m distinct subsets {S i ⊆ V } m i=1 with probabilities {p i } m i=1 (where ∑ m i=1 p i = 1). The goal in AdapTSP is to compute a decision tree T in metric (V, d) such that:
• the root of T is labeled with the root vertex r, and
• for each scenario i ∈ [m], the path T S i followed on input S i contains all vertices in S i .
The objective function is to minimize the expected tour length ∑ m i=1 p i • d(T S i ), where d(T S i ) is the length of the tour that starts at r, visits the vertices on path T S i in that order, and returns to r.
Isolation Problem This is closely related to AdapTSP. The input is the same as AdapTSP, but the goal is just to identify the unique scenario that has materialized, and not to visit all the vertices in the realized scenario.
Definition 4 (Isolation Problem) Given metric (V, d), root r and demand distribution D, the goal in Isolation is to compute a decision tree T in metric (V, d) such that:
• the root of T is labeled with the root vertex r, and
• for each scenario i ∈ [m], the path T S i followed on input S i ends at a distinct leaf-node of T .
The objective is to minimize the expected tour length IsoTime(T ) := ∑ m i=1 p i • d(T S i ), where d(T S i ) is the length of the r-tour that visits the vertices on path T S i in that order, and returns to r.
The only difference between Isolation and AdapTSP is that the tree path T S i in Isolation need not contain all vertices of S i , and the paths for different scenarios must end at distinct leaf-nodes. In Section 5 we show that any approximation algorithm for Isolation leads to an approximation algorithm for AdapTSP. So we focus on designing algorithms for Isolation.
Optimal Decision Tree This problem involves identifying a random disease from a set of possible diseases using binary tests.
Definition 5 (Optimal Decision Tree) The input is a set of m diseases with probabilities {p i } m i=1 that sum to one, and a collection {T j ⊆ [m]} n j=1 of n binary tests with costs {c j } n j=1 . There is exactly one realized disease: each disease i ∈ [m] occurs with probability p i . Each test j ∈ [n] returns a positive outcome for subset T j of diseases and returns a negative outcome for the rest [m] \ T j . The goal in ODT is to compute a decision tree Q where each internal node is labeled by a test and has two children corresponding to positive/negative test outcomes, such that for each i ∈ [m] the path Q i followed under disease i ends at a distinct leaf node of Q. The objective is to minimize the expected cost
Notice that the optimal decision tree problem is exactly Isolation on a weighted star metric. Indeed, given an instance of ODT, consider a metric (V, d) induced by a weighted star with center r and n leaves corresponding to the tests. For each j ∈ [n], we set d(r, j) = c j 2 . The demand scenarios are as follows:
It is easy to see that this Isolation instance corresponds exactly to the optimal decision tree instance. See Section 4 for an example. So any algorithm for Isolation on star-metrics can be used to solve ODT as well.
Useful Deterministic Problems Recall that the group Steiner tree problem [GKR00, HK03] consists of a metric (V, d), root r ∈ V and g groups of vertices {X i ⊆ V } g i=1 , and the goal is to find an r-tour of minimum length that contains at least one vertex from each group {X i } g i=1 . Our algorithms for the above stochastic problems rely on solving some variants of group Steiner tree.
Definition 6 (Group Steiner Orienteering) The input is a metric (V, d), root r ∈ V , g groups of vertices {X i ⊆ V } g i=1 with associated profits {φ i } g i=1 and a length bound B. The goal in GSO is to compute an r-tour of length at most B that maximizes the total profit of covered groups. A group i ∈ [g] is covered if any vertex from X i is visited by the tour.
An algorithm for GSO is said to be a (β , γ)-bicriteria approximation algorithm if on any instance of the problem, it finds an r-tour of length at most γ • B that has profit at least 1 β times the optimal (which has length at most B).
Definition 7 (Partial Latency Group Steiner) The input is a metric (V, d), g groups of vertices {X i ⊆ V } g i=1 with associated weights {w i } g i=1 , root r ∈ V and a target h ≤ g. The goal in LPGST is to compute an r-tour τ that covers at least h groups and minimizes the weighted sum of arrival times over all groups. The arrival time of group i ∈ [g] is the length of the shortest prefix of tour τ that contains an X i -vertex; if the group is not covered, its arrival time is set to be the entire tour-length. The LPGST objective is termed latency, i.e.
(2.1)
An algorithm for LPGST is said to be a (ρ, σ )-bicriteria approximation algorithm if on any instance of the problem, it finds an r-tour that covers at least h/σ groups and has latency at most ρ times the optimal (which covers at least h groups). The reason we focus on a bicriteria approximation for LPGST is that it is harder to approximate than the group Steiner tree problem (see below) and we can obtain a better bicriteria guarantee for LPGST.
To see that LPGST is at least as hard to approximate as the group Steiner tree problem, consider an arbitrary instance of group Steiner tree with metric (V, d), root r ∈ V and g groups {X i ⊆ V } g i=1 . Construct an instance of LPGST as follows. The vertices are
There are g = g + 1 groups with X i = X i for i ∈ [g] and X g+1 = {u}. The target h = g. The weights are w i = 0 for i ∈ [g] and w g+1 = 1. Since the distance from r to u is very large, no approximately optimal LPGST solution will visit u. So any such LPGST solution covers all the groups {X i } g i=1 and has latency equal to the length of the solution (as group X g+1 has weight one and all others have weight zero). This reduction also shows that LPGST on weighted star-metrics (which is used in the ODT algorithm) is at least as hard to approximate as set cover: this is because when metric (V, d) is a star-metric with center r, so is the new metric (V , d ).2
Recall that an instance of Isolation is specified by a metric (V, d), a root vertex r ∈ V , and m scenarios {S i } m i=1 with associated probability values {p i } m i=1 . The main result of this section is:
Theorem 4 If there is a (4, γ)-bicriteria approximation algorithm for group Steiner orienteering then there is an O(γ • log m)-approximation algorithm for the isolation problem.
We prove this in two steps. First, in Subsection 3.1 we show that a (ρ, 4)-bicriteria approximation algorithm for LPGST can be used to obtain an O(ρ • log m)-approximation algorithm for Isolation. Then, in Subsection 3.2 we show that any (4, γ)-bicriteria approximation algorithm for GSO leads to an (O(γ), 4)-bicriteria approximation algorithm for LPGST.
Note on reading this section: While the results of this section apply to the isolation problem on general metrics, readers interested in just the optimal decision tree problem need to only consider weighted star metrics (as discussed after Definition 5). In the ODT case, we have the following simplifications (1) a tour is simply a sequence of tests, (2) the tour length is the sum of test costs in the sequence, and (3) concatenating tours corresponds to concatenating test sequences.
Recall the definition of Isolation and LPGST from Section 2. Here we will prove:
Theorem 5 If there is a (ρ, 4)-bicriteria approximation algorithm for LPGST then there is an O(ρ • log m)approximation algorithm for Isolation.
We first give a high-level description of our algorithm. The algorithm uses an iterative approach and maintains a candidate set of scenarios that contains the realized scenario. In each iteration, the algorithm eliminates a constant fraction of scenarios from the candidate set. So the number of iterations will be bounded by O(log m).
In each iteration we solve a suitable instance of LPGST in order to refine the candidate set of scenarios.
Single iteration of Isolation algorithm As mentioned above, we use LPGST in each iteration of the Isolation algorithm-we now describe how this is done. At the start of each iteration, our algorithm maintains a candidate set M ⊆ [m] of scenarios that contains the realized scenario. The probabilities associated with the scenarios i ∈ M are not the original p i s but their conditional probabilities q i := p i ∑ j∈M p j . The algorithm Partition (given as Algorithm 1) uses LPGST to compute an r-tour τ such that after observing the demands on τ, the number of scenarios consistent with these observations is guaranteed to be a constant factor smaller than |M|.
To get some intuition for this algorithm, consider the simplistic case when there is a vertex u ∈ V located near the root r such that ≈ 50% of the scenarios in M contain it. Then just visiting vertex u would reduce the number candidate scenarios by ≈ 50%, irrespective of the observation at u, giving us the desired notion of progress. However, each vertex may give a very unbalanced partition of M: so we may have to visit multiple vertices before ensuring that the number of candidate scenarios reduces by a constant factor. Moreover, some vertices may be too expensive to visit from r: so we need to carefully take the metric into account in choosing the set of vertices to visit. Addressing these issues is precisely where the LPGST problem comes in.
3: run the (ρ, 4)-bicriteria approximation algorithm for LPGST on the instance with metric (V, d), root r, groups {X i } i∈M with weights {q i } i∈M , and target h := g -1.
, r be the r-tour returned.
4: let {P k } t k=1 be the partition of M where P k :=
return tour τ and the partition {P k } t k=1 .
Note that the information at any vertex v corresponds to a bi-partition (F v , M \ F v ) of the scenario set M, with scenarios F v having demand at v and scenarios M \ F v having no demand at v. So either the presence of demand or the absence of demand reduces the number of candidate scenarios by half (and represents progress). To better handle this asymmetry, Step 1 associates vertex v with subset D v which is the smaller of {F v , M \ F v }; this corresponds to the set of scenarios under which just the observation at v suffices to reduce the number of candidate scenarios below |M|/2 (and represents progress). In Steps 2 and 3, we view vertex v as covering the scenarios D v .
The overall algorithm for Isolation Here we describe how the different iterations are combined to solve Isolation. The final algorithm IsoAlg (given as Algorithm 2) is described in a recursive manner where each "iteration" is a new call to IsoAlg. As mentioned earlier, at the start of each iteration, the algorithm maintains a
) be the r-tour and {P k } t k=1 be the partition of M returned. 3: let q k := ∑ i∈P k q i for all k = 1 . . .t. 4: traverse tour τ and return directly to r after visiting the first (if any) vertex v k * (for k * ∈ [t -1]) that determines that the realized scenario is in P k * ⊆ M. If there is no such vertex until the end of the tour τ, then set k * ← t. 5: run IsoAlg P k * , { q i q k * } i∈P k * to isolate the realized scenario within the subset P k * .
Note that the adaptive Algorithm IsoAlg implicitly defines a decision tree too: indeed, we create a path
, and hang the subtrees created in the recursive call on each instance P k , { q i q k } from the respective node v k . See also Figure 3.1.
Analysis The rest of this subsection analyzes IsoAlg and proves Theorem 5. We first provide an outline of the proof. It is easy to show that IsoAlg correctly identifies the realized scenario after O(log m) iterations: this is shown formally in Claim 10. We relate the objective values of the LPGST and Isolation instances in two steps: Claim 6 shows that LPGST has a smaller optimal value than Isolation, and Claim 8 shows that any approximate LPGST solution can be used to construct a partial Isolation solution incurring the same cost (in expectation). Since different iterations of IsoAlg deal with different sub-instances of Isolation, we need to relate the optimal cost of these sub-instances to that of the original instance: this is done in Claim 9.
Recall that the original instance of Isolation is defined on metric (V, d), root r and set {S i } m i=1 of scenarios with probabilities {p i } m i=1 . IsoAlg works with many sub-instances of the isolation problem. Such an instance J is specified by a subset M ⊆ [m] which implicitly defines (conditional) probabilities q i = p i ∑ j∈M p j for all i ∈ M. In other words, J involves identifying the realized scenario conditioned on it being in set M (the metric and root remain the same as the original instance). Let IsoTime * (J ) denote the optimal value of any instance J .
Claim 6 For any instance J = M, {q i } i∈M , the optimal value of the LPGST instance considered in Step 3 of algorithm Partition(J ) is at most IsoTime * (J ).
Proof: Let T be an optimal decision tree corresponding to Isolation instance J , and hence IsoTime * (J ) = IsoTime(T ). Note that by definition of the sets {F v } v∈V , any internal node in T labeled vertex v has its two children v yes and v no corresponding to the realized scenario being in F v and M \F v (respectively); and by definition of {D v } v∈V , nodes v yes and v no correspond to the realized scenario being in D v and M \ D v (now not necessarily in that order).
We now define an r-tour σ based on a specific root-leaf path in T . Consider the root-leaf path that at any node labeled v, moves to the child v yes or v no that corresponds to M \ D v , until it reaches a leaf-node . Let r, u 1 , u 2 , • • • , u j denote the sequence of vertices in this root-leaf path, and define r-tour σ = r, u 1 , u 2 , • • • , u j , r . Since T is a feasible decision tree for the isolation instance, there is at most one scenario a ∈ M such that the path T S a traced in T under demands S a ends at leaf-node . In other words, every scenario b ∈ M \ {a} gives rise to a root-leaf path T S b that diverges from the root-path. By our definition of the root-path, the scenarios that diverge from it are precisely ∪ j k=1 D u k , and so ∪ j k=1 D u k = M \ {a}. Next, we show that σ is a feasible solution to the LPGST instance in Step 3. By definition of the groups {X i } i∈M (Step 2 of Algorithm 1), it follows that tour σ covers groups ∪ j k=1 D u k . So the number of groups covered is at least |M| -1 = h, and σ is a feasible LPGST solution.
Finally, we bound the LPGST objective value of σ in terms of the isolation cost IsoTime(T ). To reduce notation let u 0 = r below. The arrival times in tour σ are:
Moreover, for scenario a which is the only scenario not in ∪ j k=1 D u k , we have d(T S a ) = length(σ ) = arrival time σ (X i ). Now by (2.1), latency(σ
If we use a (ρ, 4)-bicriteria approximation algorithm for LPGST, we get the following claim:
Claim 7 For any instance J = M, {q i } i∈M , the latency of tour τ returned by Algorithm Partition is at most ρ • IsoTime * (J ). Furthermore, the resulting partition
Proof: By Claim 6, the optimal value of the LPGST instance in Step 3 of algorithm Partition is at most IsoTime * (J ); now the (ρ, 4)-bicriteria approximation guarantee implies that the latency of the solution tour τ is at most ρ times that. This proves the first part of the claim.
Consider Of course, we don't really care about the latency of the tour per se, we care about the expected cost incurred in isolating the realized scenario. But the two are related (by their very construction), as the following claim formalizes:
Claim 8 At the end of Step 4 of IsoAlg M, {q i } i∈M , the realized scenario lies in P k * . The expected distance traversed in this step is at most 2ρ • IsoTime * ( M, {q i } i∈M ).
Proof: Consider the tour τ :
r returned by the Partition algorithm. Recall that visiting any vertex v reveals whether the scenario lies in D v , or in M \ D v . In step 4 of algorithm IsoAlg, we traverse τ and one of the following happens:
Tour returns directly to r from the first vertex v k (for 1 ≤ k ≤ t -1) such that the realized scenario lies in D v k ; here k = k * . Since the scenario did not lie in any earlier D v j for j < k, the definition of
gives us that the realized scenario is indeed in P k .
• k * = t. Tour τ is completely traversed and we return to r. In this case, the realized scenario does not lie in any of
, and it is inferred to be in the complement set M \ (∪ j αB then:
(i) partition tour σ into at most 2 • d(σ ) αB paths, each of length at most αB; (ii) let σ denote the path containing maximum profit;
(iii) let r, σ , r be the r-tour obtained by connecting both end-vertices of path σ to r.
(iv) set τ ← τ ∪ r, σ , r .
set τ ← τ . Mark all groups visited in τ as covered. 8: end while 9: output the r-tour τ.
Analysis Let Opt denote the optimal profit of the given GSO instance. In the following, let α := O(log 2 n) which comes from Theorem 19. We prove that Algorithm 5 achieves a (4, 2α + 1) bicriteria approximation guarantee, i.e. solution τ has profit at least Opt/4 and length (2α + 1) • B.
By the description of the algorithm, we iterate as long as the total length of edges in τ is at most αB. Note that the increase in length of τ in any iteration is at most (α + 1) • B since every vertex is at distance at most B/2 from r. So the final length d(τ) ≤ (2α + 1) • B. This proves the bound on the length.
It now suffices to show that the final subgraph τ gets profit at least Opt 4 . At any iteration, let φ (τ) denote the profit of the current solution τ, and d(τ) its length. Since d(τ) > αB upon termination, it suffices to show the following invariant over the iterations of the algorithm:
Example 2 Consider another instance of AdapTRP on a star-metric with center r and leaves {v i } n i=1 ∪ {u i } n i=1 . For each i ∈ [n], edge (r, v i ) has unit length and edge (r, u i ) has length n. There are n scenarios: for each i ∈ [n], scenario S i = {v i , u i } occurs with 1 n probability. The optimal values for both AdapTRP and Isolation are Θ(n). Moreover, any reasonable AdapTRP solution will involve first isolating the realized scenario (by visiting vertices v i s).
Hence, the algorithm needs to interleave the two goals of isolating scenarios and visiting high-probability vertices. This will become clear in the construction of the latency group Steiner instances used by our algorithm (Step 3 in Algorithm 6).
Algorithm Outline Although we can not reduce AdapTRP to Isolation, we are still able to use ideas from the Isolation algorithm. The AdapTRP algorithm also follows an iterative approach and maintains a candidate set M ⊆ [m] containing the realized scenario. We also associate conditional probabilities q i := p i ∑ j∈M p j for each scenario i ∈ M. In each iteration, the algorithm eliminates a constant fraction of scenarios from M: so the number of iterations will be O(log m). Each iteration involves solving an instance of the latency group Steiner (LGST) problem: recall Definition 8 and the O(log 2 n)-approximation algorithm for LGST (Corollary 15). The construction of this LGST instance is the main point of difference from the Isolation algorithm. Moreover, we will show that the expected latency incurred in each iteration is O(log 2 n) • Opt. Adding up the latency over all iterations, would yield an O(log 2 n log m)-approximation algorithm for AdapTRP.
LGST to partition scenarios M In each iteration, the algorithm formulates an LGST instance and computes an r-tour τ using Corollary 15. The details are in Algorithm 6 below. An important property of this tour τ is that the number of candidate scenarios after observing demands on τ will be at most |M|/2 (see Claim 22).
Given a candidate set M of scenarios, it will be convenient to partition the vertices into two parts: H consists of vertices which occur in more than half the scenarios, and L := V \ H consists of vertices occurring in at most half the scenarios. In the LGST instance (Step 3 below), we introduce |S i ∩ H| + 1 groups (with suitable weights) corresponding to each scenario i ∈ M.
LGST (Definition 8) on metric (V, d), root r and the following groups:
for each scenario i ∈ M, -the main group X i of scenario i has weight |S i ∩ L|p i and vertices (L ∩ S i ) ∪ (H \ S i ).
-for each v ∈ S i ∩ H, group Y v i has weight p i and vertices {v} ∪ (L ∩ S i ) ∪ (H \ S i ).
4: run the LGST algorithm (from Corollary 15) on instance G .
, r be the r-tour returned.
5: let {P k } t k=1 be the partition of M where P k :=
) must be covered by τ, since Y v i s have weight p i > 0. Also since X i is not covered by V (τ), we must have v ∈ V (τ) for all v ∈ S i . Thus we have S i ⊆ H ∩V (τ), and combined with the earlier observation, H ∩V (τ) = S i . This determines i ∈ M uniquely, and so
Final AdapTRP algorithm and analysis Given the above partitioning scheme, Algorithm 7 describes the overall AdapTRP algorithm in a recursive manner.
Algorithm 7 AdapTRP M, {q i } i∈M , {S i } i∈M 1: If |M| = 1, visit the vertices in this scenario using the O(1)-approximation algorithm [FHR07] for deterministic traveling repairman, and quit.
r) be the r-tour and {P k } t k=1 be the partition of M returned. 3: let q j := ∑ i∈P k q i for all j = 1 . . .t. 4: traverse tour τ and return directly to r after visiting the first vertex v k * (for k * ∈ [t]) that determines that the realized scenario is in P k * ⊆ M. 5: update the scenarios in P k * by removing vertices visited in τ until v k * , i.e. S i ← S i \ {v 1 , . . . , v k * }, for all i ∈ P k * .
6: run AdapTRP P k * , { q i q k * } i∈P k * , {S i } i∈P k * to recursively cover the realized scenario within P k * .
The analysis for this algorithm is similar to that for the isolation problem (Section 3.1) and we follow the same outline. For any sub-instance J of AdapTRP, let Opt(J ) denote its optimal value. Just as in the isolation case (Claim 9), it can be easily seen that the latency objective function is also sub-additive.
Claim 23 For any sub-instance M, {q i } i∈M , {S i } i∈M and any partition {P k } t k=1 of M,
where q k = ∑ i∈P k q i for all 1 ≤ k ≤ t.
The next property we show is that the optimal cost of the LGST instance G considered in Steps (3)-(4) of Algorithm 6 is not too high.
Lemma 24 For any instance J = M, {q i } i∈M , {S i } i∈M of AdapTRP, the optimal value of the latency group Steiner instance G in Step 4 of Algorithm PartnLat(J ) is at most Opt(J ).
Proof: Let T be an optimal decision tree for the given AdapTRP instance J . Note that any internal node of T , labeled v, has two children corresponding to the realized scenario being in F v (yes child) or M \ F v (no child). Now consider the root-leaf path in T (and corresponding tour σ in the metric) which starts at r, and at any internal node v, moves on to the no child if v ∈ L, and moves to the yes child if v ∈ H. We claim that this tour is a feasible solution to G , the latency group Steiner instance G .
To see why, first consider any scenario i ∈ M that branched off from path σ in decision-tree T ; let v be the vertex where the tree path of scenario i branched off from σ . If v ∈ L then by the way we defined σ , it follows the "no" child of v, and so v ∈ S i ∩ L. On the other hand, if v ∈ H, then it must be that v ∈ H \ S i (again from the way σ was defined). In either case, v ∈ (S i ∩ L) ∪ (H \ S i ), and hence visiting v covers all groups, associated with scenario i, i.e. X i and {Y v i | v ∈ S i ∩ H}. Thus σ covers all groups of all the scenarios that branched off it in T . Note that there is exactly one scenario (say a ∈ M) that does not branch off σ ; scenario a traverses σ in T . Since T is a feasible solution for AdapTRP, σ must visit every vertex in S a . Therefore σ covers all the groups associated with scenario a: clearly {Y v a | v ∈ S a ∩ H} are covered; X a is also covered unless S a ∩ L = / 0 (however in that case group X a has zero weight and does not need to be covered-see Definition 8). Thus σ is a feasible solution to G .
We now bound the latency cost of tour σ for instance G . In path σ , let α i (for each i ∈ M) denote the coverage time for group X i , and β v i (for i ∈ M and v ∈ S i ∩ H) the coverage time for group Y v i . The next claim shows that the latency of σ for instance G is at most Opt(J ).
Claim 25 The expected cost of T , Opt(J
i , which is exactly the latency of tour σ for the latency group Steiner instance G .
Proof: Fix any i ∈ M; let σ i denote the shortest prefix of σ containing a vertex from X i . Note that by definition, σ i has length α i . We will lower bound separately the contributions of S i ∩ L and S i ∩ H to the cost of T .
As all but the last vertex in σ i are from (L \ S i ) ∪ (H ∩ S i ), by definition of σ , the path T S i traced in the decision-tree T when scenario i is realized, agrees with this prefix σ i . Moreover, no vertex of S i ∩ L is visited before the end of σ i . So under scenario S i , the total arrival time for vertices L ∩ S i is at least |L ∩ S i | • α i . Hence S i ∩ L contributes at least p i • |L ∩ S i | • α i towards Opt(J ). Now consider some vertex v ∈ S i ∩ H; let σ v i denote the shortest prefix of σ containing a Y v i -vertex. Note that σ v i has length β v i , and it is a prefix of σ i since Y v i ⊇ X i . As observed earlier, the path traced in decision tree T under scenario i contains σ i : so vertex v is visited (under scenario i) only after tracing path σ v i . So the contribution of v (under scenario i) to Opt(J ) is at least p i • β v i , i.e. the contribution of S i ∩ H is at least
Thus we have demonstrated a feasible solution to G of latency at most Opt(J ).
It remains to bound the expected additional latency incurred in Step 4 of Algorithm 7 when a random scenario is realized. Below we assume a ρ = O(log 2 n) approximation algorithm for latency group Steiner tree (from Corollary 15).
Lemma 26 At the end of Step 4 of AdapTRP M, {q i } i∈M , {S i } i∈M , the realized scenario lies in P k * . The expected increase in latency due to this step is at most 2 ρ • Opt( M, {q i } i∈M , {S i } i∈M ).
Proof: The proof that the realized scenario always lies in the P k * determined in Step 4 is identical to that in Claim 8 of the Isolation algorithm, and is omitted. We now bound the expected latency incurred. In the solution τ to the latency group Steiner instance G , define α i as the coverage time for group X i , ∀i ∈ M; and β v i as the coverage time for group Y v i , ∀i ∈ M and v ∈ S i ∩ H. Let i denote the realized scenario. Suppose that k * = ≤ t -1 in Step 4. Then by definition of the parts P k s, we have v ∈ X i = (S i ∩ L) ∪ (H \ S i ) and X i {v 1 , . . . , v -1 } = / 0. So the length along τ until v equals α i . Moreover the total length spent in this step is at most 2 • α i , to travel till v and then return to r (this uses the symmetry and triangle-inequality properties of the metric). So the latency of any S i -vertex increases by at most this amount. Furthermore we claim that the latency of any v ∈ S i ∩ H increases by at most 2 • β v i : this is clearly true if β v i = α i ; on the other hand if β v i < α i then v is visited before v and so it only incurs latency β v i . So the increase in latency of S i is at most 2
If k * = t then by the proof of Claim 22 the realized scenario i satisfies: S i ⊆ H, group X i is not visited by τ (so α i is undefined), and all of S i is visited by τ. In this case the total latency of S i is ∑ v∈S i ∩H β v i which is clearly at most 2 ∑ v∈S i ∩H β v i + 2 • |S i ∩ L| α i ; note that |S i ∩ L| = 0 here. Thus the expected latency incurred in Step 4 is at most 2 ∑ i∈M p i • |S i ∩ L| α i + ∑ v∈S i ∩H β v i which is twice the latency of τ for the latency group Steiner instance G . Finally, since τ is a ρ-approximate solution to G and using Lemma 24, we obtain the claim.
Finally, combining Claim 22, Lemma 26 and Claim 23, by a proof identical to that of Theorem 4, it follows that the final AdapTRP solution has cost O(log 2 n log m) • Opt. This completes the proof of Theorem 3.
We note that for the AdapTRP problem on metrics induced by a tree, our algorithm achieves an O(log n log m) approximation ratio (the guarantees in Theorem 18 and Corollary 15 improve by a logarithmic factor on tree metrics). There is also an Ω(log 1-ε n)-hardness of approximation the AdapTRP problem on tree metrics [Nag09]. So there is still a logarithmic gap between the best upper and lower bounds for the AdapTRP problem on tree metrics. In going from tree metrics to general, we lose another logarithmic factor in the approximation ratio.
In this paper, we studied the problem of constructing optimal decision trees; this widely studied problem was previously known to admit logarithmic approximation algorithms for the case of uniform costs or uniform probabilities. The greedy algorithms used in these cases do not extend to the case of non-uniform costs and probabilities, and we gave a new algorithm that seeks to be greedy with respect to two different criteria; our O(log m)-approximation is asymptotically optimal. We then considered a generalization to the adaptive traveling salesman problem, and obtained an O(log 2 n log m)-approximation algorithm for this adaptive TSP problem. We also showed that any asymptotic improvement on this result would imply an improved approximation algorithm for the group Steiner tree problem, which is a long-standing open problem. Finally, we gave an O(log 2 n log m)approximation algorithm for the adaptive traveling repairman problem-closing the gap between the known upper and lower bounds in this case remains an interesting open problem.
violates the requirement that the tour (namely σ ) under scenario S i must visit all vertices S i ⊇ X i . Finally, we have Opt ≥ (1 -1 L ) • d(σ ) ≥ 1 -1 L Opt as required. (B) Opt ≤ Opt + 1. Let τ denote an optimal r-tour for the given GST instance, so d(τ) = Opt. Consider the following solution for AdapTSP:
1. Traverse r-tour τ to determine whether or not X g+1 is the realized scenario.
2. If no demands observed on τ (i.e. scenario S g+1 is realized), visit vertex s and stop.
3. If some demand observed on τ (i.e. one of scenarios {S i } g i=1 is realized), then visit all vertices in V along an arbitrary r-tour and stop.
It is clear that this decision tree is feasible for the AdapTSP instance. For any i ∈ [g + 1], let π i denote the r-tour traversed under scenario S i in the above AdapTSP decision tree. We have d(π g+1 ) = d(τ) ≤ Opt, and d(π i ) ≤ 2n • max u,v d(u, v) ≤ L for all i ∈ [g]. Thus the resulting AdapTSP objective is at most:
Thus we have the desired reduction.
k | ≥ h 4 = |M|-1 4 ≥ |M| 8 (when |M| ≥ 2). By definition of the sets D v , it holds that
k | ≥ h 4 = |M|-1 4 ≥ |M| 8 (when |M| ≥ 2)
Claim 22 When |M| ≥ 2, partition {P k } t k=1 returned by PartnLat satisfies |P k | ≤ |M|/2, ∀k ∈ [t].Proof:
Claim 22 When |M| ≥ 2, partition {P k } t k=1 returned by PartnLat satisfies |P k | ≤ |M|/2, ∀k ∈ [t].
In the group Steiner tree problem[GKR00] the input is a metric (V, d) with root r ∈ V and groups {X i ⊆ V } of vertices; the goal is to compute a minimum length tour originating from r that visits at least one vertex of each group.
Recall that group Steiner tree on star-metrics is equivalent to the set cover problem.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment