Relay Selection with Channel Probing in Sleep-Wake Cycling Wireless Sensor Networks

1 Relay Selection with Channel Probing in Sleep-W ake Cycling W ireless Sensor Networks K. P . Nav een, Student Member , IEEE, and Anurag Kumar , F ellow , IEEE Abstract —In geographical f orwarding of packets in a large wireless sensor network (WSN) with sleep-wake cycling nodes, we are inter ested in the local decision problem faced by a node that has “custody” of a packet and has to choose one among a set of next-hop r elay nodes to forward the packet towards the sink. Each relay is associated with a “reward” that summarizes the beneﬁt of forwarding the packet through that relay . W e seek a solution to this local problem, the idea being that such a solution, if adopted by every node, could provide a reasonable heuristic for the end-to-end forwarding problem. T o wards this end, we pr opose a relay selection problem comprising a forwarding node and a collection of relay nodes, with the r elays waking up sequentially at random times. At each relay wak e-up instant the forwarder can choose to probe a relay to learn its reward value, based on which the forwarder can then decide whether to stop (and f orward its pack et to the chosen relay) or to continue to wait for further relays to wake-up. The forwarder’ s objective is to select a relay so as to minimize a combination of waiting-delay , reward and probing cost. Our problem can be considered as a variant of the asset selling problem studied in the operations research literature. W e formulate our relay selection problem as a Markov decision process (MDP) and obtain some interesting structural results on the optimal policy (namely , the threshold and the stage-independence properties). W e also conduct simulation experiments and gain valuable insights into the performance of our local forwarding-solution. Index T erms —Wir eless sensor networks, sleep-wake cycling, channel probing, geographical forwarding, asset selling problem. I . I N T RO D U C T I O N Consider a wireless sensor network deployed for the detec- tion of rar e events , e.g., forest ﬁres, intrusion in border areas, etc. T o conserve energy , the nodes in the network sleep-wake cycle whereby they alternate between an ON state and a low power OFF state. W e are further interested in asynchr onous sleep-wake c ycling where the point processes of wake-up instants of the nodes are not synchronized [1], [2]. In such networks, whenever an e vent is detected, an alarm packet (containing the event location and a time stamp) is generated and has to be forwarded, through multiple hops (as illustrated in Fig. 1), to a control center ( sink ) where appropriate action could be tak en. Since the network is sleep- wake cycling, a forwarding node (i.e., a node holding an alarm packet) has to wait for its neighbors to wake-up before it can choose one for the next hop. Thus, due to the sleep-wake Both the authors are with the Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560012, India. Email: { nav eenkp, anurag } @ece.iisc.ernet.in This work was supported in part by the Indo-French Centre for the Promotion of Advanced Research (IFCP AR Project 4000-IT -1), and in part by the Department of Science and T echnology (DST) via a J.C. Bose Fellowship. Sleeping Awake Forwarder Sink Packet Path Fig. 1. Illustration of a packet being forwarded to the sink node (green hexagon) through a sleep-wake cycling network. The square node (labeled as forwarder) is the current custodian of the packet. process, there is a delay incurred at each hop en-route to the sink, and our interest is in minimizing the total av erage end- to-end delay subject to a constraint on some global metric of interest such as the average hop count, or the average total transmission power (sum of the transmission po wer used at each hop). Such a global problem can be considered as a stochastic shortest path problem [3], for which the distributed Bellman-Ford algorithm (e.g., the LOCAL-OPT algorithm proposed by Kim et al. in [1]) can be used to obtain the optimal solution. Ho wev er , a major drawback with such an approach is that a pre-conﬁguration phase is required to run such algorithms, which would inv olve exchange of sev eral control messages. Furthermore, such global conﬁguration would need to be performed each time there is a change in the network topology , such as due to node failures, or variations in the propagation characteristics, etc. The focus of our research is instead towards designing simple forwar ding rules that use only the local information av ailable at a forwarding node. In our o wn earlier work in this direction [2], [4], we formulated the local forw arding problem as one of minimizing the one-hop forwarding delay subject to a constraint on the reward offered by the chosen relay . The rew ard associated with a relay is a function of the transmission power and the progress to wards the sink made by the packet when forwarded via that relay . W e considered two variations of the problem, one in which the number of potential relays is known [4], and the other in which only a probability mass function of the number of potential relays is known [2]. In each case, we deri ved the structure of the optimal policy . Further , through simulation e xperiments we found that, in some region 2 of operation, the end-to-end performance (i.e., total delay and total transmission power) obtained by applying the solution to the local problem at each hop is comparable with that obtained by the global solution (i.e., the LOCAL-OPT proposed by Kim et al. [1]), thus providing additional support for the approach of utilizing local forwarding rules, albeit suboptimal. In our earlier work, howe ver , we assume that the gain of the wireless communication channel between the forwarding node and a relay is a deterministic function of the distance between the two, whereas, in practice, due to the phenomenon called shadowing , the channel gain at a given distance from the forwarding node is not a constant, but varies spatially ov er points at the same distance (the variation being typically modeled as log normally distributed [5]). In addition to not being just a function of distance, the path-loss between a pair of locations varies with time; in a forest, for example, this would be due to seasonal variations in the foliage. Therefore, in each instance that a node gets custody of a packet, the node has to send probe packets to determine the channel gain to relay nodes that wake up, and thereby “offer” to forward the packet. Such probing incurs additional cost (for instance, see [6] where probing allows the transmitter to obtain a ﬁner estimate of the channel gain). Hence, “to probe” or “not to probe” can itself become a part of the decision process. In the current work we incorporate these features (namely , channel probing and the associated power cost) while choosing a relay for the next hop, leading to an interesting variant of the asset selling problem [7, Section 4.4], [8] studied in the operations research literature. Outline and Our Contributions: In Section II we will formally describe our system model, following which we will discuss the related work. Sections III and IV are dev oted tow ards characterizing the structure of the policy RST -OPT (ReST ricted-OPT imal) which is optimal within a restricted class of relay selection policies. In Section V we will discuss the globally optimal GLB-OPT policy . Numerical and simu- lation results are presented in Section VI. Our main technical contributions are the following: • W e characterize the optimal policy , RST -OPT , in terms of stopping sets. W e prove that the stopping sets have a threshold structure (Theorem 1). • W e further prove that the stopping sets are identical across the decision stages (Theorem 2 and 3). This result can be considered as a generalization of the one-step-look ahead rule (see the remark following Theorem 2). • Through one-hop numerical work we ﬁnd that the per- formance of RST -OPT is close to that of GLB-OPT . This result is useful because, the sub-optimal RST -OPT is computationally more simpler than GLB-OPT . W e hav e also conducted simulations to study the end-to-end performance of RST -OPT . W e will ﬁnally conclude in Section VII. For the ease of readability we ha ve mov ed most of the proofs to the Appendix. I I . S Y S T E M M O D E L : T H E R E L AY S E L E C T I O N P RO B L E M W e will describe the system model from the context of ge- ographical forwar ding , also known as location aware routing, z min v 0 v k v − v 0 k S ink k L k − v 0 k Z k L k F orw arder D k r c Fig. 2. The hatched area is the forwarding region L . For ` ∈ L , the progress Z ` is the difference between the forwarder-to-sink and ` -to-sink distances. [4], [9], [10]. In geographical forwarding it is assumed that each node in the network knows its location (with respect to some reference) as well as the location of the sink. Consider a forwarding node F located at v (see Fig. 2). The sink node is situated at v 0 . Thus, the distance between F and the sink is V = k v − v 0 k (we use k · k to denote the Euclidean norm). The communication re gion is the set of all locations where reliable exchange of contr ol messages (transmitted using a lo w rate robust modulation technique on a separate control channel) can take place between F and a receiv er , if any , at these locations. In Fig. 2 we have shown the communication region to be circular , but in practice this region can be arbitrary . The set of nodes within the communication region are referred to as the neighbors . Let V ` = k ` − v 0 k represent the distance of a location ` (which is a point in < 2 ) from the sink. Now deﬁne the pr ogr ess of location ` as Z ` = V − V ` , which is simply the difference between the F -to-sink and ` -to-sink distances. F is interested in forwarding the packet only to a neighbor within the forwar ding r e gion L , which is deﬁned as L = n ` ∈ communication region : Z ` ≥ z min o (1) where, z min > 0 is the minimum progress constraint (see Fig. 2, where the hatched area is the forwarding region). The reason for using z min > 0 in the deﬁnition of L are: (1) practically this will ensure that a progress of at least z min is made by the packet at each hop, and (2) mathematically this condition will allow us to bound the reward functions (to be deﬁned sooner) to take values within an interval [0 , r ] . Next, it is natural to assume that L is bounded; we will further assume that L is closed. The reason for imposing this condition will become clear in Section IV. Finally , we will refer to the nodes in the forwarding region as relays . Sleep-W ak e Process: Without loss of generality , we will assume that F receiv es an alarm packet (from an upstream node) at time 0 , which has to be forwarded to one of the relays. There are N relays that wake-up sequentially at the points of a Poisson process of rate 1 τ . 1 The wake-up times are denoted, 0 ≤ W 1 ≤ · · · ≤ W N . The relay waking up at the instant W k is referred to as the k -th relay . Let U 1 = W 1 and U k = W k − W k − 1 ( k = 2 , · · · , N ) denote the inter-wake-up 1 A practical approach for sleep-wake cycling is the asynchr onous periodic process, where each relay i wakes up at the periodic instants T i + k T with { T i } being i.i.d. (independent and identically distributed) uniform on [0 , T ] [1], [2]. Now , for large N if T scales with N such that N T → 1 τ , then the aggregate point process of relay wake-up instants conv erges to a Poisson process of rate 1 τ [11], thus justifying our Poisson process assumption. 3 time between the k -th and the ( k − 1) -th relay . Then, { U k } are i.i.d. exponential random variables with mean τ . Channel Model: W e will consider the following standard model for the transmission power required by F to achieve an SNR (signal to noise ratio) constraint of Γ at some location ` , whose distance from F is more than d ref (far -ﬁeld reference distance beyond which the following expression will hold [12]): P ` = Γ N 0 G `  D ` d ref  ξ (2) where, D ` = k ` − v k is the distance between F and ` , G ` is the random component of the channel gain between F and ` , N 0 is the receiv er noise variance, and ξ is the path- loss attenuation factor . W e will assume that d ref ≤ z min so that P ` in (2) is the power required for any ` ∈ L . Also, for simplicity , from here on we will use Γ 0 to denote Γ N 0 d ξ ref . Although G ` along with the path-loss,  D ` d ref  ξ , constitutes the gain of the channel, for simplicity we will throughout refer to G ` itself as the channel gain between F and the location ` . W e will assume that the set of channel gains, { G ` : ` ∈ L} , are i.i.d. W e will further assume that the channel coherence time is large so that the channels gains remain unchanged ov er the entire duration of the decision process, i.e., in physical layer wireless terminology , we hav e a slowly varying channel . Remark: There are two remarks we would like to make here. First is regarding the channel gains being i.i.d. Since the randomness in the channel is spatially correlated [13], if two locations ` and u are very close then the corresponding gains, G ` and G u , will not be independent; a minimum separation between the receivers is required for the gains to be statistically independently . Thus, our assumption of independence between the channel gains to the relays requires that the relays should not be close to each other, or , equiv alently , the relay density should not be large. W e will assume that this physical property holds, and, thus, proceed with the technical assumption that the channel gains are i.i.d. Next, about the slowly varying channel, it sufﬁces for the channel coherence time to be longer than the sleep-wake cycling period (recall footnote 1 from page 2). Under our light trafﬁc assumption where the ev ents are rare, with a probability close to 1 , a node wakes up and ﬁnds no forwarding node in its communication range. Thus, with a high probability , when a node wakes up, it stays aw ake for a fe w milliseconds, e.g., 3 milliseconds (for sending a control packet, turning the radio from send to listen, and then waiting for a possible response). Thus, for example, with a 1% duty cycle, the inter-wakeup time w ould need to be 300 milliseconds, imposing a reasonable requirement on the channel coherence time. Reward Structure: Finally , combining progress, Z ` , and power , P ` , we deﬁne the rew ard associated with a location ` ∈ L as, R ` = Z a ` P (1 − a ) ` = Z a ` (Γ 0 D ξ ` ) (1 − a ) G (1 − a ) ` , (3) where a ∈ [0 , 1] is used to trade-off between Z ` and P ` . The rew ard being in versely proportional to P ` is clear because it is advantageous to use low power to get the packet across; R ` is proportional to Z ` to promote progress towards the sink while choosing a relay for the next hop. The channel gains, { G ` } , are non-negati ve; we will further assume that they are bounded above by g max . These conditions along with Z ` ≥ z min (which implies that D ` ≥ z min ) and L is bounded (so that Z ` ≤ z max for all ` ∈ L ) will provide the following upper bound for the rew ard functions n R ` : ` ∈ L o : r = z a max (Γ 0 z ξ min ) (1 − a ) g (1 − a ) max . Thus, the rew ard values lie within the interval [0 , r ] . Let F ` represent the c.d.f. (cumulativ e distribution function) of R ` , and F = n F ` : ` ∈ L o (4) denote the collection of all possible reward distributions. From (3), note that, giv en a location ` it is only possible to know the reward distribution F ` . T o know the exact reward R ` , F has to transmit pr obe packets to learn the channel gain G ` (we will formalize probing very soon). Relay Locations: W e will assume that each of the N relays is randomly and mutually independently located in the forwarding region L . Formally , if L 1 , L 2 , · · · , L N denotes the relay locations, then these are i.i.d. uniform over the forwarding set L (this assumption holds if the nodes are deployed according to a spatial Poisson process). Let L denote the uniform distribution over L so that, for k = 1 , 2 , · · · , N , the distribution of L k is L . Remark: Although, for the sake of motiv ating the model, we hav e restricted to a v ery speciﬁc F (set of rew ard distributions) and L (relay location distribution), it is important to note that all our analysis in the subsequent sections will follow through for more general F and L as well. At time 0 , F only kno ws that there are N relays in its forwarding set L , but does not kno w their locations, L k , nor their channel gains, G L k . Sequential Decision Problem: When the k -th relay wakes up, we assume that its location L k , and hence its reward distribution F L k is rev ealed to F . This can be accomplished by including the location information L k within a control packet (sent using a low rate robust modulation technique, and hence, assumed to be error free) transmitted by the k - th relay upon waking up. Howe ver , if F wishes to learn the channel gain G L k (and hence the exact reward value R L k ), it has to transmit additional probe packets (indeed several packets) in order to obtain a reliable estimate of the channel gain, incurring a power cost of δ ≥ 0 units. Thus, when the k -th relay wakes up (referred to as stage k ), giv en the set of previously probed and unprobed relays (i.e., the history), the actions av ailable to F are: • s : stop and forward the packet to a relay with the maximum rew ard ( best relay ) among the probed relays; with this action the decision process ends. • c : continue to wait for the next relay to wake-up (average waiting time is τ ); with this action the decision process enters stage k + 1 . 4 • p : probe a relay from the set of all unprobed relays (provided there is at least one unprobed relay). The probed relay’ s reward value is then rev ealed, allowing F to update the best relay . After probing , the decision pr ocess is still at stage k and F has to again decide upon an action . In the model, for the sake of analysis, we neglect the time taken for the exchange of control packets and the time taken to probe a relay to learn its channel gain. W e argue that this is reasonable for very low duty cycling networks, where the av erage inter-wak e-up time is much larger than the time taken for probing and for the exchange of control packets. At stage k , let b k denote the reward of the best relay , and F k be the vector of reward distribution of the unprobed relays, i.e., formally , b k = max n R L i : i ≤ k , relay i has been probed o , and F k =  F L i : i ≤ k , relay i is unprobed  . W e will regard ( b k , F k ) to be the state of the system at stage k . Note that, it is possible that until stage k no relay has been probed, in which case b k = −∞ , or all the relays are probed so that F k is empty . Whenev er F k is empty we will represent the state as simply b k . Now we can deﬁne a forwarding policy π as follows: Deﬁnition 1: A policy π is a sequence of mappings ( µ 1 , µ 2 , · · · , µ N ) where, • for k = 1 , 2 , · · · , N − 1 , µ k ( b k , F k ) ∈ { s , c , p } and µ k ( b k ) ∈ { s , c } , and • µ N ( b N , F N ) ∈ { s , p } and µ N ( b N ) ∈ s . Note that the action to continue is not av ailable at the last stage N . Let Π denote the set of all policies.  For a policy π ∈ Π , the delay incurred, denoted D , is the time until a relay is chosen. Let R denote the reward offered by the chosen relay . Further, let M denote the total number of relays that were probed during the decision process. Then, recalling that δ is the probing cost, δ M represents the total cost of probing. W e would like to think of ( R − δ M ) as the effective r ewar d achiev ed using policy π . Then, denoting E [ · ] to be the expectation operator conditioned on using policy π , the problem we are interested in is the following: Minimize π ∈ Π  E π [ D ] − η  E π [ R ] − δ E π [ M ]   , (5) where η > 0 is the multiplier used to trade-off between delay and effecti ve rew ard. Restricted Class Π : Recall that the state at stage k is of the form ( b k , F k ) where F k is the set of all unprobed relays. The size of F k can vary from 0 (if all the k relays that hav e woken up thus far hav e been probed) to k (if none have been probed). Further , suppose the size of F k is m ( 0 < m ≤ k ) then F k ∈ F m (the m times Cartesian product of F ) since the reward distribution of each unprobed relay can be any distribution from F . Thus, the set of all possible states at stage k is large. Hence, for analytical tractability , we ﬁrst consider (in Sections III and IV) solving the problem in (5) over a r estricted class of policies, Π ⊆ Π , where a policy is restricted to take decisions keeping only up to two relays awake − one the best among all probed relays and other the best among the unprobed ones. Thus, the decision at stage k is based on ( b k , H k ) where H k is the “best distribution in F k ” (our notion of best distribution is based on stochastic ordering; we will formally discuss this in Section IV). Later in Section V we will discuss the optimal policy within the unr estricted class of policies Π . Related W ork: Suppose the probing cost δ = 0 , then the objectiv e in (5) will reduce to minimizing  E π [ D ] − η E π [ R ]  . Further , when δ = 0 , since there is no adv antage in not probing, an optimal policy is to always probe relays as they wake-up so that their reward value is immediately revealed to F . Alternati vely , if F is not allowed to exercise the option to not-probe a relay , then again the model reduces to the case where the relay rew ards are immediately rev ealed as and when they wake-up. W e have studied this particular case of our relay selection problem (which we will refer to as the basic r elay selection model ) in our earlier work [2, Section 6], [4], and this basic model can be shown to be equiv alent to a basic version of the asset selling pr oblem [7, Section 4.4], [8] studied in the operations research literature. The asset selling problem comprises a seller (with some asset to sell) and a collection of buyers who are arriving sequentially in time. The of fers made by the buyers are i.i.d. If the seller wishes to choose an early offer , then he can in vest the funds receiv ed for a longer time period. On the other hand, waiting could yield a better offer , but with the loss of time during which the sale-proceeds could hav e been in vested. The seller’ s objectiv e is to choose an of fer so as to maximize his ﬁnal re venue (received at the end of the in vestment period). Thinking of the offer of a buyer as analogous to the reward of a relay , the seller’ s objective of maximizing rev enue is equiv alent to the forwarder’ s objective of minimizing a combination of delay and reward. Howe ver , in the present work we generalize this basic version by allo wing the probing cost to be positive (i.e., δ > 0 ) so that a relay’ s re ward v alue (equi valently , b uyer’ s of fer v alue) is not revealed to the forwarder (equiv alently , seller) for free. Instead the forwarder can choose to probe a relay to know its rew ard value after incurring an additional cost of δ . Although there is work reported in the asset selling problem literature which is centered around the idea of the offer (or reward) distribution being unknown, or not knowing a parameter of the offer distribution [14], [15] but these do not incorporate an additional probe action like in our model here. T o the best of our knowledge, the particular class of models we study here is not av ailable in the asset selling problem literature. Problem of choosing a next-hop relay arises in the context of geogr aphical forwarding (as mentioned earlier, geo graphical forwar ding [9], [16] is a forwarding technique where the prerequisite is that the nodes know their respecti ve locations as well as the sink’ s location). For instance, Zorzi and Rao in [10] propose an algorithm called GeRaF (Geographical Random Forwarding) which, at each forwarding stage, chooses the relay making the lar gest progress. For a sleep-wake cycling netw ork, 5 Liu et al. in [17] propose a relay selection approach as a part of CMA C, a protocol for geographical packet forwarding. Under CMA C, node i chooses an r 0 that minimizes the expected normalized latency (which is the av erage ratio of one-hop delay and progress). Links to more literature on similar work from the context of geographical forwarding can be found in [2]. Howe ver , these work do not incorporate the action of “probing a relay” as in our relay selection model here. From the context of wireless communication, the action to probe generally occurs in the problem of channel selection [18], [19]. For instance, the authors in [18] study the following problem: a transmitter , aiming to maximize its throughput, has to choose a channel for its transmissions, among sev eral av ailable ones. The transmitter , only knowing the channel gain distributions, has to send probe packets to learn the exact channel state information (CSI). Probing many channels yields a channel with a good gain but reduces the effecti ve time for transmission within the channel coherence period. The problem is to obtain optimal strategies to decide when to stop probing and to transmit. An important difference with our work is that, in [18], [19] all the channel gain distributions are known a priori while here the reward distributions are rev ealed as and when the relays wake-up. W e will discuss more about the work in [18] in Section V. Another work which is close to ours is that of Stadje [20], where only some initial information about an offer (e.g., the av erage size of the offer) is revealed to the decision maker upon its arriv al. In addition to the actions, stop and continue, the decision maker can also choose to obtain more information about the offer by incurring a cost. Recalling previous offers is not allowed. A similar problem is studied by Thejaswi et al. in [6], where initially a coarse estimate of the channel gain is made av ailable to the transmitter . The transmitter can choose to probe the channel a second time to get a ﬁner estimate. In both of these [6], [20], the optimal policy is characterized by a threshold rule. Howe ver , the horizon length of these problems is inﬁnite, because of which the thresholds are stage independent. In general, for a ﬁnite horizon problem the optimal policy would be stage dependent. For our problem, despite being a ﬁnite horizon one, we are able to show that certain stopping sets are identical across stages. This is due to the fact that we allow the best probed relay to stay awake. I I I . R E S T R I C T E D C L A S S Π : A N M D P F O R M U L AT IO N Conﬁning to the restricted class Π , in this section we will formulate the problem in (5) as a Markov decision process. This will require us to ﬁrst discuss the one-step cost functions and state transitions before proceeding to write the Bellman optimality equations. A. One-Step Costs and State T ransitions The decision instants or the decision stages are the times at which the relays wake-up. Thus, there are N decision stages index ed by k = 1 , 2 , · · · , N . Recall that for any policy in the restricted class Π , the decision at stage k is based on ( b k , H k ) , where b k is the best rew ard so far and H k ∈ F k is the best reward distribution with F k being the set of re ward distributions of all the unprobed relays so far . As mentioned earlier , if no relay has been probed until stage k then b k = −∞ . On the other hand, if all the relays have been probed, in which case F k is empty , then we will denote the state as simple b k . Hence, the state space can be written as, X = [0 , r ] ∪ n ( b, F ` ) : b ∈ {−∞} ∪ [0 , r ] , ` ∈ L o ∪ { t } where t is the cost-free termination state. W e will use ( b, F ` ) to denote a generic state at stage k . Now , at stage k = 1 , 2 , · · · , N − 1 , giv en that the state is ( b, F ` ) , if F ’ s decision is to stop then the decision process enters t , with F incurring a termination cost of − η b (recall from (5) that η > 0 is the trade-off parameter). On the other hand, if the action is to continue then F will ﬁrst incur a waiting cost of U k +1 (the time until the next relay wakes up) and then, when the ( k + 1) -th relay wakes-up (whose reward distribution is F L k +1 ), F chooses between the two unprobed relays − one the previous relay with reward distrib ution F ` , and other the new one with distribution F L k +1 − so that the state at stage k + 1 will be either ( b, F ` ) or ( b, F L k +1 ) . The best reward value continues to be b since no new relay has been probed during the state transition. Alternativ ely , F could choose the action to probe the av ailable unprobed relay (whose reward distribution is F ` ) incurring a cost of η δ (recall that δ is the probing cost). After probing, the decision process is still considered to be at stage k with the new state being b 0 = max { b, R ` } , where R ` is the rew ard value of the just probed relay (thus the distribution of R ` is F ` ). F has to now further decide whether to stop (incurring a one-step cost of − η b 0 and enter t ), or continue (in which case the one-step cost is U k +1 and the next state is ( b 0 , F L k +1 ) ). Summarizing the above we can write the one-step cost, when the state at stage k is ( b, F ` ) , as g k  ( b, F ` ) , a k  =    − η b if a k = s U k +1 if a k = c η δ if a k = p . The next state, X 0 , is giv en by X 0 =    t if a k = s ( b, F ` ) or ( b, F L k +1 ) if a k = c max { b, R ` } if a k = p . W e hav e used X 0 to denote the next state instead of X k +1 because, if a k = p then the system is still at stage k . Only when the action is s or c the system transits to the stage k + 1 . Next, if the state at stage k is b (states of this form occur after probing the av ailable unprobed relay; recall the above expressions when a k = p ), then g k ( b, a k ) =  − η b if a k = s U k +1 if a k = c , and the next state is X k +1 =  t if a k = s ( b, F L k +1 ) if a k = c . The action to probe is not av ailable whenev er the state is b . 6 At the last stage N , action c is not available, so that g N ( b, F ` ) =  − η b if a k = s η δ if a k = p , with the system entering t if a k = s , otherwise (i.e., if a k = p ) the state transits to max { b, R k } . Finally , g N ( b ) = − η b . Note that for a policy π , the expected sum of all the one-step costs starting from stage 1 , plus the av erage waiting time for the ﬁrst relay , E [ U 1 ] = τ , 2 will equal the total cost in (5). B. Cost-to-go Functions and the Bellman Equation Let J k , k = 1 , 2 , · · · , N , represent the optimal cost-to-go function at stage k . Thus, J k ( b ) and J k ( b, F ` ) denote the cost- to-go, depending on whether there is, or is not an unprobed relay . For the last stage, N , we have, J N ( b ) = − η b , using which we obtain, J N ( b, F ` ) = min n − η b, η δ + E ` h J N (max { b, R ` } ) io = min n − η b, η δ − η E ` h max { b, R ` } io , (6) where E ` [ · ] denotes the expectation with respect to (w .r .t.) R ` whose distribution is F ` . The ﬁrst term in the min - expression above is the cost of stopping and the second term is the expected cost of probing and then stopping (recall that action c is not av ailable at the last stage N ). Next, for stages k = 1 , 2 , · · · , N − 1 , denoting the expectation w .r .t. the distribution, L , of the location, L k +1 , of the next relay by E L [ · ] , we hav e J k ( b ) = min n − η b, τ + E L h J k +1 ( b, F L k +1 ) io , (7) and J k ( b, F ` ) = min n − η b, η δ + E ` h J k (max { b, R ` } ) i , τ + E L h min { J k +1 ( b, F ` ) , J k +1 ( b, F L k +1 ) } io . (8) The ﬁrst term in both the min-expressions above is the cost of stopping. The middle term in (8) is the expected cost of probing, with η δ being the one-step cost and the remaining term being the future cost. The last term in both expressions is the expected cost of continuing, with τ representing the mean waiting time until the next relay wakes up. The future cost- to-go in the last term of (8) can be understood as follows. When the state at stage k = 1 , 2 , · · · , N − 1 is ( b, F ` ) and, if F decides to continue, then the reward distribution of the next relay is F L k +1 . Now , giv en the distrib utions F ` and F L k +1 , if F is asked to retain one of them, then it is optimal to go with the distribution that fetches a lower cost- to-go from stage k + 1 onwards, i.e., it is optimal to retain F ` if J k +1 ( b, F ` ) ≤ J k +1 ( b, F L k +1 ) , otherwise retain F L k +1 . 3 Later in this section we will show that, giv en two distributions, 2 Since inv ariably a relay has to be chosen, every policy has to wait for at least the ﬁrst relay to wake-up, at which instant the decision process begins. Thus, U 1 need not be accounted for in the total cost incurred by any policy . 3 Formally one has to introduce an intermediate state of the form ( b, F ` , F L k +1 ) at stage k + 1 where the only actions av ailable are, choose F ` or F L k +1 . Then J k +1 ( b, F ` , F L k +1 ) = min { J k +1 ( b, F ` ) , J k +1 ( b, F L k +1 ) } , which, for simplicity , we are directly using in (8). F ` and F u , if F ` is stochastically gr eater than F u [21] then J k +1 ( b, F ` ) ≤ J k +1 ( b, F u ) (see Lemma 2-(i)) so that it is optimal to retain the stochastically greater distribution. First, for simplicity let us introduce the following notation. For k = 1 , 2 , · · · , N − 1 , let C k represent the cost of continuing: C k ( b ) = τ + E L h J k +1 ( b, F L k +1 ) i (9) C k ( b, F ` ) = τ + E L h min { J k +1 ( b, F ` ) , J k +1 ( b, F L k +1 ) } i . (10) For k = 1 , 2 , · · · , N , the cost of probing, P k , is giv en by P k ( b, F ` ) = η δ + E ` h J k (max { b, R ` } ) i . (11) From (9) and (10) it is immediately clear that C k ( b, F ` ) ≤ C k ( b ) for any F ` ( ` ∈ L ). This inequality should be intuiti ve as well, since F can expect to accrue a better cost if, in addition to a probed relay , it also possesses an unprobed relay . It will be useful to note this inequality as a lemma. Lemma 1: For k = 1 , 2 , · · · , N − 1 and any ( b, F ` ) we hav e C k ( b, F ` ) ≤ C k ( b ) . Pr oof: As discussed just before the Lemma statement, the inequality follows easily from the expressions of these costs; recall (9) and (10). Finally , using the above cost notation, the cost-to-go func- tions in (7) and (8) can be written as, for k = 1 , 2 , · · · , N − 1 , J k ( b ) = min n − η b, C k ( b ) o (12) J k ( b, F ` ) = min n − η b, P k ( b, F ` ) , C k ( b, F ` ) o . (13) C. Or dering Results for the Cost-to-go Functions W e will examine how the cost-to-go functions J k ( b ) and J k ( b, F ` ) behave as functions of F ` and the stage index k . W e will ﬁrst require the deﬁnition of stochastic ordering. Deﬁnition 2 (Stochastic Ordering): Giv en two distrib utions F ` and F u , F ` is stochastically greater than F u , denoted as F ` ≥ st F u , if 1 − F ` ( r ) ≥ 1 − F u ( r ) , for all r . Equiv alently [21], F ` ≥ st F u if and only if for e very non-decreasing function f : < → < , E ` [ f ( R ` )] ≥ E u [ f ( R u )] where the distributions of R ` and R u are F ` and F u , respectiv ely .  Now , consider two relays at locations ` and u . If the corresponding re ward distributions, F ` and F u , are such that F ` ≥ st F u then F can expect that probing the relay at ` would yield a better rew ard value than the relay at u . Thus, F would prefer the stochastically greater reward distribution F ` , ov er F u . Extending this observation, it is reasonable to expect that F can accrue lower expected costs (total, continuing and probing costs) if the unprobed reward distribution av ailable at stage k is F ` than if it is F u . W e will formally prove this result next. Also, we will sho w that the expected cost at stage k is less than that at stage k + 1 , i.e., J k ( x ) ≤ J k +1 ( x ) for any state x . This again should be intuitiv e because, starting from stage k , F has the option to observe an additional relay than if it were to start from stage k + 1 . With more resource av ailable, and with these being i.i.d., F should achieve a better cost. W e will state these two results in the following lemma. 7 Lemma 2: (i) For k = 1 , 2 , · · · , N − 1 , if F ` ≥ st F u then C k ( b, F ` ) ≤ C k ( b, F u ) , (and including k = N ) P k ( b, F ` ) ≤ P k ( b, F u ) and J k ( b, F ` ) ≤ J k ( b, F u ) . (ii) For k = 1 , 2 , · · · , N − 2 , C k ( b ) ≤ C k +1 ( b ) and C k ( b, F ` ) ≤ C k +1 ( b, F ` ) , (and including k = N − 1 ) P k ( b, F ` ) ≤ P k +1 ( b, F ` ) and J k ( b, F ` ) ≤ J k +1 ( b, F ` ) . Pr oof: T o prov e (i) we ﬁrst show that the various costs are non-increasing functions of b . W e then complete the proof using the deﬁnition of stochastic ordering (Deﬁnition 2). Part (ii) follows from induction. Detail proofs are av ailable in Appendix A. I V . R E S T R I C T E D C L A S S Π : S T RU C T U R A L R E S U LT S W e begin by deﬁning, at stage k = 1 , 2 , · · · , N − 1 , the stopping set S k as S k = n b : − η b ≤ C k ( b ) o . (14) From (12) it follows that the stopping set S k is the set of all states b (states of this form are obtained after probing at stage k ) where it is better to stop than to continue. Similarly , for a giv en distribution F ` we deﬁne the stopping set S ` k as, for k = 1 , 2 , · · · , N − 1 , S ` k = n b : − η b ≤ min { P k ( b, F ` ) , C k ( b, F ` ) } o . (15) Using (13) the set S ` k has to be interpreted as, for a gi ven distribution F ` , the set of b such that whenev er the state at stage k is ( b, F ` ) it is better to stop than to either probe or continue. Note that when b = −∞ it is never optimal to stop; hence, both these stopping sets are subsets of [0 , r ] . Finally , stopping sets can also be deﬁned for k = N as, S N = [0 , r ] (since, at the last stage N , for any b the only action av ailable is to stop), and S ` N = n b : − η b ≤ P k ( b, F ` ) o . (16) The follo wing set inclusion properties easily follow from the deﬁnition of these sets and the properties of the cost functions in Lemma 1 and Lemma 2. Lemma 3: (i) For k = 1 , 2 , · · · , N and for any F ` we hav e S ` k ⊆ S k . (ii) For k = 1 , 2 , · · · , N , if F ` ≥ st F u then S ` k ⊆ S u k . (iii) For k = 1 , 2 , · · · , N − 1 we have S k ⊆ S k +1 , and for any F ` , S ` k ⊆ S ` k +1 . Pr oof: Recall the deﬁnition of the stopping sets from (14) and (15). Part (i) follows from Lemma 1. Parts (ii) and (iii) are due to Parts (i) and (ii) of Lemma 2, respectively . Discussion: The above results can be understood as follows. Whenev er an unprobed relay (say with reward distribution F ` ) is av ailable, F can be more stringent about the best reward values, b , for which it chooses to stop. This is because, F can now additionally choose to probe F ` possibly yielding a better rew ard than b . Thus, unless the best reward b is already good (so that there is no gain in probing F ` ), F will not choose to stop. Hence, we hav e S ` k ⊆ S k . Next, if F ` ≥ st F u then since probing F ` has a higher chance of yielding a better reward, the stopping condition is more stringent if the rew ard distribution of the av ailable unprobed relay is F ` than F u . Hence, the corresponding stopping sets are ordered as in Part (ii) of the abov e lemma, i.e., S ` k ⊆ S u k . Finally , whenever there are more stages to-go, F can be more cautious about stopping since it has the option to observe more relays. This suggests that S k ⊆ S k +1 and S ` k ⊆ S ` k +1 . From our above discussion, the phrase “ F being more stringent about stopping, ” suggests that it may be better to stop for larger values of b . Equi valently , this would mean that the stopping sets are characterized by thr esholds , beyond which it is optimal to stop. This is exactly our ﬁrst main result (Theorem 1). Later we will prove a more interesting result (Theorem 2 and 3) where we show that the stopping sets are stage independent , i.e., S k = S k +1 and S ` k = S ` k +1 . In the following sub-sections we will work the details of these two results. A. Stopping Sets: Thr eshold Pr operty T o prove the threshold structure of the stopping sets the following key lemma is required where we show that the increments in the various costs are bounded by the increments in the cost of stopping. Lemma 4: For k = 1 , 2 , · · · , N − 1 (for Part (ii), k = 1 , 2 , · · · , N ), for any F ` , and for b 2 > b 1 we hav e (i) C k ( b 1 ) − C k ( b 2 ) ≤ η ( b 2 − b 1 ) , (ii) P k ( b 1 , F ` ) − P k ( b 2 , F ` ) ≤ η ( b 2 − b 1 ) (iii) C k ( b 1 , F ` ) − C k ( b 2 , F ` ) ≤ η ( b 2 − b 1 ) . Pr oof: A vailable in Appendix B. Theor em 1: For k = 1 , 2 , · · · , N and for b 2 > b 1 , (i) If b 1 ∈ S k then b 2 ∈ S k . (ii) For any F ` , if b 1 ∈ S ` k then b 2 ∈ S ` k . Pr oof: Since S N = [0 , r ] , Part (i) trivially holds for k = N . Next, for k = 1 , 2 , · · · , N − 1 , using Lemma 4-(i) we can write, − η b 2 ≤ − η b 1 − C k ( b 1 ) + C k ( b 2 ) . Since b 1 ∈ S k , from (14) we know that − η b 1 ≤ C k ( b 1 ) , using which in the abo ve e xpression we obtain − ηb 2 ≤ C k ( b 2 ) implying that b 2 ∈ S k . P art (ii) can be similarly completed using Parts (ii) and (iii) of Lemma 4. α k b k N − 1 N α ℓ k k + 1 α k +1 α N − 1 α ℓ N α ℓ k +1 α ℓ N − 1 α u k Fig. 3. Illustration of the threshold property: the vertical lines are the reward axis, with each line corresponding to a different stage. The stopping sets are represented by marking their thresholds on the respectiv e vertical lines. Discussion: Thus, the stopping sets S k and S ` k can be char- acterized in terms of lower bounds α k and α ` k , respectively , as illustrated in Fig. 3 (see the vertical line corresponding to the stage index k ). Also shown in Fig. 3 is the threshold, α u k , corresponding to a distribution F u ≤ st F ` . From Lemma 3- (i) and 3-(ii) it follo ws that these thresholds are ordered, 8 α k ≤ α u k ≤ α ` k . Further , in Fig. 3 we hav e depicted these thresholds to be decreasing with the stage index k (vertical lines from left to right); this is due to Lemma 3-(iii) from where we kno w that the stopping sets are increasing with k . Our main result in the next section (Theorem 2 and 3) is to show that these thresholds are, in fact, equal (i.e., α k = α k +1 and α ` k = α ` k +1 ). Finally , note that in Fig. 3 we ha ve not shown the threshold α N corresponding to the stopping set S N ; this is simply because α N = 0 (since S N = [0 , r ] ). B. Stopping Sets: Stage Independence Pr operty From Lemma 3-(iii) we already know that S k ⊆ S k +1 , and S ` k ⊆ S ` k +1 . In this section we will prov e the inclusion in the other direction, thus leading to the result that the stopping sets are identical across the stages. W e will begin by deﬁning the stobing ( sto pping-or-pro bing ) set Q ` k as, for k = 1 , 2 , · · · , N − 1 , Q ` k = n b : min {− η b, P k ( b, F ` ) } ≤ C k ( b, F ` ) o . (17) From (13) it follo ws that Q ` k is, for a gi ven distribution F ` , the set of all b such that whenev er the state at stage k is ( b, F ` ) it is better to either stop or probe than to continue. From the deﬁnition of the sets S ` k and Q ` k (in (15) and (17), respectiv ely) it immediately follows that S ` k ⊆ Q ` k . Also from Lemma 3-(i) we already know that S ` k ⊆ S k . Howe ver , it is not immediately clear how the sets Q ` k and S k are ordered. W e will show that if F = { F ` : ` ∈ L} is totally stochastically or der ed (to be deﬁned next) then S k ⊆ Q ` k (Lemma 7). This result is essential for proving our main theorems. Deﬁnition 3 (T otal Stochastic Ordering): F is said to be totally stochastically or der ed if any two distributions from F are stochastically ordered. Formally , for any F ` , F u ∈ F either F ` ≥ st F u or F u ≥ st F ` . Further, if there exists a distribution F m ∈ F such that for ev ery F ` ∈ F we hav e F ` ≥ st F m then we say that F is totally stochastically order ed with a minimum distribution .  Lemma 5: The set of re ward distrib utions F in (4), is totally stochastically ordered with a minimum distribution. Pr oof: The channel gains, { G ` : ` ∈ L} , being identi- cally distributed will be essential to sho w that F is totally stochastically ordered. Existence of a minimum distribution will require the assumption we had made earlier (in Section II) that L is compact (closed and bounded). The complete proof is av ailable in Appendix C. Remark: Our subsequent results are not simply limited to the F in (4) which is the distrib ution set arising from the particular rew ard structure, R ` , we had assumed in (3). One can consider any collection of bounded re ward random v ariables { R ` } , such that the corresponding F is totally stochastically ordered with a minimum distribution, still all the subsequent results will hold. Before proceeding to our main theorems, we need the following results. Lemma 6: Suppose S k ⊆ Q u k , for some F u , and some k = 1 , 2 , · · · , N − 1 . Then for ev ery b ∈ S k we have J k ( b, F u ) = J N ( b, F u ) . Pr oof: A vailable in Appendix D. Next we show that the hypothesis in the above lemma indeed holds for ev ery F ` ∈ F . Lemma 7: For k = 1 , 2 , · · · , N − 1 and for any F ` ∈ F we hav e S k ⊆ Q ` k . Pr oof: The proof in volves two steps: 1) First we show that if there exists an F u such that, for k = 1 , 2 , · · · , N − 1 , S k ⊆ Q u k (thus satisfying the hypothesis in Lemma 6), then for ev ery F ` ≥ st F u we have S k ⊆ Q ` k . Lemma 6 and the total stochastic ordering of F are required for this part. 2) Next we sho w that a minimum distribution F m satisﬁes the hypothesis in Lemma 6, i.e., for ev ery k = 1 , 2 , · · · , N − 1 , S k ⊆ Q m k . The proof is completed by recalling that F ` ≥ st F m for e very F ` ∈ F and then using in Step 1 , F m in the place of F u . The existence of a minimum distribution F m (recall Lemma 5) is essential here. Formal proofs of both steps are available in Appendix E. The following are the main theorems of this section: Theor em 2: For k = 1 , 2 , · · · , N − 2 , S k = S k +1 . Pr oof: From Lemma 3-(iii) we already know that S k ⊆ S k +1 . Here, we will show that S k ⊇ S k +1 . Fix a b ∈ S k +1 ⊆ S k +2 . From Lemma 7 we know that S k +1 ⊆ Q ` k +1 and S k +2 ⊆ Q ` k +2 , for ev ery F ` . Now , applying Lemma 6 we can write, J k +1 ( b, F ` ) = J k +2 ( b, F ` ) = J N ( b, F ` ) . Thus, C k +1 ( b ) = τ + E L h J k +2 ( b, F L k +2 ) i = τ + E L h J k +1 ( b, F L k +1 ) i = C k ( b ) Finally , since b ∈ S k +1 we have − η b ≤ C k +1 ( b ) = C k ( b ) which implies that b ∈ S k . Discussion: It is interesting to compare the above result with the solution obtained for the basic model (i.e., δ = 0 case; recall the discussion on related work in Section II) or equiv alently the basic asset selling problem [7, Section 4.4]. In [7, Section 4.4], as in our Theorem 2 here, it is shown that similar stopping sets are identical across the stages; this policy is referred to as the one-step-look-ahead rule since the policy , to stop if and only if the “cost of stopping” is less than the “cost of continuing for one-more step and then stopping, ” being optimal for stage N − 1 , is optimal for all stages. The key idea there (i.e., in [7, Section 4.4]), as in our Lemma 6, is also to show that the cost-to-go functions, at every stage k , are identical for ev ery state within the stopping set. Ho wev er here, to apply Lemma 6, it was further essential for us to prove Lemma 7 showing that for ev ery F ` , S k ⊆ Q ` k . Now , note that the result S k ⊆ Q ` k trivially holds for δ = 0 , since if δ = 0 then for any ( b, F ` ) it is always optimal to probe, so that Q ` k = [0 , r ] . Thus, Theorem 2, incorporating the additional case δ > 0 , can be considered as a generalization of the one-step-look- ahead rule which is optimal for the basic asset selling model. Theor em 3: For k = 1 , 2 , · · · , N − 1 and any F ` , S ` k = S ` k +1 . Pr oof: Similar to the proof of Theorem 2, here we need to show that the probing and continuing costs satisfy analogous equalities, i.e., for b ∈ S ` k we need to show that P k +1 ( b, F ` ) = P k ( b, F ` ) and C k +1 ( b, F ` ) = C k ( b, F ` ) . Formal proof is av ailable in Appendix F. 9 b k N − 1 N k + 1 α N − 1 α ℓ N α u N Fig. 4. Illustration of the stage independence property: only the thresholds corresponding to the last stage (and stage N − 1 for S k ) are shown, since these alone are sufﬁcient to characterize the stopping sets for any k . Discussion: Owing to Theorem 2 and 3, we can now modify the illustration in Fig. 3 to Fig. 4 where we sho w only a single threshold corresponding to each stopping set. Thus, to characterize the stopping set S ` k for any k , it is sufﬁcient to compute only the threshold α ` N corresponding to the last stage. Similarly , the stopping set S k is characterized by the threshold α N − 1 computed for stage N − 1 (recall that α N = 0 ). C. Pr obing Sets Similar to the stopping sets S ` k , one can also deﬁne the probing sets P ` k as the set of all b such that whene ver the state at stage k is ( b, F ` ) it is better to probe than to either stop or continue, i.e., P ` k = n b : P k ( b, F ` ) ≤ min {− η b, C k ( b, F ` ) } o . (18) Note that P ` k is simply the difference of the sets Q ` k and S ` k , i.e., P ` k = Q ` k \ S ` k . From our numerical work we hav e observed that, similar to the stopping sets, the probing sets P ` k are characterized by upper bounds ζ ` k (see Fig. 5). The intuition for this is as follows. Let ( b, F ` ) be the state at stage N − 1 . If the v alue of b is very small, then it is better to probe than to continue, because probing will giv e an opportunity to probe an additional relay at stage N in case the process continues after probing at stage N − 1 , while continuing without probing will depriv e F of this opportunity . This argument can be extended to any stage k to conclude that it may be better to probe for small values of b . Howe ver , as b increases, probing may not yield a better reward than the existing b ; hence probing might not be worth the cost, so that it may be better to simply continue. T o formally show the threshold property of the probing set P ` k , the following is sufﬁcient: for any b 2 > b 1 , P k ( b 1 , F ` ) − P k ( b 2 , F ` ) ≤ C k ( b 1 , F ` ) − C k ( b 2 , F ` ) . This is because, if b 2 / ∈ S ` k (so that stopping is not optimal) is such that b 2 ∈ P ` k (i.e., P k ( b 2 , F ` ) ≤ C k ( b 2 , F ` ) ) then from the abov e inequality we obtain P k ( b 1 , F ` ) ≤ C k ( b 1 , F ` ) , implying that it is optimal to probe at b 1 as well so that probing sets are characterized by upper bounds. Howe ver , we have not yet been able to prove or disprove such a result, but we strongly believ e that it is true and make the following conjecture. Conjectur e 1: For k = 1 , 2 , · · · , N − 1 , for any F ` , if b 2 ∈ P ` k then for any b 1 < b 2 we hav e b 1 ∈ P ` k .  Discussion: If the abo ve conjecture is true, then some additional structural results can be deduced. For instance, suppose for some F ` , α ` k > α k , or equiv alently , α ` N > α N − 1 (refer to Fig. 5(a)). Then, since S k ⊆ Q ` k (from Lemma 7), for b k N − 1 N k + 1 α ℓ N = ζ ℓ N α N − 1 ζ ℓ k ζ ℓ k +1 ζ ℓ N − 1 (a) b k N − 1 N k + 1 ζ u k ζ u k +1 α u N = ζ u N ζ u N − 1 α N − 1 (b) Fig. 5. Structure of the probing sets if Conjecture 1 is true. (a) Probing sets corresponding to a distribution F ` such that α ` N > α N − 1 , (b) Probing sets corresponding to an F u such that α u N = α N − 1 any ( b, F ` ) such that α N − 1 < b < α ` N , it should be optimal to probe. Now , in voking Conjecture 1 we can conclude that it is optimal to probe for any b < α ` N , so that ζ ` k = α ` N for all k . Thus, for such “good” distributions, F ` , (i.e., F ` such that α ` N > α N − 1 ) the policy corresponding to it is completely characterized by a single threshold α ` N . Next, for distributions F u such that α u k = α k (equiv alently , α ` N = α N − 1 ; see Fig. 5(b)), there is a window between ζ u k and α u N where, for any ( b, F ` ) such that ζ u N ≤ b < α u N , it is optimal to continue. Unlike α u k , the thresholds ζ u k are stage dependent. In fact, from our numerical work, we observe that ζ u k are increasing with k . Finally , as depicted in Fig. 5, for any distribution F ` , at the last stage we inv ariably should hav e α ` N = ζ ` N since the action to continue is not av ailable at stage N . D. P olicy Implementation T o summarize, from Theorem 1, the stopping sets S k and S ` k are characterized by lower bounds α k and α ` k . In Theorem 2 and 3 we proved that these thresholds are stage independent. Hence it is sufﬁcient to compute only α N − 1 and α ` N , thus simplifying the ov erall computation of the optimal policy . Further , if Conjecture 1 is true, then the upper bounds ζ ` k are sufﬁcient to characterize the probing sets P ` k . Now , F after computing these thresholds, operates as follows: At stage k = 1 , 2 , · · · , N − 1 , whenever the state is ( b, F ` ) , ( 1 ) if b ≥ α ` N then stop and forward the packet to the probed relay , ( 2 ) if b ≤ ζ ` k then probe the unprobed relay and update the best reward to b 0 = max { b, R ` } . Now , if b 0 ≥ α N − 1 stop, otherwise continue to wait for the next relay , ( 3 ) otherwise (i.e., if ζ ` k < b < α ` N ), continue to wait for the next relay to wake-up, at which instant choose, between F ` and F L k +1 , whiche ver is stochastically greater while putting the other unprobed relay to sleep. If the decision process enters the last stage N and if the state is ( b, F ` ) then if b ≥ α ` N stop, otherwise probe (continue is not av ailable). Finally , if the state at stage N is b then stop irrespectiv e of its value. 10 V . U N R E S T R I C T E D C L A S S Π : A N I N F O R M A L D I S C U S S I O N In this section, based on the insights we hav e obtained from the analysis in the previous sections, we will informally discuss the possible structure of the optimal policy within the unrestricted class of policies, Π . Recall that a policy within Π , at stage k , is in general allowed to base its decision on ( b k , F k ) where b k is the reward of the best probed relay ( b k = −∞ if no relay has been probed yet) and F k is the set of unprobed relays ( F k = {} if all the relays have been probed). Thus, the state space at stage k can be written as X k = n ( b, H ) : b ∈ {−∞} ∪ [0 , r ] , H ∈ F j , 0 ≤ j ≤ k o . (21) Again the actions av ailable are stop, probe, and continue. If the action is to probe then F has to further decide which relay to probe among the several ones av ailable at stage k . When there are no unprobed relays (i.e., H = { } ) we will represent the state as simply b . W e now proceed to write the recursi ve Bellman optimality equation for this more general unrestricted problem. Although these equations are more inv olved than the ones in Section III (recall (6) through (8)), these can be understood similarly and hence we do not provide an explanation. The sole purpose for writing these equations here is because we will require these (in Section VI) to perform value iteration and numerically compute an optimal policy for the unrestricted problem. Hence these equations can be omitted without affecting the readability of the remainder of this section. Let J k , k = 1 , 2 , · · · , N , represent the optimal cost-to-go at stage k (for simplicity we are again using J k ), then, J N ( b ) = − η b , and J N ( b, H ) is as in (19). For stage k = 1 , 2 , · · · , N − 1 we hav e J k ( b ) = min n − η b, τ + E L h J k ( b, { F L k +1 } ) io , (22) and J k ( b, H ) as in (20) In view of the complexity of the problem, we do not pursue the formal analysis of characterizing the structure of the optimal policy within the unrestricted class. Howe ver , based on our results from the previous sections and a related work by Chaporkar and Proutiere [18], we will discuss the possible structure of the unrestricted-optimal policy . A. Discussion on the Last Stage N Suppose the decision process enters the last stage N . Now , giv en the best re ward v alue among the probed relays, b , and the set H of reward distributions of the unprobed relays, F has to decide whether to stop, or probe a relay (note that continue action is not available at the last stage). Suppose the action is to probe then, after probing and updating the best rew ard value, if still there are some unprobed relays left, F has to again decide to stop or probe. This decision problem is similar to the one studied by Chaporkar and Proutiere in [18], but from the context of channel selection. In the following, we will brieﬂy describe the problem in [18]. Giv en a set of channels with different channel gain dis- tributions, a transmitter has to choose a channel for its transmissions. The transmitter can probe a channel to know its channel gain. Probing all the channels will enable the transmitter to select the best channel b ut at the cost of reducing the effecti ve transmission time within the channel coherence period. On the other hand, probing only a few channels may depriv e the transmitter of the opportunity to transmit on a better channel. The transmitter is interested in maximizing its thr oughput within the coherence period. The authors in [18], for their channel probing problem, prov e that the one-step-look-ahead (OSLA) rule is optimal: giv en the channel g ain of the best channel (among the channels probed so far) and a collection of channel gain distributions of the unprobed channels, it is optimal to stop and transmit on the best channel if and only if the throughput obtained by doing so is greater than the expected throughput obtained by probing any unprobed channel and then stopping (by transmitting on the new-best channel). Further, they prove that if the set of channel gain distributions is totally stochastically ordered (recall Deﬁnition 3), then it is optimal to probe the channel whose distribution is stochastically largest among all the unprobed channels. Howe ver , in their problem maximizing throughput inv olves optimizing a product of the channel gain and the remaining transmission time, unlike in our problem where (at the last stage) we optimize a linear combination of rew ard and the probing cost. But, from our numerical work we hav e seen that a similar OSLA rule is optimal once our decision process enters the last stage N : gi ven a state ( b, H ) at stage N , it is optimal to stop if the cost of stopping is less than the cost of probing any distribution from H and then stopping; otherwise it is optimal to probe the stochastically largest distribution from H . B. Discussion on Stages k = 1 , 2 , · · · , N − 1 For the other stages k = 1 , 2 , · · · , N − 1 , one can begin by deﬁning the stopping sets S k and S H k , and the stobing sets Q H k , analogous to the ones in (14), (15) and (17). Note that, here we need to deﬁne S H k and Q H k for a set of distrib utions H unlike in the earlier case where we had deﬁned these sets only for a giv en distribution F ` . W e expect that the results analogous to the ones in Section IV, namely Theorems 2 and 3 where we prove that the stopping sets are stage independent, hold true for this more general setting as well. Further , similar to that at stage N , for any stage k we expect that if it is optimal to probe at some state ( b, H ) then it is better to probe the stochastically largest distribution from H . Again, we have seen that these observations hold in our numerical work. J N ( b, H ) = min n − η b, η δ + min F ` ∈H E ` h J N (max { b, R ` } , H \ { F ` } ) io . (19) J k ( b, H ) = min n − η b, η δ + min F ` ∈H E ` h J k (max { b, R ` } , H \ { F ` } ) i , τ + E L h J k +1 ( b, H ∪ { F L k +1 } ) io . (20) 11 V I . N U M E R I C A L A N D S I M U L A T I O N R E S U LT S A. One-Hop Study W e begin by listing the various parameter values that we hav e used in our numerical work. The forwarder and the sink are separated by a distance of V = 1000 meters (m); recall Fig. 2. The radius of the communication region is 50 m. W e set z min = 5 m. There are N = 5 relays within the forwarding region L . These are uniformly located within L . T o enable us to perform value iteration (i.e., recursively solve the Bellman equation to obtain optimal value and the optimal policy), we hav e discretized the forwarding region L into a grid of 20 uniformly spaced points within L and then map the location of each relay to a grid point closest to it. Since the grid is symmetric about the line joining F and the sink (with 4 points lying on the line so that these do not hav e symmetric pairs), we have in total ( 20 − 4 2 + 4 = ) 12 different possible D ` values, giving rise to 12 different reward distributions constituting the set F . Next, recall the reward expression from (3); we hav e ﬁxed, d ref = 5 m, ξ = 2 . 5 , and a = 0 . 5 . For Γ N 0 , which is referred to as the receiver sensitivity , we use a value of 10 − 9 mW (equiv alently − 90 dBm) speciﬁed for the Crossbow T elosB wireless mote [22]. T o ensure that the transmit power of a relay from any grid location is within the range of 1 mW to 0 . 003 mW (equiv alently 0 dBm to − 24 dBm; again from T elosB datasheet [22]), 4 we allo w for four dif ferent channel gain values: 0 . 4 × 10 − 3 , 0 . 6 × 10 − 3 , 0 . 8 × 10 − 3 , and 1 × 10 − 3 , each occurring with equal probability . Since channel probing is usually performed using the maximum allowable transmit power , we set the probing cost δ to be 1 mW . Finally , the inter- wake-up times { U k } are exponentially distributed random variables with mean τ = 20 milliseconds (ms). One-Hop P olicies: The follo wing is the description of the policies that we will study: • RST -OPT (ReST ricted OPTimal): The optimal policy within the restricted class (Sections III and IV) where F is allowed to keep at most two relays awak e − the best probed and the best unprobed relay; recall the im- plementation summary of this policy from Section IV -D. • GLB-OPT (GLoBal OPTimal): The optimal policy within the unrestricted class of policies where F operates by keeping all the unprobed relays awak e. W e obtain GLB- OPT by numerically solving the optimality equations in (19), (20) and (22). • BAS-OPT (BASic OPTtimal): The optimal policy for the basic relay selection model where F is not allowed to exercise the option of not-pr obing a relay (recall discussion of the basic model from related work). Thus, each time a relay wakes up, it is immediately probed (incurring a cost of η δ ) and its reward value is rev ealed to F . By incorporating η δ into the term τ (so that the inter-w ake-up time is modiﬁed to τ + η δ ), the solution to this model can be characterized (see our prior work [2, Section 6]) in terms of a single threshold α as follo ws: 4 Although practically only a ﬁnite set of transmit power levels will be allowed, for our numerical work we assume that the relays can transmit using any power within the speciﬁed range. 0 20 40 60 80 100 −2000 −1500 −1000 −500 0 500 η Expected Total Cost RST−OPT GLB−OPT BAS−OPT Fig. 6. Expected total cost as a function of the trade-off multiplier η ; see (5). Recall that a large η implies less emphasis on expected delay . at any stage k = 1 , 2 , · · · , N − 1 , stop if and only if the best reward value b k ≥ α ; at stage N stop for any b N . Note that the threshold α depends on η . Discussion: In Fig. 6 we hav e plotted the total cost (i.e., the objectiv e in (5)) incurred by each of the above p ol icies as a function of the multiplier η . GLB-OPT being the globally opti- mal policy achieves the minimum cost. Howe ver , interestingly we observe that the total cost obtained by RST -OPT is very close to that of GLB-OPT . While the performance of BAS- OPT is good for small values of η , the performance degrades as η increases illustrating that it is not wise to naiv ely probe ev ery relay as and when they wake-up. In Fig. 7 we have shown the indi vidual components of the total cost (namely delay , rew ard, and probing cost) as functions of η . As η decreases to 0 we see (from Fig. 7(a)) that the expected delay incurred by all the policies conv erges to 20 ms which is the mean time, τ , until the ﬁrst relay wakes up. Similarly , the expected rewards (in Fig. 7(b)) con verge to rew ard of the ﬁrst relay , and the probing costs (in Fig. 7(c)) con ver ge to the cost of probing a single relay , i.e., δ = 1 mW . This is because, for small values of η , since delay is v alued more (recall the total cost expression from (5)), all the policies essentially end up probing the ﬁrst relay and then forwarding the packet to it. This also explains as to why similar total cost (recall Fig. 6) is incurred by all the policies in the low η regime (e.g., η ≤ 20 ). Next, as η increases we see that the delay incurred and the rew ard achieved by all the policies increases (see Fig. 7(a) and 7(b), respectiv ely). While the probing cost of B AS-OPT naiv ely increases (see Fig. 7(c)), probing costs incurred by RST -OPT and GLB-OPT saturate beyond η = 20 . This is because, whenever η is large, RST -OPT and GLB-OPT are aware that the gain in reward value obtained by probing more relays is negated by the cost term, η δ , which is added to the total cost each time a new relay is probed; BAS-OPT , not allowed to not-probe, ends up probing all the relays until the best reward exceeds the threshold α . Thus, although BAS- OPT incurs a smaller delay than the other two policies, but suffers both in terms of rew ard and probing cost, leading to an 12 0 20 40 60 80 100 20 30 40 50 60 70 80 90 100 η Expected Delay in ms RST−OPT GLB−OPT BAS−OPT (a) 0 20 40 60 80 100 12 14 16 18 20 22 η Expected Reward in m a /mW (1−a) RST−OPT GLB−OPT BAS−OPT (b) 0 20 40 60 80 100 1 1.5 2 2.5 3 3.5 4 η Expected Probing Cost in mW RST−OPT GLB−OPT BAS−OPT (c) Fig. 7. Indi vidual components of the total cost in Fig. 6 as functions of η : (a) Delay (b) Reward and (c) Probing Cost. higher total cost. On the other hand, RST -OPT and GLB-OPT wait for more relays and then probe only the relays with good rew ard distribution to accrue a better total cost. Finally , the marginal improvement in performance obtained by GLB-OPT ov er RST -OPT can be understood as follows. Although the delay incurred by these two policies is almost identical, for large η v alues, GLB-OPT achieves a better rew ard than RST -OPT by incurring a slightly higher probing cost. Thus, whenev er the reward offered by the relay with the best distribution is not good enough, GLB-OPT probes an additional relay to improv e the rew ard; such improvement is not possible by RST -OPT since it is restricted to keep only one unprobed relay awak e. Computational Complexity: Finally on the computational complexity of these policies. T o obtain GLB-OPT we had to recursiv ely solv e the Bellman equation (referred to as the value iteration ) in (19), (20) and (22), for e very stage k and e very possible state at stage k . The total number of all possible states at stage k , i.e., the cardinality of the state space X k in (21), grows exponentially with the cardinality of F (assuming that F is discrete like in our numerical example). It also grows exponentially with the stage index k . In contrast, for computing RST -OPT , since within the re- stricted class at any time only one unprobed relay is kept awak e, the state space size gro ws only linearly with the cardinality of F . Also, the size of the state space does not grow with k . Furthermore, from our analysis in Section IV we know that the stopping sets are threshold based, and moreover the thresholds, α k and { α ` k : F ` ∈ F } , are stage independent. Hence, these thresholds hav e to be computed only once (for stage N − 1 and N , respectiv ely), thus further reducing the complexity of RST -OPT . BAS-OPT , being a single-threshold based policy , is much simpler to implement but is not a good choice whenev er η is large. B. End-to-End Study The good one-hop performance of RST -OPT and its com- putational simplicity motiv ates us to apply RST -OPT to route packets in an asynchronously sleep-wake cycling WSN and study its end-to-end performance. W e will also obtain the end- to-end performance of the naiv e B AS-OPT policy . First we will describe the setting that we ha ve considered for our end-to-end simulation study . W e construct a network by randomly placing 500 nodes in a square region of side 500 m. The sink node is placed at the location (500 , 0) . The network nodes are asynchronously and periodically sleep-wak e cycling, i.e., a node i wakes up at the periodic instants, { T i + k T : k ≥ 0 } , where { T i } are i.i.d. uniform on [0 , T ] with T being the sleep-wak e cycling period (recall our justiﬁcation for the periodic sleep-wake cycling from footnote 1 in page 2). W e ﬁx T = 100 ms. A source node is randomly chosen, which generates an alarm packet at time 0 . This alarm packet has to be routed to the sink node. Here, in addition to v arying η , we will also vary the multiplier a and study the end-to-end performance. Recall from (3) that a is the multiplier used to trade-off between progress and power in the reward expression; a larger value of a implies more emphasis on progress. The values of all the other parameters, e.g., r c , δ , Γ N 0 , channel gains, etc., remain as in our one-hop study . Now , for a gi ven η and a , each node computes the corre- sponding RST -OPT and B AS-OPT policies assuming a mean inter-w ake-up time of T N i ms, where N i is the number of nodes in the forwarding region of node i . In Fig. 8, for three different values of a (namely 0 . 5 , 0 . 7 , and 0 . 9 ) we hav e plotted, as functions of η , the total delay and the total power (which is the sum of the probing and the transmission po wers incurred at each hop) incurred, by applying RST -OPT and BAS-OPT policies at each hop en-route to the sink node. Each data point in Fig. 8 is obtained by av eraging the respecti ve quantities o ver 1000 alarm packets. Discussion: First, note that both total delay and total power incurred by B AS-OPT are increasing with η for each a . Hence, no fav orable trade-off between delay and power can be obtained using BAS-OPT ; it is better to operate BAS- OPT at a low value of η , where the total delay incurred is (approximately) 250 ms while the total power expended is about 20 mW . In fact, as η decreases to 0 , we see that the performance of all the policies (i.e., RST -OPT and B AS-OPT for different values of a ) conv erge to these values. This is simply because, whenever η is small, since (one-hop) delay is valued more, all the policies, at each hop, essentially forward the packet to the ﬁrst relay that wakes up. 13 0 5 10 15 20 200 300 400 500 600 700 800 900 1000 1100 η Total Delay in ms RST−OPT BAS−OPT a=0.5 a=0.7 a=0.9 (a) 0 5 10 15 20 10 20 30 40 50 60 70 η Total Power (Probing+Tx) in mW RST−OPT BAS−OPT a=0.9 a=0.5 a=0.7 (b) Fig. 8. End-to-end performance of RST -OPT and BAS-OPT as functions of η for different values of a : (a) T otal delay , and (b) T otal power . For RST -OPT , while only a marginal trade-off between delay and power can be achieved for a = 0 . 5 (see from Fig. 8(b) that the corresponding total power decreases only marginally as η increases from 1 to 4 ), but as we increase the value of a to 0 . 7 and then to 0 . 9 , we see that the total power sharply decreases with η . For instance, for a = 0 . 9 , from Fig. 8(b) we see that the total po wer decrease from 20 mW to 13 mW as η goes from 0 to 7 . Howe ver , over this range of η , total delay increases from 250 ms to 360 ms (see the plot corresponding to RST -OPT , a = 0 . 9 , from Fig. 8(a)). Thus, for these higher values of a , trade-of f between delay and power can be achiev ed using RST -OPT . Next, for any ﬁxed η , from Fig. 8(b) observe that the total power incurred by RST -OPT is improving (i.e., decreasing) with a . This can be understood as follows: since a larger a giv es less emphasis on power and more emphasis on progress in the re ward expression (recall (3)), then, although the one- hop transmissions may be of higher power , but there are fewer hops and hence fewer transmissions, thus resulting in a lower total power . This observ ation would suggest that it is advantageous to use RST -OPT by setting a = 0 . 9 rather than a = 0 . 5 or 0 . 7 . Howe ver , from Fig. 8(a) we see that the total delay is not decreasing with a . In fact, delay incurred by RST -OPT ﬁrst decreases as a increases from 0 . 5 to 0 . 7 , and then increases as a is further increased to 0 . 9 . Similar is the case for the plots corresponding to B AS-OPT in Fig. 8(a). This observation can be understood as follows. When a = 0 . 5 , since (one-hop) po wer is valued more, the respective forwarding nodes at each hop will end up spending more time waiting for a relay which require strictly lesser transmission power . Similarly , when a = 0 . 9 , larger delay is incurred at each hop since the forwarding nodes now hav e to wait for relays whose progress value is more (howe ver , since a = 0 . 9 results in a fewer hops we see that the delay incurred in this case is considerably less than the a = 0 . 5 case). On the other hand, when a = 0 . 7 , since a relativ ely fair trade-off between progress and power exists, the waiting time at each hop is reduced because now any relay with a moderate progress and a moderate transmission power would sufﬁce. The abov e argument is precisely the reason as to why the total power incurred by B AS-OPT behaves as in Fig. 8(b): when a = 0 . 5 or 0 . 9 , each forwarder , in the process of waiting for a relay whose transmission power requirement is lo w or progress is large, respectiv ely , will end up probing more relays. RST -OPT beneﬁts over BAS-OPT here by probing only good relays at each hop, thus yielding a lower total power . Finally , summarizing our end-to-end results, we see that no trade-off between delay and power can be achiev ed by the naiv e B AS-OPT policy , while RST -OPT achieves such a trade- off (by v arying η ) for a = 0 . 7 or 0 . 9 . Further , for a ﬁxed η , fa vorable trade-off between delay and power can be obtained by varying a . For instance, from Fig. 8 we see that when η = 7 , moving from a = 0 . 7 to 0 . 9 will result in a power saving of about 3 mW while increasing the end-to-end delay by 130 ms. Thus, depending on the application requirement (i.e., delay or power sensiti ve application) one has to appropriately choose the values of η and a . V I I . C O N C L U S I O N Motiv ated by the problem of end-to-end geographical for- warding in a sleep-wake cycling wireless sensor network, we formulated a decision problem of choosing a next-hop relay node when a set of potential relay neighbors are sequentially waking up in time. A po wer cost is incurred for probing a relay to learn its channel gain. W e ﬁrst studied a restricted class of policies where a policy’ s decision is based only on, in addition to the best probed relay , the best unprobed relays (instead of all the unprobed relays). W e characterized the optimal polic y in terms of stopping sets. Our ﬁrst main result (Theorem 1) was to show that the stopping sets are threshold based. Then we prov ed that the stopping sets are stage independent (Theorem 2 and 3). A discussion on the more general unrestricted class of policies was provided. W e conducted numerical work to compare the performances of the restricted optimal (RST -OPT) and the global optimal (GLB-OPT) policies. W e observed that the performance of RST -OPT is close to that of GLB-OPT . 14 W e also conducted simulation experiments to study the end- to-end performance of RST -OPT . Finally , it is worth noting that our work being a variant of the asset selling problem, can, in general, ﬁnd application wherev er the problem of resource-selection occurs, when a collection of resources are sequentially arriving. R E F E R E N C E S [1] J. Kim, X. Lin, and N. Shroff, “Optimal Anycast T echnique for Delay-Sensitiv e Ener gy-Constrained Asynchronous Sensor Networks, ” IEEE/ACM T ransactions on Networking , April 2011. [2] K. P . Naveen and A. Kumar, “Relay Selection for Geographical For- warding in Sleep-W ake Cycling W ireless Sensor Networks, ” IEEE T ransactions on Mobile Computing , vol. 12, no. 3, pp. 475–488, 2013. [3] D. P . Bertsekas and J. N. Tsitsiklis, “ An Analysis of Stochastic Shortest Path Problems, ” Mathematics of Operations Researc h , vol. 16, 1991. [4] K. P . Naveen and A. Kumar , “T unable Locally-Optimal Geographical Forwarding in Wireless Sensor Networks with Sleep-Wake Cycling Nodes, ” in INFOCOM 2010, 29th IEEE Conference on Computer Communications , March 2010. [5] T . Rappaport, W ireless Communications: Principles and Practice , 2nd ed. Upper Saddle River , NJ, USA: Prentice Hall PTR, 2001. [6] P . S. C. Thejaswi, J. Zhang, M. O. Pun, H. V . Poor, and D. Zheng, “Dis- tributed Opportunistic Scheduling with Two-Lev el Probing, ” IEEE/ACM T ransactions on Networking , vol. 18, no. 5, October 2010. [7] D. P . Bertsekas, Dynamic Pro gramming and Optimal Contr ol, V ol. I . Athena Scientiﬁc, 2005. [8] S. Karlin, Stochastic Models and Optimal P olicy for Selling an Asset . Stanford University Press, Stanford, 1962. [9] K. Akkaya and M. Y ounis, “ A Survey on Routing Protocols for Wireless Sensor Networks, ” Ad Hoc Networks , vol. 3, pp. 325–349, 2005. [10] M. Zorzi and R. R. Rao, “Geographic Random Forwarding (GeRaF) for Ad Hoc and Sensor Networks: Multihop Performance, ” IEEE T ransac- tions on Mobile Computing , vol. 2, pp. 337–348, 2003. [11] E. Cinlar , Introduction to Stochastic Processes . Prentice-Hall, 1975. [12] A. Kumar , D. Manjunath, and J. Kuri, W ireless Networking . San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2008. [13] P . Agrawal and N. Patwari, “Correlated Link Shadow Fading in Multi- Hop Wireless Networks, ” IEEE T ransactions on W ireless Communica- tions , vol. 8, no. 8, pp. 4024–4036, 2009. [14] S. C. Albright, “A Bayesian Approach to a Generalized House Selling Problem, ” Management Science , vol. 24, no. 4, pp. 432–440, 1977. [15] D. B. Rosenﬁeld, R. D. Shapiro, and D. A. Butler , “Optimal Strategies for Selling an Asset, ” Management Science , vol. 29, no. 9, pp. 1051– 1061, 1983. [16] M. Mauve, J. W idmer, and H. Hartenstein, “ A Survey on Position-Based Routing in Mobile Ad-Hoc Networks, ” IEEE Network , vol. 15, pp. 30– 39, 2001. [17] S. Liu, K. W . Fan, and P . Sinha, “CMAC: An Energy Efﬁcient MAC Layer Protocol using Conver gent Packet Forwarding for Wireless Sensor Networks, ” in SECON ’07: 4th Annual IEEE Communications Society Confer ence on Sensor , Mesh and Ad Hoc Communications and Net- works , June 2007, pp. 11–20. [18] P . Chaporkar and A. Proutiere, “Optimal Joint Probing and Transmission Strategy for Maximizing Throughput in Wireless Systems, ” IEEE Jour- nal on Selected Ar eas in Communications , vol. 26, no. 8, pp. 1546–1555, October 2008. [19] N. B. Chang and M. Liu, “Optimal Channel Probing and Transmission Scheduling for Opportunistic Spectrum Access, ” in MobiCom ’07: Pr oceedings of the 13th annual A CM international confer ence on Mobile computing and networking , 2007, pp. 27–38. [20] W . Stadje, “ An Optimal Stopping Problem with Two Lev els of In- complete Information, ” Mathematical Methods of Operations Researc h , vol. 45, pp. 119–131, 1997. [21] D. Stoyan, Comparison Methods for Queues and other Stochastic Models . John Wiley & Sons, New Y ork, 1983. [22] Crossbow , “T elosB Mote Platform. ” [Online]. A vailable: www .willow . co.uk/T elosB Datasheet.pdf 15 A P P E N D I X A P RO O F O F L E M M A 2 For conv enience, here in the appendix, we will recall the respective Lemma/Theorem statements before providing their proofs. Now , before proceeding to the proof of Lemma 2, we will require the following result ﬁrst. Lemma 8: For k = 1 , 2 , · · · , N , J k ( b ) and J k ( b, F ` ) are decreasing in b . Pr oof: Proof is by induction. For stage N we know that J N ( b ) = − η b , and hence is decreasing in b . Also, recalling J N ( b, F ` ) from (6): J N ( b, F ` ) = min n − η b, η δ − η E ` h max { b, R ` } io , it is easy to see that J N ( b, F ` ) is also decreasing in b . Thus, the monotonicity properties holds for stage N . Now , suppose J k +1 ( b ) and J k +1 ( b, F ` ) (for all F ` ) are decreasing in b for some k + 1 = 2 , 3 , · · · , N , then we will show that the result holds for stage k as well. First, recall the expressions of J k ( b ) and J k ( b, F ` ) (from (12) and (13) respectiv ely): J k ( b ) = min n − η b, C k ( b ) o and J k ( b, F ` ) = min n − η b, P k ( b, F ` ) , C k ( b, F ` ) o . Thus to complete the proof it is sufﬁcient to sho w that C k ( b ) , P k ( b, F ` ) and C k ( b, F ` ) are decreasing in b . From the induction hypothesis, it is easy to see that C k ( b ) (in (9)) is decreasing in b , so that we obtain J k ( b ) is decreasing in b . Now that we have established J k ( b ) is decreasing in b , it will immediately follow that the probing cost P k ( b, F ` ) (in (11)) is decreasing in b . Finally , again using the induction argument, observe that min n J k +1 ( b, F ` ) , J k +1 ( b, F L k +1 ) o is decreasing in b so that the continuing cost C k ( b, F ` ) (in (10)) is also decreasing. W e are now ready to prove Lemma 2. Lemma 2 : (i) For k = 1 , 2 , · · · , N − 1 , if F ` ≥ st F u then C k ( b, F ` ) ≤ C k ( b, F u ) , (including k = N ) P k ( b, F ` ) ≤ P k ( b, F u ) and J k ( b, F ` ) ≤ J k ( b, F u ) . (ii) For k = 1 , 2 , · · · , N − 2 , C k ( b ) ≤ C k +1 ( b ) and C k ( b, F ` ) ≤ C k +1 ( b, F ` ) , (including k = N − 1 ) P k ( b, F ` ) ≤ P k +1 ( b, F ` ) and J k ( b, F ` ) ≤ J k +1 ( b, F ` ) . Pr oof of P art-(i): Consider stage N and recall the expression for the optimal cost-to-go function J N ( b, F ` ) from (6): J N ( b, F ` ) = min n − η b, P N ( b, F ` ) o = min n − η b, η δ − η E ` h max { b, R ` } io . Since the function f ( r ) = max { b, r } is increasing in r , using the deﬁnition of stochastic ordering (Deﬁnition 2) we can write E ` h max { b, R ` } i ≥ E u h max { b, R u } i , so that we hav e P N ( b, F ` ) ≤ P N ( b, F u ) and J N ( b, F ` ) ≤ J N ( b, F u ) . Thus, the result holds for stage N . Now suppose the result is true for some k + 1 = 2 , 3 , · · · , N . From Lemma 8 we know that J k ( b ) is decreasing in b , which would imply that, for any b , the function f ( r ) = J k (max { b, r } )) is decreasing in r . Again, using the deﬁnition of stochastic ordering (in Deﬁnition 2) we can conclude that E ` h J k (max { b, R ` } ) i ≤ E u h J k (max { b, R u } ) i , so that P k ( b, F ` ) ≤ P k ( b, F u ) (see (11)). Next, from the induction argument we know that J k +1 ( b, F ` ) ≤ J k +1 ( b, F u ) so that min n J k +1 ( b, F ` ) , J k +1 ( b, F L k +1 ) o ≤ min n J k +1 ( b, F u ) , J k +1 ( b, F L k +1 ) o . Therefore, we also hav e C k ( b, F ` ) ≤ C k ( b, F u ) (see (10)). The proof can no w be easily completed by recalling (from (13)) that J k ( b, F ` ) = min n − η b, P k ( b, F ` ) , C k ( b, F ` ) o . Pr oof of P art-(ii): This result is very intuitiv e, since with more number of stages to go, one is expected to accrue a lo wer cost. Howe ver , we prove it here for completeness. Again the proof is by induction. For stage N − 1 we easily have, J N − 1 ( b ) = min n − η b, C k ( b ) o ≤ − η b = J N ( b ) . 16 Next, consider a state of the form ( b, F ` ) . The cost of probing P N − 1 ( b, F ` ) can be bounded as follows: P N − 1 ( b, F ` ) = η δ + E ` h J N − 1 (max { b, R ` } ) i ∗ ≤ η δ + E ` h J N (max { b, R ` } ) i o = η δ − η E ` h max { b, R ` } i † = P N ( b, F ` ) , where, to obtain ∗ we have used, J N − 1 ( b ) ≤ J N ( b ) (which we had just proved), o is because J N ( b ) = − η b for all b , and † is simply obtained by recalling the expression for P N ( b, F ` ) . Using the above inequality in the following, we obtain J N − 1 ( b, F ` ) = min n − η b, P N − 1 ( b, F ` ) , C N − 1 ( b, F ` ) o ≤ min n − η b, P N − 1 ( b, F ` ) o ≤ min n − η b, η δ − η E ` h max { b, R ` } io = J N ( b, F ` ) . Thus we hav e shown the result for stage N − 1 . Suppose the result is true for some stage k + 1 = 2 , 3 , · · · , N − 1 . i.e., J k +1 ( b ) ≤ J k +2 ( b ) and J k +1 ( b, F ` ) ≤ J k +2 ( b, F ` ) (for all F ` ), then, using the induction hypothesis, the cost of continuing, C k ( b ) , can be bounded as C k ( b ) = τ + E L h J k +1 ( b, F k +1 ) i ≤ τ + E L h J k +2 ( b, F k +2 ) i = C k +1 ( b ) . Thus, we hav e J k ( b ) ≤ J k +1 ( b ) (see (12)). Next, consider the probing cost, P k ( b, F ` ) = η δ + E ` h J k (max { b, R ` } ) i ∗ ≤ η δ + E ` h J k +1 (max { b, R ` } ) i = P k +1 ( b, F ` ) where, to obtain ∗ we hav e used J k ( b ) ≤ J k +1 ( b ) which we have already sho wn. The cost of continuing can be similarly bounded: C k ( b, F ` ) = τ + E L h min { J k +1 ( b, F ` ) , J k +1 ( b, F L k +1 ) } i ∗ ≤ τ + E L h min { J k +2 ( b, F ` ) , J k +2 ( b, F L k +2 ) } i = C k +1 ( b, F ` ) , where ∗ is due to the induction hypothesis and the fact that location random v ariables, L k +1 and L k +2 , are identically distrib uted. Finally , using the above inequalities in the expression of J k ( b, F ` )  recall (13); J k ( b, F ` ) = min n − η b, P k ( b, F ` ) , C k ( b, F ` ) o , we obtain J k ( b, F ` ) ≤ J k +1 ( b, F ` ) , thus completing the proof. A P P E N D I X B P RO O F O F L E M M A 4 The following simple property about the min -operator will be useful while proving Lemma 4. Lemma 9: If x 1 , x 2 , · · · , x j and y 1 , y 2 , · · · , y j in < , are such that, x i − y i ≤ x 1 − y 1 for all i = 1 , 2 , · · · , j , then min { x 1 , x 2 , · · · , x j } − min { y 1 , y 2 , · · · , y j } ≤ x 1 − y 1 (23) Pr oof: Suppose min { y 1 , y 2 , · · · , y j } = y i , for some 1 ≤ i ≤ j , then the LHS of (23) can be written as, LH S = min { x 1 , x 2 , · · · , x j } − y i ≤ x i − y i . The proof is complete by recalling that we are given, x i − y i ≤ x 1 − y 1 . Lemma 4 : For k = 1 , 2 , · · · , N − 1 (for part (ii), k = 1 , 2 , · · · , N ), for any F ` , and for b 2 > b 1 we hav e (i) C k ( b 1 ) − C k ( b 2 ) ≤ η ( b 2 − b 1 ) , 17 (ii) P k ( b 1 , F ` ) − P k ( b 2 , F ` ) ≤ η ( b 2 − b 1 ) (iii) C k ( b 1 , F ` ) − C k ( b 2 , F ` ) ≤ η ( b 2 − b 1 ) . Pr oof: Since J N ( b ) is − η b we already have, for stage N , J N ( b 1 ) − J N ( b 2 ) = η ( b 2 − b 1 ) . Also, for a gi ven distribution F ` and for b 2 > b 1 , P N ( b 1 , F ` ) − P N ( b 2 , F ` ) = η E ` h max { b 2 , R ` } − max { b 1 , R ` } i ∗ ≤ η ( b 2 − b 1 ) , where to obtain ∗ , ﬁrst consider all the three cases that are possible: (1) R ` ≤ b 1 < b 2 , (2) b 1 < R ` < b 2 , and (3) b 1 < b 2 ≤ R ` , and then note that in all these cases,  max { b 2 , R ` } − max { b 1 , R ` }  , is bounded above by b 2 − b 1 . Now , since J N ( b, F ` ) = min n − η b, P N ( b, F ` ) o , the abo ve inequality along with Lemma 9 will yield, J N ( b 1 , F ` ) − J N ( b 2 , F ` ) ≤ η ( b 2 − b 1 ) . Suppose for some stage k + 1 = 1 , 2 , · · · , N we hav e J k +1 ( b 1 ) − J k +1 ( b 2 ) ≤ η ( b 2 − b 1 ) and J k +1 ( b 1 , F ` ) − J k +1 ( b 2 , F ` ) ≤ η ( b 2 − b 1 ) for all b 2 > b 1 , and for all F ` . Then we will show that all the inequalities listed in the lemma will hold for stage k as well. First, a simple application of the induction hypothesis will yield, C k ( b 1 ) − C k ( b 2 ) = E L h J k +1 ( b 1 , F L k ) − J k +1 ( b 2 , F L k ) i ≤ η ( b 2 − b 1 ) . Since J k ( b ) = min n − η b, C k ( b ) o , the above inequality along with Lemma 9 gi ves, J k ( b 1 ) − J k ( b 2 ) ≤ η ( b 2 − b 1 ) , for any b 2 > b 1 . Using this we can write P k ( b 1 , F ` ) − P k ( b 2 , F ` ) = E ` h J k (max { b 1 , R ` } ) − J k (max { b 2 , R ` } ) i ≤ E ` h η  max { b 2 , R ` } − max { b 1 , R ` } i ≤ η ( b 2 − b 1 ) , (24) where the last inequality is again by considering all the three regions where R ` can lie. T o show part (iii), deﬁne L ` as the set of all distributions that are stochastically greater than ` , i.e., L ` = n F t ∈ F : F t ≥ st F ` o . Let L c ` denote the set of all the remaining distributions, i.e., L c ` = F \ L ` . From Lemma 5, where we ha ve shown that F is totally stochastically ordered (see Deﬁnition 3), it follows that L c ` contains all distributions in F which are stochastically smaller than F ` . Recalling the expression for C k ( b, F ` ) from (10), the difference in the cost of continuing can now be bounded as follows: C k ( b 1 , F ` ) − C k ( b 2 , F ` ) = Z F  min { J k +1 ( b 1 , F ` ) , J k +1 ( b 1 , F t ) } − min { J k +1 ( b 2 , F ` ) , J k +1 ( b 2 , F t ) }  dL ( t ) ∗ = Z L ` ( J k +1 ( b 1 , F t ) − J k +1 ( b 2 , F t )) dL ( t ) + Z L c ` ( J k +1 ( b 1 , F ` ) − J k +1 ( b 2 , F ` )) dL ( t ) . o ≤ η ( b 2 − b 1 ) . (25) In the above deriv ation, ∗ is obtained by using Lemma 2-(i), and o is simply by applying the induction argument. Since J k ( b, F ` ) = min n − η b, P k ( b, F ` ) , C k ( b, F ` ) o , using (24) and (25) along with Lemma 9, we obtain, J k ( b 1 , F ` ) − J k ( b 2 , F ` ) ≤ η ( b 2 − b 1 ) , thus completing the induction argument. A P P E N D I X C P RO O F O F L E M M A 5 Lemma 5 : The set of rew ard distributions F in (4), is totally stochastically ordered with a minimum distribution. Pr oof: Recall the re ward expression from (3), R ` = Z a ` P (1 − a ) ` = Z a ` (Γ 0 D ξ ` ) (1 − a ) G (1 − a ) ` . 18 The distribution, F ` , of R ` can be written as, F ` ( r ) = P ( R ` ≤ r ) = P Z a ` (Γ 0 D ξ ` ) (1 − a ) G (1 − a ) ` ≤ r ! = P  G (1 − a ) ` ≤ κ ` r  , (26) where κ ` = (Γ 0 D ξ ` ) (1 − a ) Z a ` . Let `, u be any two locations in L . Since the rewards are non-neg ativ e, we hav e F ` ( r ) = F u ( r ) = 0 for r < 0 . Hence, we only need to consider the case r ≥ 0 . Now , giv en `, u ∈ L , either κ ` ≤ κ u or κ ` > κ u . Thus we hav e, either κ ` r ≤ κ u r or κ ` r ≥ κ u r , for e very r ≥ 0 . Finally , since G ` and G u are identically distributed, we hav e, either F ` ( r ) ≤ F u ( r ) or F ` ( r ) ≥ F u ( r ) , for all r , so that F ` and F u are stochastically ordered (recall Deﬁnition 2). T o show that there exists a minimum distribution, ﬁrst note that κ ` as a function of ` ∈ L is continuous. Then, since we had assumed that L is compact (closed and bounded), there exists an m ∈ L where the maximum is achieved, i.e., κ ` ≤ κ m for all ` ∈ L . Again, since the gains G ` and G m are identically distributed, from (26) it follows that F ` ≥ st F m for all ` ∈ L , so that F m is the minimum distribution. A P P E N D I X D P RO O F O F L E M M A 6 Lemma 6 : Suppose S k ⊆ Q u k , for some F u , and some k = 1 , 2 , · · · , N − 1 . Then for ev ery b ∈ S k we ha ve J k ( b, F u ) = J N ( b, F u ) . Pr oof: Fix a b ∈ S k ⊆ Q u k . Then, J k ( b, F u ) = min n − η b, P k ( b, F u ) , C k ( b, F u ) o ∗ = min n − η b, P k ( b, F u ) o o = min n − η b, η δ + E u h J k (max { b, R u } ) io † = min n − η b, η δ − η E u h max { b, R u } io = J N ( b, F u ) . In the above deriv ation, ∗ is because, b being in Q u k , at ( b, F u ) it is optimal to either stop or probe (recall (17)). o is simply obtained by substituting for P k ( b, F u ) from (11). Further , after probing the new state, max { b, R u } ≥ b , is also in S k (from Theorem 1) so that it is optimal to stop after probing. This observation yields † . Finally , the last equality is obtained by recalling the expression of J N ( b, F u ) from (6). A P P E N D I X E P RO O F O F L E M M A 7 As discussed in the outline of the proof of Lemma 7, the result immediately follows once we show Step 1 and Step 2 . First we will formally state and prove Step 1 . Lemma 10: Suppose F u is a distribution such that for all k = 1 , 2 , · · · , N − 1 , S k ⊆ Q u k . Then for any distribution F ` ≥ st F u we hav e S k ⊆ Q ` k . Pr oof: W e will ﬁrst show that S N − 1 ⊆ Q ` N − 1 . Fix a b ∈ S N − 1 . Then b ∈ Q u N − 1 (because it is giv en that S N − 1 ⊆ Q u N − 1 ), so that using the deﬁnition of the set Q u N − 1 (from (17)) we can write min n − η b, P N − 1 ( b, F u ) o ≤ C N − 1 ( b, F u ) . (27) For any generic distribution F s , whenever b ∈ S N − 1 , the minimum of the cost of stopping and the cost of probing can be simpliﬁed as follows: min n − η b, P N − 1 ( b, F s ) o ∗ = min n − η b, η δ + E s h J N − 1 (max { b, R s } ) io o = min n − η b, η δ − η E s h max { b, R s } io † = J N ( b, F s ) . (28) 19 In the above, ∗ is obtained by recalling the expression for the probing cost from (11). o is because, after probing we are still at stage N − 1 with the new state max { b, R s } also in S N − 1 (Lemma 1); in S N − 1 we know that it is optimal to stop, so that J N − 1 (max { b, R s } ) = − η max { b, R s } . Finally , to obtain † , recall the expression for J N ( b, F s ) from (6). Now using (28) in (27) we see that, the hypothesis b ∈ S N − 1 implies, J N ( b, F u ) ≤ C N − 1 ( b, F u ) . Also, from Lemma 2-(i) we hav e, J N ( b, F ` ) ≤ J N ( b, F u ) for any F ` ≥ st F u . Combining these we can write J N ( b, F ` ) ≤ J N ( b, F u ) ≤ C N − 1 ( b, F u ) . (29) T o conclude that b ∈ Q ` N − 1 , we need to show min n − η b, P N − 1 ( b, F ` ) o ≤ C N − 1 ( b, F ` ) , or , alternativ ely , recalling (28), it is sufﬁcient to show , J N ( b, F ` ) ≤ C N − 1 ( b, F ` ) . (30) Now for any generic distribution F s ∈ F deﬁne L s = n t ∈ L : F t ≥ st F s o i.e., L s is the set of all distributions in F that are stochastically greater than F s . Let L c ` denote the set of all the remaining distributions, i.e., L c ` = F \ L ` . Since F is totally stochastically ordered (Lemma 5), L c s contains all distributions in F that are stochastically smaller than F s . Further , for F ` ≥ st F u we hav e L ` ⊆ L u . Then, recalling the expression for C N − 1 ( b, F u ) from (10) we can write C N − 1 ( b, F u ) = τ + E L h min { J N ( b, F u ) , J N ( b, F L N ) } i ∗ = τ + Z L u J N ( b, F t ) dL ( t ) + Z L c u J N ( b, F u ) dL ( t ) o = τ + Z L ` J N ( b, F t ) dL ( t ) + Z L u \L ` J N ( b, F t ) dL ( t ) + Z L c u J N ( b, F u ) dL ( t ) , where, ∗ is obtained by using Lemma 2-(i) and the deﬁnition of L u , and to obtain o we have split the integral over L u (ﬁrst integral in ∗ ) into two integrals − one over L ` and the other over L u \ L ` . Now , for any F t ∈ L u \ L ` we know that F t ≥ st F u so that J N ( b, F t ) ≤ J N ( b, F u ) (again from Lemma 2-(i)). Thus, in the above expression, replacing J N ( b, F t ) by J N ( b, F u ) in the middle integral, and then combining it with the last integral, we obtain C N − 1 ( b, F u ) ≤ τ + Z L ` J N ( b, F t ) dL ( t ) + Z L c ` dL ( t ) ! J N ( b, F u ) (31) From (29) and (31) we see that we have an inequality of the following form J N ( b, F ` ) ≤ J N ( b, F u ) ≤ c + pJ N ( b, F u ) , (32) where c = τ + R L ` J N ( b, F t ) dL ( t ) and p = R L c ` dL ( t ) . Since p ∈ [0 , 1] we can write J N ( b, F ` )(1 − p ) ≤ J N ( b, F u )(1 − p ) , rearranging which we obtain, J N ( b, F ` ) ≤ pJ N ( b, F ` ) + J N ( b, F u ) − pJ N ( b, F u ) ∗ ≤ pJ N ( b, F ` ) + c + pJ N ( b, F u ) − pJ N ( b, F u ) = c + pJ N ( b, F ` ) where, to obtain ∗ we hav e used (32). Finally , note that c + pJ N ( b, F ` ) = τ + Z L ` J N ( b, F t ) dL ( t ) + Z L c ` dL ( t ) ! J N ( b, F ` ) = τ + E L h min { J N ( b, F ` ) , J N ( b, F L N ) } i = C N − 1 ( b, F ` ) . Thus, as desired we hav e shown J N ( b, F ` ) ≤ C N − 1 ( b, F ` ) (recall the discussion leading to (30)). Suppose that for some k + 1 = 2 , 3 , · · · , N − 1 we have S k +1 ⊆ Q ` k +1 . W e will hav e to show that the same holds for stage k . Fix any b ∈ S k , then for any generic distribution F s , exactly as in (28) we have min n − η b, P k ( b, F s ) o = min n − η b, η δ + E s h J k (max { b, R s } ) io = min n − η b, η δ − η E s h max { b, R s } io = J N ( b, F s ) . (33) 20 Thus the hypothesis S k ⊆ Q u k implies J N ( b, F u ) ≤ C k ( b, F u ) , and to show S k ⊆ Q ` k it is sufﬁcient to obtain J N ( b, F ` ) ≤ C k ( b, F ` ) . Proceeding as before (recall how (31) was obtained) we can write C k ( b, F u ) ≤ τ + Z L ` J k +1 ( b, F t ) dL ( t ) + Z L c ` dL ( t ) ! J k +1 ( b, F u ) . Now using Lemma 6, we conclude C k ( b, F u ) ≤ τ + Z L ` J k +1 ( b, F t ) dL ( t ) + Z L c ` dL ( t ) J N ( b, F u ) . Note that the conditions required to apply Lemma 6 hold i.e., b ∈ S k +1 (since S k ⊆ S k +1 from Lemma 3-(iii)) and S k +1 ⊆ Q u k +1 (this is giv en). Thus, again we ha ve an inequality of the form J N ( b, F ` ) ≤ J N ( b, F u ) ≤ c 0 + pJ N ( b, F u ) (where c 0 = τ + R L ` J k +1 ( b, F t ) dL ( t ) ). As before we can show that J N ( b, F ` ) ≤ c 0 + pJ N ( b, F ` ) . Finally the proof is complete by showing that c 0 + pJ N ( b, F ` ) = C k ( b, F ` ) as follows: C k ( b, F ` ) = τ + Z L ` J k +1 ( b, F t ) dL ( t ) + Z L c ` J k +1 ( b, F ` ) dL ( t ) = c 0 + pJ N ( b, F ` ) , (34) where to replace J k +1 ( b, F ` ) by J N ( b, F ` ) we have to again apply Lemma 6. Howe ver this time S k +1 ⊆ Q ` k +1 , is by the induction hypothesis. W e still require a distrib ution F u satisfying S k ⊆ Q u k , for ev ery k . The minimum distrib ution F m turns out to be useful in this context. The following lemma thus constitutes Step 2 of the proof of Lemma 7. Lemma 11: For ev ery k = 1 , 2 , · · · , N − 1 , the stobing set Q m k corresponding to the minimum distribution F m satisﬁes, S k ⊆ Q m k . Pr oof: First note that the existence of a minimum distribution F m follows from Lemma 5. Now , F m being minimum we hav e F ` ≥ st F m for all F ` . Then, using Lemma 2-(i) we can write J k +1 ( b, F L k +1 ) ≤ J k +1 ( b, F m ) . Using the abov e expression in (10) and then recalling (9), we obtain C k ( b, F m ) = C k ( b ) . Finally , the result follows from the deﬁnition of the sets Q m k and S k . A P P E N D I X F P RO O F O F T H E O R E M 3 Theor em 3 : For k = 1 , 2 , · · · , N − 1 and for any F ` , S ` k = S ` k +1 . Pr oof: Recalling the deﬁnition of the set S ` k (from (15)), for any b ∈ S ` k +1 we hav e (if k + 1 = N , note that the following expression will not contain the continuing cost), − η b ≤ min n P k +1 ( b, F ` ) , C k +1 ( b, F ` ) o . Suppose, as in Theorem 2, we can show that for any b ∈ S ` k +1 , the various costs at stages k and k + 1 are same, i.e., P k ( b, F ` ) = P k +1 ( b, F ` ) and C k ( b, F ` ) = C k +1 ( b, F ` ) , then the abov e inequality would imply , S ` k ⊇ S ` k +1 . The proof is complete by recalling that we already have S ` k ⊆ S ` k +1 (from Lemma 3-(iii)). Fix a b ∈ S ` k +1 . T o show that P k ( b, F ` ) = P k +1 ( b, F ` ) , ﬁrst using Lemma 3-(i) and Theorem 2, note that S ` k +1 ⊆ S k +1 = S k . Since b ∈ S k +1 the cost of probing is P k +1 ( b, F ` ) = η δ + E ` h J k +1 (max { b, R ` } ) i = η δ − η E ` h max { b, R ` } i where, to obtain the second equality , note that max { b, R ` } ∈ S k (from Theorem 1) and hence at max { b, R ` } it is optimal to stop, so that J k +1 (max { b, R ` } ) = − η max { b, R ` } . Similarly , since b is also in S k the cost of probing at stage k , P k ( b, F ` ) , is again η δ − η E ` h max { b, R ` } i . Finally , following the same procedure used to show C k ( b ) = C k +1 ( b ) in Theorem 2, we can obtain C k ( b, F ` ) = C k +1 ( b, F ` ) , thus completing the proof.

Relay Selection with Channel Probing in Sleep-Wake Cycling Wireless Sensor Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment