Cross-Layer Designs in Coded Wireless Fading Networks with Multicast
A cross-layer design along with an optimal resource allocation framework is formulated for wireless fading networks, where the nodes are allowed to perform network coding. The aim is to jointly optimize end-to-end transport layer rates, network code …
Authors: Ketan Rajawat, Nikolaos Gatsis, Georgios B. Giannakis
Traditional networks have always assumed nodes capable of only forwarding or replicating packets. For many types of networks however, this constraint is not inherently needed since the nodes can invariably perform encoding functions. Interestingly, even simple linear mixing operations can be powerful enough to enhance the network throughput, minimize delay, and decrease the overall power consumption [1], [2]. For the special case of single-source multicast, which does not even admit a polynomial-time solution within the routing framework [3], linear network coding achieves the full network capacity [4]. In fact, the network flow description of multicast with random network coding adheres to only linear inequality constraints reminiscent of the corresponding description in unicast routing [5].
This encourages the use of network coding to extend several popular results in unicast routing framework to multicast without appreciable increase in complexity. Of particular interest is the resource allocation and cross-layer optimization task in wireless networks [6], [7]. The objective here is to maximize a network utility function subject to flow, rate, capacity and power constraints. This popular approach not only offers the flexibility of capturing diverse performance objectives, but also admits a layering interpretation, arising from different decompositions of the optimization problem [8].
This paper deals with cross-layer optimization of wireless multicast networks that use network coding and operate over fading links. The aim is to maximize a total network utility objective, and entails finding end-to-end rates, network code design variables, broadcast link flows, link capacities, average power consumption, and instantaneous power allocations.
Network utility maximization was first brought into coded networks in [5], where the aim was to minimize a generic cost function subject only to flow and rate constraints. The optimal flow and rate variables may then be converted to a practical random network coding implementation using methods from [9] and [10]. Subsequent works extended this framework to include power, capacity, and scheduling constraints [11]- [14]. The interaction of network coding with the network and transport layers has also been explored in [15]- [19]; in these works, networks with fixed link capacities are studied, and different decomposition techniques result in different types of layered architectures.
There are however caveats associated with the utility maximization problem in wireless networks. First, the power control and scheduling subproblems are usually non-convex. This implies that the dual decomposition of the overall problem, though insightful, is not necessarily optimal and does not directly result in a feasible primal solution. Second, for continuous fading channels, determining the power control policy is an infinite dimensional problem. Existing approaches in network coding consider either deterministic channels [11], [14], or, links with a finite number of fading states [12], [20], [21].
On the other hand, a recent result in unicast routing shows that albeit the non-convexity, the overall utility optimization problem has no duality gap for wireless networks with continuous fading channels [22]. As this is indeed the case in all reallife fading environments, the result promises the optimality of layer separation. In particular, it renders a dual subgradient descent algorithm for network design optimal [23].
The present paper begins with a formulation that jointly optimizes end-to-end rates, virtual flows, broadcast link flows, link capacities, average power consumption, and instantaneous power allocations in wireless fading multicast networks that use intra-session network coding (Section II). The first contribution of this paper is to introduce a realistic physical layer model formulation accounting for the capacity of broadcast links. The cross-layer problem is generally non-convex, yet it is shown to have zero duality gap (Section III-A). This result considerably broadens [22] to coded multicast networks with broadcast links. The zero duality gap is then leveraged in order to develop a subgradient descent algorithm that minimizes the dual function (Sections III-B, III-C). The algorithm admits a natural layering interpretation, allowing optimal integration of network coding into the protocol stack.
In Section IV, the subgradient algorithm is modified so that the component of the subgradient that results from the physical layer power allocation may be delayed with respect to operations in other layers. This provably convergent asynchronous subgradient method and its online implementation constitute the second major contribution. Unlike the algorithm in [23], which is used for offline network optimization, the algorithm developed here is suitable for online network control. Convergence of asynchronous subgradient methods for dual minimization is known under diminishing stepsize [24]; the present paper proves results for constant stepsize. Near-optimal primal variables are also recovered by forming running averages of the primal iterates. This technique has also been used in synchronous subgradient methods for convex optimization; see e.g., [25] and references therein. Here, ergodic convergence results are established for the asynchronous scheme and the non-convex problem at hand. Finally, numerical results are presented in Section V, and Section VI concludes the paper.
Consider a wireless network consisting of a set of terminals (nodes) denoted by N . The broadcast property of the wireless interface is modeled by using the concept of hyperarcs. A hyperarc is a pair (i, J) that represents a broadcast link from a node i to a chosen set of nodes J ⊂ N . The entire network can therefore be represented as a hypergraph H = (N , A), where A is the set of hyperarcs. The complexity of the model is determined by the choice of the set A. Let the neighbor-set N (i) denote the set of nodes that node i reaches. An exhaustive model might include all possible 2 |N (i)| -1 hyperarcs from node i. On the other hand, a simpler model might include only a smaller number of hyperarcs per node. A point-to-point model is also a special case when node i has |N (i)| hyperarcs each containing just one receiver.
The present work considers a physical layer whereby the channels undergo random multipath fading. This model allows for opportunistically best schedules per channel realization. This is different from the link-level network models in [5], [12], [13], [21], where the hyperarcs are modeled as erasure channels. The next subsection discusses the physical layer model in detail.
In the current setting, terminals are assumed to have a set of tones F available for transmission. Let h f ij denote the power gain between nodes i and j over a tone f ∈ F , assumed random, capturing fading effects. Let h represent the vector formed by stacking all the channel gains. The network operates in a time slotted fashion; the channel h remains constant for the duration of a slot, but is allowed to change from slot to slot. A slowly fading channel is assumed so that a large number of packets may be transmitted per time slot. The fading process is modeled to be stationary and ergodic.
Since the channel changes randomly per time slot, the optimization variables at the physical layer are the channel realization-specific power allocations p f iJ (h) for all hyperarcs (i, J) ∈ A, and tones f ∈ F . For convenience, these power allocations are stacked in a vector p(h). Instantaneous power allocations may adhere to several scheduling and mask constraints, and these will be generically denoted by a bounded set Π such that p(h) ∈ Π. The long-term average power consumption by a node i is given by
where E[.] denotes expectation over the stationary channel distribution.
For slow fading channels, the information-theoretic capacity of a hyperarc (i, J) is defined as the maximum rate at which all nodes in J receive data from i with vanishing probability of error in a given time slot. This capacity depends on the instantaneous power allocations p(h) and channels h.
However, only conflict-free hyperarc are allowed to be scheduled for a given h. Specifically, power may be allocated to hyperarcs (i 1 , J 1 ) and (i 2 , J 2 ) if and only if [13] i) i 1 = i 2 ; ii) i 1 / ∈ J 2 and i 2 / ∈ J 1 (half-duplex operation); and iii-a) J 1 ∩ J 2 = ∅ (primary interference), or additionally, iii-b) J 1 ∩ N (i 2 ) = J 2 ∩ N (i 1 ) = ∅ (secondary interference). The set Π therefore consists of all possible power allocations that satisfy the previous properties.
Due to hyperarc scheduling, all transmissions in the network are interference free. The signal-to-noise ratio (SNR) at a node j ∈ J is given by
where N j is the noise power at j. In a broadcast setting, the maximum rate of information transfer from i to each node in J is
A similar expression can be written for the special case of point-to-point links by substituting hyperarcs (i, J) by arcs (i, j) in the expression for Γ f iJj (p(h), h). For slow-fading channels, Gaussian codebooks with sufficiently large block lengths achieve this capacity in every time slot. More realistically, an SNR penalty term ρ can be included to account for finite-length practical codes and adaptive modulation schemes, so that
The penalty term is in general a function of the target bit error rate.
Example 2. Signal-to-interference-plus-noise-ratio (SINR) model: Here, the constraint set Π is simply a box set B p ,
The set B p could also include (instantaneous) sum-power constraints per node. The capacity is expressed as in (4) or (5), but now the SNR is replaced by the SINR, given by
The denominator consists of the following terms:
• Interference from other nodes' transmissions to node j
• "Self-interference" due to transmissions of node j
This term is introduced to encourage half-duplex operation by setting h jj to a large value. • "Broadcast-interference" from transmissions of node i to other hyperarcs
This term is introduced to force node i to transmit at most over a single hyperarc, by setting β to a large value. The previous definitions ignore interference from nonneighboring nodes. However, they can be readily extended to include more general interference models.
The link layer capacity is defined as the long-term average of the total instantaneous capacity, namely,
This is also called ergodic capacity and represents the maximum average data rate available to the link layer.
The network supports multiple multicast sessions indexed by m, namely S m := (s m , T m , a m ), each associated with a source node s m , sink nodes T m ⊂ N , and an average flow rate a m from s m to each t ∈ T m . The value a m is the average rate at which the network layer of source terminal s m admits packets from the transport layer. Traffic is considered elastic, so that the packets do not have any short-term delay constraints.
Network coding is a generalization of routing since the nodes are allowed to code packets together rather than simply forward them. This paper considers intra-session network coding, where only the traffic belonging to the same multicast session is allowed to mix. Although better than routing in general, this approach is still suboptimal in terms of achieving the network capacity. However, general (inter-session) network coding is difficult to characterize or implement since neither the capacity region nor efficient network code designs are known [1,Part II]. On the other hand, a simple linear coding strategy achieves the full capacity region of intra-session network coding [4].
The network layer consists of endogenous flows of coded packets over hyperarcs. Recall that the maximum average rate of transmission over a single hyperarc cannot exceed c iJ . Let the coded packet-rate of a multicast session m over hyperarc (i, J) be z m iJ (also referred to as the subgraph or broadcast link flow). The link capacity constraints thus translate to
To describe the intra-session network coding capacity region, it is commonplace to use the concept of virtual flow between terminals i and j corresponding to each session m and sink t ∈ T m with average rate x mt ij . These virtual flows are defined only for neighboring pairs of nodes i.e., (i, j) ∈ G := {(i, j)|(i, J) ∈ A, j ∈ J}. The virtual flows satisfy the flow-conservation constraints, namely, j:(i,j)∈G
for all m, t ∈ T m , and i ∈ N . Hereafter, the set of equations for i = t will be omitted because they are implied by the remaining equations.
The broadcast flows z m iJ and the virtual flows x mt ij can be related using results from the lossy-hyperarc model of [5], [13]. Specifically, [13, eq. ( 9)] relates the virtual flows and subgraphs, using the fraction b iJK ∈ [0, 1] of packets injected into the hyperarc (i, J) that reach the set of nodes K ⊂ N (i). Recall from Section II-A, that here the instantaneous capacity function C f iJ (•) is defined such that all packets injected into the hyperarc (i, J) are received by every node in J. Thus in our case, b iJK = 1 whenever K ∩ J = ∅ and consequently, j∈K
(12) Note the difference with [13] where at every time slot, packets are injected into a fixed set of hyperarcs at the same rate. The problem in [13] is therefore to find a schedule of hyperarcs that do not interfere (the non-conflicting hyperarcs). The same schedule is used at every time slot; however, only a random subset of nodes receive the injected packets in a given slot. Instead here, the hyperarc selection is part of the power allocation problem at the physical layer, and is done for every time slot. The transmission rate (or equivalently, the channel coding redundancy) is however appropriately adjusted so that all the nodes in the selected hyperarc receive the data.
In general, for any feasible solution to the set of equations ( 10)- (12), a network code exists that supports the corresponding exogenous rates a m [5]. This is because for each multicast session m, the maximum flow between s m and t ∈ T m is a m , and is therefore achievable [4,Th. 1]. Given a feasible solution, various network coding schemes can be used to achieve the exogenous rates. Random network coding based implementations such as those proposed in [9] and [10], are particularly attractive since they are fully distributed and require little overhead. These schemes also handle any residual errors or erasures that remain due to the physical layer.
The system model also allows for a set of "box constraints" that limit the long-term powers, transport layer rates, broadcast link flow rates, virtual flow rates as well as the maximum link capacities. Combined with the set Π, these constraints can be compactly expressed as
Here y is a super-vector formed by stacking all the average rate and power variables, that is, a m , z m iJ , x mt ij , c iJ , and p i . Parameters with min/max subscripts or superscripts denote prescribed lower/upper bounds on the corresponding variables.
A common objective of the network optimization problem is maximization of the exogenous rates a m and minimization of the power consumption p i . Towards this end, consider increasing and concave utility functions U m (a m ) and convex cost functions V i (p i ) so that the overall objective function
For example, the utility function can be the logarithm of session rates and the cost function can be the squared average power consumption. The network utility maximization problem can be written as
where i ∈ N . Note that constraints (1), ( 9) and ( 11) have been relaxed without increasing the objective function. For instance, the relaxation of ( 11) is equivalent to allowing each node to send at a higher rate than received, which amounts to adding virtual sources at all nodes i = t. However, adding virtual sources does not result in an increase in the objective function because the utilities U m depend only on the multicast rate a m . The solution of the optimization problem (14) gives the throughput a m that is achievable using optimal virtual flow rates x mt ij and power allocation policies p(h). These virtual flow rates are used for network code design. When implementing coded networks in practice, the traffic is generated in packets and stored at nodes in queues (and virtual queues for virtual flows) [10]. The constraints in (14) guarantee that all queues are stable.
Optimization problem ( 14) is non-convex in general, and thus difficult to solve. For example, in the conflict graph model, the constraint set Π is discrete and non-convex, while in the SINR-model, the capacity function C f iJ (p(h), h) is a non-concave function of p(h); see e.g., [26], [6]. The next section analyzes the Lagrangian dual of (14).
This section shows that ( 14) has zero duality gap, and solves the dual problem via subgradient descent iterations. The purpose here is two-fold: i) to describe a layered architecture in which linear network coding is optimally integrated; and ii) to set the basis for a network implementation of the subgradient method, which will be developed in Section IV.
Associate Lagrange multipliers ν mt i , η mt iK , ξ iJ , λ iJ and µ i with the flow constraints (14b), the union of flow constraints (14c), the link rate constraints (14d), the capacity constraints (14e), and the power constraints (14f), respectively. Also, let ζ be the vector formed by stacking these Lagrange multipliers in the aforementioned order. Similarly, if inequalities (14b)-(14f) are rewritten with zeros on the right-hand side, the vector q(y, p(h)) collects all the terms on the left-hand side of the constraints. The Lagrangian can therefore be written as
The dual function and the dual problem are, respectively,
Since (14e) may be a non-convex constraint, the duality gap is in general, non-zero; i.e., D ≥ P. Thus, solving (17) yields an upper bound on the optimal value P of ( 14). In the present formulation however, we have the following interesting result.
A generalized version of Proposition 1, including a formal definition of continuous fading, is provided in Appendix A and connections to relevant results are made. The essential reason behind this strong duality is that the set of ergodic capacities resulting from all feasible power allocations is convex.
The requirement of continuous fading channels is not limiting since it holds for all practical fading models, such as Rayleigh, Rice, or Nakagami-m. Recall though that the dual problem is always convex. The subgradient method has traditionally been used to approximately solve (17), and also provide an intuitive layering interpretation of the network optimization problem [8]. The zero duality gap result is remarkable in the sense that it renders this layering optimal.
A corresponding result for unicast routing in uncoded networks has been proved in [22]. The fact that it holds for coded networks with broadcast links, allows optimal integration of the network coding operations in the wireless protocol stack. The next subsection deals with this subject.
The dual problem (17) can in general be solved using the subgradient iterations [27,Sec. 8.2] indexed by ℓ (y(ℓ), p(h; ℓ)) ∈ arg max
where ǫ is a positive constant stepsize, and [.] + denotes projection onto the nonnegative orthant. The inclusion symbol (∈) allows for potentially multiple maxima. In (18b), q(y(ℓ), p(h; ℓ)) is a subgradient of the dual function ̺(ζ) in ( 16) at ζ(ℓ). Next, we discuss the operations in (18) in detail.
For the Lagrangian obtained from (15), the maximization in (18a) can be separated into the following subproblems
where
and 1 1 X is the indicator function, which equals one if the expression X is true, and zero otherwise.
The physical layer subproblem (19f) implies per-fading state separability. Specifically, instead of optimizing over the class of power control policies, (19f) allows solving for the optimal power allocation for each fading state; that is,
Note that problems (19a)-(19e) are convex and admit efficient solutions. The per-fading state power allocation subproblem (19f) however, may not necessarily be convex. For example, under the conflict graph model (cf. Example 1), the number of feasible power allocations may be exponential in the number of nodes. Finding an allocation that maximizes the objective function in (20) is equivalent to the NP-hard maximum weighted hyperarc matching problem [13]. Similarly, the capacity function and hence the objective function for the SINR model (cf. Example 2) is non-convex in general, and may be difficult to optimize.
This separable structure allows a useful layered interpretation of the problem. In particular, the transport layer subproblem (19a) gives the optimal exogenous rates allowed into the network; the network flow sub-problem (19b) yields the endogenous flow rates of coded packets on the hyperarcs; and the virtual flow sub-problem (19c) is responsible for determining the virtual flow rates between nodes and therefore the network code design. Likewise, the capacity sub-problem (19d) yields the link capacities and the power sub-problem (19e) provides the power control at the data link layer.
The layered architecture described so far also allows for optimal integration of network coding into the protocol stack. Specifically, the broadcast and virtual flows optimized respectively in (19b) and (19c), allow performing the combined routing-plus-network coding task at network layer. An implementation such as the one in [10] typically requires queues for both broadcast as well as virtual flows to be maintained here.
Next, the subgradient updates of (18b) become
where q(ℓ) are the subgradients at index ℓ given by
The physical layer updates (21d) and (21e) are again complicated since they involve the E[.] operations of (22d) and (22e). These expectations can be acquired via Monte Carlo simulations by solving (19f) for realizations of h and averaging over them. These realizations can be independently drawn from the distribution of h, or they can be actual channel measurements.
In fact, the latter is implemented in Section IV on the fly during network operation.
This subsection provides convergence results for the subgradient iterations (18). Since the primal variables (y, p(h)) and the capacity function C f iJ (.) are bounded, it is possible to define an upper bound G on the subgradient norm; i.e., q(y(ℓ), p(h; ℓ)) ≤ G for all ℓ ≥ 1.
Proposition 2. For the subgradient iterations in (19) and (21), the best dual value converges to D upto a constant; i.e.,
This result is well known for dual (hence, convex) problems [27,Prop. 8.2.3]. However, the presence of an infinitedimensional variable p(h) is a subtlety here. A similar case is dealt with in [22] and Proposition 2 follows from the results there.
Note that in the subgradient method (18), the sequence of primal iterates {y(ℓ)} does not necessarily converge. However, a primal running average scheme can be used for finding the optimal primal variables y * as summarized next. Recall that f (y) denotes the objective function m U m (a m )i V i (p i ).
the following results hold: a) There exists a sequence {p(h; s)} such that (ȳ(s), p(h; s)) ∈ B, and also lim s→∞ [q(ȳ(s), p(h; s))]
b) The sequence f (ȳ(s)) converges in the sense that
and lim sup
Equation (25) asserts that the sequence {ȳ(ℓ)} together with an associated {p(h; ℓ)} becomes asymptotically feasible. Moreover, (26) explicates the asymptotic suboptimality as a function of the stepsize, and the bound on the subgradient norm. Proposition 3 however, does not provide a way to actually find {p(h; ℓ)}.
Averaging of the primal iterates is a well-appreciated method to obtain optimal primal solutions from dual subgradient methods in convex optimization [25]. Note though that the primal problem at hand is non-convex in general. Results related to Proposition 3 are shown in [23]. Proposition 3 follows in this paper as a special case result for a more general algorithm allowing for asynchronous subgradients and suitable for online network control, elaborated next.
The algorithm in Section III-B finds the optimal operating point of ( 14) in an offline fashion. In the present section, the subgradient method is adapted so that it can be used for resource allocation during network operation.
The algorithm is motivated by Proposition 3 as follows. The exogenous arrival rates a m (ℓ) generated by the subgradient method [cf. (19a)] can be used as the instantaneous rate of the traffic admitted at the transport layer at time ℓ. Then, Proposition 3 guarantees that the long-term average transport layer rates will be optimal. Similar observations can be made for other rates in the network.
More generally, an online algorithm with the following characteristics is desirable.
• Time is divided in slots and each subgradient iteration takes one time slot. The channel is assumed to remain invariant per slot but is allowed to vary across slots. • Each layer maintains its set of dual variables, which are updated according to (21) with a constant stepsize ǫ. • The instantaneous transmission and reception rates at the various layers are set equal to the primal iterates at that time slot, found using (19). • Proposition 3 ensures that the long-term average rates are optimal. For network resource allocation problems such as those described in [5], the subgradient method naturally lends itself to an online algorithm with the aforementioned properties. This approach however cannot be directly extended to the present case because the dual updates (21d)-(21e) require an expectation operation, which needs prior knowledge of the exact channel distribution function for generation of independent realizations of h per time slot. Furthermore, although Proposition 3 guarantees the existence of a sequence of feasible power variables p(h; s), it is not clear if one could find them since the corresponding running averages do not necessarily converge.
Towards adapting the subgradient method for network control, recall that the subgradients qiJ λ and qi µ involve the following summands that require the expectation operations [cf. (22d) and (22e)]
These expectations can however be approximated by averaging over actual channel realizations. To do this, the power allocation subproblem (19f) must be solved repeatedly for a prescribed number of time slots, say S, while using the same Lagrange multipliers. This would then allow approximation of the E operations in ( 27) and (28) with averaging operations, performed over channel realizations at these time slots. It is evident however, that the averaging operation not only consumes S time slots but also that the resulting subgradient is always outdated. Specifically, if the current time slot is of the form ℓ = KS + 1 with K = 0, 1, 2, . . ., the most recent approximations of CiJ and Pi available are Update the dual iterates λ iJ (ℓ + 1) and µ i (ℓ + 1):
Network Control: Use the current iterates a m (ℓ) for flow control; x mt ij (ℓ) and z m iJ (ℓ) for routing and network coding; c iJ (ℓ) for link rate control; and p(h ℓ ; τ (ℓ)) for instantaneous power allocation.
Here, the power allocations are calculated using (19f) with the old multipliers λ iJ (ℓ -S) and µ i (ℓ -S). The presence of outdated subgradient summands motivates the use of an asynchronous subgradient method such as the one in [24].
Specifically, the dual updates still occur at every time slot but are allowed to use subgradients with outdated summands. Thus, ĈiJ (ℓ -S) and Pi (ℓ -S) are used instead of the corresponding E[.] terms in (22d) and (22e) at the current time ℓ. Further, since the averaging operation consumes another S time slots, the same summands are also used for times ℓ + 1, ℓ + 2, . . ., ℓ + S -1. At time ℓ + S, power allocations from the time slots ℓ, ℓ + 1, ℓ + S -1 become available, and are used for calculating ĈiJ (ℓ) and Pi (ℓ), which then serve as the more recent subgradient summands. Note that a subgradient summand such as ĈiJ is at least S and at most 2S -1 slots old.
The asynchronous subgradient method is summarized as Algorithm 1. The algorithm uses the function τ (ℓ) which outputs the time of most recent averaging operation, that is,
Note that S ≤ ℓτ (ℓ) ≤ 2S -1. Recall also that the subgradient components ĈiJ and Pi are evaluated only at times τ (ℓ).
The following proposition gives the dual convergence result on this algorithm. Define Ḡ as the bound [ ĈT PT ] T ≤ Ḡ where Ĉ and P are formed by stacking the terms
Thus, the suboptimality in the asynchronous subgradient over the synchronous version is bounded by a constant proportional to D = 2S -1. Consequently, the asynchronous subgradient might need a smaller stepsize (and hence, more iterations) to reach a given distance from the optimal.
The convergence of asynchronous subgradient methods for convex problems such as (17) has been studied in [24,Sec. 6] for a diminishing stepsize. Proposition 4 provides a complementary result for constant stepsizes.
Again, as with the synchronous version, the primal running averages also converge to within a constant from the optimal value of ( 14). This is stated formally in the next proposition.
a) There exists a sequence p(h; s) such that (ȳ(s), p(h; s)) ∈ B and
b) The sequence f (ȳ(s)) converges in the following sense:
Note that as with the synchronous subgradient, the primal running averages are still asymptotically feasible, but the bound on their suboptimality increases by a term proportional to the delay D in the physical layer updates. Of course, all the results in Propositions 4 and 5 reduce to the corresponding results in Propositions 2 and 3 on setting D = 0. Interestingly, there is no similar result for primal convergence in asynchronous subgradient methods even for convex problems.
Finally, the following remarks on the online nature of the algorithm and the implementation of the Lagrangian maximizations in (19) are in order. Remark 1. Algorithm 1 has several characteristics of an online adaptive algorithm. In particular, prior knowledge of the channel distribution is not needed in order to run the algorithm since the expectation operations are replaced by averaging over channel realizations on the fly. Likewise, running averages need not be evaluated; Proposition 5 ensures that the corresponding long-term averages will be near-optimal. Further, if at some time the network topology changes and the algorithm keeps running, it would be equivalent to restarting the entire algorithm with the current state as initialization. The algorithm is adaptive in this sense. Remark 2. Each of the maximization operations (19a)-(19e) is easy, because it involves a single variable, concave objective, box constraints, and locally available Lagrange multipliers. The power control subproblem (19f) however may be hard and require centralized computation in order to obtain a (near-) optimal solution. For the conflict graph model, see [13], [28] and references therein for a list of approximate
algorithms. For the SINR model, solutions of (19f) could be based on approximation techniques in power control for digital subscriber lines (DSL)-see e.g., [23] and references thereinand efficient message passing protocols as in [11].
The asynchronous algorithm developed in Section IV is simulated on the wireless network shown in Fig. 1. The network has 8 nodes placed on a 300m × 300m area. Hyperarcs originating from node i are denoted by (i, J) ∈ A where J ∈ 2 N (i) \∅ i.e., the power set of the neighbors of i excluding the empty set. For instance, hyperarcs originating from node 1 are (1, {2}), (1, {8}) and (1, {2, 8}). The network supports the two multicast sessions S 1 = {1, {4, 6}} and S 2 = {4, {1, 7}}. sets of conflict free hyperarcs (cf. Example 1); these sets are called matchings. At each time slot, the aim is to find the matching that maximizes the objective function f,(i,J) γ f iJ . Note that since γ f iJ is a positive quantity, only maximal matchings, i.e., matchings with maximum possible cardinality, need to be considered. At each time slot, the following two steps are carried out. S1) Find the optimal power allocation for each maximal matching. Note that the capacity of an active hyperarc is a function of the power allocation over that hyperarc alone [cf. ( 3) and ( 4)]. Thus, the maximization in (19f) can be solved separately for each hyperarc and tone. The resulting objective [cf. (19g)] is a concave function in a single variable, admitting an easy waterfilling-type solution. S2) Evaluate the objective function (19f) for each maximal matching and for powers found in Step 2, and choose the matching with the highest resulting objective value. It is well known that the enumeration of hyperarc matchings requires exponential complexity [13]. Since the problem at hand is small, full enumeration is used.
Fig. 2 shows the evolution of the utility function f (ȳ(s)) and the best dual value up to the current iteration. The utility function is evaluated using the running average of the primal iterates [cf. (24)]. It can be seen that after a certain number of iterations, the primal and dual values remain very close corroborating the vanishing duality gap. Fig. 3 shows the evolution of the utility function for different values of S. Again the utility function converges to a near-optimal value after sufficient number of iterations. Note however that the gap from the optimal dual value increases for large values of S, such as S = 60 (cf. Proposition 5).
Finally, Fig. 4 shows the optimal values of certain optimization variables. Specifically the two subplots show all the virtual flows to given sinks for each of the multicast sessions, namely, {s 1 = 1, t = 6} and {s 2 = 4, t = 7}, respectively. The thickness and the gray level of the edges is proportional to the magnitude of the virtual flows. It can be observed that most virtual flows are concentrated along the shorter paths between the source and the sink. Also, the radius of the circles representing the nodes is proportional to the optimal average power consumption. It can be seen that the inner nodes 2, 4, 6, and 8 consume more power than the outer ones, 1, 3, 5, and 7. This is because the inner nodes have more neighbors, and thus more opportunities to transmit. Moreover, the outer nodes are all close to their neighbors.
This paper formulates a cross-layer optimization problem for multicast networks where nodes perform intra-session network coding, and operate over fading broadcast links. Zero duality gap is established, rendering layered architectures optimal.
Leveraging this result, an adaptation of the subgradient method suitable for network control is also developed. The method is asynchronous, because the physical layer returns its contribution to the subgradient vector with delay. Using the subgradient vector, primal iterates in turn dictate routing, network coding, and resource allocation. It is established that network variables, such as the long-term average rates admitted into the network layer, converge to near-optimal values, and the suboptimality bound is provided explicitly as a function of the delay in the subgradient evaluation. APPENDIX A STRONG DUALITY FOR THE NETWORKING PROBLEM (14) This appendix formulates a general version of problem (14), and gives results about its duality gap. Let h be the random channel vector in Ω := R d h + , where R + denotes the nonnegative reals, and d h the dimensionality of h. Let D be the σ-field of Borel sets in Ω, and P h the distribution of h, which is a probability measure on D.
As in (14), consider two optimization variables: the vector y constrained to a subset B y of the Euclidean space R dy ; and the function p : Ω → R dp belonging to an appropriate set of functions P. In the networking problem, the aforementioned function is the power allocation p(h), and set P consists of the power allocation functions satisfying instantaneous constraints, such as spectral mask or hyperarc scheduling constraints (cf. also Examples 1 and 2). Henceforth, the function variable will be denoted by p instead of p(h), for brevity. Let Π be a subset of R dp . Then P is defined as the set of functions taking values in Π.
The network optimization problem ( 14) can be written in the general form
where g and v are R d -valued functions describing d constraints. The formulation also subsumes similar problems in the unicast routing framework such as those in [22], [23].
Evidently, problem ( 14) is a special case of (36). If inequalities (14b)-(14f) are rearranged to have zeros on the right hand side, function v(p(h), h) will simply have zeros in the entries that correspond to constraints (14b)-(14d). The function q(y, p(h)) defined before (15)
The following assumptions regarding (36) are made:
AS1. Constraint set B y is convex, closed, bounded, and in the interior of the domains of functions f (y) and g(y). Set Π is closed, bounded, and in the interior of the domain of function v(., h) for all h.
is convex, and v(p(h), h) is integrable whenever p is measurable. Furthermore, there is a Ḡ > 0 such that E[v(p(h), h)] ≤ Ḡ, whenever p ∈ P.
Random vector h is continuous; 1 and AS4. There exist y ′ ∈ B y and p ′ ∈ P such that (36b) holds as strict inequality (Slater constraint qualification).
Note that these assumptions are natural for the network optimization problem (14). Specifically, B y are the box constraints for variables a m , x mt ij , z m iJ , c iJ , and p i ; and Π gives the instantaneous power allocation constraints. The function f (y) is selected concave and g(y) is linear. Moreover, the entries of v(p(h), h) corresponding to (14f) are bounded because the set Π is bounded. For the same reason, the ergodic capacities E[C f iJ (p(h), h)] are bounded. While (36) is not convex in general, it is separable [29,Sec. 5.1.6]. The Lagrangian (keeping constraints (36c) implicit) and the dual function are, respectively [cf. also (15) and ( 16)]
where ζ denotes the vector of Lagrange multipliers and
The additive form of the dual function is a consequence of the separable structure of the Lagrangian. Further, AS1 and AS2 ensure that the domain of ̺(ζ) is R d . Finally, the dual problem becomes [cf. also (17)]
As p varies in P, define the range of E[v(p(h), h)] as
The following lemma demonstrating the convexity of R plays a central role in establishing the zero duality gap of (36), and in the recovery of primal variables from the subgradient method.
Lemma 1. If AS1-AS3 hold, then the set R is convex.
The proof relies on Lyapunov's convexity theorem [30]. Recently, an extension of Lyapunov's theorem [30,Extension 1] has been applied to show zero duality gap of power control problems in DSL [26]. This extension however does not apply here, as indicated in the ensuing proof. In a related contribution [22], it is shown that the perturbation function of a problem similar to (36) is convex; the claim of Lemma 1 though is quite different. 1 Formally, this is equivalent to saying that P h is absolutely continuous with respect to the Lebesgue measure on R d h + . In more practical terms, it means that h has a probability density function without deltas.
Proof of Lemma 1: Let r 1 and r 2 denote arbitrary points in R, and let α ∈ (0, 1) be arbitrary. By the definition of R, there are functions p 1 and p 2 in P such that
The set function u(E) is a nonatomic vector measure on D, because P h is nonatomic (cf. AS3) and the functions v(p 1 (h), h) and v(p 2 (h), h) are integrable (cf. AS2); see [31] for definitions. Hence, Lyapunov's theorem applies to u(E); see also [30,Extension 1] and [22, Lemma 1].
Specifically, consider a null set Φ in D, i.e., a set with P h (Φ) = 0, and the whole space Ω ∈ D. It holds that u(Φ) = 0 and u(Ω) = [r T 1 , r T 2 ] T . For the chosen α, Lyapunov's theorem asserts that there exists a set E α ∈ D such that (E c α denotes the complement of E α )
Now using these E α and E c α , define
It is easy to show that p α (h) ∈ P. In particular, the function p α (h) can written as p α (h) = p 1 (h)1 1 Eα + p 2 (h)1 1 E c α , where 1 1 E is the indicator function of a set E ∈ D. Hence it is measurable, as sum of measurable functions. Moreover, we have that p α (h) ∈ Π for almost all h, because p 1 (h) and p 2 (h) satisfy this property. The need to show p α (h) ∈ P makes [30, Extension 1] not directly applicable here.
Thus, p α (h) ∈ P and satisfies [cf. (44)]
Finally, the zero duality gap result follows from Lemma 1 and is stated in the following proposition. Proposition 6. If AS1-AS4 hold, then problem (36) has zero duality gap, i.e., P = D. Furthermore, the values P and D are finite, the dual problem (40) has an optimal solution, and the set of optimal solutions of (40) is bounded.
Proof: Function f (y) is continuous on B y since it is convex (cf. AS1 and AS2) [27,Prop. 1.4.6]. This, combined with the compactness of B y , shows that the optimal primal value P is finite. Consider the set
Using Lemma 1, it is easy to verify that set W is convex. The rest of the proof follows that of [29,Prop. 5.3.1 and 5.1.4], using the finiteness of P and Slater constraint qualification (cf. AS4).
The boundedness of the optimal dual set is a standard result for convex problems under Slater constraint qualification and finiteness of optimal primal value; see e.g., [27,Prop. 6.4.3] and [25, p. 1762]. The proof holds also in the present setup since P is finite, P = D, and AS4 holds.
This appendix formulates the synchronous and asynchronous subgradient methods for the generic problem (36); and establishes the convergence claims in Propositions 2-5. Note that Propositions 2 and 3 follow from Propositions 4 and 5, respectively, upon setting the delay D = 0.
Starting from an arbitrary ζ(1) ≥ 0, the subgradient iterations for (40) indexed by ℓ ∈ N are [cf. also (18)]
where ǧ(ℓ) and v(ℓ) are the subgradients of functions ψ(ζ) and φ(ζ), defined as [cf. also (22)]
The iteration in (48c) is synchronous, because at every ℓ, both maximizations (48a) and (48b) are performed using the current Lagrange multiplier ζ(ℓ). An asynchronous method is also of interest and operates as follows. Here, the component v of the overall subgradient used at ℓ does not necessarily correspond to the Lagrange multiplier ζ(ℓ), but to the Lagrange multiplier at a time τ (ℓ) ≤ ℓ. Noting that the maximizer in (48b) is p(.; τ (ℓ))) and the corresponding subgradient component used at ℓ is v(τ (ℓ)), the iteration takes the form
The difference ℓ-τ (ℓ) is the delay with which the subgradient component v becomes available. In Algorithm 1 for example, the delayed components are ĈiJ (τ (ℓ)) and Pi (τ (ℓ)).
Next, we proceed to analyze the convergence of (50). Function g(y) is continuous on B y because it is convex [27,Prop. 1.4.6]. Then AS1 and AS2 imply that there exists a bound G such that for all y ∈ B y and p ∈ P,
Due to this bound on the subgradient norm, algorithm (50) can be viewed as a special case of an approximate subgradient method [32]. We do not follow this line of analysis here though, because it does not take advantage of the source of the error in the subgradient-namely, that an old maximizer of the Lagrangian is used. Moreover, algorithm (50) can be viewed as a particular case of an ε-subgradient method (see [29,Sec. 6.3.2] for definitions). This connection is made in [24] which only deals with diminishing stepsizes; here results are proved for constant stepsizes. The following assumption is adopted for the delay ℓτ (ℓ).
AS5. There exists a finite D ∈ N such that ℓτ (ℓ) ≤ D for all ℓ ∈ N.
AS5 holds for Algorithm 1 since the maximum delay there is D = 2S -1. The following lemma collects the results needed for Propositions 2 and 4. Specifically, it characterizes the error term in the subgradient definition when -v(τ (ℓ)) is used; and also relates successive iterates ζ(ℓ) and ζ(ℓ + 1). The quantity Ḡ in the ensuing statement was defined in AS2.
Lemma 2. Under AS1-AS5, the following hold for the sequence {ζ(ℓ)} generated by (50) for all θ ≥ 0 Proof of Lemma 2: a) The left-hand side of (52a) is
Applying the definition of the subgradient for φ(ζ) at ζ(τ (ℓ)) to (53), it follows that
Now, adding and subtracting the same terms in the righthand side of (54), we obtain
Applying the definition of the subgradient for φ(ζ) at ζ(τ (ℓ)+ κ) to (55), it follows that
Using the Cauchy-Schwartz inqeuality, (56) becomes
Now, write the subgradient iteration [cf. (50)] at τ (ℓ)+κ-1:
Subtracting ζ(τ (ℓ) + κ -1) from both sides of the latter and using the nonexpansive property of the projection [27, Prop. 2.2.1] followed by (51), one finds from (58) that
Finally, recall that v(ℓ) ≤ Ḡ for all ℓ ∈ N (cf. AS2), and ℓτ (ℓ) ≤ D for all ℓ ∈ N (cf. AS5). Applying the two aforementioned assumptions and (59) to (57), we obtain (52a).
b) This part follows readily from part a), using (38) and the definition of the subgradient for ψ
Due to the nonexpansive property of the projection, it follows that
Introducing (52b) and ( 51) into (61), (52c) follows.
The main convergence results for the synchronous and asynchronous subgradient methods are given by Propositions 2 and 4, respectively. Using Lemma 2, Proposition 4 is proved next.
Proof of Proposition 4: a) Let ζ * be an arbitrary dual solution. With g i and v i denoting the i-th entries of g and v, respectively, define
where y ′ and p ′ are the strictly feasible variables in AS4. Note that δ > 0 due to AS4. We show that the following relation holds for all ℓ ≥ 1:
Eq. ( 63) implies that the sequence of Lagrange multipliers {ζ(ℓ)} is bounded, because the optimal dual set is bounded (cf. Proposition 6). Next, (63) is shown by induction.
It obviously holds for ℓ = 1. Assume it holds for some ℓ ∈ N. It is proved next that it holds for ℓ + 1. Two cases are considered, depending on the value of ̺(ζ(ℓ)).
The square-bracketed quantity in (64) is positive due to the assumption of Case
Next, a bound on ||ζ(ℓ)|| is developed. Specifically, it holds due to the definition of the dual function [cf. (38)] that
Rewriting the inner product in (66) using the entries of the corresponding vectors and substituting (62) into (66) using ζ ≥ 0, it follows that
Using ζ(ℓ) ≤ d i=1 ζ i (ℓ) into (67), the following bound is obtained:
Introducing (68) into (65b) and using the assumption of Case 2, the desired relation (63
Summing the latter for ℓ = 1, . . . , s, and introducing the quantity min 1≤ℓ≤s ̺(ζ(ℓ)), it follows that
Substituting the left-hand side of (70) with 0, rearranging the resulting inequality, and dividing by 2ǫs, we obtain Thus, taking the limit as s → ∞ in (71), yields (32).
Note that the sequence of Lagrange multipliers in the synchronous algorithm (48c) is bounded. This was shown for convex primal problems in [25, Lemma 3]. Interestingly, the proof also applies in the present case since AS1-AS4 hold and imply finite optimal P = D. Furthermore, Proposition 2 for the synchronous method follows from [27, Prop. 8.2.3], [22].
Next, the convergence of primal variables through running averages is considered. The following lemma collects the intermediate results for the averaged sequence {ȳ(s)} [cf. (24)], and is used to establish convergence for the generic problem (36) with asynchronous subgradient updates as in (50). Note that ȳ(s) ∈ B y , s ≥ 1, because (24) represents a convex combination of the points {y(1), . . . , y(s)}. (72c)
Eq. ( 72a) is an upper bound on the constraint violation, while (72b) and (72c) provide lower and upper bounds on the objective function at ȳ(s). Lemma 3 relies on Lemma 1 and the fact that the averaged sequence {ȳ(s)} is generated from maximizers of the Lagrangian {y(ℓ)} that are not outdated. Applying the Cauchy-Schwartz inequality to the latter, (72c) follows readily. Using Lemma 3, the main convergence results for the synchronous and asynchronous subgradient methods are given correspondingly by Propositions 3 and 5, after substituting q(ȳ(s), p(h; s)) = g(ȳ(s)) + E[v(p(h; s)].
(88)
Proof of Proposition 5: a) Take limits on both sides of (72a) as s → ∞, and use the boundedness of {ζ(s)}.
b) Using P = D and taking the lim inf in (72b), we obtain (34a). Moreover, using P = D, (72a), the boundedness of ζ * , and taking lim sup in (72c), (34b) follows.
The conflict graph model of Example 1 with secondary interference constraints is used. In order to solve the power control subproblem (19f), we need to enumerate all possible
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment