Asymptotic optimality of the cross-entropy method for Markov chain problems
The correspondence between the cross-entropy method and the zero-variance approximation to simulate a rare event problem in Markov chains is shown. This leads to a sufficient condition that the cross-entropy estimator is asymptotically optimal.
Authors: Ad Ridder
good state 0, it will hit the failure set F before returning to 0. The associated event is denoted by A = 1{X(T ) ∈ F}.
In this paper we consider the problem of estimating the hitting probability P(A) by simulation. The typical applications that we have in mind, are system failures in models of highly reliable Markovian system, and excessive backlogs in product-form Jackson queueing networks. The failure set is a rare event that should occur only with very small probability, which makes the need for reducing the simulation variance, for instance by importance sampling. Reliability and queueing systems have been studied largely in relation to importance sampling estimation of the performance measures, we refer to overviews in [9] and in [10].
The rare event probability P(A) will be estimated by an importance sampling simulation method that implements a change of measure P * , i.e., the estimator Y is the average of i.i.d. replications of
where d P/d P * is the likelihood ratio, and where we assume that 1{A}d P is absolute continuous w.r.t. d P * . The issue is to find a good change of measure in terms of performance of the estimator Y . The optimal change of measure would be
for which Y would have zero variance [10]. It is shown in [5,8] that the associated Markov chain has transition probabilities
and p opt (x, y) = p(x, y) for good state x = 0, and bad states x ∈ F. The optimal transition probabilities cannot be used directly for simulation since they require knowledge of the unknown hitting probabilities, however they suggest to construct an importance sampling algorithm by approximating or estimating the hitting probabilities γ(x). For instance, let Π(x) be the set of all sample paths of the Markov chain of the form π = (x = x 0 → x 1 → • → x k ) with x 0 , . . . , x k-1 ∈ T , x k ∈ F and p(x j , x j+1 ) > 0 (j = 0, . . . , k -1), and where k = 1, 2, . . .. The probability of path π to occur is p(π) = k-1 j=0 p(x j , x j+1 ), and, clearly, the hitting probability becomes γ(x) = π∈Π(x) p(π). In [5] it is studied how the approximation
performs in reliability problems. In a slightly different context, [6, Section 4] considers the rare event probability that a random walk reaches high levels. It is shown that it fits in the Markov chain framework and an approximation of γ(x) is proposed based on an asymptotic approximation of these probabilities.
Another line of research has been developed in [3,4] for rare event problems in which the probability of interest can be approximated via large deviations. In this framework, the decay rate is given in terms of a variational problem, or an optimal control problem. These problems are related to a family of nonlinear partial differential equations known as Hamilton-Jacobi-Bellmann (HJB) equations. It is shown how subsolutions of the HJB equations associated with rare event problems could be used to construct efficient importance sampling schemes. This method has been successfully applied to several queueing systems, see [3,4].
The cross-entropy method for rare event simulation [12] considers to choose a change of measure P ce from a specified family of changes of measures that minimizes the Kullback-Leibler distance from the optimal P opt . It has been shown by [1] that the associated transition probabilities p ce (x, y) are of the form
where N (x, y) is the number of times that transition (x, y) occurs in a random sample path of the Markov chain until absorption in one of the good or bad states.
Again, these transition probabilities cannot be used directly for simulation since they contain the unknown variables N (x, y), however, in this case they suggest to estimate the expectations in expression (2), for instance by simulation. This approach has been applied to queueing systems in [1,2], and to reliability systems in [11].
In the following sections we shall give more background on the cross-entropy method and how it results in the expression (2). More importantly, we shall show that in fact the cross-entropy solution is the zero-variance distribution, i.e., the matrices of transition probabilities satisfy P ce = P opt . This identity will be the basis to formulate in Section 3 a sufficient condition for which the estimated crossentropy solution is asymptotically optimal.
Let (Ω, A, P) be the probability space of the sample paths of the Markov chain X(0), X(1), . . ., and denote by X a random sample path. Recall that our objective is to execute simulations of the Markov chain for estimating the rare event probability P(A) = P(X(T ) ∈ F), that we execute these simulations under a change of measure P * , and that the optimal change of measure is
Suppose that we consider only changes of measures P * under which the Markov property is retained, say with matrix of transition probabilities P * , and suppose that we minimize the Kullback-Leibler distance between these changes of measures and the optimal one, i.e., inf
where the cross-entropy is defined by
Substituting (3), minimizing the cross-entropy, and deleting constant terms yields sup
Since the sample path probability d P * (X) is a product of individual transition probabilities, we get
where N (x, y) is the number of times transition (x, y) occurring in the random sample path X. Substituting the expression (6) into the cross-entropy optimization program (5), and applying the first order condition using a Lagrange multiplier, gives the solution (2) for the individual transition probabilities.
Notice that the optimal transition matrix P opt is a feasible matrix, i.e., an element of P, and thus it must hold that it is the cross-entropy solution. We shall give a direct proof of the matrix identity P ce = P opt , based on the expressions of the transition probabilities. In fact we shall prove a relation between the expected number of transitions from x to y and the absorption probability γ(y). Denote by v(x) the expected number of visits to state x starting at x before absorption:
Proposition 1. For all x, y ∈ X :
Proof. For ease of notation we assume that we have the equivalent modelling in which all the good and bad states are absorbing. Introduce probabilities (for any
x, y ∈ X ) f (x, y) = P((X(t)) reaches state y|X(0) = x) g(y) = P((X(t)) reaches bad set F without a transition x → y|X(0) = y).
Notice that we allow x and y to be a good or bad state for which these probabilities are obviously either zero or one. Consider the event {N (x, y) = n} ∩ {reach bad set from x},
for n ≥ 1. This event can only occur if (A) n -1 times (i) a number of times [transition x → y ′ = y followed by a return to x], followed by (ii) [transition x → y followed by a return to x]; then (A) is followed by (B) which is (iii) a number of times [transition x → y ′ = y followed by a return to x], followed by (iv) [transition x → y followed by reaching the bad set without the transition x → y]. That is,
.
Let us work out the summations using geometric series and denoting
Hence,
In the same manner we determine the absorption probability γ(y) = P((X(t)) reaches the bad set |X(0) = y).
Partition this event with respect to the number of transitions x → y:
When we include the first term g(y) as the zero-th term of the summation, we obtain
From the expressions (7) and (8) we see that
To conclude, we calculate the proportionality factor:
The last equality is a well-known relation for Markov chains, e.g., see [7].
Corollary 2. For all x, y ∈ X :
Proof. The identity follows easily by noting that
In this section we assume that there is a family of rare events {A n } parameterized by n = 1, 2, . . . such that each A n satisfies the model assumptions of the previous section, and such that lim n→∞ P(A n ) = 0. Suppose that Y * n is an unbiased estimator of P(A n ) obtained by a change of measure P * . Then this estimator is asymptotically
see for instance [9]. Now, recall the cross-entropy representation p ce (x, y) in ( 2) of the zero-variance transition probabilities, and suppose that these are estimated by pce (x, y). A common approach is to apply an iterative scheme to the optimization program (5). Since the program involves the rare event, we first apply a change of measure:
This is done for a probability measure P (0) such that (i) the Markov property is retained; (ii) the associated matrix of transition probabilities is feasible P (0) ∈ P;
(iii) the set F is 'not so' rare under P (0) . The program ( 9) is solved iteratively by estimation: let X (1) , . . . , X (k) be i.i.d. sample paths of the Markov chain generated by simulating the states according to a matrix of transition probabilities P (j) until absorption in the good or bad states, then we calculate for j = 0, 1, . . .
We repeat this 'updating' of the change of measure a few times until the difference P (j+1) -P (j) is small enough (in some matrix norm). For details we refer to [12].
In this way we obtain an implementable change of measure Pce , and its associated importance sampling estimator is denoted by
Clearly, this estimator is unbiased, i.e., Êce [ Ŷ ce n ] = P(A n ) 1 . We claim that it is asymptotically optimal if the following condition holds.
Condition 1. There are finite posivite constants K 1 , K 2 such that for all n
This is a condition on the approximation of the zero-variance measure by the implementation of cross-entropy method. Actually, a weaker condition for the upper bound suffices: Proof. Notice that
Furthermore, the lower bound gives lim sup
Now consider, (using (3) in the third equality in the following lines),
E opt log dP opt d Pce (X) log P(A n ) = 0.
We illustrate the theorem by the simple example of simulating the M/M/1 queue (Poisson-λ arrivals, exponential-µ services) where λ < µ. We consider its associated discrete-time Markov chain (X(t)) ∞ t=0 by embedding the continuous-time queueing process at the jump times. The transition probabilities are p = p(x, x + 1) = λ λ + µ (x = 0, 1, . . .) q = p(x, x -1) = µ λ + µ (x = 1, 2, . . .).
The rare event is hitting state n before returning to the zero state. For this model the optimal (zero-variance) transition probabilities follow easily from calculating the hitting probabilities γ(x) = P((X(t)) reaches n before 0|X(0) = x), for x = 1, . . . , n -1, by solving the equations γ(x) = p(x, x -1)γ(x -1) + p(x, x + 1)γ(x + 1), with boundary conditions γ(0) = 0 and γ(n) = 1. Let σ = µ/λ. Then we get
Notice that p opt (1, 2) = 1.
The cross-entropy between the optimal probability measure P opt and any other probability measure Q which is associated with transition probabilities q(x, x + 1) and q(x, x -1), can be determined as follows (where we apply the product form ( 6)):
We may follow the same reasoning as in the proofs of Propositions 1 and 2 for calculating E opt [N (x, y)|X(0) = 0]. Notice that under P opt 1{A} = 1, f (0, x) = 1, and γ(y) = 1, thus
Finally, the visiting numbers v opt (x) = 1/(1 -f opt (x, x)) can be calculated numerically via recursion (using f opt (x, x + 1) = 1):
.
We have implemented the cross-entropy method of Section 3 for queueing pa-
After having constructed an importance sampling algorithm for rare-event simulation, a key issue is to assess the statistical performance of the associated impor-tance sampling estimator. In this paper we have considered rare-event problems in Markov chains, for which we have used the cross-entropy method as the engine of finding a change of measure for executing the importance sampling simulations. We have shown that for these problems the cross-entropy method coincides with the zero-variance approach. This non-implementable optimal change of measure is estimated by an implementable change of measure that is returned by the cross-entropy method. Our main result is that we give a sufficient condition for the associated importance sampling estimator to be logarithmically efficient. Further investigations are undertaken to obtain conditions for strong efficiency.
We denote the expectation w.r.t. measure P by E, w.r.t. measure Pce by Êce , w.r.t. measure P opt by E opt , etc.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment