The exp-$G$ family of probability distributions

In this paper we introduce a new method to add a parameter to a family of distributions. The additional parameter is completely studied and a full description of its behaviour in the distribution is given. We obtain several mathematical properties of…

Authors: Wagner Barreto-Souza, Alex, re B. Simas

The present work is an enhanced and extended version of the pioneering manuscript presented at Estância de São Pedro, São Paulo, Brazil, in the 18 o SINAPE, 2008, see Barreto-Souza et al. (2008). In many practical situations, usual probability distributions do not provide an adequate fit. By example, if the data are asymmetric, normal distribution will not be a good choice. With this, several methods of introducing a parameter to expand a family of distributions have been studied. Marshall and Olkin (1997) introduced a new way to expand probability distributions and applied to yield a two-parameter extension of the exponential distribution which can serve as a competitor to such commonly-used two-parameter distributions as the Weibull, gamma and lognormal distributions. Furthermore, this method was used to obtain a three-parameter extension of the Weibull distribution. Moreover, Mudholkar et al. (1996) introduced a three-parameter distribution alternative to the Weibull distribution, that has the Weibull as limiting distribution. Some methods of introducing of parameters to symmetric distributions have been studied in order to add skewness. For instance, Azzalini (1985) introduced and studied the well-known skew-normal distribution, which is obtained by adding a shape parameter to the normal distribution. Another symmetric distribution that was extended by adding a skewness parameter was the Student's-t distribution by Jones and Faddy (2003). Finally, Ma and Genton (2004) introduced a general class of skew-symmetric distributions, whereas Ferreira and Steel (2006) provides a general perspective on the introduction of skewness into symmetric distributions. Recently, Jones (2004) introduced a class of distributions that adds two parameters to a reference distribution. Further, Jones and Pewsey (2009) advanced a four-parameter family has both symmetric and skewed members and allows for tail weights that are both heavier and lighter than those of the generating distribution. In this article, we introduce a new method to add a parameter to some reference distribution. The resulting distribution exhibits the remarkable reciprocal property. We study this parameter in detail, and we give a full description of its behaviour in the distribution. The augmented distribution has several connections with the reference distribution, for instance, the Kullback-Leibler divergence of the augmented distribution with respect to the original distribution is finite and only depends on the new parameter. Several others properties in this direction are also given. The inferential aspects of this distribution are studied in details, and two special cases are discussed, and a successful empirical application shows the flexibility the new distribution, and also motivates its usage. A special attention must be given to the fact that it is not straightforward that this new distribution contains the reference distribution as special case. We show that this is the case if we enlarge the parameter space, and also, that this enlargement is good, in the sense that, all the standard inferential procedures work if this new value in the parameter space is considered to be the true value of the parameter. The remaining of the article unfolds as follows: in Section 2 the new class of distributions is introduced, several properties are given, the new parameter is completely characterized, and the inferential aspects are discussed. Sections 3 and 4 deals with two special cases: the exp-Weibull and exp-beta distributions, respectively. In Section 5 an empirical application shows the usefulness of this distribution. Finally, Section 6 ends the article with some concluding remarks. The Appendix contains the proofs of the results presented in the article. The cdf of a random variable with truncated exponential distribution in the interval [0, 1] with parameter λ is given by where λ > 0 and x ∈ [0, 1]. We now observe that F * λ (•) is a cdf for λ ∈ R \ {0}, and that lim λ→0 Therefore, we extend the parameter space of the distribution above for the entire line: We now define the new class as follows. Let G(x; θ) be the cdf of a continuous or discrete random variable with θ being the parameters related to G, then the class of distributions exp-G, indexed by λ, is defined by From now on, we will denote a random variable X with cdf (2) by X ∼ exp -G(Θ), where Θ = (λ, θ) T . If G(x; θ) is a cdf of a continuous random variable, then the exp-G distribution is absolutely continuous for every λ = 0, and its probability density function (pdf), which is the derivative of the cdf (2) with respect to x, is given by where g(•; θ) is the pdf associated to the cdf G(•; θ). Let G(x; θ) be a cdf of a discrete random variable taking values on the set {x 1 , x 2 , . . .}, where x 1 < x 2 < • • • , then the corresponding exp-G distribution is also discrete, takes values on the same set for every λ = 0, and its probability function is given by where G(x 0 ) = 0. If G(x; θ) is a absolutely continuous cdf, then its hazard function is given by where S(x; θ) = 1 -G(x; θ) is the survival function of a random variable with cdf G(•, θ). We now state several results regarding the relation between the exp-G and G distributions, where proofs can be found in the Appendix. Proposition 2.1 Let X and X λ have G distribution and exp-G distribution with parameter λ, respectively. Let also µ be the law of X, and µ λ be the law of X λ . Then, i) X and X λ have the same support for all λ = 0; ii) if X is continuous, singular or discrete, then X λ is continuous, singular or discrete, respectively, for all λ = 0; iii) µ << µ λ , that is, µ λ is absolutely continuous with respect to µ. Moreover, the Radon-Nikodym derivative of µ λ with respect to µ is, almost surely, We now give a characterization for our class of distributions through Shannon entropy. Such entropy were introduced by Shannon (1948) and, for a random variable X with density f (•), with respect to a σ-finite measure µ, usually the Lebesgue or counting measure, is given by Jaynes (1957) introduced one of the most powerful techniques employed in the field of probability and statistics called maximum entropy method. This method is closely related to the Shannon entropy and considers a class of density functions where T i (X), i = 1, . . . , m, are absolutely integrable functions with respect to f dµ, and T 0 (X) = α 0 = 1. In the continuous case, the maximum entropy principle suggests to derive the unknown density function of the random variable X by the model that maximizes the Shannon entropy in (6), subject to the information constraints defined in the class F. The maximum entropy distribution is the density of the class F, denoted by f M E , which is obtained as the solution of the optimization problem Jaynes (1957), in the page 623, states that the maximum entropy distribution f M E , obtained by the constrained maximization problem described above, "is the only unbiased assignment we can make; to use any other would amount to arbitrary assumption of information which by hypothesis we do not have." It is the distribution which should not incorporate additional exterior information other than which is specified by the constraints. In order to obtain a maximum entropy characterization for our class of distributions, we now derive suitable constraints. For this, the next result plays a important role. We will assume in the Propositions 2.2 and 2.3 that the reference measure, µ, is the Lebesgue measure, and that all the random variables involved are continuous. Proposition 2.2 Let G be the distribution of a continuous random variable, with pdf, g(•), and let X be a random variable with pdf f (•) given by 3. Then, we have that and the Shannon entropy of f (•) is given by where U follows truncated exponential distribution with parameter λ and cdf given by (1). The next proposition shows that the class exp-G of distributions has maximum entropy in the class of all probability distributions specified by the constraints stated therein. Proposition 2.3 The pdf f (•) of a random variable X, given by (3), is the unique solution of the optimization problem under the constraints C1 and C2 presented in Proposition 2.2. We provide two asymptotic results of this class, by making the parameter λ tend to ±∞. This results will allow us to give an interpretation for this parameter. Since F λ (x) → x as λ → 0, we have, trivially, that if as λ → 0, where d -→ stands for convergence in distribution. Therefore, the definition of the family exp-G by using (2) with λ ∈ R is good. This fact plays an important role in our paper because this makes the family exp-G contain G as particular case. The following result is very important since regular distributions in Statistics enjoy many desirable properties. Proposition 2.4 If G is a parametric regular probability distribution, with parametric space Θ, then so is the exp-G distribution, with respect to the parametric space R × Θ. The proof follows from a simple verification of the conditions given in Lehmann and Casella (2003). The distribution may present very different behaviour for large absolute values of λ, thus showing that this is a rich class of distributions. Going further on the discussion of what happens when the absolute value of λ is large. We begin by noting that F G λ will tend to one if λ tends to infinity, whenever x is such that G(x) > 0, and will be zero otherwise. Therefore, if X λ follows a exp-G distribution, where G is any cdf, then as λ → ∞, where a = inf{x; G(x) > 0}, ' v -→' stands for vague convergence, and δ a is the Dirac's measure concentrated on a, that is, δ a ({a}) = 1. Note that we needed to consider the vague convergence instead of convergence in distribution to allow a where 1 is the function identically equal to one, which is not a probability measure. However, we may interpret this case as a "probability measure" concentrated at -∞, that is, if a random variable would follow 1, then pr(X ≤ x) = 1 for all x ∈ R. We now obtain the asymptotic behaviour of λ → -∞. For this case, a simple calculus argument allows us to conclude that F G λ will tend to zero, whenever x is such that G(x) < 1, and will be 1 otherwise. Therefore, if X λ follows a exp-G distribution, where G is any cdf, then as λ → -∞, where b = sup{x; G(x) < 1}. Note that we also needed to use the vague convergence to include the case where b = ∞. In this case where 0 is the function identically equal to zero, which, again, is not a probability measure. However, we may, accordingly, interpret this case as a "probability measure" concentrated at ∞, that is, if a random variable would follow 0, then pr(X ≤ x) = 0 for all x ∈ R. We see from this result, that the parameter λ can be interpreted as a concentration parameter, because it moves the exp-G distribution to a degenerated distribution in a (if a is finite), when it varies from zero to infinity, and to a degenerated distribution in b (if b is finite) when it varies from 0 to minus infinity. Furthermore, if a equals minus infinity, the distribution moves towards the left side of the axis until the mass escape entirely, when λ tends to infinity. Analogously, when b equals infinity, the distribution moves towards the right side of the axis until the mass escape entirely, when λ tends to minus infinity. This family of distributions enjoys a very interesting reciprocal property. We begin by introducing some notation, let X G ∼ G, and 1/X G ∼ S, where G is continuous. Therefore, we have that if We also would like to remark that the reciprocal of X G λ has a corresponding exp-S distribution with -λ, that is, X G λ has cdf F G λ (x) and 1/X G λ has cdf F S -λ (x). This means that whenever we study a special case of the exp-G distribution, we may easily study the reciprocal case. For instance, in this paper we study the exp-Weibull, and from this result, we also obtain several properties of the exp-Fréchet distribution. We now give an useful expansions for the pdf (3). With this expansion, we can obtain mathematical properties such as ordinary moments, factorial moments and moment generating function of the exp-G distribution from G distribution. Expanding the term e -λG(x;θ) in (3), it follows where {a k } ∞ k=0 is a sequence of real numbers and c ∈ R. Several distributions do not have closed-form cdf and can be written in the form (10), we have, for instance, the normal, gamma and beta distributions. For n positive integer, we have where Gradshteyn and Ryzhik, 2000). Using ( 10) and ( 11) in ( 9), it becomes an useful expansion for (3) when G(•; θ) has not closed-form given by Let now X 1 , . . . , X n be a random sample with pdf in the form (3) and define X i:n the ith order statistic. The pdf of the X i:n , say f i:n , is given by By using binomial expansion for the terms {1 -e -λG(x;θ) } i-1 and {e -λG(x;θ) -e -λ } n-i in (13), it follows where f j,k (•) denotes the pdf of a random variable with exp-G(λ(j + k + 1), θ) distribution. Therefore, the pdf of X i:n can be written as linear combination of pdf's in the form (3) and, hence, the mathematical properties of the order statistics can be obtained from associated exp-G distribution. We hardly need to emphasize the necessity and importance of moments in any statistical analysis especially in applied work. Some of the most important features and characteristics of a distribution can be studied through moments, e.g., tendency, dispersion, skewness and kurtosis. We now give general expressions for the moments of the family exp-G of distributions. Consider X and Y be random variables with exp-G(λ, θ) and G distributions, respectively. When G(., θ) has closed-from, an useful expression for the rth moment of the exp-G distributions it follows from (9) and it is given in function of the probability weighted moments of the Y : In particular, formula (15) provides us another proof of condition v) in Proposition 2.1. If G(•, θ) has not closed-from, from (12) we obtain the rth moment of X in function of the moments of Y : In particular, if c is integer non-negative, the moments of X are given in function of the ordinary moments of Y . Finally, with the result ( 14) the rth moment of the ith order statistic is given by where Z j,k has exp-G(λ(j + k + 1), θ) distribution. The expansions ( 9), ( 12) and ( 14) are main results of this Section and plays an important role in this paper. Let X be a random variable with exp-G(λ, θ) distribution, with λ = 0. The log-density of X with observed value x is given by and the associated score function is U = (∂ℓ/∂λ, ∂ℓ/∂θ) ⊤ , where with U * (θ) being the associated score function of the log-density of a random variable with pdf g(., θ). From regularity conditions, we have E{G(X; θ)} = λ -1 -(e λ -1) -1 and E{∂G(X; θ)/∂θ} = λ -1 E{U * (θ)}. The information matrix K = K((λ, θ) ⊤ ) is where For a random sample x = (x 1 , . . . , x n ) of size n from X and Θ = (λ, θ) T , the total log-likelihood is where ℓ (i) is the log-likelihood for the ith observation (i = 1, . . . , n) as given before. The total score function is , where U (i) for i = 1, . . . , n has the form given earlier and the total information matrix is K n (θ) = nK(Θ). The maximum likelihood estimator (MLE) Θ of Θ is obtained numerically from the solution of the non-linear system of equations U n = 0. Under conditions that are fulfilled for the parameter Θ in the interior of the parameter space but not on the boundary, the asymptotic distribution of where ' A ∼' stands for the asymptotic distribution. The asymptotic multivariate normal N k+1 (0, K n ( Θ) -1 ) distribution of Θ can be used to construct approximate confidence regions for some parameters and for the hazard and survival functions. In fact, an 100(1 -γ)% asymptotic confidence interval for each parameter Θ i is given by where κΘ i ,Θ i denotes the ith diagonal element of K n ( Θ) -1 for i = 1, . . . , k + 1 and z γ/2 is the quantile 1-γ/2 of the standard normal distribution. The asymptotic normality is also useful for testing goodness of fit of the exp-G distribution and for comparing this distribution with some of its special submodels using one of the three well-known asymptotically equivalent test statistics -namely, the likelihood ratio (LR) statistic, Rao (S R ) and Wald (W ) statistics. Consider the partition Θ = (Θ T 1 , Θ T 2 ) T of the vector of parameters for the exp-Weibull distribution. The total score function T and the total Fisher information matrix and its inverse are assumed partitioned in the same way as Θ. The LR statistic for testing the null hypothesis 1 versus the alternative hypothesis 1 is given by w = 2{ℓ( Θ)ℓ( Θ)}, where Θ and Θ denote the MLEs under the null and the alternative hypotheses, respectively. The statistic w is asymptotically (as n → ∞) distributed as χ 2 q , where q is the dimension of the vector θ 1 of interest. The score statistic for testing where U 1 and K 11 are the components of U n and K -1 n corresponding to Θ 1 evaluated at Θ. The score statistic S R has asymptotically the χ 2 q distribution and has an advantage over the LR since it only needs the estimation under the null hypothesis but requires the inverse Fisher information matrix. The Wald statistic for testing the null hypothesis , where K 11 is the component of the inverse information matrix K -1 n corresponding to Θ 1 evaluated at Θ. The Wald statistic W has also under H 0 an asymptotic χ 2 q distribution. The Wald and score statistics are very used in practice and our derivation of the information matrix will be very convenient in modelling the exp-G distributions. Since λ is a parameter added to some distribution, it can be seen as a nuisance parameter. With this in mind, we will advance a modified profile estimator for θ. From the last subsection, we have that Therefore, if ∂G(x; θ)/∂θ ⊤ ∂G(x; θ)/∂θ, that belongs to R, does not vanish for all values in some open neighbourhood of the true value of θ, let λ = ∂G(x; θ) ∂θ ⊤ ∂G(x; θ) ∂θ where ∂ 2 G(x; θ)/∂θ ⊤ ∂θ ⊤ stands for the row vector containing the diagonal elements of the Hessian matrix of G, ∂ 2 G(x; θ)/∂θ∂θ ⊤ , and ∂U * (θ)/∂θ stands for the column vector (∂U * 1 (θ)/∂θ 1 , . . . , ∂U * k (θ)/∂θ k ) ⊤ . We, therefore, obtain the modified profile likelihood function: l = l(θ) = log λ -log(1 -e -λ) + log g(x; θ) -λG(x; θ). The modified profile estimator for θ can be obtained by maximizing l. Let V be the estimating equation given by one may also obtain the profile likelihood estimator by solving the equation V n (θ) = n i=1 V (i) (θ) = 0. We now discuss estimation and inference when λ = 0. It is very important to discuss this case because we are interested in testing the hypotheses H 0 : λ = 0 versus H 1 : λ = 0, i.e., to test if the exp-G fit is significantly better than G fit. The next result plays a important role in this paper. Theorem 2.5 Let F λ (•) and f λ (•) be the cdf and pdf defined by ( 2) and ( 3), respectively. The following conditions are true: with ∂U * (θ)/∂θ∂θ ⊤ dG being the information matrix with respect to G; vi) If G is regular and (λ 0 , θ 0 ) ∈ Θ, then the likelihood ratio, Wald and Score statistics has null asymptotic distribution χ 2 q , where q is the number of parameters estimated in alternative hypothesis minus the number of parameters estimated in null hypothesis. We now move to the class of distributions exp-G, when G is the cdf of the Weibull distribution, we will call this class of distributions by exp-Weibull. More precisely, to obtain the exp-Weibull distribution we put in (2) the cdf of the Weibull distribution G(x) = 1 -exp{-(x/β) α }, where β > 0, α > 0 and x > 0. Therefore, the cdf of the exp-Weibull distribution given by From the general expressions (3) and ( 5) we obtain that the pdf and hazard functions are given by respectively. We now illustrate the flexibility of this class of distributions by presenting some graphics of both the pdf and hazard functions. Figure 1 shows the plots of the pdf of the exp-Weibull distribution for some values of α and β, and for λ = -5, -1, 0, 1, 5, ∞. We note that when the the value of λ increases the pdf becomes more 'peaked'. Figure 2 contains the plots of hazard function of the exp-Weibull distribution for different values of α and β and λ = -5, -1, 0, 1, 5, ∞. We note that the behaviour of the hazard function of the Weibull distribution is close to the behaviour of the graphics with λ = 1.0, and as the value of λ increases, the behaviour of the hazard function of the exp-Weibull becomes very different from the behaviour of the hazard function of the Weibull distribution, showing that as the value of λ gets larger the exp-Weibull "moves away" from the Weibull distribution, and gets closer to the Dirac mass at zero, as remarked on the end of the last Section. The pdf of the ith order statistic of a random sample from exp-W (λ, β, α) distribution is given by We will now obtain series representation for the moments of the exp-Weibull distribution and of the order statistics. To this end, let X be a random variable following a exp-Weibull distribution with parameters β > 0, α > 0 and λ > 0. From now on we will use the notation X ∼ exp-Weibull(λ, β, α) to indicate this fact. We have the probability weighted moment of a random variable Y following Weibull distribution with parameter vector θ = (β, α) ⊤ can be written as E{Y r G(Y ; θ) j } = β r 1 0 (-log u) r/α (1u) j du. Therefore, from (15) it follows that the rth moment of X is Exp-Weibull(λ,2,5) Figure 1: Graphics of the pdf of the exp-Weibull distribution for some values of the parameters. We now give an alternative expression to (19) more simple. The rth moment of X is Now, expading exp{λe -(x/β) α } in Taylor's series we get where Y k follows the Weibull distribution with parameters (k + 1) 1/α /β and α, and the interchange between the series and integral being possible due to Fubini's theorem together with the fact that we are dealing with positive integrand. Hence, we have that the rth moment of a exp-Weibull distribution can be written as Figure 3 shows skewness and kurtosis of the exp-Weibull distribution, obtained from application of the formula of the moments above, for β = 0.5 and some values of α as function of λ. We now note from (20) that all moments of the exp-Weibull distribution tends to zero as λ increases to infinity, which is a very remarkable fact. So, as we can note from Figure 3, as λ increases, the skewness tends to zero, as well as the kurtosis, one more time reflecting the expected behaviour of the limiting distribution as λ → ∞. An expression for the rth moment of the ith order statistic of the exp-Weibull distribution, say X i:n , follows from (17) and (20): Expressions ( 19) and (21) show the importance of the expansions given in Subsection (2.3). Furthermore, result (20) shows that alternative expressions to ( 15) and ( 16) can be obtained depending of the G distribution. In this brief subsection we use the reciprocal property of the exp-G distributions to obtain expressions for the moments and order statistics of the exp-Fréchet distribution. Let Y ∼ exp-Fr(λ, β, α) and Y i:n be the ith order statistics from a random sample, of size n, of the exp-Fréchet distribution. From formulae (20) and (21), we have that the moments of Y and Y i:n are respectively, for r < α. Let θ = (λ, β, α) T be the parameter vector and X random variable with exp-Weibull(λ, β, α) distribution. The log-density ℓ = ℓ(θ) for the random variable X with observed value x is given by The score function is given by From the regularity conditions one obtains the following closed-form expressions For interval estimation and hypothesis tests on the model parameters, we require the information matrix. We will, therefore, use some of the expressions above to obtain the Fisher's information matrix. The 3 × 3 unit information matrix whose elements are These elements of the information matrix depend on some expectations that can be easily obtained through numerical integration. Let Y be a random variable following standard beta distribution with parameters a > 0 and b > 0. The cdf of Y is given by G(x; (a, b) ⊤ ) = I x (a, b), where I x (a, b) = B(a, b) -1 x 0 t a-1 (1t) b-1 dt denotes the incomplete beta function and B(a, b) = 1 0 t a-1 (1 -t) b-1 dt is the beta function. The exp-beta distribution is introduced by taking G as being the cdf of Y in (2). We will denote a random variable X with exp-beta distribution by X ∼ exp-beta(λ, a, b). The pdf and cdf of the exp-beta distribution are given by and 1 -e -λ , x ∈ (0, 1), respectively. Figure 4 shows the plots of the pdf of the exp-beta distribution for some values of a and b, and for λ = -∞, -10, -3, 0, 3, 10, ∞. Observe that for the exp-beta(λ,2,1) distribution, the density of the beta(2,1) distribution is very close to a straight line, whereas the densities of the exp-beta(λ,2,1) distributions may assume various shapes, such as unimodal, and strictly increasing. The cdf of Y can be written in the form (10). To see this, use the expansion Let X 1 , . . . , X n be a random sample from exp-beta(λ, a, b) distribution and denote the ith order statistic as X i:n . The pdf of X i:n , say f i:n , is obtained from (13): for x ∈ (0, 1). We now give an expression for the rth moment of the exp-beta distribution. Consider X ∼ exp-beta(λ, a, b). We seen that pdf of X can be written in the form (12). We have that E(Y v ) = B(v + a, b)/B(a, for v > 0. With this results, from (15) and ( 17) it follows easily that the moments of X and X i:n are given by respectively. More once, we used the expansions of the Subsection (2.3) and see its importance. Figure 5 shows skewness and kurtosis of the exp-beta distribution, obtained from application of the formula of the moments above, for a = 2 and some values of b as function of λ. Let X random variable with exp-beta(λ, a, b) distribution and θ = (λ, β, α) T be the parameter vector, with λ = 0. The log-density ℓ = ℓ(θ) for the random variable X with observed value x is given by where Ψ(y) = d log Γ(y)/dy. Under the usual regularity conditions, the expected value of the score function vanishes. Hence, we obtain These elements of the information matrix depend on some expectations that can be easily obtained through numerical integration. Our aim in this Section is to motivate the use of the class exp-G of distributions by showing a successful application to one real data set. We here will fit exp-Weibull distribution to the data set given by Birnbaum and Saunders (1969) We test the null hypothesis H 0 : Weibull model against the alternative hypothesis H 1 : exp-Weibull model. The LR statistic is 9.5453 and p-value = 2 × 10 -3 . Hence, for any usual significance level, we reject null model (Weibull) in favour of the alternative exp-Weibull model. In Figure 6 are displayed the empirical density, fitted exp-Weibull and Weibull densities. Hence, we see that the exp-Weibull distribution yields a better fit than Weibull distribution. We defined a family of distributions that provides a rather general and flexible framework for statistical analysis. It also provides a rather flexible mechanism for fitting a wide spectrum of real world data sets. Several properties of this class of distributions was obtained, such as Kullback-Leibler divergence between G and exp-G distributions, characterization based on Shannon entropy, moments, order statistics, estimation of the parameters and inference. With this, we moved to two special distributions, the exp-Weibull and exp-beta distributions, which were studied with some details. The article were motivated by a successful application to fatigue life data. A similar computation shows that exp{ Nevertheless, if x is a discontinuity point of G, we have that Suppose that G is discontinuous at the points {x 1 , . . .}, and let G c be the part of G with continuity points (the sum of the continuous and singular parts of G), then, it is easy to observe that 1 {x i ≤x} exp{-λG(x i-1 )} -exp{-λG(x i )} 1 -e -λ , where 1 A (x) is the indicator function of the set A. We now, have that 1 {x i ≤x} exp{-λG(x i-1 )} -exp{-λG(x i )} 1 -e -λ , which concludes the proof of iii). The proof of iv) is a simple application of iii), and to prove v), one uses the iii) and the inequality in equation ( 22), for λ > 0, and a similar inequality for λ < 0. Let z(•) be a pdf which satisfies the constraints C1 and C2. The Kullback-Leibler divergence between z and f is With this, we follow Cover and Thomas (1991) Proof of Theorem 2.5 (i) It is well-known from real analysis that if f n is a sequence of bounded and right-continuous functions that converge in all point for a continuous function, then this convergence is uniform. Therefore, since F λ satisfies the above conditions and converges for G, which is a continuous function, the proof of (i) is completed. (ii)-(iv) The proofs are easily checked. (v) For (λ 0 , θ 0 ) ∈ Θ, with λ 0 ∈ R \ {0}, the result follows of the Proposition 2.2 and the Theorem 6.5.1 from Lehmann and Casella (2003). For λ 0 = 0, we have to use the results in (ii)-(iv) to adapt the proof given in Lehmann and Casella (2003) to our case. We begin by showing that √ n λ d → N(0, 12), when n → ∞. The following lemma will be useful in this proof. The proof the above lemma is similar to usual proof, but replacing the derivative by the derivative from the right. It is clear that the same result holds for derivative from the left. Applying the Lemma to log-likelihood with respect to pdf f (x; λ) = λe -λx /(1 -e -λ ), i.e. G(x; θ) = x, if follows that ℓ ′ ( λ) = ℓ ′ (0) + λℓ ′′ (0) + By supposition, ℓ ′ ( λ) = 0, thus √ n λ = n -1/2 n/2 -n j=1 x j 12 -1 -(2n) -1 λℓ ′′′ ( λ) . As n -1 λℓ ′′′ ( λ) → 0 in probability and n -1/2 (n/2 -n j=1 x j ) d → N(0, 1), when n → ∞. The rest of the proof is analogous to the one given in Lehmann, where one may use the results in (ii)-(iv) to ensure that all the arguments holds true. (vi) It follows from asymptotic normality of Θ, see Lehmann and Romano (2008) for more details.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment