Counting and Locating the Solutions of Polynomial Systems of Maximum Likelihood Equations, II: The Behrens-Fisher Problem
Let $\mu$ be a $p$-dimensional vector, and let $\Sigma_1$ and $\Sigma_2$ be $p \times p$ positive definite covariance matrices. On being given random samples of sizes $N_1$ and $N_2$ from independent multivariate normal populations $N_p(\mu,\Sigma_1)$ and $N_p(\mu,\Sigma_2)$, respectively, the Behrens-Fisher problem is to solve the likelihood equations for estimating the unknown parameters $\mu$, $\Sigma_1$, and $\Sigma_2$. We shall prove that for $N_1, N_2 > p$ there are, almost surely, exactly $2p+1$ complex solutions of the likelihood equations. For the case in which $p = 2$, we utilize Monte Carlo simulation to estimate the relative frequency with which a typical Behrens-Fisher problem has multiple real solutions; we find that multiple real solutions occur infrequently.
💡 Research Summary
The paper investigates the algebraic structure of the maximum‑likelihood (ML) equations that arise in the multivariate Behrens‑Fisher problem. In this setting two independent samples are drawn from multivariate normal populations (N_p(\mu,\Sigma_1)) and (N_p(\mu,\Sigma_2)) of sizes (N_1) and (N_2). The goal is to estimate the common mean vector (\mu) together with the two covariance matrices (\Sigma_1) and (\Sigma_2) by solving the likelihood equations.
Derivation of the likelihood equations
Using the sample means (\bar X,\bar Y) and the usual unbiased sample covariance matrices (S_1,S_2), the authors write the likelihood equations in the familiar form (2.1)–(2.2). A key observation is that once (\mu) is fixed, the covariance matrices are uniquely determined by the simple updates (2.5)–(2.6). Consequently the whole system can be reduced to a set of (p) cubic equations in the components of (\mu) alone. This reduction eliminates the (p(p+1)/2) unknown entries of each covariance matrix and reveals the true algebraic degree of the problem.
Algebraic‑geometric counting
The central question is how many (complex) solutions the reduced system possesses. The authors invoke a theorem of Catanese et al. (2006) concerning critical points of a product of powers of polynomials. In the Behrens‑Fisher case the relevant polynomials have degree 2 (they are the denominators (D_X(\mu)) and (D_Y(\mu)) that appear after clearing fractions). Applying the theorem yields an upper bound given by the coefficient of (z^p) in the generating function ((1-z)^p(1-2z)^2). Expanding this series shows that the coefficient equals (2p+1).
Genericity of the data
For the bound to be tight, the data must be “generic”, i.e., the two quadratic forms (D_X(\mu)) and (D_Y(\mu)) must have no common zero. Lemma 3.4 proves that for almost all data sets the coefficients of these quadratics are generic; this follows from the fact that any positive‑definite matrix can be expressed as a sum of outer products of sample vectors (Lemma 3.3). Hence, with probability one (under a continuous sampling model) the ML system has exactly (2p+1) complex solutions.
Real solutions and multiplicity
Because complex solutions occur in conjugate pairs, at least one solution is real for any data set. The authors call the number (2p+1) the “ML degree” of the Behrens‑Fisher problem. For (p=1) this reproduces the known result of three complex roots (the cubic in the univariate case). For (p=2) they obtain five complex roots. To assess how often more than one real root occurs, they conduct a large Monte‑Carlo study for (p=2). The empirical frequency of multiple real solutions is below 5 %, indicating that in practice the likelihood surface is usually unimodal with a unique real maximizer.
Computational implications
The proof that all solutions can be obtained from a single univariate polynomial of degree (2p+1) follows from Gröbner‑basis theory and the algorithm described by Hösten et al. (2005). Thus a practical numerical approach is to compute this univariate polynomial (e.g., via elimination) and then find its roots; the real roots correspond to all admissible ML estimates. This provides a deterministic alternative to iterative schemes that may converge to saddle points.
Extension to MANOVA
The authors generalize the analysis to the case of (k+1) independent multivariate normal groups (the MANOVA problem). By the same reasoning the ML degree becomes
\
Comments & Academic Discussion
Loading comments...
Leave a Comment