Nonparametric Linear Discriminant Analysis for High Dimensional Matrix-Valued Data
This paper addresses classification problems with matrix-valued data, which commonly arise in applications such as neuroimaging and signal processing. Building on the assumption that the data from each class follows a matrix normal distribution, we propose a novel extension of Fisher’s Linear Discriminant Analysis (LDA) tailored for matrix-valued observations. To effectively capture structural information while maintaining estimation flexibility, we adopt a nonparametric empirical Bayes framework based on Nonparametric Maximum Likelihood Estimation (NPMLE), applied to vectorized and scaled matrices. The NPMLE method has been shown to provide robust, flexible, and accurate estimates for vector-valued data with various structures in the mean vector or covariance matrix. By leveraging its strengths, our method is effectively generalized to the matrix setting, thereby improving classification performance. Through extensive simulation studies and real data applications, including electroencephalography (EEG) and magnetic resonance imaging (MRI) analysis, we demonstrate that the proposed method tends to outperform existing approaches across a variety of data structures.
💡 Research Summary
This paper tackles the challenging problem of classifying high‑dimensional matrix‑valued observations, a setting that arises frequently in neuroimaging, signal processing, and spatio‑temporal analysis. Assuming that each class follows a matrix‑normal distribution (X\mid Y=k\sim MN(M_k,U,V)), the authors develop a novel “Non‑Parametric Linear Discriminant Analysis” (NPMLDA) that blends a non‑parametric empirical Bayes (NPEB) framework with Non‑Parametric Maximum Likelihood Estimation (NPMLE).
The methodology proceeds in two stages. First, the row and column covariance matrices (U) and (V) are estimated using existing techniques such as GEMINI, yielding an estimate of the Kronecker product (\Sigma=V\otimes U) and its square‑root inverse (\Sigma^{-1/2}=V^{-1/2}\otimes U^{-1/2}). Second, each matrix observation is vectorized and scaled: (z_i^{(k)}=\Sigma^{-1/2}\operatorname{vec}(X_i^{(k)})). After this transformation, the data follow a standard multivariate normal distribution with mean (\mu_k=\Sigma^{-1/2}\operatorname{vec}(M_k)) and identity covariance.
The core innovation lies in estimating the scaled mean vectors (\mu_k) without imposing sparsity, low‑rank, or separability constraints. For each coordinate (j) of the scaled vectors, the authors construct a hierarchical Bayesian model: the sample mean (\bar z^{(k)}_j) is normally distributed around the true coordinate (\mu^{(k)}j) with variance (1/n_k), while (\mu^{(k)}j) itself is drawn from an unknown distribution (G^{(k)}) on the real line. The distribution (G^{(k)}) is estimated non‑parametrically by maximizing the marginal likelihood of the observed means, i.e., solving a NPMLE problem. To make this infinite‑dimensional optimization tractable, they approximate (G^{(k)}) by a finite mixture of point masses (grid points (\nu\ell) with weights (\omega\ell)), using algorithms from Lindsey (1995), Koenker‑Mizera (2014), Dicker‑Zhao (2016), and others. The resulting posterior mean (\hat\mu^{(k)}_j) provides a Bayes estimator of each coordinate, which are assembled into the estimated scaled mean vectors (\hat c_k=\Sigma^{-1/2}\operatorname{vec}(M_k)).
With (\hat\Sigma^{-1/2}) and (\hat c_k) in hand, the classic Fisher discriminant function is approximated as
\
Comments & Academic Discussion
Loading comments...
Leave a Comment