Spectral Thresholds in Correlated Spiked Models and Fundamental Limits of Partial Least Squares
We provide a rigorous random matrix theory analysis of spiked cross-covariance models where the signals across two high-dimensional data channels are partially aligned. These models are motivated by multi-modal learning and form the standard generative setting underlying Partial Least Squares (PLS), a widely used yet theoretically underdeveloped method. We show that the leading singular values of the sample cross-covariance matrix undergo a Baik-Ben Arous-Peche (BBP)-type phase transition, and we characterize the precise thresholds for the emergence of informative components. Our results yield the first sharp asymptotic description of the signal recovery capabilities of PLS in this setting, revealing a fundamental performance gap between PLS and the Bayes-optimal estimator. In particular, we identify the SNR and correlation regimes where PLS fails to recover any signal, despite detectability being possible in principle. These findings clarify the theoretical limits of PLS and provide guidance for the design of reliable multi-modal inference methods in high dimensions.
💡 Research Summary
The paper presents a rigorous random matrix theory (RMT) analysis of a high‑dimensional “correlated spiked” cross‑covariance model that underlies Partial Least Squares (PLS), a classical method for multi‑modal inference. Two data matrices, ˜X∈ℝ^{n×d_x} and ˜Y∈ℝ^{n×d_y}, are modeled as a sum of Gaussian noise (X, Y) and a low‑rank signal consisting of r spikes. Each spike k carries a signal‑to‑noise ratio (SNR) λ_{x,k} and λ_{y,k} and left singular vectors u_{x,k}, u_{y,k} that are partially aligned: ⟨u_{x,k}, u_{y,k}⟩≈ρ_k∈(−1,1). The aspect ratios d_x/n→α_x and d_y/n→α_y (α_x≥α_y>0) are kept fixed as n→∞.
First, the authors recall the bulk spectrum of the pure‑noise cross‑covariance S=XᵀY. Using free probability, they show that the empirical singular‑value distribution converges to a deterministic measure μ(α_x,α_y) whose Stieltjes‑type T‑transform t(z) satisfies a cubic equation P(α_x,α_y)(τ,z)=0. The right‑most edge of the bulk is ς⁺, expressed explicitly in terms of the unique positive root τ⁺ of P.
When spikes are added, the top 2r singular values of ˜S=˜Xᵀ˜Y separate from the bulk. For each spike k, two positive roots r⁺k and r⁻k of another cubic polynomial R(λ{x,k},λ{y,k},ρ_k) are defined. The smaller root r⁻_k is a decreasing function of ρ_k², meaning stronger alignment pushes the outlier further away from the bulk. The singular‑value mapping b(·) is non‑increasing; if r⁻_k>τ⁺ the corresponding singular value becomes an outlier at b(r⁻_k), otherwise it sticks to the bulk edge. The phase transition (detectability threshold) is therefore
min_k r⁻_k = τ⁺,
or equivalently λ_{x,k} λ_{y,k} ρ_k must be large enough to push r⁻_k beyond τ⁺. This is a BBP‑type transition generalized to a cross‑product setting.
The paper then translates these spectral results into performance guarantees for PLS, which estimates the shared latent subspace by the leading singular vectors of ˜S. The asymptotic overlap between the estimated and true signal directions is shown to be proportional to the outlier strength b(r⁻_k). Consequently, PLS can recover a component only when r⁻_k>τ⁺; otherwise the estimator is asymptotically orthogonal to the true signal. Importantly, even when the Bayes‑optimal estimator (derived via approximate message passing) can succeed for arbitrarily small ρ_k, PLS requires a strictly larger SNR because its spectral threshold exceeds τ⁺. This creates a fundamental performance gap: there exist regimes where detection is information‑theoretically possible but PLS (and similarly CCA) fails completely.
Numerical experiments corroborate the theory. Figures illustrate (i) the bulk density versus empirical histograms, (ii) the trajectories of the top two singular values as λ_x=λ_y varies, and (iii) phase diagrams in the (λ_x,λ_y) plane for PLS, CCA, single‑channel PCA, and the Bayes‑optimal method. The diagrams reveal that for modest correlations ρ, the region where PLS detects a signal can be strictly smaller than that for PCA on a single channel, highlighting a counter‑intuitive drawback of using cross‑covariance‑based methods in certain settings.
The authors discuss implications: (a) the alignment ρ and aspect ratios α_x,α_y critically shape the detectability boundary; (b) practitioners should be cautious when applying PLS in low‑correlation or low‑SNR regimes, as it may miss signals that are recoverable by more sophisticated Bayesian approaches; (c) the analytical framework can be extended to non‑Gaussian noise (via universality), to dense spike regimes (linear in dimension), and to nonlinear multi‑modal fusion techniques.
In summary, the paper provides the first exact asymptotic characterization of the singular‑value spectrum of a spiked cross‑covariance matrix, identifies precise BBP‑type thresholds for outlier emergence, and leverages these results to expose a fundamental limitation of Partial Least Squares. By contrasting PLS with the Bayes‑optimal estimator and Canonical Correlation Analysis, it clarifies when and why classical spectral methods fall short in high‑dimensional multi‑modal inference, offering valuable guidance for the design of more reliable algorithms.
Comments & Academic Discussion
Loading comments...
Leave a Comment