Finite-sample performance of the maximum likelihood estimator in logistic regression
Logistic regression is a classical model for describing the probabilistic dependence of binary responses to multivariate covariates. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression, assessed in terms of logistic risk. We consider two questions: first, that of the existence of the MLE (which occurs when the dataset is not linearly separated), and second, that of its accuracy when it exists. These properties depend on both the dimension of covariates and the signal strength. In the case of Gaussian covariates and a well-specified logistic model, we obtain sharp non-asymptotic guarantees for the existence and excess logistic risk of the MLE. We then generalize these results in two ways: first, to non-Gaussian covariates satisfying a certain two-dimensional margin condition, and second to the general case of statistical learning with a possibly misspecified logistic model. Finally, we consider the case of a Bernoulli design, where the behavior of the MLE is highly sensitive to the parameter direction.
💡 Research Summary
This paper investigates two fundamental questions concerning the maximum‑likelihood estimator (MLE) in logistic regression: (i) under what conditions does the MLE exist, and (ii) how accurate is it when it does exist. The authors adopt the logistic loss (negative log‑likelihood) as the performance metric and study the excess risk L(θ̂_n)−L(θ*), where θ* minimizes the population risk.
The analysis proceeds in four increasingly general settings.
-
Gaussian design, well‑specified model.
Assuming X∼N(0,I_d) and P(Y=1|X)=σ(⟨θ*,X⟩) with signal strength B=‖θ*‖, the paper derives explicit non‑asymptotic sample‑size thresholds guaranteeing existence of the MLE with probability at least 1−δ. Specifically, n≥C·(d+ B²·log(1/δ)) suffices, where C is an absolute constant. When this condition holds, the excess logistic risk satisfies
L(θ̂_n)−L(θ*) ≤ C’·(d+log(1/δ))/n
with the same confidence level. These bounds are sharp in the sense that they match known asymptotic limits (χ²‑type fluctuations) while remaining valid for arbitrary ratios among n, d, and B. -
Regular (non‑Gaussian) designs.
The authors introduce a two‑dimensional margin condition that captures a broad class of covariate distributions, including log‑concave, sub‑Gaussian, and i.i.d. coordinate models. For any distribution satisfying this “regularity” with parameter κ∈(0,1], the existence condition becomes n≥C·κ⁻²·(d+ B²·log(1/δ)), and the excess risk bound acquires the same κ⁻² factor. Thus, the Gaussian results extend up to a multiplicative penalty that quantifies how far the design deviates from isotropy. -
Misspecified logistic models.
When the true conditional distribution P(Y|X) does not belong to the logistic family, the paper still defines θ* as the population risk minimizer and shows that, under the same regularity assumptions on X, the MLE (if it exists) enjoys the same O((d+log(1/δ))/n) excess‑risk rate, up to constants that now also depend on the model misspecification error. This demonstrates the robustness of the MLE to moderate departures from the logistic link. -
Bernoulli (binary) designs.
For covariates taking values in {0,1}ⁿ, the authors uncover a pronounced sensitivity of the MLE to the direction of θ*. If θ* aligns closely with a coordinate axis, the data are far more likely to be linearly separable, causing the MLE to fail to exist. The paper quantifies this phenomenon via a direction‑dependent margin function and provides explicit sample‑size thresholds that reflect the geometry of θ*.
Technical contributions include a refined convex‑localization argument, sharp deviation bounds for the empirical gradient, and lower bounds on the empirical Hessian that together enable the non‑asymptotic analysis. The proofs combine tools from empirical process theory, random matrix concentration, and geometric probability (e.g., spherical caps).
Overall, the work bridges a gap between classical asymptotic theory (which assumes fixed d and B) and high‑dimensional asymptotics (which fix B while letting d/n converge). By delivering finite‑sample guarantees that remain valid for arbitrary relationships among n, d, B, and the covariate distribution, the paper provides practitioners with concrete criteria for when logistic‑regression MLE can be trusted and how many samples are needed to achieve a desired predictive accuracy.
Comments & Academic Discussion
Loading comments...
Leave a Comment