Microarrays, Empirical Bayes and the Two-Groups Model
The classic frequentist theory of hypothesis testing developed by Neyman, Pearson and Fisher has a claim to being the twentieth century’s most influential piece of applied mathematics. Something new is happening in the twenty-first century: high-throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The two-groups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the two-groups setting, with particular attention focused on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in large-scale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals and Bayesian competitors to the two-groups model.
💡 Research Summary
The paper addresses a fundamental shift in statistical practice brought about by high‑throughput technologies such as microarrays, next‑generation sequencing, and neuro‑imaging, which routinely generate thousands to tens of thousands of simultaneous hypothesis tests. Classical Neyman‑Pearson‑Fisher theory was designed for a single test or a modest number of tests; when the number of tests becomes large, the traditional focus on controlling a family‑wise error rate (FWER) leads to overly conservative procedures and a severe loss of power.
Efron proposes the two‑groups model as a unifying Bayesian framework for large‑scale testing. In this model each of the N tests belongs either to a null group (no true effect) with prior probability p₀ or to a non‑null group (true effect) with probability p₁=1−p₀. The observed z‑scores follow a mixture density
f(z)=p₀ f₀(z)+p₁ f₁(z),
where f₀ is the null density and f₁ the non‑null density. Applying Bayes’ theorem yields the local false discovery rate
fdr(z)=Pr(null|Z=z)=p₀ f₀(z)/f(z).
The paper shows that the widely used Benjamini–Hochberg (BH) false discovery rate (FDR) procedure is essentially a frequentist implementation of a Bayesian decision rule: the BH cutoff selects all tests whose estimated posterior probability of being null does not exceed a pre‑specified level q. The cumulative FDR, denoted Fdr(z), is the expectation of fdr(Z) over the region Z≤z, establishing a clear mathematical link between the two concepts.
A central contribution is the emphasis on empirical Bayes estimation. In practice the theoretical null N(0,1) often fails (e.g., the education data example where the z‑scores are shifted and over‑dispersed). Efron therefore estimates the overall mixture density f(z) directly from the data, using a Poisson generalized linear model (GLM) on binned counts of z‑scores. By fitting a 7‑degree polynomial to log f(z) (the “locfdr” method), the full shape of the mixture—including heavy tails—is captured without imposing parametric forms on f₁.
When the theoretical null is inadequate, an “empirical null” N(μ̂,σ̂²) is fitted to the central bulk of the data, providing a more realistic f₀(z). This adjustment prevents both overly liberal and overly conservative FDR estimates. The prior proportion p₀ can also be estimated from the data (e.g., Storey’s λ method), but the paper notes that when p₀≈0.9 the exact value has limited impact on the final fdr calculations.
Power considerations are addressed by allowing test‑specific prior probabilities p₀(i). If a researcher has prior knowledge that certain genes are “hot prospects,” assigning them a lower p₀(i) dramatically reduces their individual fdrᵢ(z) and raises the odds of being declared non‑null. This leads to a practical Bayes‑factor threshold: fdr≤0.20 corresponds roughly to a Bayes factor ≥36, a much stricter criterion than the usual one‑sided p‑value≈0.001.
Correlation among tests, inevitable in imaging or spatial transcriptomics, violates the independence assumption underlying the BH theorem. The paper discusses conditional FDR, permutation‑based adjustments, and the need for multivariate empirical Bayes models that explicitly model dependence structures.
Beyond individual tests, the two‑groups framework extends naturally to group‑level inference (e.g., pathway analysis). By aggregating tests into biologically meaningful sets and computing a set‑wise fdr, researchers gain higher power and more interpretable results. The approach also supports simultaneous construction of multiple confidence intervals, offering a coherent frequentist–Bayesian hybrid for reporting uncertainty.
In conclusion, Efron demonstrates that Bayesian ideas—particularly empirical Bayes estimation of the null and mixture densities—can be seamlessly integrated with frequentist FDR control, yielding procedures that are both theoretically sound and practically powerful for modern high‑dimensional data. This synthesis revitalizes statistical theory for the “big data” era and provides a roadmap for applying similar ideas across genomics, proteomics, neuro‑imaging, and other fields where massive simultaneous testing is the norm.
Comments & Academic Discussion
Loading comments...
Leave a Comment