Asymptotic Model Selection for Naive Bayesian Networks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. This formula deviates from the standard BIC score. Our work provides a concrete example that the BIC score is generally not valid for statistical models that belong to a stratified exponential family. This stands in contrast to linear and curved exponential families, where the BIC score has been proven to provide a correct approximation for the marginal likelihood.

💡 Research Summary

**
The paper revisits the widely used Bayesian Information Criterion (BIC) and demonstrates that its standard form is not universally valid for all statistical models, particularly those belonging to the stratified exponential family (SEF). While BIC has been proven to be a consistent approximation of the marginal likelihood for linear and curved exponential families—where the parameter space forms a single smooth manifold—the authors focus on a class of models where the parameter space is a union of manifolds of different dimensions, separated by singular boundaries.

The concrete example examined is a naive Bayesian network with two hidden (latent) binary states and binary observable features. In such a network, the overall model is “layered”: each hidden state defines its own exponential family, and the full model is a mixture of two distinct families. This structure leads to a situation where the usual BIC penalty (-\frac{k}{2}\log N) (with (k) the total number of free parameters and (N) the sample size) fails to capture the true asymptotic behavior of the log‑marginal likelihood.

Key contributions

Derivation of a closed‑form asymptotic formula – By applying a refined Laplace approximation that accounts for multiple singularities introduced by the hidden states, the authors obtain an explicit expression for the log‑marginal likelihood:

\

Asymptotic Model Selection for Naive Bayesian Networks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment