Consistent model selection in a collection of stochastic block models

Consistent model selection in a collection of stochastic block models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce the penalized Krichevsky-Trofimov (KT) estimator as a convergent method for estimating the number of nodes clusters when observing multiple networks within both multi-layer and dynamic Stochastic Block Models. We establish the consistency of the KT estimator, showing that it converges to the correct number of clusters in both types of models when the number of nodes in the networks increases. Our estimator does not require a known upper bound on this number to be consistent. Furthermore, we show that these consistency results hold in both dense and sparse regimes, making the penalized KT estimator robust across various network configurations. We illustrate its performance on synthetic datasets.


💡 Research Summary

This paper addresses the problem of determining the number of communities (the model order) in collections of stochastic block models (SBMs), specifically in multi‑layer SBMs (MLSBM) and dynamic SBMs (DynSBM). The authors propose a penalized version of the Krichevsky‑Trofimov (KT) estimator, a Bayesian‑inspired estimator originally designed for categorical distributions, and adapt it to the network setting.

Key contributions are:

  1. Upper‑bound‑free consistency – The estimator converges to the true number of clusters (k_{0}) without requiring a pre‑specified upper bound on (k). This removes a restrictive assumption common in previous work.

  2. Unified treatment of dense and sparse regimes – The analysis covers both constant‑edge‑probability (dense) graphs and graphs whose edge probabilities decay as (\rho_{n}=O(1/n)) (sparse). In the sparse case the authors assume (n\rho_{n}\to\infty) (average degree diverges), which is sufficient for weak recovery and matches known detection thresholds.

  3. Explicit penalty terms – For MLSBM the penalty is
    \


Comments & Academic Discussion

Loading comments...

Leave a Comment