Multivariate Species Sampling Models
Species sampling processes have long served as the fundamental framework for modeling random discrete distributions and exchangeable sequences. However, data arising from distinct but related sources require a broader notion of probabilistic invariance, making partial exchangeability a natural choice. Countless models for partially exchangeable data, collectively known as dependent nonparametric priors, have been proposed. These include hierarchical, nested and additive processes, widely used in statistics and machine learning. Still, a unifying framework is lacking and key questions about their underlying learning mechanisms remain unanswered. We fill this gap by introducing multivariate species sampling models, a new general class of nonparametric priors that encompasses most existing finite- and infinite-dimensional dependent processes. They are characterized by the induced partially exchangeable partition probability function encoding their multivariate clustering structure. We establish their core distributional properties and analyze their dependence structure, demonstrating that borrowing of information across groups is entirely determined by shared ties. This provides new insights into the underlying learning mechanisms, offering, for instance, a principled rationale for the previously unexplained correlation structure observed in existing models. Beyond providing a cohesive theoretical foundation, our approach serves as a constructive tool for developing new models and opens novel research directions for capturing richer dependence structures beyond the framework of multivariate species sampling processes.
💡 Research Summary
**
This paper addresses a fundamental gap in Bayesian non‑parametric modelling for data that arise from several related but distinct sources. Classical species‑sampling processes (SSPs) such as the Dirichlet process assume full exchangeability of observations, which is too restrictive when data are collected from multiple populations that are exchangeable only within each population. The authors propose a unifying framework called multivariate species‑sampling processes (mSSPs) that extends the SSP construction to a vector of dependent random probability measures ((P_{1},\dots ,P_{J})).
An mSSP is defined by a common set of atoms ({\theta_{h}}{h\ge1}) drawn i.i.d. from a non‑atomic base measure (P{0}) and by a collection of group‑specific weight sequences (\pi_{j}=(\pi_{j,h})_{h\ge1}). Each random probability measure takes the form
\
Comments & Academic Discussion
Loading comments...
Leave a Comment