Modeling asymmetry in multi-way contingency tables with ordinal categories via f-divergence

Modeling asymmetry in multi-way contingency tables with ordinal categories via f-divergence
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study introduces a novel model that effectively captures asymmetric structures in multivariate contingency tables with ordinal categories. Leveraging the principle of maximum entropy, our approach employs f-divergence to provide a rational model under the presence of a ``prior guess.’’ Inspired by the constraints used in the derivation of multivariate normal distributions, we demonstrate that the proposed model minimizes f-divergence from complete symmetry under specific constraints. The proposed model encompasses existing asymmetry models as special cases while offering remarkably high interpretability. By modifying divergence measures included in f-divergence, the model provides the flexibility to adapt to specific probabilistic structures of interest. Furthermore, we established theorems that show that a complete symmetry model can be decomposed into two or more models, each imposing less restrictive parameter constraints. We also investigated the properties of the goodness-of-fit statistics with an emphasis on the likelihood ratio and Wald test statistics. Extensive Monte Carlo simulations confirmed the nominal size, high power, and robustness of the choice of f-divergence. Finally, an application to real-world data highlights the practical utility of the proposed model for analyzing asymmetric structures in ordinal contingency tables.


💡 Research Summary

This paper introduces a unified framework for modeling asymmetric structures in multi‑way contingency tables with ordinal categories. The authors start from the principle of maximum entropy and treat the completely symmetric model (denoted S) as a “prior guess.” Under this reference distribution, they seek the distribution that minimizes an f‑divergence subject to constraints on the sums of probabilities over each symmetry class, the marginal means of the ordinal scores, and the second‑order mixed moments (which correspond to covariances once the means are fixed).

The f‑divergence class, originally defined by Ali‑Silvey and Csiszár, includes the Kullback‑Leibler (KL), reverse KL, Hellinger, Pearson χ² and many other divergences through a single parameter λ (the power‑divergence family). By choosing a twice‑differentiable strictly convex function f, its derivative F and inverse F⁻¹ appear in the closed‑form solution of the constrained minimization problem. The resulting cell probabilities have the form

π_i = π_i^S · F⁻¹( u_iᵀα + u_iᵀB u_i + γ_i ),

where u_i is the vector of ordinal scores for cell i, α is a T‑dimensional vector of first‑order effects, B is a symmetric T × T matrix capturing second‑order interactions, and γ_i is constant within each symmetry class D(i). This expression mirrors the multivariate normal distribution, which maximizes differential entropy under mean‑covariance constraints, thereby providing an intuitive discrete analogue.

Two important special cases are examined. When f(x)=x log x−x+1 (KL divergence), F⁻¹(x)=eˣ and the model reduces to

π_i = π_i^S exp( u_iᵀα + u_iᵀB u_i + γ_i ).

In this setting the conditional probabilities can be written as ratios of latent parameters θ_i = exp(u_iᵀα + u_iᵀB u_i), yielding the Gaussian Symmetry (GS) model. The GS model is shown to be equivalent to the previously proposed Generalized Linear Symmetry (GLS) model and, by imposing further restrictions on B, to the LS and extended LS (ELS) models. Thus the new framework subsumes all earlier asymmetric models as special cases.

When f(x)=(x−1)²/2 (Pearson χ² divergence), F⁻¹(x)=x+1 and the model becomes linear in the score terms, offering an alternative representation that can be more appropriate when the data are heavy‑tailed or when a linear adjustment is desired. By varying λ within the power‑divergence family, practitioners can select the divergence that best matches the underlying probabilistic structure (e.g., Hellinger distance for robustness, χ² for count data).

The authors prove that a fully symmetric model can be decomposed into two or more sub‑models, each imposing weaker constraints. This decomposition theorem provides a systematic way to attribute observed asymmetry to distinct sources (e.g., marginal versus interaction effects).

Statistical inference is addressed by deriving the asymptotic distributions of the likelihood‑ratio (LR) and Wald test statistics under the f‑divergence framework. The authors show that, regardless of the chosen λ, the tests maintain nominal size and exhibit high power when true asymmetry is present. Extensive Monte‑Carlo simulations confirm these theoretical results and demonstrate robustness of the procedure across a wide range of λ values.

A simulation study linking the discrete model to a continuous multivariate normal distribution validates that, when the ordinal scores are integer and the constraints correspond to true means and covariances, the proposed estimator recovers the underlying parameters accurately.

The methodology is applied to two real data sets: (1) a regional migration table where flows from city A to B are not mirrored by flows from B to A, and (2) a public‑opinion survey tracking changes in policy support over time. In both cases the conventional symmetric model fails to fit (large χ² statistics), whereas the f‑divergence‑based asymmetric model yields significant parameter estimates that are readily interpretable (e.g., a higher propensity for upward mobility in income categories).

In summary, the paper contributes (i) a general maximum‑entropy formulation using f‑divergence, (ii) a closed‑form solution that unifies and extends existing asymmetric models, (iii) clear parameter interpretations linked to ordinal scores, (iv) generalized inferential tools (LR and Wald tests) that are robust to the choice of divergence, and (v) empirical evidence of practical usefulness. Future work may explore Bayesian priors for the “prior guess,” high‑dimensional variable selection, and extensions to non‑ordinal categorical data.


Comments & Academic Discussion

Loading comments...

Leave a Comment