Proper Correlation Coefficients for Nominal Random Variables
This paper develops an intuitive concept of perfect dependence between two variables of which at least one has a nominal scale. Perfect dependence is attainable for all marginal distributions. It furthermore proposes a set of dependence measures that are 1 if and only if this perfect dependence is satisfied. The advantages of these dependence measures relative to classical dependence measures like contingency coefficients, Goodman-Kruskal’s lambda and tau and the so-called uncertainty coefficient are twofold. Firstly, they are defined if one of the variables exhibits continuities. Secondly, they satisfy the property of attainability. That is, they can take all values in the interval [0,1] irrespective of the marginals involved. Both properties are not shared by classical dependence measures which need two discrete marginal distributions and can in some situations yield values close to 0 even though the dependence is strong or even perfect. Additionally, the paper provides a consistent estimator for one of the new dependence measures together with its asymptotic distribution under independence as well as in the general case. This allows to construct confidence intervals and an independence test with good finite sample properties, as a subsequent simulation study shows. Finally, two applications on the dependence between the variables country and income, and country and religion, respectively, illustrate the use of the new measure.
💡 Research Summary
**
The paper addresses a long‑standing gap in statistical dependence measurement for nominal variables, especially when one of the variables is continuous or mixed. Existing measures such as Cramér’s V, Goodman‑Kruskal’s λ and τ, and the uncertainty coefficient are limited to fully discrete margins and often cannot attain the theoretical maximum of 1, even under strong dependence, because their attainable range depends on the marginal distributions.
To overcome these limitations, the author first formalizes a notion of perfect dependence for nominal (or nominal‑continuous) random variables. By invoking the Fréchet–Hoeffding bounds, a joint distribution is said to be perfectly dependent if it coincides with either the upper bound min{F_X, F_Y} or the lower bound max{0, F_X+F_Y−1} after an appropriate relabeling (permutation) of the nominal categories. This definition is deliberately marginal‑invariant: for any given marginal distributions, a permutation can be found that makes the joint distribution achieve the bound, guaranteeing the property of attainability.
Next, the paper enumerates a set of desirable properties for a dependence measure in this context: existence, normalization to
Comments & Academic Discussion
Loading comments...
Leave a Comment