Rethinking Disentanglement under Dependent Factors of Variation
Representation learning is an approach that allows to discover and extract the factors of variation from the data. Intuitively, a representation is said to be disentangled if it separates the different factors of variation in a way that is understandable to humans. Definitions of disentanglement and metrics to measure it usually assume that the factors of variation are independent of each other. However, this is generally false in the real world, which limits the use of these definitions and metrics to very specific and unrealistic scenarios. In this paper we give a definition of disentanglement based on information theory that is also valid when the factors of variation are not independent. Furthermore, we relate this definition to the Information Bottleneck Method. Finally, we propose a method to measure the degree of disentanglement from the given definition that works when the factors of variation are not independent. We show through different experiments that the method proposed in this paper correctly measures disentanglement with non-independent factors of variation, while other methods fail in this scenario.
💡 Research Summary
The paper addresses a fundamental limitation of current disentanglement research: most definitions and evaluation metrics assume that the underlying factors of variation are statistically independent. In real‑world data, factors such as color, shape, size, or lighting are often correlated, and nuisance variables further complicate the picture. Under these conditions, existing metrics (e.g., MIG, SAP, DCI) either become ill‑defined or give misleading scores.
To overcome this, the authors propose an information‑theoretic definition of disentanglement that remains valid when factors are dependent. They introduce four conditional‑independence properties: (1) Factor‑Invariance – a latent variable Z_j conveys information about its target factor Y_i but is conditionally independent of all other factors given Y_i; (2) Nuisance‑Invariance – Z_j is also conditionally independent of nuisance variables N given Y_i; (3) Representation‑Invariance – Y_i can be predicted solely from Z_j, i.e., the rest of the latent vector provides no additional information; (4) Explicitness – the latent representation Z captures all information about each factor, formalized as X ↔ Z ↔ Y_i.
When a representation satisfies all four properties, each sub‑representation Z_j is both minimal (contains no extraneous information) and sufficient (contains all information needed for its factor). This aligns directly with the Information Bottleneck (IB) principle: the representation should minimize I(Z;X) while maximizing I(Z;Y_i) for each factor, controlled by a trade‑off parameter β. By treating each factor as a separate target, the IB objective naturally enforces the four properties simultaneously.
The authors also present a practical measurement framework. They estimate conditional mutual information using variational bounds and neural estimators to quantify each property: I(Z_j; ˜Y_i | Y_i) for factor‑invariance, I(Z_j; N | Y_i) for nuisance‑invariance, predictive performance of Y_i from Z_j alone for representation‑invariance, and reconstruction loss for explicitness. These estimates yield a single scalar “minimal‑sufficient score” that can be compared across models.
Experiments are conducted on two families of datasets. The first includes standard benchmark suites (dSprites, 3dShapes) where factors are approximately independent. The second introduces synthetic dependencies (e.g., color strongly correlated with shape) and adds nuisance variations. Models compared are the proposed β‑IB disentangler, β‑VAE, FactorVAE, and β‑TCVAE. Results show that on independent data all methods achieve similar scores, but on dependent data the traditional metrics collapse dramatically, while the minimal‑sufficient score remains high for the proposed method. Visualizations of the latent space confirm that each Z_j encodes its designated factor with negligible leakage from other factors or nuisances.
The paper discusses limitations: accurate estimation of conditional mutual information in high dimensions can be sample‑inefficient, and defining or observing nuisance variables may be non‑trivial in many applications. Moreover, the choice of β critically influences the balance between compression and sufficiency, suggesting a need for adaptive tuning strategies.
In conclusion, the work reframes disentanglement as an information‑theoretic problem of achieving minimal and sufficient representations for each factor, independent of any independence assumptions. By grounding the definition in the Information Bottleneck and providing concrete estimators for the four conditional‑independence properties, the authors deliver a robust, generalizable metric and a learning objective that work reliably even when factors are correlated. This contribution opens the door to principled disentangled representation learning in realistic, noisy, and causally entangled environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment