Overcoming Representation Bias in Fairness-Aware data Repair using Optimal Transport
Optimal transport (OT) has an important role in transforming data distributions in a manner which engenders fairness. Typically, the OT operators are learnt from the unfair attribute-labelled data, and then used for their repair. Two significant limitations of this approach are as follows: (i) the OT operators for underrepresented subgroups are poorly learnt (i.e. they are susceptible to representation bias); and (ii) these OT repairs cannot be effected on identically distributed but out-of-sample (i.e.\ archival) data. In this paper, we address both of these problems by adopting a Bayesian nonparametric stopping rule for learning each attribute-labelled component of the data distribution. The induced OT-optimal quantization operators can then be used to repair the archival data. We formulate a novel definition of the fair distributional target, along with quantifiers that allow us to trade fairness against damage in the transformed data. These are used to reveal excellent performance of our representation-bias-tolerant scheme in simulated and benchmark data sets.
💡 Research Summary
The paper tackles two intertwined challenges in fairness‑aware data repair: (i) the optimal transport (OT) operators learned from training data are poorly estimated for under‑represented sub‑groups, leading to representation bias, and (ii) once learned, these operators cannot be directly applied to identically distributed but out‑of‑sample (archival) data. To overcome both issues, the authors introduce a Bayesian non‑parametric (BNP) stopping rule that governs the learning of each attribute‑conditioned component of the data distribution.
The methodology proceeds as follows. For each combination of a non‑protected attribute u and a protected attribute s (the paper assumes binary Bernoulli variables for simplicity), the class‑conditional feature distribution F_{u,s} is modeled with a Dirichlet Process (DP) prior F ∼ DP( \hat{F}_0, ν_0 ). As data points are observed sequentially, the posterior DP F_k = DP( \hat{F}_k, ν_k ) is updated, where \hat{F}_k is a weighted mixture of the prior mean and the empirical measure of the first k samples.
The key innovation is a data‑driven stopping rule based on the Kullback‑Leibler divergence between successive Dirichlet posteriors: the smallest k for which KL
Comments & Academic Discussion
Loading comments...
Leave a Comment