Faithful Group Shapley Value
Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.
💡 Research Summary
Data valuation is increasingly critical for machine‑learning‑driven economies, and the Shapley value has become the de‑facto standard for assigning a fair monetary worth to individual training points. In many real‑world settings, however, data are supplied in batches by organizations, institutions, or companies, which calls for a group‑level valuation method. Existing extensions, collectively referred to as Group Shapley Value (GSV), simply treat each pre‑defined group as an atomic player and apply the classic Shapley formula. The authors expose a previously unstudied vulnerability: shell‑company attacks. By strategically splitting a data‑owner’s dataset into several smaller subsidiaries, an adversary can inflate the total Shapley payout because GSV’s value for a group depends on how the remaining data are partitioned. The paper formalizes this attack theoretically, showing that under a mild “prudence” condition on the expected utility (Δ³Ū(s) > 0), the expected GSV of a merged group is strictly smaller than the sum of the GSVs of its split parts. This phenomenon is empirically demonstrated and poses a serious fairness risk for data marketplaces, copyright compensation, and collaborative learning.
To remedy the problem, the authors introduce a Faithfulness axiom: the valuation of a given group must be invariant to any partition of the other data. Together with the classic Shapley axioms (null player, symmetry, linearity, efficiency), they define a five‑axiom system for faithful group data valuation. They prove (Theorem 1) that the only valuation satisfying all five axioms is the sum of the individual Shapley values of the group’s members: \
Comments & Academic Discussion
Loading comments...
Leave a Comment