Reducing Biases in Record Matching Through Scores Calibration

Reducing Biases in Record Matching Through Scores Calibration
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Record matching models typically output a real-valued matching score that is later consumed through thresholding, ranking, or human review. While fairness in record matching has mostly been assessed using binary decisions at a fixed threshold, such evaluations can miss systematic disparities in the entire score distribution and can yield conclusions that change with the chosen threshold. We introduce a threshold-independent notion of score bias that extends standard group-fairness criteria-demographic parity (DP), equal opportunity (EO), and equalized odds (EOD)-from binary outputs to score functions by integrating group-wise metric gaps over all thresholds. Using this metric, we empirically show that several state-of-the-art deep matchers can exhibit substantial score bias even when appearing fair at commonly used thresholds. To mitigate these disparities without retraining the underlying matcher, we propose two model-agnostic post-processing methods that only require score evaluations on an (unlabeled) calibration set. Calib targets DP by aligning minority/majority score distributions to a common Wasserstein barycenter via a quantile-based optimal-transport map, with finite-sample guarantees on both residual DP bias and score distortion. C-Calib extends this idea to label-dependent notions (EO/EOD) by performing barycenter alignment conditionally on an estimated label, and we characterize how its guarantees depend on both sample size and label-estimation error. Experiments on standard record-matching benchmarks and multiple neural matchers confirm that Calib and C-Calib substantially reduce score bias with minimal loss in accuracy.


💡 Research Summary

This paper addresses a critical gap in the fairness literature for record‑matching systems: while most prior work evaluates fairness only after thresholding a matcher’s continuous score, it ignores systematic disparities that may exist across the entire score distribution. The authors therefore introduce a threshold‑independent score‑bias metric that extends the classic group‑fairness notions—demographic parity (DP), equal opportunity (EO), and equalized odds (EOD)—from binary decisions to the underlying scoring function. Concretely, the metric integrates the difference between group‑specific true‑positive‑rate (TPR), false‑positive‑rate (FPR) or positive‑rate (PR) curves over all possible thresholds, thus capturing cumulative performance gaps that would be invisible to single‑threshold analyses or to aggregate measures such as AUC.

To mitigate the identified bias without retraining the matcher, the authors propose two model‑agnostic post‑processing algorithms that require only an unlabeled calibration set.

  1. Calib targets DP. It treats the minority and majority groups’ score distributions as empirical measures and transports both to a common Wasserstein barycenter using a quantile‑based optimal‑transport map. The transformation is monotonic, preserves the

Comments & Academic Discussion

Loading comments...

Leave a Comment