Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The increasing reliance on human preference feedback to judge AI-generated pseudo labels has created a pressing need for principled, budget-conscious data acquisition strategies. We address the crucial question of how to optimally allocate a fixed annotation budget between ground-truth labels and pairwise preferences in AI. Our solution, grounded in semi-parametric inference, casts the budget allocation problem as a monotone missing data framework. Building on this formulation, we introduce Preference-Calibrated Active Learning (PCAL), a novel method that learns the optimal data acquisition strategy and develops a statistically efficient estimator for functionals of the data distribution. Theoretically, we prove the asymptotic optimality of our PCAL estimator and establish a key robustness guarantee that ensures robust performance even with poorly estimated nuisance models. Our flexible framework applies to a general class of problems, by directly optimizing the estimator’s variance instead of requiring a closed-form solution. This work provides a principled and statistically efficient approach for budget-constrained learning in modern AI. Simulations and real-data analysis demonstrate the practical benefits and superior performance of our proposed method.

💡 Research Summary

The paper tackles a practical yet under‑explored problem: how to allocate a fixed annotation budget between expensive ground‑truth labels and cheap pairwise preference feedback when training models that rely on both sources of supervision. The authors formalize the setting as a monotone missing‑data problem under the Missing‑At‑Random (MAR) assumption. Each data point consists of covariates X, two AI‑generated pseudo‑outcomes (W₁, W₂), an optional true outcome Y, and a preference label V that indicates which pseudo‑outcome is closer to Y. Acquiring Y (together with V) costs ρ > 1 units, while acquiring V alone costs 1 unit. The budget constraint ρ·n_lab + n_pre ≤ B translates into an expected‑cost constraint on the propensity scores α₁(x,w₁,w₂), α₂(x,w₁,w₂), α₃(x,w₁,w₂) that govern whether a sample receives full labeling, preference‑only labeling, or remains unlabeled.

The methodological core is a semi‑parametric inference framework. The target parameter θ = θ(P_{X,Y}) is any functional of the joint distribution of covariates and true outcomes (e.g., a regression coefficient). The authors derive the efficient influence function (EIF) for θ under the three‑type missingness structure (Proposition 3.1). The EIF explicitly depends on the propensity scores, allowing the asymptotic variance of any regular estimator to be expressed as a functional V(α) = E

Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs

💡 Research Summary

Comments & Academic Discussion

Leave a Comment