Combinatorial Allocation Bandits with Nonlinear Arm Utility

Combinatorial Allocation Bandits with Nonlinear Arm Utility
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A matching platform is a system that matches different types of participants, such as companies and job-seekers. In such a platform, merely maximizing the number of matches can result in matches being concentrated on highly popular participants, which may increase dissatisfaction among other participants, such as companies, and ultimately lead to their churn, reducing the platform’s profit opportunities. To address this issue, we propose a novel online learning problem, Combinatorial Allocation Bandits (CAB), which incorporates the notion of arm satisfaction. In CAB, at each round $t=1,\dots,T$, the learner observes $K$ feature vectors corresponding to $K$ arms for each of $N$ users, assigns each user to an arm, and then observes feedback following a generalized linear model (GLM). Unlike prior work, the learner’s objective is not to maximize the number of positive feedback, but rather to maximize the arm satisfaction. For CAB, we provide an upper confidence bound algorithm that achieves an approximate regret upper bound, which matches the existing lower bound for the special case. Furthermore, we propose a TS algorithm and provide an approximate regret upper bound. Finally, we conduct experiments on synthetic data to demonstrate the effectiveness of the proposed algorithms compared to other methods.


💡 Research Summary

The paper introduces a novel online learning framework called Combinatorial Allocation Bandits (CAB) to address a practical problem in matching platforms: the concentration of matches on a few popular arms (e.g., companies, popular users) which leads to dissatisfaction and churn among less‑selected arms. Traditional bandit literature focuses on maximizing the number of positive feedback events (clicks, matches), which can exacerbate this imbalance. To capture business‑relevant objectives, the authors model each arm’s “satisfaction” as a concave (diminishing‑returns) function of the number of matches it receives. The goal of CAB is to maximize the cumulative expected arm satisfaction over a horizon of T rounds, rather than the raw count of matches.

In each round t, the learner observes K context vectors ϕ_t(i,a)∈ℝ^d for every user i∈


Comments & Academic Discussion

Loading comments...

Leave a Comment