Stochastic Matching Bandits with Rare Optimization Updates
We introduce a bandit framework for stochastic matching under the multinomial logit (MNL) choice model. In our setting, $N$ agents on one side are assigned to $K$ arms on the other side, where each arm stochastically selects an agent from its assigned pool according to unknown preferences and yields a corresponding reward over a horizon $T$. The objective is to minimize regret by maximizing the cumulative revenue from successful matches. A naive approach requires solving an NP-hard combinatorial optimization problem at every round, resulting in a prohibitive computational cost. To address this challenge, we propose batched algorithms that strategically limit the number of times matching assignments are updated to $Θ(\log\log T)$ over the entire horizon. By invoking expensive combinatorial optimization only on a vanishing fraction of rounds, our algorithms substantially reduce overall computational overhead while still achieving a regret bound of $\widetilde{\mathcal{O}}(\sqrt{T})$.
💡 Research Summary
**
The paper introduces a novel bandit framework called Stochastic Matching Bandits (SMB) that captures the realities of modern two‑sided platforms where the “arm” side makes stochastic choices among multiple assigned agents. Unlike prior matching‑bandit work that assumes deterministic arm preferences and focuses on finding a stable matching, SMB models each arm’s selection using a Multinomial Logit (MNL) choice model. For arm (k) and an assigned set of agents (S_{k,t}) at round (t), the probability of selecting agent (n) is
\
Comments & Academic Discussion
Loading comments...
Leave a Comment