Converging to stability in two-sided bandits: The case of unknown preferences on both sides of a matching market.arXiv preprint arXiv:2302.06176, 2023

Gaurab Pokharel, Sanmay Das · 2023 · arXiv 2302.06176

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Learn to Match: Two-Sided Matching with Temporally Extended Feedback

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

Learn2Match is a POMG-based MARL benchmark for two-sided matching with temporally extended feedback; independent PPO yields higher social welfare and lower regret than CA-ETC but higher information-friction loss.

citing papers explorer

Showing 1 of 1 citing paper.

Learn to Match: Two-Sided Matching with Temporally Extended Feedback cs.LG · 2026-06-04 · unverdicted · none · ref 45
Learn2Match is a POMG-based MARL benchmark for two-sided matching with temporally extended feedback; independent PPO yields higher social welfare and lower regret than CA-ETC but higher information-friction loss.

Converging to stability in two-sided bandits: The case of unknown preferences on both sides of a matching market.arXiv preprint arXiv:2302.06176, 2023

fields

years

verdicts

representative citing papers

citing papers explorer