Learn2Match is a POMG-based MARL benchmark for two-sided matching with temporally extended feedback; independent PPO yields higher social welfare and lower regret than CA-ETC but higher information-friction loss.
Dynamic matching bandit for two-sided online markets.arXiv preprint arXiv:2205.03699, 2022
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LinMatch recasts linear matching bandits as maximum-weight matching LPs solvable by the Hungarian algorithm and proves tight regret bounds of tilde Theta(d sqrt(MKT)).
citing papers explorer
-
Learn to Match: Two-Sided Matching with Temporally Extended Feedback
Learn2Match is a POMG-based MARL benchmark for two-sided matching with temporally extended feedback; independent PPO yields higher social welfare and lower regret than CA-ETC but higher information-friction loss.
-
A Linear Matching Bandit Approach to Online Multi-Human Multi-Robot Teaming
LinMatch recasts linear matching bandits as maximum-weight matching LPs solvable by the Hungarian algorithm and proves tight regret bounds of tilde Theta(d sqrt(MKT)).