Multi-agent RL with league self-play trains quadrotors to exceed champion human performance in multi-player races above 22 m/s while cutting collisions by 50% and generalizing zero-shot to safer human interaction.
Tesauro, Temporal difference learning and TD-Gammon,Communi- cations of the ACM58–68 (1995)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning
Multi-agent RL with league self-play trains quadrotors to exceed champion human performance in multi-player races above 22 m/s while cutting collisions by 50% and generalizing zero-shot to safer human interaction.