R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

Connor Mattson; Daniel S. Brown; Ellen Novoseller; Nicholas Waytowich; Varun Raveendra; Vernon J. Lawhern

arxiv: 2510.18085 · v2 · pith:6SNHEQ6Znew · submitted 2025-10-20 · 💻 cs.RO · cs.AI· cs.MA

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

Connor Mattson , Varun Raveendra , Ellen Novoseller , Nicholas Waytowich , Vernon J. Lawhern , Daniel S. Brown This is my paper

classification 💻 cs.RO cs.AIcs.MA

keywords demonstrationsmulti-agenthumanr2bcbehaviorapproachcloningimitation

0 comments

read the original abstract

Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Duet: Dual-Robot Understanding via Efficient Teaching
cs.RO 2026-06 unverdicted novelty 5.0

DUET pretrains collaborative policies on human-human VR demonstrations then fine-tunes on minimal robot teleoperation data, achieving equal or better performance than robot-only baselines with 5.4x faster collection a...