Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

Ameesh Shah; Beyazit Yalcinkaya; Hanna Krasowski; Marcell Vazquez-Chanlatte; Sanjit A. Seshia

arxiv: 2511.02304 · v2 · pith:JVV5I46Qnew · submitted 2025-11-04 · 💻 cs.MA · cs.AI· cs.CL· cs.FL· cs.LG

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

Beyazit Yalcinkaya , Marcell Vazquez-Chanlatte , Ameesh Shah , Hanna Krasowski , Sanjit A. Seshia This is my paper

classification 💻 cs.MA cs.AIcs.CLcs.FLcs.LG

keywords learningcooperativemulti-agentpoliciestasksacc-marlagentsautomata-conditioned

0 comments

read the original abstract

We study learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks assigned to agents enables breaking down a team-level objective into simpler, smaller sub-tasks. However, existing approaches remain sample-inefficient and are limited to the single-task case, requiring retraining policies for each new task. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify challenges to the feasibility of ACC-MARL, propose solutions, and prove that our approach is optimal. We further show that learned value functions can be used to assign tasks optimally at test time. Experiments demonstrate emergent task-aware, multi-step coordination among agents, such as pressing a button to unlock a door, holding the door, and short-circuiting tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

SpecRLBench is a new benchmark evaluating generalization of LTL-guided RL methods across navigation and manipulation domains with static/dynamic environments and varied robot dynamics.