The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently ?
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
method 1polarities
use method 1representative citing papers
A projected gradient descent algorithm for noisy inductive matrix completion achieves linear convergence and stable recovery at sample complexity governed by side-information dimension, extending to inexact side-information with optimal error degradation.
Provides the first finite-time convergence guarantees for Q-value iteration in general-sum Stackelberg Markov games.
Introduces robust estimators for linear Markov games in offline MARLHF that achieve O(ε^{1-o(1)}) or O(√ε) bounds on Nash or CCE gaps under uniform or unilateral coverage.
Introduces HS-S (aggregating dynamic threat powers) and Coco-S (fixed points of statewise HS Bellman operator) for stochastic games, proves they coincide for two players but disagree for three, shows uniqueness via extended axioms and topological degree theory, and gives sampling estimators.
citing papers explorer
-
Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation
The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
-
Sample-efficient inductive matrix completion with noise and inexact side-information
A projected gradient descent algorithm for noisy inductive matrix completion achieves linear convergence and stable recovery at sample complexity governed by side-information dimension, extending to inexact side-information with optimal error degradation.
-
Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games
Provides the first finite-time convergence guarantees for Q-value iteration in general-sum Stackelberg Markov games.
-
Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback
Introduces robust estimators for linear Markov games in offline MARLHF that achieve O(ε^{1-o(1)}) or O(√ε) bounds on Nash or CCE gaps under uniform or unilateral coverage.
-
Learning Strategic Value and Cooperation in Multi-Player Stochastic Games through Side Payments
Introduces HS-S (aggregating dynamic threat powers) and Coco-S (fixed points of statewise HS Bellman operator) for stochastic games, proves they coincide for two players but disagree for three, shows uniqueness via extended axioms and topological degree theory, and gives sampling estimators.