The Game Changer Problem: Controlling Equilibria with Discrete Rewards

Brandon Han; Shiyun Cheng; Xiaojin Zhu; Young Wu

arxiv: 2606.29012 · v1 · pith:MSVR7MZ2new · submitted 2026-06-27 · 💻 cs.GT

The Game Changer Problem: Controlling Equilibria with Discrete Rewards

Brandon Han , Young Wu , Shiyun Cheng , Xiaojin Zhu This is my paper

Pith reviewed 2026-06-30 08:18 UTC · model grok-4.3

classification 💻 cs.GT

keywords game changer problemequilibrium controldiscrete rewardsNash equilibriumzero-sum gamesgeneral-sum gamesdynamic programmingreward redesign

0 comments

The pith

An external designer can force any chosen pure action profile to become the unique Nash equilibrium by picking all rewards from a fixed finite set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the game changer problem in which a designer alters every entry of a game's reward matrix, but must draw each value from a predetermined finite set, with the goal of making one specific pure action profile the sole equilibrium. For two-player zero-sum games and for general-sum games the authors supply direct feasibility tests that decide whether such a redesign is possible. Because rewards are restricted to discrete values rather than arbitrary reals, the resulting programs admit exact optimal solutions that can be computed by dynamic programming, in contrast to earlier continuous formulations that rely on linear programming.

Core claim

In the game changer problem an external designer modifies a game's reward matrix so that a target pure action profile becomes the unique equilibrium, subject to every matrix entry belonging to a given finite set. Simple feasibility characterizations are given for two-player zero-sum games and for general-sum games. The discrete reward structure yields exact optimality and supports efficient dynamic programming algorithms, supplying a sharper alternative to prior continuous reward redesign formulations based on linear programming.

What carries the argument

The finite discrete reward set together with the feasibility conditions that certify when a target pure profile can be made the unique equilibrium.

If this is right

Feasibility reduces to simple checks rather than solving an optimization problem.
Dynamic programming yields exact optimal reward assignments for both zero-sum and general-sum cases.
The discrete formulation avoids the approximation gaps that appear when continuous linear programs are discretized after the fact.
Any finite reward set that satisfies the characterizations immediately produces a working redesign without further search.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same discrete-reward idea might be tested on games with three or more players to see whether analogous feasibility conditions survive.
If the finite set is interpreted as a menu of practical payoff levels, the characterizations give a direct way to decide whether a desired outcome can be engineered with those levels.
Dynamic programming routines could be run on small action spaces to produce explicit redesign tables for concrete games.

Load-bearing premise

The designer may assign any value from the finite reward set to any matrix entry without further limits on which entries can be changed or on the size of the action spaces.

What would settle it

A concrete two-player game and finite reward set together with a target profile for which the stated feasibility test returns yes, yet no assignment from the set actually makes that profile the unique equilibrium.

read the original abstract

We introduce the game changer problem, where an external designer modifies a game's reward matrix to make a target pure action profile the unique equilibrium, subject to the constraint that all entries of the reward matrix come from a finite set. We give simple feasibility characterizations for two-player zero-sum games and general-sum games, and the discrete reward structure yields exact optimality and enables efficient dynamic programming algorithms, providing a sharper alternative to prior continuous reward redesign formulations based on linear programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames equilibrium steering as selecting discrete rewards from a finite set to force a target pure profile, with simple feasibility rules and DP for two-player zero-sum and general-sum games.

read the letter

The core contribution is a new game changer problem: an external designer picks every entry of the reward matrix from a finite set to make one target pure action profile the unique equilibrium. They supply straightforward feasibility characterizations for two-player zero-sum and general-sum cases, and the discrete structure supports exact optimality via dynamic programming rather than the linear programs used in continuous redesign papers.

The feasibility conditions are presented cleanly and the DP approach follows naturally from the finite reward set, giving a precise alternative that avoids continuous relaxation. The problem statement is explicit about free choice from the set, so there is no hidden assumption there.

The main constraint is the two-player limit; nothing is shown for n-player games, and the DP will scale with action-space size and the cardinality of the reward set. Those are scope issues rather than errors in the two-player results.

This is useful for researchers in algorithmic game theory or mechanism design who need to control equilibria when only a discrete menu of rewards is available. A reader working on multi-agent systems with restricted incentives would find the characterizations and algorithms worth examining.

It deserves peer review. The formulation is distinct, the claims are stated without circularity, and the DP method is a concrete technical step that merits checking the details.

Referee Report

0 major / 3 minor

Summary. The paper introduces the game changer problem, in which an external designer modifies entries of a game's reward matrix—each drawn from a finite discrete set—to render a designated pure action profile the unique Nash equilibrium. For two-player zero-sum and general-sum games the authors supply feasibility characterizations; the discrete structure is further shown to admit exact optimality results together with efficient dynamic-programming algorithms, positioned as a sharper alternative to prior continuous-reward redesign formulations that rely on linear programming.

Significance. If the characterizations and DP optimality claims hold, the work supplies a precise, computationally tractable method for equilibrium control under the realistic constraint that rewards must belong to a finite set. The explicit feasibility conditions and the exact (rather than approximate) optimality results constitute a clear technical contribution relative to continuous LP baselines.

minor comments (3)

[§2] §2, Definition 1: the finite reward set R is introduced without an accompanying small-scale example that illustrates how the designer’s choice of entries interacts with the target profile; adding one would clarify the subsequent feasibility statements.
[§4.2] §4.2, Algorithm 1: the dynamic-programming recurrence is stated at a high level; the base case and the precise state representation (action-profile versus payoff-vector) should be written explicitly to allow direct implementation.
[Table 1] Table 1: the reported running times compare the DP approach only against a generic LP solver; a column showing the size of the action spaces used in the experiments would help readers assess scalability claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful summary of our work and the recommendation of minor revision. No specific major comments were provided in the report, so we have no individual points to address at this time.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces the game changer problem with discrete rewards from a finite set as an explicit modeling choice and derives feasibility characterizations plus DP algorithms directly from that structure. These are presented as theoretical results (feasibility conditions for zero-sum and general-sum cases) without any reduction to fitted parameters, self-definitional loops, or load-bearing self-citations. The discrete structure is an input assumption, not an output derived from the claimed characterizations. No equations or steps in the abstract or described claims exhibit the enumerated circular patterns; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are stated or derivable from the given text.

pith-pipeline@v0.9.1-grok · 5598 in / 958 out tokens · 28035 ms · 2026-06-30T08:18:27.404296+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

158 extracted references · 13 canonical work pages · 3 internal anchors

[1]

Apprenticeship learning via inverse reinforcement learning , author =
[2]

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 , pages =

Multiagent reinforcement learning: algorithm converging to nash equilibrium in general-sum discounted stochastic games , author =. Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 , pages =
[3]

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 , pages =

Internal implementation , author =. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 , pages =
[4]

Journal of the Operational Research Society , publisher =

On the uniqueness of solutions to linear programs , author =. Journal of the Operational Research Society , publisher =
[5]

SIAM journal on computing , publisher =

The nonstochastic multiarmed bandit problem , author =. SIAM journal on computing , publisher =
[6]

, author =

Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study. , author =
[7]

Defense against reward poisoning attacks in reinforcement learning , author =
[8]

Admissible Policy Teaching through Reward Design , author =
[9]

International Conference on Machine Learning and Data Mining in Pattern Recognition , pages =

Vulnerability of deep reinforcement learning to policy induction attacks , author =. International Conference on Machine Learning and Data Mining in Pattern Recognition , pages =
[10]

Dota 2 with large scale deep reinforcement learning , author =
[11]

International Conference on Artificial Intelligence and Statistics , pages =

Stochastic linear bandits robust to adversarial attacks , author =. International Conference on Artificial Intelligence and Statistics , pages =
[12]

ICML , pages =

Convergence problems of general-sum multiagent reinforcement learning , author =. ICML , pages =
[13]

International joint conference on artificial intelligence , volume = 17, pages =

Rational and convergent learning in stochastic games , author =. International joint conference on artificial intelligence , volume = 17, pages =
[14]

Advances in Neural Information Processing Systems , volume = 34, pages =

Offline rl without off-policy evaluation , author =. Advances in Neural Information Processing Systems , volume = 34, pages =
[15]

, author =

Libratus: The Superhuman AI for No-Limit Poker. , author =. IJCAI , pages =
[16]

Science , publisher =

Superhuman AI for multiplayer poker , author =. Science , publisher =
[17]

Foundations and Trends

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , author =. Foundations and Trends
[18]

, year = 2021, note =

Clancey, William J. , year = 2021, note =

2021
[19]

Crime and punishment in scientific research

Crime and punishment in scientific research , author =. 0803.4058 , archiveprefix =

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Pluto: The 'Other' Red Planet , author =
[21]

, year = 1979, address =

Clancey, William J. , year = 1979, address =

1979
[22]

, year = 1983, booktitle =

Clancey, William J. , year = 1983, booktitle =

1983
[23]

, year = 1984, booktitle =

Clancey, William J. , year = 1984, booktitle =

1984
[24]

When is Offline Two-Player Zero-Sum Markov Game Solvable? , author =
[25]

When is Offline Two-Player Zero-Sum

Cui, Qiwen and Du, Simon S , year = 2022, journal =. When is Offline Two-Player Zero-Sum

2022
[26]

Linear programming and extensions , author =
[27]

Autonomous Robots , publisher =

A taxonomy for multi-agent robotics , author =. Autonomous Robots , publisher =
[28]

Blackboard Systems , year = 1986, publisher =

1986
[29]

Volunteer's dilemma ---
[30]

Revisiting some common practices in cooperative multi-agent reinforcement learning , author =
[31]

Adversarial Attacks on Linear Contextual Bandits , author =
[32]

Adversarial policies: Attacking deep reinforcement learning , author =
[33]

SIAM Review , publisher =

f-Finger Morra , author =. SIAM Review , publisher =
[34]

2017 IEEE international conference on robotics and automation (ICRA) , pages =

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , author =. 2017 IEEE international conference on robotics and automation (ICRA) , pages =

2017
[35]

Proceedings of the AAAI Conference on Artificial Intelligence , volume = 34, pages =

Robust stochastic bandit algorithms under probabilistic unbounded adversarial attack , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume = 34, pages =
[36]

International Conference on Machine Learning , pages =

Adversarial policy learning in two-player competitive games , author =. International Conference on Machine Learning , pages =
[37]

Econometrica , publisher =

A simple adaptive procedure leading to correlated equilibrium , author =. Econometrica , publisher =
[38]

International Journal of Man-Machine Studies , volume = 20, number = 1, pages =

Strategic explanations for a diagnostic consultation system , author =. International Journal of Man-Machine Studies , volume = 20, number = 1, pages =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , issn =

work page doi:10.1016/s0020-7373(84)80003-6
[39]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas , year = 1983, journal =

1983
[40]

International conference on machine learning , pages =

Fictitious self-play in extensive-form games , author =. International conference on machine learning , pages =
[41]

International Conference on Autonomous Agents and Multiagent Systems , pages =

Towards a fast detection of opponents in repeated stochastic games , author =. International Conference on Autonomous Agents and Multiagent Systems , pages =
[42]

International Journal of Game Theory , publisher =

Uniqueness of equilibrium points in bimatrix games , author =. International Journal of Game Theory , publisher =
[43]

Journal of machine learning research , volume = 4, number =

Nash Q-learning for general-sum stochastic games , author =. Journal of machine learning research , volume = 4, number =
[44]

Adversarial attacks on neural network policies , author =
[45]

International Conference on Decision and Game Theory for Security , pages =

Deceptive reinforcement learning under adversarial manipulations on cost signals , author =. International Conference on Decision and Game Theory for Security , pages =
[46]

Proceedings of the 36th International Conference on Machine Learning , publisher =

Multi-Agent Adversarial Inverse Reinforcement Learning , author =. Proceedings of the 36th International Conference on Machine Learning , publisher =
[47]

Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 1 , location =

Value-Based Policy Teaching with Active Indirect Elicitation , author =. Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 1 , location =
[48]

Science , publisher =

Human-level performance in 3D multiplayer games with population-based reinforcement learning , author =. Science , publisher =
[49]

Offline decentralized multi-agent reinforcement learning , author =
[50]

International Conference on Machine Learning , pages =

Is pessimism provably efficient for offline rl? , author =. International Conference on Machine Learning , pages =
[51]

Advances in Neural Information Processing Systems , booktitle =

Adversarial attacks on stochastic bandits , author =. Advances in Neural Information Processing Systems , booktitle =
[52]

2020 57th ACM/IEEE Design Automation Conference (DAC) , pages =

TrojDRL: evaluation of backdoor attacks on deep reinforcement learning , author =. 2020 57th ACM/IEEE Design Automation Conference (DAC) , pages =

2020
[53]

The International Journal of Robotics Research , publisher =

Reinforcement learning in robotics: A survey , author =. The International Journal of Robotics Research , publisher =
[54]

Delving into adversarial attacks on deep policies , author =
[55]

Journal of Economic Dynamics and Control , publisher =

Learning competitive pricing strategies by multi-agent reinforcement learning , author =. Journal of Economic Dynamics and Control , publisher =
[56]

International Conference on Database and Expert Systems Applications , pages =

A multi-agent Q-learning framework for optimizing stock trading systems , author =. International Conference on Database and Expert Systems Applications , pages =
[57]

IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans , publisher =

A multiagent approach to q -learning for daily stock trading , author =. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans , publisher =
[58]

Multi-agent reinforcement learning in sequential social dilemmas , author =
[59]

IEEE Transactions on Games , publisher =

Multiagent inverse reinforcement learning for two-person zero-sum games , author =. IEEE Transactions on Games , publisher =
[60]

Tactics of adversarial attack on deep reinforcement learning agents , author =
[61]

Journal of Artificial Intelligence Research , volume = 66, pages =

Multi-agent inverse reinforcement learning for certain general-sum stochastic games , author =. Journal of Artificial Intelligence Research , volume = 66, pages =
[62]

Machine learning proceedings 1994 , publisher =

Markov games as a framework for multi-agent reinforcement learning , author =. Machine learning proceedings 1994 , publisher =

1994
[63]

International Conference on Machine Learning , pages =

Data poisoning attacks on stochastic bandits , author =. International Conference on Machine Learning , pages =
[64]

, author =

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns. , author =. IJCAI , pages =
[65]

ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

Action-manipulation attacks on stochastic bandits , author =. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

2020
[66]

Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning , author =
[67]

Algorithms in multi-agent systems: a holistic perspective from reinforcement learning and game theory , author =
[68]

Proceedings of the AAAI Conference on Artificial Intelligence , volume = 35, pages =

Stochastic Graphical Bandits with Adversarial Corruptions , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume = 35, pages =
[69]

Conference on Learning Theory , pages =

Corruption-robust exploration in episodic reinforcement learning , author =. Conference on Learning Theory , pages =
[70]

International Conference on Decision and Game Theory for Security , pages =

Data poisoning attacks in contextual bandits , author =. International Conference on Decision and Game Theory for Security , pages =
[71]

Advances in Neural Information Processing Systems , booktitle =

Policy poisoning in batch reinforcement learning and control , author =. Advances in Neural Information Processing Systems , booktitle =
[72]

Game Redesign in No-regret Game Playing , author =
[73]

Markov games of incomplete information for multi-agent reinforcement learning , author =
[74]

Uniqueness of solution in linear programming , author =
[75]

Dynamic economic emissions dispatch optimisation using multi-agent reinforcement learning , author =
[76]

Observable actions , author =

Markov perfect equilibrium: I. Observable actions , author =. Journal of Economic Theory , publisher =
[77]

Offline pre-trained multi-agent decision transformer: One big sequence model conquers all starcraftii tasks , author =
[78]

Naval Research Logistics Quarterly , publisher =

Constructing bimatrix games with special properties , author =. Naval Research Logistics Quarterly , publisher =
[79]

Journal of Dynamics & Games , publisher =

On the uniqueness of Nash equilibrium in strategic-form games , author =. Journal of Dynamics & Games , publisher =
[80]

Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm , author =

Showing first 80 references.

[1] [1]

Apprenticeship learning via inverse reinforcement learning , author =

[2] [2]

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 , pages =

Multiagent reinforcement learning: algorithm converging to nash equilibrium in general-sum discounted stochastic games , author =. Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 , pages =

[3] [3]

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 , pages =

Internal implementation , author =. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 , pages =

[4] [4]

Journal of the Operational Research Society , publisher =

On the uniqueness of solutions to linear programs , author =. Journal of the Operational Research Society , publisher =

[5] [5]

SIAM journal on computing , publisher =

The nonstochastic multiarmed bandit problem , author =. SIAM journal on computing , publisher =

[6] [6]

, author =

Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study. , author =

[7] [7]

Defense against reward poisoning attacks in reinforcement learning , author =

[8] [8]

Admissible Policy Teaching through Reward Design , author =

[9] [9]

International Conference on Machine Learning and Data Mining in Pattern Recognition , pages =

Vulnerability of deep reinforcement learning to policy induction attacks , author =. International Conference on Machine Learning and Data Mining in Pattern Recognition , pages =

[10] [10]

Dota 2 with large scale deep reinforcement learning , author =

[11] [11]

International Conference on Artificial Intelligence and Statistics , pages =

Stochastic linear bandits robust to adversarial attacks , author =. International Conference on Artificial Intelligence and Statistics , pages =

[12] [12]

ICML , pages =

Convergence problems of general-sum multiagent reinforcement learning , author =. ICML , pages =

[13] [13]

International joint conference on artificial intelligence , volume = 17, pages =

Rational and convergent learning in stochastic games , author =. International joint conference on artificial intelligence , volume = 17, pages =

[14] [14]

Advances in Neural Information Processing Systems , volume = 34, pages =

Offline rl without off-policy evaluation , author =. Advances in Neural Information Processing Systems , volume = 34, pages =

[15] [15]

, author =

Libratus: The Superhuman AI for No-Limit Poker. , author =. IJCAI , pages =

[16] [16]

Science , publisher =

Superhuman AI for multiplayer poker , author =. Science , publisher =

[17] [17]

Foundations and Trends

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , author =. Foundations and Trends

[18] [18]

, year = 2021, note =

Clancey, William J. , year = 2021, note =

2021

[19] [19]

Crime and punishment in scientific research

Crime and punishment in scientific research , author =. 0803.4058 , archiveprefix =

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Pluto: The 'Other' Red Planet , author =

[21] [21]

, year = 1979, address =

Clancey, William J. , year = 1979, address =

1979

[22] [22]

, year = 1983, booktitle =

Clancey, William J. , year = 1983, booktitle =

1983

[23] [23]

, year = 1984, booktitle =

Clancey, William J. , year = 1984, booktitle =

1984

[24] [24]

When is Offline Two-Player Zero-Sum Markov Game Solvable? , author =

[25] [25]

When is Offline Two-Player Zero-Sum

Cui, Qiwen and Du, Simon S , year = 2022, journal =. When is Offline Two-Player Zero-Sum

2022

[26] [26]

Linear programming and extensions , author =

[27] [27]

Autonomous Robots , publisher =

A taxonomy for multi-agent robotics , author =. Autonomous Robots , publisher =

[28] [28]

Blackboard Systems , year = 1986, publisher =

1986

[29] [29]

Volunteer's dilemma ---

[30] [30]

Revisiting some common practices in cooperative multi-agent reinforcement learning , author =

[31] [31]

Adversarial Attacks on Linear Contextual Bandits , author =

[32] [32]

Adversarial policies: Attacking deep reinforcement learning , author =

[33] [33]

SIAM Review , publisher =

f-Finger Morra , author =. SIAM Review , publisher =

[34] [34]

2017 IEEE international conference on robotics and automation (ICRA) , pages =

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , author =. 2017 IEEE international conference on robotics and automation (ICRA) , pages =

2017

[35] [35]

Proceedings of the AAAI Conference on Artificial Intelligence , volume = 34, pages =

Robust stochastic bandit algorithms under probabilistic unbounded adversarial attack , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume = 34, pages =

[36] [36]

International Conference on Machine Learning , pages =

Adversarial policy learning in two-player competitive games , author =. International Conference on Machine Learning , pages =

[37] [37]

Econometrica , publisher =

A simple adaptive procedure leading to correlated equilibrium , author =. Econometrica , publisher =

[38] [38]

International Journal of Man-Machine Studies , volume = 20, number = 1, pages =

Strategic explanations for a diagnostic consultation system , author =. International Journal of Man-Machine Studies , volume = 20, number = 1, pages =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , issn =

work page doi:10.1016/s0020-7373(84)80003-6

[39] [39]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas , year = 1983, journal =

1983

[40] [40]

International conference on machine learning , pages =

Fictitious self-play in extensive-form games , author =. International conference on machine learning , pages =

[41] [41]

International Conference on Autonomous Agents and Multiagent Systems , pages =

Towards a fast detection of opponents in repeated stochastic games , author =. International Conference on Autonomous Agents and Multiagent Systems , pages =

[42] [42]

International Journal of Game Theory , publisher =

Uniqueness of equilibrium points in bimatrix games , author =. International Journal of Game Theory , publisher =

[43] [43]

Journal of machine learning research , volume = 4, number =

Nash Q-learning for general-sum stochastic games , author =. Journal of machine learning research , volume = 4, number =

[44] [44]

Adversarial attacks on neural network policies , author =

[45] [45]

International Conference on Decision and Game Theory for Security , pages =

Deceptive reinforcement learning under adversarial manipulations on cost signals , author =. International Conference on Decision and Game Theory for Security , pages =

[46] [46]

Proceedings of the 36th International Conference on Machine Learning , publisher =

Multi-Agent Adversarial Inverse Reinforcement Learning , author =. Proceedings of the 36th International Conference on Machine Learning , publisher =

[47] [47]

Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 1 , location =

Value-Based Policy Teaching with Active Indirect Elicitation , author =. Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 1 , location =

[48] [48]

Science , publisher =

Human-level performance in 3D multiplayer games with population-based reinforcement learning , author =. Science , publisher =

[49] [49]

Offline decentralized multi-agent reinforcement learning , author =

[50] [50]

International Conference on Machine Learning , pages =

Is pessimism provably efficient for offline rl? , author =. International Conference on Machine Learning , pages =

[51] [51]

Advances in Neural Information Processing Systems , booktitle =

Adversarial attacks on stochastic bandits , author =. Advances in Neural Information Processing Systems , booktitle =

[52] [52]

2020 57th ACM/IEEE Design Automation Conference (DAC) , pages =

TrojDRL: evaluation of backdoor attacks on deep reinforcement learning , author =. 2020 57th ACM/IEEE Design Automation Conference (DAC) , pages =

2020

[53] [53]

The International Journal of Robotics Research , publisher =

Reinforcement learning in robotics: A survey , author =. The International Journal of Robotics Research , publisher =

[54] [54]

Delving into adversarial attacks on deep policies , author =

[55] [55]

Journal of Economic Dynamics and Control , publisher =

Learning competitive pricing strategies by multi-agent reinforcement learning , author =. Journal of Economic Dynamics and Control , publisher =

[56] [56]

International Conference on Database and Expert Systems Applications , pages =

A multi-agent Q-learning framework for optimizing stock trading systems , author =. International Conference on Database and Expert Systems Applications , pages =

[57] [57]

IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans , publisher =

A multiagent approach to q -learning for daily stock trading , author =. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans , publisher =

[58] [58]

Multi-agent reinforcement learning in sequential social dilemmas , author =

[59] [59]

IEEE Transactions on Games , publisher =

Multiagent inverse reinforcement learning for two-person zero-sum games , author =. IEEE Transactions on Games , publisher =

[60] [60]

Tactics of adversarial attack on deep reinforcement learning agents , author =

[61] [61]

Journal of Artificial Intelligence Research , volume = 66, pages =

Multi-agent inverse reinforcement learning for certain general-sum stochastic games , author =. Journal of Artificial Intelligence Research , volume = 66, pages =

[62] [62]

Machine learning proceedings 1994 , publisher =

Markov games as a framework for multi-agent reinforcement learning , author =. Machine learning proceedings 1994 , publisher =

1994

[63] [63]

International Conference on Machine Learning , pages =

Data poisoning attacks on stochastic bandits , author =. International Conference on Machine Learning , pages =

[64] [64]

, author =

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns. , author =. IJCAI , pages =

[65] [65]

ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

Action-manipulation attacks on stochastic bandits , author =. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

2020

[66] [66]

Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning , author =

[67] [67]

Algorithms in multi-agent systems: a holistic perspective from reinforcement learning and game theory , author =

[68] [68]

Proceedings of the AAAI Conference on Artificial Intelligence , volume = 35, pages =

Stochastic Graphical Bandits with Adversarial Corruptions , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume = 35, pages =

[69] [69]

Conference on Learning Theory , pages =

Corruption-robust exploration in episodic reinforcement learning , author =. Conference on Learning Theory , pages =

[70] [70]

International Conference on Decision and Game Theory for Security , pages =

Data poisoning attacks in contextual bandits , author =. International Conference on Decision and Game Theory for Security , pages =

[71] [71]

Advances in Neural Information Processing Systems , booktitle =

Policy poisoning in batch reinforcement learning and control , author =. Advances in Neural Information Processing Systems , booktitle =

[72] [72]

Game Redesign in No-regret Game Playing , author =

[73] [73]

Markov games of incomplete information for multi-agent reinforcement learning , author =

[74] [74]

Uniqueness of solution in linear programming , author =

[75] [75]

Dynamic economic emissions dispatch optimisation using multi-agent reinforcement learning , author =

[76] [76]

Observable actions , author =

Markov perfect equilibrium: I. Observable actions , author =. Journal of Economic Theory , publisher =

[77] [77]

Offline pre-trained multi-agent decision transformer: One big sequence model conquers all starcraftii tasks , author =

[78] [78]

Naval Research Logistics Quarterly , publisher =

Constructing bimatrix games with special properties , author =. Naval Research Logistics Quarterly , publisher =

[79] [79]

Journal of Dynamics & Games , publisher =

On the uniqueness of Nash equilibrium in strategic-form games , author =. Journal of Dynamics & Games , publisher =

[80] [80]

Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm , author =