Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems

Stefano V. Albrecht; Subramanian Ramamoorthy

arxiv: 1907.09189 · v1 · pith:BGOQRQR4new · submitted 2019-07-22 · 💻 cs.MA · cs.AI

Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems

Stefano V. Albrecht , Subramanian Ramamoorthy This is my paper

Pith reviewed 2026-05-24 17:54 UTC · model grok-4.3

classification 💻 cs.MA cs.AI

keywords multiagent learningad hoc teamsrepeated matrix gamesempirical evaluationequilibrium attainmentsocial welfarefairness

0 comments

The pith

No single multiagent learning algorithm performs best across all ad hoc team problems

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares five multiagent learning algorithms in ad hoc team problems where agents are heterogeneous and adapt without prior coordination. These algorithms were originally designed for homogeneous agent teams. The tests cover a range of repeated matrix games and use criteria such as equilibrium attainment, social welfare, and fairness. The results show no overall winner, but each algorithm has advantages tied to the specific criterion.

Core claim

When five major multiagent learning algorithms are evaluated in ad hoc team problems with continuously adapting agents, no algorithm emerges as clearly superior, although the algorithms display relative strengths depending on whether the focus is equilibrium attainment, social welfare, or fairness.

What carries the argument

Comparative evaluation of algorithms in repeated matrix games using the criteria of equilibrium attainment, social welfare, and fairness

Load-bearing premise

The chosen repeated matrix games and three performance criteria represent a broad enough sample of ad hoc team problems.

What would settle it

A test in which one of the five algorithms ranks highest across every game and every criterion would contradict the no-clear-winner result.

Figures

Figures reproduced from arXiv: 1907.09189 by Stefano V. Albrecht, Subramanian Ramamoorthy.

**Figure 1.** Figure 1: shows the payoff polytope of a 2×2 game. The payoff table of the game is given in the figure. Player 1 chooses the row and player 2 chooses the column. The elements of the table contain the payoffs to player 1 and 2, respectively. 0 1 2 3 4 5 0 1 2 3 4 5 Payoff to player 1 Payoff to player 2 1, 1 4, 1 1, 4 3, 3 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: shows the results averaged over all three parts. Note that we cannot take the average of the payoffs since we consider ordinal games. However, for the final expected payoffs, we first normalised the values by dividing through the respective maximum, after which we took the average of all results. Thus, the final expected payoffs in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows no single MAL algorithm dominates in heterogeneous adapting ad hoc teams, but different ones trade off better on welfare versus equilibrium in repeated matrix games.

read the letter

The core finding is that five standard multiagent learning algorithms, when placed in teams with mixed strategies and ongoing adaptation, produce no overall winner. Performance instead splits by criterion: some do better on social welfare, others on reaching equilibrium or fairness measures. This comes from running them across a range of repeated matrix games chosen to cover different payoff structures. The setup moves beyond the usual homogeneous-agent tests or fixed-opponent evaluations that dominate earlier work. That extension to continuously adapting heterogeneous teams is the main new element, and it is carried out with multiple performance criteria rather than a single metric. The comparison is systematic enough to highlight practical tradeoffs for algorithm selection. The main soft spot is scope. All results stay inside repeated normal-form games with full observability and discrete actions. Many real ad hoc team problems involve partial observability, non-stationary environments beyond matrix repetition, or continuous action spaces, so the observed patterns may not transfer. The abstract gives no numbers, error bars, or details on game selection, which makes it hard to judge how sensitive the relative-strength claims are to those choices. If the full paper includes clear statistical tests and justifies the game set against the broader literature, that concern shrinks. Readers who work on deployed multiagent systems where agents cannot assume identical code or prior agreements will get the most from this. It supplies concrete pointers on when one algorithm might be preferable over another under different objectives. The work is a solid empirical exercise that fills a documented gap, even if it does not claim to cover the full space of ad hoc problems. It deserves peer review because the question is relevant and the design is reproducible in principle. I would send it out rather than desk reject.

Referee Report

3 major / 2 minor

Summary. The manuscript empirically evaluates five multiagent learning algorithms (originally developed for homogeneous teams) in ad hoc team problems consisting entirely of continuously adapting agents. The evaluation is performed on a set of repeated matrix games using three performance criteria: equilibrium attainment, social welfare, and fairness. The central claim is that no algorithm is clearly superior overall, but the algorithms exhibit relative strengths depending on the criterion (e.g., some favor social welfare while others better attain equilibrium).

Significance. If the empirical patterns hold, the work supplies a concrete comparative baseline for choosing among standard MAL algorithms when agents must coordinate without prior agreements or homogeneity assumptions. The use of multiple orthogonal criteria and a claimed diverse game set allows the identification of metric-specific trade-offs that are directly relevant to practical multiagent deployments.

major comments (3)

[§3 (Game Selection) and §4 (Experimental Setup)] The central claim that the observed relative strengths generalize to ad hoc team problems rests on the representativeness of the chosen repeated matrix games. The manuscript should provide an explicit argument (with reference to the literature on ad hoc teams) showing how the selected games cover key dimensions such as partial observability, continuous action spaces, or richer non-stationarity that appear in many ad hoc team formulations outside normal-form repeated games.
[§5 (Results) and §6 (Discussion)] The conclusion of 'no clear winner' and the reported relative strengths are presented without reference to statistical significance tests, confidence intervals, or effect-size measures across the repeated trials. Without these, it is impossible to determine whether observed differences between algorithms on a given criterion are robust or could be explained by sampling variability.
[§4.2 (Performance Criteria)] The performance criteria are defined with respect to the joint action profiles that arise during play, yet the paper does not report how ties or non-convergence are handled when computing equilibrium attainment or fairness scores; this choice directly affects the ranking of algorithms on those metrics.

minor comments (2)

[Throughout] Notation for the five algorithms and the three criteria should be introduced once in a table and then used consistently; currently the text alternates between full names and ad-hoc abbreviations.
[Figure captions] Figure captions should state the number of independent runs and whether error bars represent standard deviation or standard error.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly where appropriate.

read point-by-point responses

Referee: [§3 (Game Selection) and §4 (Experimental Setup)] The central claim that the observed relative strengths generalize to ad hoc team problems rests on the representativeness of the chosen repeated matrix games. The manuscript should provide an explicit argument (with reference to the literature on ad hoc teams) showing how the selected games cover key dimensions such as partial observability, continuous action spaces, or richer non-stationarity that appear in many ad hoc team formulations outside normal-form repeated games.

Authors: We agree that an explicit justification of the game selection would strengthen the paper. Our diverse set of repeated matrix games is motivated by standard benchmarks in the MAL and ad hoc team literature (e.g., works using normal-form games to study coordination without prior agreements). However, we acknowledge that these do not capture partial observability, continuous actions, or richer non-stationarity. We will add a dedicated paragraph in §3 discussing the scope, citing relevant ad hoc team papers that employ matrix games as foundational settings, and noting limitations for generalization to more complex environments. revision: partial
Referee: [§5 (Results) and §6 (Discussion)] The conclusion of 'no clear winner' and the reported relative strengths are presented without reference to statistical significance tests, confidence intervals, or effect-size measures across the repeated trials. Without these, it is impossible to determine whether observed differences between algorithms on a given criterion are robust or could be explained by sampling variability.

Authors: We thank the referee for this important point. The manuscript reports mean performance over repeated trials but omits formal statistical analysis. In the revision we will incorporate paired statistical tests (e.g., Wilcoxon signed-rank with Bonferroni correction), 95% confidence intervals, and effect-size measures (Cohen's d) for all pairwise algorithm comparisons on each criterion. These will be added to §5 and referenced in the discussion to support claims of relative strengths and the absence of a clear overall winner. revision: yes
Referee: [§4.2 (Performance Criteria)] The performance criteria are defined with respect to the joint action profiles that arise during play, yet the paper does not report how ties or non-convergence are handled when computing equilibrium attainment or fairness scores; this choice directly affects the ranking of algorithms on those metrics.

Authors: We apologize for the lack of detail. In the revised §4.2 we will explicitly state the handling rules: equilibrium attainment counts a trial as successful only if a Nash equilibrium is played in at least the final 100 rounds (with non-convergent trials excluded from this metric); ties among multiple equilibria are resolved by selecting the one with highest social welfare; fairness and welfare scores always use the full trajectory average, including non-convergent runs. This specification will ensure the metrics are fully reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison against external criteria

full rationale

The paper conducts an empirical evaluation of five MAL algorithms across a set of repeated matrix games, measuring performance via externally defined criteria (equilibrium attainment, social welfare, fairness). No equations, parameters, or predictions are fitted to the target outcomes and then re-reported as results. No self-citation chain supports a uniqueness claim or ansatz that reduces the central conclusion to prior author work. The derivation chain consists of running the algorithms and tabulating observed metrics; conclusions follow directly from those measurements without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are identifiable. The evaluation implicitly relies on the representativeness of the chosen matrix games and criteria, but these are not formalized as axioms in the provided text.

pith-pipeline@v0.9.0 · 5763 in / 1141 out tokens · 24746 ms · 2026-05-24T17:54:38.107051+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems

INTRODUCTION Game theory provides a mathematically well deﬁned frame- work for the analysis of multiagent interactive decision mak- Appears in: Proceedings of the 11th International Con- ference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, Winikoff, Padgham, and van der Hoek (eds.), 4-8 June 2012, V alencia, Spain. Copyright c⃝ 2012...

work page internal anchor Pith review Pith/arXiv arXiv 2012
[2]

winning”, and it uses the rate δl if it believes itself to be “losing

EXPERIMENTAL SETUP 2.1 Algorithms Our selection of algorithms is motivated by the range of approaches it covers. We tested two algorithms that model their opponents and three that do not model their opponents: • Joint Action Learning (JAL) [5,32] • Conditional Joint Action Learning (CJAL) [1] • Win or Learn Fast with PHC (WOLF-PHC) [3] • Modiﬁed Regret-Ma...

work page
[3]

Alg1 / Alg2

EXPERIMENTAL RESULTS This section presents and analyses the results of our exper- iments. It is important to note that the performance of an algorithm may depend on the player position it takes on. To account for this, we repeated each play once for every per- mutation of the agent order. We call this process a sweep. In the following, whenever we refer t...

work page 1956
[4]

In his 1967 paper [10], he describes the Bayesian game, a game in which players have beliefs about missing information

RELATED WORK Harsanyi pioneered the study of incomplete information games. In his 1967 paper [10], he describes the Bayesian game, a game in which players have beliefs about missing information. He develops the concept of the Bayesian Nash equilibrium [11] in which each player plays a best response against the other players, based on the personal beliefs ...

work page 1967
[5]

The algorithms were evaluated in a comprehensive range of repeated games, and the teams consisted of agents which were themselves learning

CONCLUSION In this work, we compared the performance of ﬁve multia- gent learning algorithms in a set of ad hoc team problems. The algorithms were evaluated in a comprehensive range of repeated games, and the teams consisted of agents which were themselves learning. Our intention was to characterise the performance of salient types of multiagent learning ...

work page
[6]

Banerjee and S

REFERENCES[1] D. Banerjee and S. Sen. Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning. Autonomous Agents and Multi-Agent Systems , 15(1):91–108, 2007

work page 2007
[7]

Barrett, P

S. Barrett, P. Stone, and S. Kraus. Empirical evaluation of ad hoc teamwork in the pursuit domain. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems , May 2011

work page 2011
[8]

Bowling and M

M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artiﬁcial Intelligence, 136(2):215–250, 2002

work page 2002
[9]

G. Brown. Iterative solution of games by ﬁctitious play. In T. Koopmans, editor, Activity Analysis of Production and Allocation. Wiley, 1951

work page 1951
[10]

Claus and C

C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artiﬁcial Intelligence , pages 746–752, 1998

work page 1998
[11]

Conitzer and T

V. Conitzer and T. Sandholm. Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the 20th International Conference on Machine Learning, volume 20, pages 83–90, 2003

work page 2003
[12]

Genter, N

K. Genter, N. Agmon, and P. Stone. Role-based ad hoc teamwork. In Proceedings of the Plan, Activity, and Intent Recognition Workshop at the 25th Conference on Artiﬁcial Intelligence, August 2011

work page 2011
[13]

Greenwald and K

A. Greenwald and K. Hall. Correlated q-learning. In Proceedings of the 20th International Conference on Machine Learning, pages 242–249, 2003

work page 2003
[14]

Harsanyi

J. Harsanyi. Bargaining in ignorance of the opponents’ utility function. Journal of Conﬂict Resolution , 6(1):29–38, 1962

work page 1962
[15]

bayesian

J. Harsanyi. Games with incomplete information played by “bayesian” players, i-iii. part i. the basic model. Management Science, 14(3):159–182, 1967

work page 1967
[16]

bayesian

J. Harsanyi. Games with incomplete information played by “bayesian” players, i-iii. part ii. bayesian equilibrium points. Management Science, 14(5):320–334, 1968

work page 1968
[17]

bayesian

J. Harsanyi. Games with incomplete information played by “bayesian” players, i-iii. part iii. the basic probability distribution of the game. Management Science, 14(7):486–502, 1968

work page 1968
[18]

Hart and A

S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000

work page 2000
[19]

Hart and A

S. Hart and A. Mas-Colell. A reinforcement procedure leading to correlated equilibrium. Economic Essays: A Festschrift for Werner Hildenbrand , pages 181–200, 2001

work page 2001
[20]

Hu and M

J. Hu and M. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning , volume 242, page 250, 1998

work page 1998
[21]

Hu and M

J. Hu and M. Wellman. Experimental results on q-learning for general-sum stochastic games. In Proceedings of the 17th International Conference on Machine Learning , page 414. Morgan Kaufmann Publishers Inc., 2000

work page 2000
[22]

Hu and M

J. Hu and M. Wellman. Nash q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4:1039–1069, 2003

work page 2003
[23]

Jaakkola, M

T. Jaakkola, M. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185–1201, 1994

work page 1994
[24]

J. Jordan. Bayesian learning in normal form games. Games and Economic Behavior , 3(1):60–81, 1991

work page 1991
[25]

Kimbrough and M

S. Kimbrough and M. Lu. Simple reinforcement learning agents: Pareto beats nash in an algorithmic game theory study. Information Systems and E-Business Management , 3(1):1–19, 2005

work page 2005
[26]

Lemke and J

C. Lemke and J. Howson. Equilibrium points of bimatrix games. Journal of the Society for Industrial and Applied Mathematics, 12(2):413–423, 1964

work page 1964
[27]

M. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning , volume 157, page 163, 1994

work page 1994
[28]

M. Littman. Friend-or-foe q-learning in general-sum games. In Proceedings of the 18th International Conference on Machine Learning, ICML ’01, pages 322–328. Morgan Kaufmann Publishers Inc., 2001

work page 2001
[29]

Littman and C

M. Littman and C. Szepesv´ ari. A generalized reinforcement-learning model: Convergence and applications. In In Proceedings of the 13th International Conference on Machine Learning, pages 310–318. Morgan Kaufmann, 1996

work page 1996
[30]

Rapoport and M

A. Rapoport and M. Guyer. A taxonomy of 2 × 2 games. General Systems: Yearbook of the Society for General Systems Research, 11:203–214, 1966

work page 1966
[31]

Schelling

T. Schelling. The Strategy of Conﬂict . Harvard University Press, 1980

work page 1980
[32]

S. Sen, S. Airiau, and R. Mukherjee. Towards a pareto-optimal solution in general-sum games. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems , pages 153–160. ACM, 2003

work page 2003
[33]

L. Shapley. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America , 39(10):1095, 1953

work page 1953
[34]

Stone, G

P. Stone, G. Kaminka, S. Kraus, and J. Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th Conference on Artiﬁcial Intelligence, July 2010

work page 2010
[35]

Stone, G

P. Stone, G. Kaminka, and J. Rosenschein. Leading a best-response teammate in an ad hoc team. In Agent-Mediated Electronic Commerce: Designing Trading Strategies and Mechanisms for Electronic Markets , pages 132–146, November 2010

work page 2010
[36]

Stone and S

P. Stone and S. Kraus. To teach or not to teach? decision making under uncertainty in ad hoc teams. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems , May 2010

work page 2010
[37]

Uther and M

W. Uther and M. Veloso. Adversarial reinforcement learning. Technical report, Computer Science Department, Carnegie Mellon University, 1997

work page 1997
[38]

Watkins and P

C. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279–292, 1992

work page 1992
[39]

Wolpert and W

D. Wolpert and W. Macready. No free lunch theorems for search. Technical Report SFI-TR-95-02-010, Santa Fe Institute, 1995

work page 1995
[40]

Wolpert and W

D. Wolpert and W. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997

work page 1997
[41]

F. Wu, S. Zilberstein, and X. Chen. Online planning for ad hoc autonomous agent teams. In Proceedings of the 22nd International Joint Conference on Artiﬁcial Intelligence , 2011

work page 2011
[42]

H. Young. Strategic learning and its limits . Oxford University Press, 2004

work page 2004

[1] [1]

Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems

INTRODUCTION Game theory provides a mathematically well deﬁned frame- work for the analysis of multiagent interactive decision mak- Appears in: Proceedings of the 11th International Con- ference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, Winikoff, Padgham, and van der Hoek (eds.), 4-8 June 2012, V alencia, Spain. Copyright c⃝ 2012...

work page internal anchor Pith review Pith/arXiv arXiv 2012

[2] [2]

winning”, and it uses the rate δl if it believes itself to be “losing

EXPERIMENTAL SETUP 2.1 Algorithms Our selection of algorithms is motivated by the range of approaches it covers. We tested two algorithms that model their opponents and three that do not model their opponents: • Joint Action Learning (JAL) [5,32] • Conditional Joint Action Learning (CJAL) [1] • Win or Learn Fast with PHC (WOLF-PHC) [3] • Modiﬁed Regret-Ma...

work page

[3] [3]

Alg1 / Alg2

EXPERIMENTAL RESULTS This section presents and analyses the results of our exper- iments. It is important to note that the performance of an algorithm may depend on the player position it takes on. To account for this, we repeated each play once for every per- mutation of the agent order. We call this process a sweep. In the following, whenever we refer t...

work page 1956

[4] [4]

In his 1967 paper [10], he describes the Bayesian game, a game in which players have beliefs about missing information

RELATED WORK Harsanyi pioneered the study of incomplete information games. In his 1967 paper [10], he describes the Bayesian game, a game in which players have beliefs about missing information. He develops the concept of the Bayesian Nash equilibrium [11] in which each player plays a best response against the other players, based on the personal beliefs ...

work page 1967

[5] [5]

The algorithms were evaluated in a comprehensive range of repeated games, and the teams consisted of agents which were themselves learning

CONCLUSION In this work, we compared the performance of ﬁve multia- gent learning algorithms in a set of ad hoc team problems. The algorithms were evaluated in a comprehensive range of repeated games, and the teams consisted of agents which were themselves learning. Our intention was to characterise the performance of salient types of multiagent learning ...

work page

[6] [6]

Banerjee and S

REFERENCES[1] D. Banerjee and S. Sen. Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning. Autonomous Agents and Multi-Agent Systems , 15(1):91–108, 2007

work page 2007

[7] [7]

Barrett, P

S. Barrett, P. Stone, and S. Kraus. Empirical evaluation of ad hoc teamwork in the pursuit domain. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems , May 2011

work page 2011

[8] [8]

Bowling and M

M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artiﬁcial Intelligence, 136(2):215–250, 2002

work page 2002

[9] [9]

G. Brown. Iterative solution of games by ﬁctitious play. In T. Koopmans, editor, Activity Analysis of Production and Allocation. Wiley, 1951

work page 1951

[10] [10]

Claus and C

C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artiﬁcial Intelligence , pages 746–752, 1998

work page 1998

[11] [11]

Conitzer and T

V. Conitzer and T. Sandholm. Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the 20th International Conference on Machine Learning, volume 20, pages 83–90, 2003

work page 2003

[12] [12]

Genter, N

K. Genter, N. Agmon, and P. Stone. Role-based ad hoc teamwork. In Proceedings of the Plan, Activity, and Intent Recognition Workshop at the 25th Conference on Artiﬁcial Intelligence, August 2011

work page 2011

[13] [13]

Greenwald and K

A. Greenwald and K. Hall. Correlated q-learning. In Proceedings of the 20th International Conference on Machine Learning, pages 242–249, 2003

work page 2003

[14] [14]

Harsanyi

J. Harsanyi. Bargaining in ignorance of the opponents’ utility function. Journal of Conﬂict Resolution , 6(1):29–38, 1962

work page 1962

[15] [15]

bayesian

J. Harsanyi. Games with incomplete information played by “bayesian” players, i-iii. part i. the basic model. Management Science, 14(3):159–182, 1967

work page 1967

[16] [16]

bayesian

J. Harsanyi. Games with incomplete information played by “bayesian” players, i-iii. part ii. bayesian equilibrium points. Management Science, 14(5):320–334, 1968

work page 1968

[17] [17]

bayesian

J. Harsanyi. Games with incomplete information played by “bayesian” players, i-iii. part iii. the basic probability distribution of the game. Management Science, 14(7):486–502, 1968

work page 1968

[18] [18]

Hart and A

S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000

work page 2000

[19] [19]

Hart and A

S. Hart and A. Mas-Colell. A reinforcement procedure leading to correlated equilibrium. Economic Essays: A Festschrift for Werner Hildenbrand , pages 181–200, 2001

work page 2001

[20] [20]

Hu and M

J. Hu and M. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning , volume 242, page 250, 1998

work page 1998

[21] [21]

Hu and M

J. Hu and M. Wellman. Experimental results on q-learning for general-sum stochastic games. In Proceedings of the 17th International Conference on Machine Learning , page 414. Morgan Kaufmann Publishers Inc., 2000

work page 2000

[22] [22]

Hu and M

J. Hu and M. Wellman. Nash q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4:1039–1069, 2003

work page 2003

[23] [23]

Jaakkola, M

T. Jaakkola, M. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185–1201, 1994

work page 1994

[24] [24]

J. Jordan. Bayesian learning in normal form games. Games and Economic Behavior , 3(1):60–81, 1991

work page 1991

[25] [25]

Kimbrough and M

S. Kimbrough and M. Lu. Simple reinforcement learning agents: Pareto beats nash in an algorithmic game theory study. Information Systems and E-Business Management , 3(1):1–19, 2005

work page 2005

[26] [26]

Lemke and J

C. Lemke and J. Howson. Equilibrium points of bimatrix games. Journal of the Society for Industrial and Applied Mathematics, 12(2):413–423, 1964

work page 1964

[27] [27]

M. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning , volume 157, page 163, 1994

work page 1994

[28] [28]

M. Littman. Friend-or-foe q-learning in general-sum games. In Proceedings of the 18th International Conference on Machine Learning, ICML ’01, pages 322–328. Morgan Kaufmann Publishers Inc., 2001

work page 2001

[29] [29]

Littman and C

M. Littman and C. Szepesv´ ari. A generalized reinforcement-learning model: Convergence and applications. In In Proceedings of the 13th International Conference on Machine Learning, pages 310–318. Morgan Kaufmann, 1996

work page 1996

[30] [30]

Rapoport and M

A. Rapoport and M. Guyer. A taxonomy of 2 × 2 games. General Systems: Yearbook of the Society for General Systems Research, 11:203–214, 1966

work page 1966

[31] [31]

Schelling

T. Schelling. The Strategy of Conﬂict . Harvard University Press, 1980

work page 1980

[32] [32]

S. Sen, S. Airiau, and R. Mukherjee. Towards a pareto-optimal solution in general-sum games. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems , pages 153–160. ACM, 2003

work page 2003

[33] [33]

L. Shapley. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America , 39(10):1095, 1953

work page 1953

[34] [34]

Stone, G

P. Stone, G. Kaminka, S. Kraus, and J. Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th Conference on Artiﬁcial Intelligence, July 2010

work page 2010

[35] [35]

Stone, G

P. Stone, G. Kaminka, and J. Rosenschein. Leading a best-response teammate in an ad hoc team. In Agent-Mediated Electronic Commerce: Designing Trading Strategies and Mechanisms for Electronic Markets , pages 132–146, November 2010

work page 2010

[36] [36]

Stone and S

P. Stone and S. Kraus. To teach or not to teach? decision making under uncertainty in ad hoc teams. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems , May 2010

work page 2010

[37] [37]

Uther and M

W. Uther and M. Veloso. Adversarial reinforcement learning. Technical report, Computer Science Department, Carnegie Mellon University, 1997

work page 1997

[38] [38]

Watkins and P

C. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279–292, 1992

work page 1992

[39] [39]

Wolpert and W

D. Wolpert and W. Macready. No free lunch theorems for search. Technical Report SFI-TR-95-02-010, Santa Fe Institute, 1995

work page 1995

[40] [40]

Wolpert and W

D. Wolpert and W. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997

work page 1997

[41] [41]

F. Wu, S. Zilberstein, and X. Chen. Online planning for ad hoc autonomous agent teams. In Proceedings of the 22nd International Joint Conference on Artiﬁcial Intelligence , 2011

work page 2011

[42] [42]

H. Young. Strategic learning and its limits . Oxford University Press, 2004

work page 2004