The Invisible Handshake: Persistent Overpricing by Adaptive Market Agents

Alfio Ferrara; Emanuele Guidotti; Luigi Foscari; Nicol\`o Cesa-Bianchi; Tatjana Chavdarova

arxiv: 2510.15995 · v3 · submitted 2025-10-14 · 💱 q-fin.TR · cs.GT· cs.LG

The Invisible Handshake: Persistent Overpricing by Adaptive Market Agents

Luigi Foscari , Emanuele Guidotti , Nicol\`o Cesa-Bianchi , Tatjana Chavdarova , Alfio Ferrara This is my paper

Pith reviewed 2026-05-18 08:03 UTC · model grok-4.3

classification 💱 q-fin.TR cs.GTcs.LG

keywords overpricingmarket makingdecentralized learningprice impactstochastic gradient ascentrepeated gamesfinancial marketsadaptive agents

0 comments

The pith

Decentralized learning by adaptive market agents can produce persistent overpricing while respecting cash and inventory limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models a repeated game between a market maker who sets liquidity and a market taker who picks trade sizes, with prices driven by both trade impacts and external shocks. Overpricing is measured against a counterfactual path that keeps the same shocks but removes all price impact from trades. The authors identify strategy profiles that sustain this overpricing indefinitely under realistic constraints and give a sufficient condition under which decentralized learning reaches those profiles in finite time. They decompose the interaction into a competitive part that pushes toward zero impact and a collaborative part that rewards overpricing when total inventory is positive. The same split governs both immediate and forward-looking objectives, so the overpricing outcome emerges from individual adaptation rather than explicit agreement.

Core claim

What carries the argument

The decomposition of the game into a competitive component that favors zero price impact and a collaborative component that makes overpricing jointly profitable when aggregate inventory is positive.

If this is right

Projected stochastic gradient ascent satisfies the sufficient condition and therefore reaches the overpricing region in finite time.
The competitive-collaborative decomposition applies equally to myopic and farsighted payoff objectives.
Persistent overpricing can be maintained while strictly obeying cash and inventory constraints.
The same structural incentives arise whether agents optimize one-period or multi-period returns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Regulators monitoring trading algorithms could target gradient-based learning rules rather than final positions alone.
The mechanism may help explain mispricing in markets where most volume comes from adaptive algorithmic traders.
Extending the model to three or more agents would test whether the collaborative component scales or fragments.
Laboratory experiments with human subjects or AI agents using gradient learning could directly observe whether overpricing emerges.

Load-bearing premise

That the competitive-collaborative decomposition governs incentives for both myopic and farsighted objectives and that projected stochastic gradient ascent satisfies the condition needed to reach the overpricing region.

What would settle it

Simulate the two-agent market under projected stochastic gradient ascent for many periods and measure whether average prices remain above the counterfactual shock-only path whenever aggregate inventory is positive; absence of sustained overpricing would refute the claim.

Figures

Figures reproduced from arXiv: 2510.15995 by Alfio Ferrara, Emanuele Guidotti, Luigi Foscari, Nicol\`o Cesa-Bianchi, Tatjana Chavdarova.

**Figure 2.** Figure 2: Market impact of a fixed collusive strategy profile [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

We study overpricing in a repeated game between two representative agents: a market maker, who controls market liquidity, and a market taker, who chooses trade quantities. Market prices evolve through the endogenous price impact of trades and exogenous shocks. We define overpricing relative to a counterfactual price path that holds fixed the same sequence of shocks while shutting down price impact, and characterize the set of feasible strategy profiles that generate persistent overpricing while respecting cash and inventory constraints. We provide a sufficient condition for decentralized learning to reach the overpricing region in finite time, and we show that this condition is satisfied, in particular, by projected stochastic gradient ascent. A key step in the analysis is a decomposition of the game into a competitive component, which favors zero price impact, and a collaborative component, which makes overpricing jointly profitable when aggregate inventory is positive. We further show that the same structural incentives govern both myopic and farsighted objectives. Together, these results show how decentralized learning by adaptive market agents can lead to persistent overpricing in financial markets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean decomposition showing how projected stochastic gradient ascent can push a market-maker/taker game into persistent overpricing, but the claim that the projection preserves the collaborative term rests on an unshown step.

read the letter

The main takeaway is that decentralized learning by a market maker and taker can reach a region of persistent overpricing, defined against a no-impact counterfactual, when a competitive-collaborative decomposition favors the collaborative side once aggregate inventory turns positive. The same structure is said to hold for both myopic and farsighted payoffs, and projected stochastic gradient ascent is claimed to satisfy the sufficient condition for finite-time entry into that region while respecting cash and inventory bounds.

Referee Report

2 major / 2 minor

Summary. The paper analyzes a repeated game between a market maker controlling liquidity and a market taker choosing quantities, with prices driven by endogenous impact and exogenous shocks. Overpricing is defined relative to a counterfactual path that fixes shocks but eliminates price impact. The authors characterize feasible strategy profiles that sustain overpricing under cash and inventory constraints, derive a sufficient condition for decentralized learning to enter the overpricing region in finite time, and prove that projected stochastic gradient ascent satisfies this condition. A competitive-collaborative decomposition of incentives is shown to govern both myopic and farsighted objectives, implying that adaptive agents can generate persistent overpricing.

Significance. If the sufficient-condition claim holds after projection, the work supplies a precise mechanism by which decentralized adaptive learning produces persistent overpricing without explicit collusion, via inventory-driven collaborative incentives. The competitive-collaborative split offers a reusable structural tool for analyzing repeated market games and unifies myopic and forward-looking behavior. These results bear on market-efficiency debates and microstructure policy, particularly for settings with binding position limits.

major comments (2)

[Abstract / sufficient-condition statement] Abstract and the section stating the sufficient condition: the claim that projected stochastic gradient ascent satisfies the condition for entering the overpricing region in finite time is asserted without exhibiting the post-projection inner-product verification or boundary-case analysis. When cash or inventory constraints bind, the projection operator can clip the collaborative component, leaving the update inside the competitive region; the manuscript provides no explicit check that the collaborative term still dominates after projection.
[Definition of overpricing region] The definition of the overpricing region (relative to the no-price-impact counterfactual) makes membership depend on the realized shock sequence. The convergence result therefore requires that the sufficient condition remain satisfied uniformly across shock paths; the paper does not supply a uniform bound or robustness argument that would prevent the dynamics from exiting the region under alternative shock realizations.

minor comments (2)

[Decomposition section] The notation for the competitive and collaborative components should be introduced with explicit equations rather than descriptive text alone, to facilitate checking the inner-product condition after projection.
[Figures] Figure captions and axis labels for any simulation or phase-plane diagrams should explicitly indicate whether trajectories respect the cash/inventory projection or are shown in the unconstrained plane.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The comments identify opportunities to strengthen the exposition of the sufficient-condition claim and the pathwise nature of the overpricing result. We respond to each major comment below and indicate the revisions we will make in the next version.

read point-by-point responses

Referee: Abstract / sufficient-condition statement: the claim that projected stochastic gradient ascent satisfies the condition for entering the overpricing region in finite time is asserted without exhibiting the post-projection inner-product verification or boundary-case analysis. When cash or inventory constraints bind, the projection operator can clip the collaborative component, leaving the update inside the competitive region; the manuscript provides no explicit check that the collaborative term still dominates after projection.

Authors: We agree that the current presentation would benefit from an explicit post-projection verification. The underlying proof already establishes that the inner product of the projected update with the collaborative direction remains positive when the step size is sufficiently small, but this step is only sketched. In the revised manuscript we will add a short lemma (or appendix paragraph) that directly computes the inner product after projection onto the cash- and inventory-feasible set and verifies the boundary cases. This addition will make the argument self-contained without altering the result. revision: yes
Referee: Definition of overpricing region: The definition of the overpricing region (relative to the no-price-impact counterfactual) makes membership depend on the realized shock sequence. The convergence result therefore requires that the sufficient condition remain satisfied uniformly across shock paths; the paper does not supply a uniform bound or robustness argument that would prevent the dynamics from exiting the region under alternative shock realizations.

Authors: The overpricing region and the sufficient condition are formulated pathwise for any fixed exogenous shock sequence; the finite-time entry result therefore holds conditionally on the realized path that the agents actually observe. Uniformity over all possible shock realizations is not required for the stated theorem. We will revise the text to state this pathwise character explicitly and add a brief remark noting that, under standard bounded-shock assumptions, a uniform bound follows immediately from the same inner-product argument. If the referee prefers a fully uniform statement without additional assumptions, we can discuss the necessary restrictions on the shock process. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper defines overpricing relative to an explicit counterfactual price path that shuts down price impact while holding shocks fixed; this is a modeling choice, not a self-referential construction. It then states a sufficient condition for decentralized learning to reach the overpricing region in finite time and asserts that projected stochastic gradient ascent satisfies the condition via the competitive/collaborative decomposition. These steps rest on independent game-theoretic analysis of the repeated interaction, cash/inventory constraints, and the decomposition, without reducing the central claim to a fit, a tautology, or a self-citation chain. The derivation is self-contained against the stated mathematical structure.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract relies on an unstated decomposition of the game into competitive and collaborative components and on the existence of a sufficient condition for finite-time convergence that is asserted to hold for projected stochastic gradient ascent.

axioms (2)

domain assumption The game admits a decomposition into a competitive component favoring zero price impact and a collaborative component that makes overpricing jointly profitable when aggregate inventory is positive.
This decomposition is invoked to characterize feasible strategy profiles that generate persistent overpricing.
ad hoc to paper Projected stochastic gradient ascent satisfies the sufficient condition for reaching the overpricing region in finite time.
The abstract states that the condition is satisfied in particular by this algorithm.

pith-pipeline@v0.9.0 · 5730 in / 1397 out tokens · 29024 ms · 2026-05-18T08:03:39.976815+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We decompose the per-round game into a competitive component (Game 2) and collaborative component (Game 3) ... R^p_t = 1/2 Z + 1/2 U (eq. 18). Projected gradient ascent on κ̃ satisfies the assumptions of theorem 6.2.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

Ibrahim Abada, Xavier Lambin, and Nikolay Tchakarov. 2024. Collusion by mis- take: Does algorithmic sophistication drive supra-competitive profits? European Journal of Operational Research 318, 3 (2024), 927–953. Publisher: Elsevier

work page 2024
[2]

Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. 2013. Efficient market making via convex optimization, and a connection to online learning. ACM Transactions on Economics and Computation 1, 2 (May 2013). doi:10.1145/ 2465769.2465777 Number of pages: 39 Place: New York, NY, USA Publisher: Association for Computing Machinery tex.articleno: 12 tex...

work page arXiv 2013
[3]

Jacob Abernethy and Satyen Kale. 2013. Adaptive market making via online learning. Advances in Neural Information Processing Systems 26 (2013)

work page 2013
[4]

Robert J Aumann. 1964. Markets with a continuum of traders. Econometrica (1964), 39–50

work page 1964
[5]

Martino Banchio and Giacomo Mantegazza. 2023. Adaptive Algorithms and Collusion via Coupling. In Proceedings of the 24th ACM Conference on Economics and Computation. ACM, London United Kingdom, 208–208. doi:10.1145/3580507. 3597726

work page doi:10.1145/3580507 2023
[6]

Martino Banchio and Andrzej Skrzypacz. 2022. Artificial Intelligence and Auction Design. doi:10.48550/arXiv.2202.05947 arXiv:2202.05947 [econ]

work page doi:10.48550/arxiv.2202.05947 2022
[7]

Yogev Bar-On and Yishay Mansour. 2023. Uniswap Liquidity Provision: An Online Learning Approach. http://arxiv.org/abs/2302.00610 arXiv:2302.00610 [cs]

work page arXiv 2023
[8]

Fischer Black. 1971. Toward a fully automated stock exchange, part I. Financial Analysts Journal 27, 4 (1971), 28–35

work page 1971
[9]

Jean-Philippe Bouchaud, Julius Bonart, Jonathan Donier, and Martin Gould. 2018. Trades, quotes and prices: financial markets under the microscope . Cambridge University Press

work page 2018
[12]

Emilio Calvano, Giacomo Calzolari, Vincenzo Denicolò, and Sergio Pastorello

work page
[13]

American Economic Review 110, 10 (Oct

Artificial Intelligence, Algorithmic Pricing, and Collusion. American Economic Review 110, 10 (Oct. 2020), 3267–3297. doi:10.1257/aer.20190623

work page doi:10.1257/aer.20190623 2020
[14]

Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Luigi Foscari, and Vinayak Pathak. 2025. Market Making without Regret. arXiv:2411.13993 [cs.GT] https://arxiv.org/abs/2411.13993

work page arXiv 2025
[15]

Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, and Tucker Balch. 2022. Learning to simulate realistic limit order book markets from data as a World Agent. In Proceedings of the Third ACM International Conference on AI in Finance . 428–436. doi:10.1145/3533271.3561753 arXiv:2210.09897 [cs, q-fin]

work page doi:10.1145/3533271.3561753 2022
[16]

Jean-Edouard Colliard, Thierry Foucault, and Stefano Lovo. 2022. Algorithmic Pricing and Liquidity in Securities Markets. SSRN Electronic Journal (2022). doi:10.2139/ssrn.4252858 Publisher: Elsevier BV

work page doi:10.2139/ssrn.4252858 2022
[17]

Cover and E

T.M. Cover and E. Ordentlich. 1996. On-line portfolio selection. In Proceedings of the ninth annual conference on Computational learning theory - COLT ’96 . ACM Press, Desenzano del Garda, Italy, 310–313. doi:10.1145/238061.238161

work page doi:10.1145/238061.238161 1996
[18]

Cover and E

T.M. Cover and E. Ordentlich. 1996. Universal portfolios with side information. IEEE Transactions on Information Theory 42, 2 (March 1996), 348–363. doi:10. 1109/18.485708

work page 1996
[19]

Sanmay Das and Malik Magdon-Ismail. 2008. Adapting to a market shock: opti- mal sequential market-making. In Proceedings of the 22nd international conference on neural information processing systems (NIPS’08). Curran Associates Inc., Van- couver, British Columbia, Canada and Red Hook, NY, USA, 361–368. Number of pages: 8

work page 2008
[20]

Goldberg, and Christos H

Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. 2009. The complexity of computing a Nash equilibrium. Commun. ACM 52, 2 (2009), 89–97. doi:10.1145/1461928.1461951

work page doi:10.1145/1461928.1461951 2009
[21]

Winston Wei Dou, Itay Goldstein, and Yan Ji. 2025. Ai-powered trading, algorith- mic collusion, and price efficiency. Jacobs Levy Equity Management Center for Quantitative Financial Research Paper, The Wharton School Research Paper (2025)

work page 2025
[22]

Eugene F Fama. 1970. Efficient capital markets: A review of theory and empirical work. The journal of Finance 25, 2 (1970), 383–417

work page 1970
[23]

J Doyne Farmer, Austin Gerig, Fabrizio Lillo, and Henri Waelbroeck. 2013. How efficiency shapes market impact. Quantitative Finance 13, 11 (2013), 1743–1758

work page 2013
[24]

Sumitra Ganesh, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, and Manuela Veloso. 2019. Reinforcement Learning for Market Making in a Multi- agent Dealer Market. doi:10.48550/ARXIV.1911.05892 Version Number: 1

work page doi:10.48550/arxiv.1911.05892 2019
[25]

Sanford J Grossman and Joseph E Stiglitz. 1980. On the impossibility of in- formationally efficient markets. The American economic review 70, 3 (1980), 393–408

work page 1980
[26]

Chengyan Gu. 2023. Deep Q-Learning in Airline Dynamic Pricing and Tacit Collusion: Experimental deep RL in airline revenue management in a duopoly setting. (2023). Paper presented at the American Economic Association (AEA) conference

work page 2023
[27]

Harrington

Joseph E. Harrington. 2018. Developing Competition Law for Collusion by Autonomous Artificial Agents. Journal of Competition Law & Economics 14, 3 (2018), 331–363. doi:10.1093/joclec/nhy016

work page doi:10.1093/joclec/nhy016 2018
[28]

Elad Hazan and Satyen Kale. 2009. On stochastic and worst-case models for investing. In Proceedings of the 22nd International Conference on Neural Informa- tion Processing Systems (NIPS’09). Curran Associates Inc., Red Hook, NY, USA, 709–717. event-place: Vancouver, British Columbia, Canada

work page 2009
[29]

Junling Hu and Michael P. Wellman. 1998. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In Proceedings of the Fifteenth Interna- tional Conference on Machine Learning (ICML ’98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 242–250

work page 1998
[30]

Sung-Ha Hwang and Luc Rey-Bellet. 2020. Strategic decompositions of normal form games: Zero-sum games and potential games.Games and Economic Behavior 122 (July 2020), 370–390. doi:10.1016/j.geb.2020.05.003

work page doi:10.1016/j.geb.2020.05.003 2020
[31]

Kalai and S

A. Kalai and S. Vempala. 2000. Efficient algorithms for universal portfolios. In Proceedings 41st annual symposium on foundations of computer science . 486–491. doi:10.1109/SFCS.2000.892136

work page doi:10.1109/sfcs.2000.892136 2000
[32]

Pankaj Kumar. 2023. Deep reinforcement learning for high-frequency market making. In Proceedings of the 14th asian conference on machine learning (Proceed- ings of machine learning research, Vol. 189) , Emtiyaz Khan and Mehmet Gonen (Eds.). PMLR, 531–546. https://proceedings.mlr.press/v189/kumar23a.html

work page 2023
[33]

Albert S. Kyle. 1985. Continuous Auctions and Insider Trading. Econometrica 53, 6 (Nov. 1985), 1315. doi:10.2307/1913210

work page doi:10.2307/1913210 1985
[34]

Fabrizio Lillo, J Doyne Farmer, and Rosario N Mantegna. 2003. Master curve for price-impact function. Nature 421, 6919 (2003), 129–130

work page 2003
[35]

Michael L. Littman. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning.. In ICML. Morgan Kaufmann, 157–163

work page 1994
[36]

Michael L. Littman. 1994. Markov games as a framework for multi-agent re- inforcement learning. In ICML (ICML’94). Morgan Kaufmann Publishers Inc., 157–163

work page 1994
[37]

Michael Maschler, Eilon Solan, and Shmuel Zamir. 2013. Game Theory (1 ed.). Cambridge University Press. doi:10.1017/CBO9780511794216

work page doi:10.1017/cbo9780511794216 2013
[38]

Iacopo Mastromatteo, Bence Toth, and Jean-Philippe Bouchaud. 2014. Agent- based models for latent liquidity and concave price impact. Physical Review E 89, 4 (2014), 042805

work page 2014
[39]

Rusu, Joel Veness, Marc G

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforce...

work page 2015
[40]

Dov Monderer and Lloyd S Shapley. 1996. Potential games. Games and economic behavior (1996)

work page 1996
[41]

Stephen Morris and Takashi Ui. 2004. Best response equivalence. Games and Economic Behavior 49, 2 (2004), 260–287. Publisher: Elsevier

work page 2004
[42]

John W. Pratt. 1978. Risk aversion in the small and in the large. In Uncertainty in Economics. Elsevier, 59–79. doi:10.1016/B978-0-12-214850-7.50010-3

work page doi:10.1016/b978-0-12-214850-7.50010-3 1978
[43]

Jakka Sairamesh and Jeffrey O Kephart. 2000. Price dynamics and quality in information markets. Decision Support Systems 28, 1 (2000), 35–47. doi:10.1016/ S0167-9236(99)00073-1

work page 2000
[44]

Samuelson

Paul A. Samuelson. 1965. Proof That Properly Anticipated Prices Fluctuate Randomly. Industrial Management Review 6, 2 (1965), 41

work page 1965
[45]

Tuomas Sandholm and Robert H. Crites. 1995. On Multiagent Q-Learning in a Semi-Competitive Domain. In Adaption and Learning in Multi-Agent Systems . Springer Berlin Heidelberg, 191–205

work page 1995
[46]

Lloyd S. Shapley. 1953. Stochastic Games*. Proceedings of the National Academy of Sciences 39, 10 (1953), 1095–1100. doi:10.1073/pnas.39.10.1095

work page doi:10.1073/pnas.39.10.1095 1953
[47]

Thomas Spooner, John Fearnley, Rahul Savani, and Andreas Koukorinis. 2018. Market making via reinforcement learning. InProceedings of the 17th international conference on autonomous agents and MultiAgent systems (Aamas ’18). Interna- tional Foundation for Autonomous Agents and Multiagent Systems, Stockholm, Sweden and Richland, SC, 434–442. Number of pages: 9

work page 2018
[48]

Thomas Spooner and Rahul Savani. 2020. Robust market making via adversarial reinforcement learning. In Proceedings of the 19th international conference on autonomous agents and MultiAgent systems (Aamas ’20). International Foundation for Autonomous Agents and Multiagent Systems, Auckland, New Zealand and Richland, SC, 2014–2016. Number of pages: 3

work page 2020
[49]

Gerald Tesauro and Jeffrey O. Kephart. 2002. Pricing in agent economies using multi-agent Q-learning. Autonomous Agents and Multi-Agent Systems 5, 3 (2002), 289–304. doi:10.1023/A:1015504423309

work page doi:10.1023/a:1015504423309 2002
[50]

Tesauro and Jeffrey O

Gerald J. Tesauro and Jeffrey O. Kephart. 1998. Foresight-based pricing algorithms in an economy of software agents. In Proceedings of the First International Con- ference on Information and Computation Economies . Association for Computing Machinery, New York, NY, USA, 37–44. doi:10.1145/288994.289002

work page doi:10.1145/288994.289002 1998
[51]

Tesauro and Jeffrey O

Gerald J. Tesauro and Jeffrey O. Kephart. 2000. Foresight-based pricing algorithms in agent economies. Decision Support Systems 28, 1 (2000), 49–60. doi:10.1016/ S0167-9236(99)00074-3

work page 2000
[52]

Bence Tóth, Zoltán Eisler, and J-P Bouchaud. 2016. The Square-Root Impace Law Also Holds for Option Markets. Wilmott 2016, 85 (2016), 70–73

work page 2016
[53]

Bence Tóth, Yves Lemperiere, Cyril Deremble, Joachim De Lataillade, Julien Kockelkoren, and J-P Bouchaud. 2011. Anomalous price impact and the critical nature of liquidity in financial markets. Physical Review X 1, 2 (2011), 021006

work page 2011
[54]

Ludo Waltman and Uzay Kaymak. 2008. Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control 32, 10 (2008), 3275–3293

work page 2008
[55]

Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (May 1992), 279–292. doi:10.1007/BF00992698

work page doi:10.1007/bf00992698 1992
[56]

Haoran Wei, Yuanbo Wang, Lidia Mangu, and Keith Decker. 2019. Model-based Reinforcement Learning for Predictions and Control for Limit Order Books. http://arxiv.org/abs/1910.03743 arXiv:1910.03743 [cs]. A TECHNICAL APPENDIX In this section, we present the remaining proofs of the results presented in the paper. A.1 Proof of Theorem 5.2 Theorem 5.2. For any...

work page arXiv 2019

[1] [1]

Ibrahim Abada, Xavier Lambin, and Nikolay Tchakarov. 2024. Collusion by mis- take: Does algorithmic sophistication drive supra-competitive profits? European Journal of Operational Research 318, 3 (2024), 927–953. Publisher: Elsevier

work page 2024

[2] [2]

Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. 2013. Efficient market making via convex optimization, and a connection to online learning. ACM Transactions on Economics and Computation 1, 2 (May 2013). doi:10.1145/ 2465769.2465777 Number of pages: 39 Place: New York, NY, USA Publisher: Association for Computing Machinery tex.articleno: 12 tex...

work page arXiv 2013

[3] [3]

Jacob Abernethy and Satyen Kale. 2013. Adaptive market making via online learning. Advances in Neural Information Processing Systems 26 (2013)

work page 2013

[4] [4]

Robert J Aumann. 1964. Markets with a continuum of traders. Econometrica (1964), 39–50

work page 1964

[5] [5]

Martino Banchio and Giacomo Mantegazza. 2023. Adaptive Algorithms and Collusion via Coupling. In Proceedings of the 24th ACM Conference on Economics and Computation. ACM, London United Kingdom, 208–208. doi:10.1145/3580507. 3597726

work page doi:10.1145/3580507 2023

[6] [6]

Martino Banchio and Andrzej Skrzypacz. 2022. Artificial Intelligence and Auction Design. doi:10.48550/arXiv.2202.05947 arXiv:2202.05947 [econ]

work page doi:10.48550/arxiv.2202.05947 2022

[7] [7]

Yogev Bar-On and Yishay Mansour. 2023. Uniswap Liquidity Provision: An Online Learning Approach. http://arxiv.org/abs/2302.00610 arXiv:2302.00610 [cs]

work page arXiv 2023

[8] [8]

Fischer Black. 1971. Toward a fully automated stock exchange, part I. Financial Analysts Journal 27, 4 (1971), 28–35

work page 1971

[9] [9]

Jean-Philippe Bouchaud, Julius Bonart, Jonathan Donier, and Martin Gould. 2018. Trades, quotes and prices: financial markets under the microscope . Cambridge University Press

work page 2018

[10] [12]

Emilio Calvano, Giacomo Calzolari, Vincenzo Denicolò, and Sergio Pastorello

work page

[11] [13]

American Economic Review 110, 10 (Oct

Artificial Intelligence, Algorithmic Pricing, and Collusion. American Economic Review 110, 10 (Oct. 2020), 3267–3297. doi:10.1257/aer.20190623

work page doi:10.1257/aer.20190623 2020

[12] [14]

Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Luigi Foscari, and Vinayak Pathak. 2025. Market Making without Regret. arXiv:2411.13993 [cs.GT] https://arxiv.org/abs/2411.13993

work page arXiv 2025

[13] [15]

Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, and Tucker Balch. 2022. Learning to simulate realistic limit order book markets from data as a World Agent. In Proceedings of the Third ACM International Conference on AI in Finance . 428–436. doi:10.1145/3533271.3561753 arXiv:2210.09897 [cs, q-fin]

work page doi:10.1145/3533271.3561753 2022

[14] [16]

Jean-Edouard Colliard, Thierry Foucault, and Stefano Lovo. 2022. Algorithmic Pricing and Liquidity in Securities Markets. SSRN Electronic Journal (2022). doi:10.2139/ssrn.4252858 Publisher: Elsevier BV

work page doi:10.2139/ssrn.4252858 2022

[15] [17]

Cover and E

T.M. Cover and E. Ordentlich. 1996. On-line portfolio selection. In Proceedings of the ninth annual conference on Computational learning theory - COLT ’96 . ACM Press, Desenzano del Garda, Italy, 310–313. doi:10.1145/238061.238161

work page doi:10.1145/238061.238161 1996

[16] [18]

Cover and E

T.M. Cover and E. Ordentlich. 1996. Universal portfolios with side information. IEEE Transactions on Information Theory 42, 2 (March 1996), 348–363. doi:10. 1109/18.485708

work page 1996

[17] [19]

Sanmay Das and Malik Magdon-Ismail. 2008. Adapting to a market shock: opti- mal sequential market-making. In Proceedings of the 22nd international conference on neural information processing systems (NIPS’08). Curran Associates Inc., Van- couver, British Columbia, Canada and Red Hook, NY, USA, 361–368. Number of pages: 8

work page 2008

[18] [20]

Goldberg, and Christos H

Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. 2009. The complexity of computing a Nash equilibrium. Commun. ACM 52, 2 (2009), 89–97. doi:10.1145/1461928.1461951

work page doi:10.1145/1461928.1461951 2009

[19] [21]

Winston Wei Dou, Itay Goldstein, and Yan Ji. 2025. Ai-powered trading, algorith- mic collusion, and price efficiency. Jacobs Levy Equity Management Center for Quantitative Financial Research Paper, The Wharton School Research Paper (2025)

work page 2025

[20] [22]

Eugene F Fama. 1970. Efficient capital markets: A review of theory and empirical work. The journal of Finance 25, 2 (1970), 383–417

work page 1970

[21] [23]

J Doyne Farmer, Austin Gerig, Fabrizio Lillo, and Henri Waelbroeck. 2013. How efficiency shapes market impact. Quantitative Finance 13, 11 (2013), 1743–1758

work page 2013

[22] [24]

Sumitra Ganesh, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, and Manuela Veloso. 2019. Reinforcement Learning for Market Making in a Multi- agent Dealer Market. doi:10.48550/ARXIV.1911.05892 Version Number: 1

work page doi:10.48550/arxiv.1911.05892 2019

[23] [25]

Sanford J Grossman and Joseph E Stiglitz. 1980. On the impossibility of in- formationally efficient markets. The American economic review 70, 3 (1980), 393–408

work page 1980

[24] [26]

Chengyan Gu. 2023. Deep Q-Learning in Airline Dynamic Pricing and Tacit Collusion: Experimental deep RL in airline revenue management in a duopoly setting. (2023). Paper presented at the American Economic Association (AEA) conference

work page 2023

[25] [27]

Harrington

Joseph E. Harrington. 2018. Developing Competition Law for Collusion by Autonomous Artificial Agents. Journal of Competition Law & Economics 14, 3 (2018), 331–363. doi:10.1093/joclec/nhy016

work page doi:10.1093/joclec/nhy016 2018

[26] [28]

Elad Hazan and Satyen Kale. 2009. On stochastic and worst-case models for investing. In Proceedings of the 22nd International Conference on Neural Informa- tion Processing Systems (NIPS’09). Curran Associates Inc., Red Hook, NY, USA, 709–717. event-place: Vancouver, British Columbia, Canada

work page 2009

[27] [29]

Junling Hu and Michael P. Wellman. 1998. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In Proceedings of the Fifteenth Interna- tional Conference on Machine Learning (ICML ’98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 242–250

work page 1998

[28] [30]

Sung-Ha Hwang and Luc Rey-Bellet. 2020. Strategic decompositions of normal form games: Zero-sum games and potential games.Games and Economic Behavior 122 (July 2020), 370–390. doi:10.1016/j.geb.2020.05.003

work page doi:10.1016/j.geb.2020.05.003 2020

[29] [31]

Kalai and S

A. Kalai and S. Vempala. 2000. Efficient algorithms for universal portfolios. In Proceedings 41st annual symposium on foundations of computer science . 486–491. doi:10.1109/SFCS.2000.892136

work page doi:10.1109/sfcs.2000.892136 2000

[30] [32]

Pankaj Kumar. 2023. Deep reinforcement learning for high-frequency market making. In Proceedings of the 14th asian conference on machine learning (Proceed- ings of machine learning research, Vol. 189) , Emtiyaz Khan and Mehmet Gonen (Eds.). PMLR, 531–546. https://proceedings.mlr.press/v189/kumar23a.html

work page 2023

[31] [33]

Albert S. Kyle. 1985. Continuous Auctions and Insider Trading. Econometrica 53, 6 (Nov. 1985), 1315. doi:10.2307/1913210

work page doi:10.2307/1913210 1985

[32] [34]

Fabrizio Lillo, J Doyne Farmer, and Rosario N Mantegna. 2003. Master curve for price-impact function. Nature 421, 6919 (2003), 129–130

work page 2003

[33] [35]

Michael L. Littman. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning.. In ICML. Morgan Kaufmann, 157–163

work page 1994

[34] [36]

Michael L. Littman. 1994. Markov games as a framework for multi-agent re- inforcement learning. In ICML (ICML’94). Morgan Kaufmann Publishers Inc., 157–163

work page 1994

[35] [37]

Michael Maschler, Eilon Solan, and Shmuel Zamir. 2013. Game Theory (1 ed.). Cambridge University Press. doi:10.1017/CBO9780511794216

work page doi:10.1017/cbo9780511794216 2013

[36] [38]

Iacopo Mastromatteo, Bence Toth, and Jean-Philippe Bouchaud. 2014. Agent- based models for latent liquidity and concave price impact. Physical Review E 89, 4 (2014), 042805

work page 2014

[37] [39]

Rusu, Joel Veness, Marc G

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforce...

work page 2015

[38] [40]

Dov Monderer and Lloyd S Shapley. 1996. Potential games. Games and economic behavior (1996)

work page 1996

[39] [41]

Stephen Morris and Takashi Ui. 2004. Best response equivalence. Games and Economic Behavior 49, 2 (2004), 260–287. Publisher: Elsevier

work page 2004

[40] [42]

John W. Pratt. 1978. Risk aversion in the small and in the large. In Uncertainty in Economics. Elsevier, 59–79. doi:10.1016/B978-0-12-214850-7.50010-3

work page doi:10.1016/b978-0-12-214850-7.50010-3 1978

[41] [43]

Jakka Sairamesh and Jeffrey O Kephart. 2000. Price dynamics and quality in information markets. Decision Support Systems 28, 1 (2000), 35–47. doi:10.1016/ S0167-9236(99)00073-1

work page 2000

[42] [44]

Samuelson

Paul A. Samuelson. 1965. Proof That Properly Anticipated Prices Fluctuate Randomly. Industrial Management Review 6, 2 (1965), 41

work page 1965

[43] [45]

Tuomas Sandholm and Robert H. Crites. 1995. On Multiagent Q-Learning in a Semi-Competitive Domain. In Adaption and Learning in Multi-Agent Systems . Springer Berlin Heidelberg, 191–205

work page 1995

[44] [46]

Lloyd S. Shapley. 1953. Stochastic Games*. Proceedings of the National Academy of Sciences 39, 10 (1953), 1095–1100. doi:10.1073/pnas.39.10.1095

work page doi:10.1073/pnas.39.10.1095 1953

[45] [47]

Thomas Spooner, John Fearnley, Rahul Savani, and Andreas Koukorinis. 2018. Market making via reinforcement learning. InProceedings of the 17th international conference on autonomous agents and MultiAgent systems (Aamas ’18). Interna- tional Foundation for Autonomous Agents and Multiagent Systems, Stockholm, Sweden and Richland, SC, 434–442. Number of pages: 9

work page 2018

[46] [48]

Thomas Spooner and Rahul Savani. 2020. Robust market making via adversarial reinforcement learning. In Proceedings of the 19th international conference on autonomous agents and MultiAgent systems (Aamas ’20). International Foundation for Autonomous Agents and Multiagent Systems, Auckland, New Zealand and Richland, SC, 2014–2016. Number of pages: 3

work page 2020

[47] [49]

Gerald Tesauro and Jeffrey O. Kephart. 2002. Pricing in agent economies using multi-agent Q-learning. Autonomous Agents and Multi-Agent Systems 5, 3 (2002), 289–304. doi:10.1023/A:1015504423309

work page doi:10.1023/a:1015504423309 2002

[48] [50]

Tesauro and Jeffrey O

Gerald J. Tesauro and Jeffrey O. Kephart. 1998. Foresight-based pricing algorithms in an economy of software agents. In Proceedings of the First International Con- ference on Information and Computation Economies . Association for Computing Machinery, New York, NY, USA, 37–44. doi:10.1145/288994.289002

work page doi:10.1145/288994.289002 1998

[49] [51]

Tesauro and Jeffrey O

Gerald J. Tesauro and Jeffrey O. Kephart. 2000. Foresight-based pricing algorithms in agent economies. Decision Support Systems 28, 1 (2000), 49–60. doi:10.1016/ S0167-9236(99)00074-3

work page 2000

[50] [52]

Bence Tóth, Zoltán Eisler, and J-P Bouchaud. 2016. The Square-Root Impace Law Also Holds for Option Markets. Wilmott 2016, 85 (2016), 70–73

work page 2016

[51] [53]

Bence Tóth, Yves Lemperiere, Cyril Deremble, Joachim De Lataillade, Julien Kockelkoren, and J-P Bouchaud. 2011. Anomalous price impact and the critical nature of liquidity in financial markets. Physical Review X 1, 2 (2011), 021006

work page 2011

[52] [54]

Ludo Waltman and Uzay Kaymak. 2008. Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control 32, 10 (2008), 3275–3293

work page 2008

[53] [55]

Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (May 1992), 279–292. doi:10.1007/BF00992698

work page doi:10.1007/bf00992698 1992

[54] [56]

Haoran Wei, Yuanbo Wang, Lidia Mangu, and Keith Decker. 2019. Model-based Reinforcement Learning for Predictions and Control for Limit Order Books. http://arxiv.org/abs/1910.03743 arXiv:1910.03743 [cs]. A TECHNICAL APPENDIX In this section, we present the remaining proofs of the results presented in the paper. A.1 Proof of Theorem 5.2 Theorem 5.2. For any...

work page arXiv 2019