The Invisible Handshake: Persistent Overpricing by Adaptive Market Agents
Pith reviewed 2026-05-18 08:03 UTC · model grok-4.3
The pith
Decentralized learning by adaptive market agents can produce persistent overpricing while respecting cash and inventory limits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study overpricing in a repeated game between two representative agents: a market maker, who controls market liquidity, and a market taker, who chooses trade quantities. Market prices evolve through the endogenous price impact of trades and exogenous shocks. We define overpricing relative to a counterfactual price path that holds fixed the same sequence of shocks while shutting down price impact, and characterize the set of feasible strategy profiles that generate persistent overpricing while respecting cash and inventory constraints. We provide a sufficient condition for decentralized learning to reach the overpricing region in finite time, and we show that this condition is satisfied, in
What carries the argument
The decomposition of the game into a competitive component that favors zero price impact and a collaborative component that makes overpricing jointly profitable when aggregate inventory is positive.
If this is right
- Projected stochastic gradient ascent satisfies the sufficient condition and therefore reaches the overpricing region in finite time.
- The competitive-collaborative decomposition applies equally to myopic and farsighted payoff objectives.
- Persistent overpricing can be maintained while strictly obeying cash and inventory constraints.
- The same structural incentives arise whether agents optimize one-period or multi-period returns.
Where Pith is reading between the lines
- Regulators monitoring trading algorithms could target gradient-based learning rules rather than final positions alone.
- The mechanism may help explain mispricing in markets where most volume comes from adaptive algorithmic traders.
- Extending the model to three or more agents would test whether the collaborative component scales or fragments.
- Laboratory experiments with human subjects or AI agents using gradient learning could directly observe whether overpricing emerges.
Load-bearing premise
That the competitive-collaborative decomposition governs incentives for both myopic and farsighted objectives and that projected stochastic gradient ascent satisfies the condition needed to reach the overpricing region.
What would settle it
Simulate the two-agent market under projected stochastic gradient ascent for many periods and measure whether average prices remain above the counterfactual shock-only path whenever aggregate inventory is positive; absence of sustained overpricing would refute the claim.
Figures
read the original abstract
We study overpricing in a repeated game between two representative agents: a market maker, who controls market liquidity, and a market taker, who chooses trade quantities. Market prices evolve through the endogenous price impact of trades and exogenous shocks. We define overpricing relative to a counterfactual price path that holds fixed the same sequence of shocks while shutting down price impact, and characterize the set of feasible strategy profiles that generate persistent overpricing while respecting cash and inventory constraints. We provide a sufficient condition for decentralized learning to reach the overpricing region in finite time, and we show that this condition is satisfied, in particular, by projected stochastic gradient ascent. A key step in the analysis is a decomposition of the game into a competitive component, which favors zero price impact, and a collaborative component, which makes overpricing jointly profitable when aggregate inventory is positive. We further show that the same structural incentives govern both myopic and farsighted objectives. Together, these results show how decentralized learning by adaptive market agents can lead to persistent overpricing in financial markets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes a repeated game between a market maker controlling liquidity and a market taker choosing quantities, with prices driven by endogenous impact and exogenous shocks. Overpricing is defined relative to a counterfactual path that fixes shocks but eliminates price impact. The authors characterize feasible strategy profiles that sustain overpricing under cash and inventory constraints, derive a sufficient condition for decentralized learning to enter the overpricing region in finite time, and prove that projected stochastic gradient ascent satisfies this condition. A competitive-collaborative decomposition of incentives is shown to govern both myopic and farsighted objectives, implying that adaptive agents can generate persistent overpricing.
Significance. If the sufficient-condition claim holds after projection, the work supplies a precise mechanism by which decentralized adaptive learning produces persistent overpricing without explicit collusion, via inventory-driven collaborative incentives. The competitive-collaborative split offers a reusable structural tool for analyzing repeated market games and unifies myopic and forward-looking behavior. These results bear on market-efficiency debates and microstructure policy, particularly for settings with binding position limits.
major comments (2)
- [Abstract / sufficient-condition statement] Abstract and the section stating the sufficient condition: the claim that projected stochastic gradient ascent satisfies the condition for entering the overpricing region in finite time is asserted without exhibiting the post-projection inner-product verification or boundary-case analysis. When cash or inventory constraints bind, the projection operator can clip the collaborative component, leaving the update inside the competitive region; the manuscript provides no explicit check that the collaborative term still dominates after projection.
- [Definition of overpricing region] The definition of the overpricing region (relative to the no-price-impact counterfactual) makes membership depend on the realized shock sequence. The convergence result therefore requires that the sufficient condition remain satisfied uniformly across shock paths; the paper does not supply a uniform bound or robustness argument that would prevent the dynamics from exiting the region under alternative shock realizations.
minor comments (2)
- [Decomposition section] The notation for the competitive and collaborative components should be introduced with explicit equations rather than descriptive text alone, to facilitate checking the inner-product condition after projection.
- [Figures] Figure captions and axis labels for any simulation or phase-plane diagrams should explicitly indicate whether trajectories respect the cash/inventory projection or are shown in the unconstrained plane.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive report. The comments identify opportunities to strengthen the exposition of the sufficient-condition claim and the pathwise nature of the overpricing result. We respond to each major comment below and indicate the revisions we will make in the next version.
read point-by-point responses
-
Referee: Abstract / sufficient-condition statement: the claim that projected stochastic gradient ascent satisfies the condition for entering the overpricing region in finite time is asserted without exhibiting the post-projection inner-product verification or boundary-case analysis. When cash or inventory constraints bind, the projection operator can clip the collaborative component, leaving the update inside the competitive region; the manuscript provides no explicit check that the collaborative term still dominates after projection.
Authors: We agree that the current presentation would benefit from an explicit post-projection verification. The underlying proof already establishes that the inner product of the projected update with the collaborative direction remains positive when the step size is sufficiently small, but this step is only sketched. In the revised manuscript we will add a short lemma (or appendix paragraph) that directly computes the inner product after projection onto the cash- and inventory-feasible set and verifies the boundary cases. This addition will make the argument self-contained without altering the result. revision: yes
-
Referee: Definition of overpricing region: The definition of the overpricing region (relative to the no-price-impact counterfactual) makes membership depend on the realized shock sequence. The convergence result therefore requires that the sufficient condition remain satisfied uniformly across shock paths; the paper does not supply a uniform bound or robustness argument that would prevent the dynamics from exiting the region under alternative shock realizations.
Authors: The overpricing region and the sufficient condition are formulated pathwise for any fixed exogenous shock sequence; the finite-time entry result therefore holds conditionally on the realized path that the agents actually observe. Uniformity over all possible shock realizations is not required for the stated theorem. We will revise the text to state this pathwise character explicitly and add a brief remark noting that, under standard bounded-shock assumptions, a uniform bound follows immediately from the same inner-product argument. If the referee prefers a fully uniform statement without additional assumptions, we can discuss the necessary restrictions on the shock process. revision: partial
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper defines overpricing relative to an explicit counterfactual price path that shuts down price impact while holding shocks fixed; this is a modeling choice, not a self-referential construction. It then states a sufficient condition for decentralized learning to reach the overpricing region in finite time and asserts that projected stochastic gradient ascent satisfies the condition via the competitive/collaborative decomposition. These steps rest on independent game-theoretic analysis of the repeated interaction, cash/inventory constraints, and the decomposition, without reducing the central claim to a fit, a tautology, or a self-citation chain. The derivation is self-contained against the stated mathematical structure.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The game admits a decomposition into a competitive component favoring zero price impact and a collaborative component that makes overpricing jointly profitable when aggregate inventory is positive.
- ad hoc to paper Projected stochastic gradient ascent satisfies the sufficient condition for reaching the overpricing region in finite time.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We decompose the per-round game into a competitive component (Game 2) and collaborative component (Game 3) ... R^p_t = 1/2 Z + 1/2 U (eq. 18). Projected gradient ascent on κ̃ satisfies the assumptions of theorem 6.2.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ibrahim Abada, Xavier Lambin, and Nikolay Tchakarov. 2024. Collusion by mis- take: Does algorithmic sophistication drive supra-competitive profits? European Journal of Operational Research 318, 3 (2024), 927–953. Publisher: Elsevier
work page 2024
-
[2]
Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. 2013. Efficient market making via convex optimization, and a connection to online learning. ACM Transactions on Economics and Computation 1, 2 (May 2013). doi:10.1145/ 2465769.2465777 Number of pages: 39 Place: New York, NY, USA Publisher: Association for Computing Machinery tex.articleno: 12 tex...
-
[3]
Jacob Abernethy and Satyen Kale. 2013. Adaptive market making via online learning. Advances in Neural Information Processing Systems 26 (2013)
work page 2013
-
[4]
Robert J Aumann. 1964. Markets with a continuum of traders. Econometrica (1964), 39–50
work page 1964
-
[5]
Martino Banchio and Giacomo Mantegazza. 2023. Adaptive Algorithms and Collusion via Coupling. In Proceedings of the 24th ACM Conference on Economics and Computation. ACM, London United Kingdom, 208–208. doi:10.1145/3580507. 3597726
-
[6]
Martino Banchio and Andrzej Skrzypacz. 2022. Artificial Intelligence and Auction Design. doi:10.48550/arXiv.2202.05947 arXiv:2202.05947 [econ]
- [7]
-
[8]
Fischer Black. 1971. Toward a fully automated stock exchange, part I. Financial Analysts Journal 27, 4 (1971), 28–35
work page 1971
-
[9]
Jean-Philippe Bouchaud, Julius Bonart, Jonathan Donier, and Martin Gould. 2018. Trades, quotes and prices: financial markets under the microscope . Cambridge University Press
work page 2018
-
[12]
Emilio Calvano, Giacomo Calzolari, Vincenzo Denicolò, and Sergio Pastorello
-
[13]
American Economic Review 110, 10 (Oct
Artificial Intelligence, Algorithmic Pricing, and Collusion. American Economic Review 110, 10 (Oct. 2020), 3267–3297. doi:10.1257/aer.20190623
- [14]
-
[15]
Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, and Tucker Balch. 2022. Learning to simulate realistic limit order book markets from data as a World Agent. In Proceedings of the Third ACM International Conference on AI in Finance . 428–436. doi:10.1145/3533271.3561753 arXiv:2210.09897 [cs, q-fin]
-
[16]
Jean-Edouard Colliard, Thierry Foucault, and Stefano Lovo. 2022. Algorithmic Pricing and Liquidity in Securities Markets. SSRN Electronic Journal (2022). doi:10.2139/ssrn.4252858 Publisher: Elsevier BV
-
[17]
T.M. Cover and E. Ordentlich. 1996. On-line portfolio selection. In Proceedings of the ninth annual conference on Computational learning theory - COLT ’96 . ACM Press, Desenzano del Garda, Italy, 310–313. doi:10.1145/238061.238161
-
[18]
T.M. Cover and E. Ordentlich. 1996. Universal portfolios with side information. IEEE Transactions on Information Theory 42, 2 (March 1996), 348–363. doi:10. 1109/18.485708
work page 1996
-
[19]
Sanmay Das and Malik Magdon-Ismail. 2008. Adapting to a market shock: opti- mal sequential market-making. In Proceedings of the 22nd international conference on neural information processing systems (NIPS’08). Curran Associates Inc., Van- couver, British Columbia, Canada and Red Hook, NY, USA, 361–368. Number of pages: 8
work page 2008
-
[20]
Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. 2009. The complexity of computing a Nash equilibrium. Commun. ACM 52, 2 (2009), 89–97. doi:10.1145/1461928.1461951
-
[21]
Winston Wei Dou, Itay Goldstein, and Yan Ji. 2025. Ai-powered trading, algorith- mic collusion, and price efficiency. Jacobs Levy Equity Management Center for Quantitative Financial Research Paper, The Wharton School Research Paper (2025)
work page 2025
-
[22]
Eugene F Fama. 1970. Efficient capital markets: A review of theory and empirical work. The journal of Finance 25, 2 (1970), 383–417
work page 1970
-
[23]
J Doyne Farmer, Austin Gerig, Fabrizio Lillo, and Henri Waelbroeck. 2013. How efficiency shapes market impact. Quantitative Finance 13, 11 (2013), 1743–1758
work page 2013
-
[24]
Sumitra Ganesh, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, and Manuela Veloso. 2019. Reinforcement Learning for Market Making in a Multi- agent Dealer Market. doi:10.48550/ARXIV.1911.05892 Version Number: 1
-
[25]
Sanford J Grossman and Joseph E Stiglitz. 1980. On the impossibility of in- formationally efficient markets. The American economic review 70, 3 (1980), 393–408
work page 1980
-
[26]
Chengyan Gu. 2023. Deep Q-Learning in Airline Dynamic Pricing and Tacit Collusion: Experimental deep RL in airline revenue management in a duopoly setting. (2023). Paper presented at the American Economic Association (AEA) conference
work page 2023
-
[27]
Joseph E. Harrington. 2018. Developing Competition Law for Collusion by Autonomous Artificial Agents. Journal of Competition Law & Economics 14, 3 (2018), 331–363. doi:10.1093/joclec/nhy016
-
[28]
Elad Hazan and Satyen Kale. 2009. On stochastic and worst-case models for investing. In Proceedings of the 22nd International Conference on Neural Informa- tion Processing Systems (NIPS’09). Curran Associates Inc., Red Hook, NY, USA, 709–717. event-place: Vancouver, British Columbia, Canada
work page 2009
-
[29]
Junling Hu and Michael P. Wellman. 1998. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In Proceedings of the Fifteenth Interna- tional Conference on Machine Learning (ICML ’98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 242–250
work page 1998
-
[30]
Sung-Ha Hwang and Luc Rey-Bellet. 2020. Strategic decompositions of normal form games: Zero-sum games and potential games.Games and Economic Behavior 122 (July 2020), 370–390. doi:10.1016/j.geb.2020.05.003
-
[31]
A. Kalai and S. Vempala. 2000. Efficient algorithms for universal portfolios. In Proceedings 41st annual symposium on foundations of computer science . 486–491. doi:10.1109/SFCS.2000.892136
-
[32]
Pankaj Kumar. 2023. Deep reinforcement learning for high-frequency market making. In Proceedings of the 14th asian conference on machine learning (Proceed- ings of machine learning research, Vol. 189) , Emtiyaz Khan and Mehmet Gonen (Eds.). PMLR, 531–546. https://proceedings.mlr.press/v189/kumar23a.html
work page 2023
-
[33]
Albert S. Kyle. 1985. Continuous Auctions and Insider Trading. Econometrica 53, 6 (Nov. 1985), 1315. doi:10.2307/1913210
-
[34]
Fabrizio Lillo, J Doyne Farmer, and Rosario N Mantegna. 2003. Master curve for price-impact function. Nature 421, 6919 (2003), 129–130
work page 2003
-
[35]
Michael L. Littman. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning.. In ICML. Morgan Kaufmann, 157–163
work page 1994
-
[36]
Michael L. Littman. 1994. Markov games as a framework for multi-agent re- inforcement learning. In ICML (ICML’94). Morgan Kaufmann Publishers Inc., 157–163
work page 1994
-
[37]
Michael Maschler, Eilon Solan, and Shmuel Zamir. 2013. Game Theory (1 ed.). Cambridge University Press. doi:10.1017/CBO9780511794216
-
[38]
Iacopo Mastromatteo, Bence Toth, and Jean-Philippe Bouchaud. 2014. Agent- based models for latent liquidity and concave price impact. Physical Review E 89, 4 (2014), 042805
work page 2014
-
[39]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforce...
work page 2015
-
[40]
Dov Monderer and Lloyd S Shapley. 1996. Potential games. Games and economic behavior (1996)
work page 1996
-
[41]
Stephen Morris and Takashi Ui. 2004. Best response equivalence. Games and Economic Behavior 49, 2 (2004), 260–287. Publisher: Elsevier
work page 2004
-
[42]
John W. Pratt. 1978. Risk aversion in the small and in the large. In Uncertainty in Economics. Elsevier, 59–79. doi:10.1016/B978-0-12-214850-7.50010-3
-
[43]
Jakka Sairamesh and Jeffrey O Kephart. 2000. Price dynamics and quality in information markets. Decision Support Systems 28, 1 (2000), 35–47. doi:10.1016/ S0167-9236(99)00073-1
work page 2000
- [44]
-
[45]
Tuomas Sandholm and Robert H. Crites. 1995. On Multiagent Q-Learning in a Semi-Competitive Domain. In Adaption and Learning in Multi-Agent Systems . Springer Berlin Heidelberg, 191–205
work page 1995
-
[46]
Lloyd S. Shapley. 1953. Stochastic Games*. Proceedings of the National Academy of Sciences 39, 10 (1953), 1095–1100. doi:10.1073/pnas.39.10.1095
-
[47]
Thomas Spooner, John Fearnley, Rahul Savani, and Andreas Koukorinis. 2018. Market making via reinforcement learning. InProceedings of the 17th international conference on autonomous agents and MultiAgent systems (Aamas ’18). Interna- tional Foundation for Autonomous Agents and Multiagent Systems, Stockholm, Sweden and Richland, SC, 434–442. Number of pages: 9
work page 2018
-
[48]
Thomas Spooner and Rahul Savani. 2020. Robust market making via adversarial reinforcement learning. In Proceedings of the 19th international conference on autonomous agents and MultiAgent systems (Aamas ’20). International Foundation for Autonomous Agents and Multiagent Systems, Auckland, New Zealand and Richland, SC, 2014–2016. Number of pages: 3
work page 2020
-
[49]
Gerald Tesauro and Jeffrey O. Kephart. 2002. Pricing in agent economies using multi-agent Q-learning. Autonomous Agents and Multi-Agent Systems 5, 3 (2002), 289–304. doi:10.1023/A:1015504423309
-
[50]
Gerald J. Tesauro and Jeffrey O. Kephart. 1998. Foresight-based pricing algorithms in an economy of software agents. In Proceedings of the First International Con- ference on Information and Computation Economies . Association for Computing Machinery, New York, NY, USA, 37–44. doi:10.1145/288994.289002
-
[51]
Gerald J. Tesauro and Jeffrey O. Kephart. 2000. Foresight-based pricing algorithms in agent economies. Decision Support Systems 28, 1 (2000), 49–60. doi:10.1016/ S0167-9236(99)00074-3
work page 2000
-
[52]
Bence Tóth, Zoltán Eisler, and J-P Bouchaud. 2016. The Square-Root Impace Law Also Holds for Option Markets. Wilmott 2016, 85 (2016), 70–73
work page 2016
-
[53]
Bence Tóth, Yves Lemperiere, Cyril Deremble, Joachim De Lataillade, Julien Kockelkoren, and J-P Bouchaud. 2011. Anomalous price impact and the critical nature of liquidity in financial markets. Physical Review X 1, 2 (2011), 021006
work page 2011
-
[54]
Ludo Waltman and Uzay Kaymak. 2008. Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control 32, 10 (2008), 3275–3293
work page 2008
-
[55]
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (May 1992), 279–292. doi:10.1007/BF00992698
-
[56]
Haoran Wei, Yuanbo Wang, Lidia Mangu, and Keith Decker. 2019. Model-based Reinforcement Learning for Predictions and Control for Limit Order Books. http://arxiv.org/abs/1910.03743 arXiv:1910.03743 [cs]. A TECHNICAL APPENDIX In this section, we present the remaining proofs of the results presented in the paper. A.1 Proof of Theorem 5.2 Theorem 5.2. For any...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.