Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery
Pith reviewed 2026-06-27 22:40 UTC · model grok-4.3
The pith
As market informedness increases, market-maker profitability trends upward overall because informed trading aids price discovery enough to offset adverse selection costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this model with heterogeneous information sets, inventory-risk aversion, endogenous prices, exogenous fundamental values, and a state-dependent self-exciting market-taker order flow that satisfies finite-horizon stability, multi-agent reinforcement learning yields market-maker strategies under which profitability displays an overall upward trend with rising aggregate informedness, even amid local non-monotonicities from stochastic learning, indicating that price-discovery benefits can offset adverse-selection costs.
What carries the argument
Multi-agent reinforcement learning with centralized training and decentralized execution applied to market makers holding heterogeneous information sets inside an agent-based market that endogenously forms prices and maintains stable order-flow dynamics.
Load-bearing premise
The reinforcement learning produces strategies that match real-world market-making behavior under heterogeneous information and the order-flow process stays stable over the finite horizon.
What would settle it
Simulations at successively higher informedness levels that instead show flat or declining market-maker profitability without an overall upward trend would refute the central claim.
read the original abstract
This paper studies how market informedness affects market makers' profitability in a computational market environment with heterogeneous learning agents. We develop an agent-based market model in which market makers differ in their information sets and inventory-risk aversion, prices form endogenously, fundamental values evolve exogenously, and market-taker order flow follows a state-dependent self-exciting process. The model provides a controlled computational laboratory for analyzing the interaction between informed trading, adverse selection, price discovery, and liquidity provision. We establish finite-horizon stability properties of the market-taker order-flow process and solve the market-making problem using multi-agent reinforcement learning with centralized training and decentralized execution. The results show that informed market order flow is particularly harmful when aggregate market informedness is low, exposing market makers to severe adverse-selection risk. However, as market informedness increases, market-maker profitability displays an overall upward trend despite local non-monotonicities arising from complex market dynamics and stochastic learning. This suggests that the price-discovery benefits of informed trading can offset its adverse-selection costs. The findings contribute to computational economics by showing how agent heterogeneity, endogenous price formation, and learning-based liquidity provision jointly shape market outcomes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops an agent-based computational market model with heterogeneous market makers (differing in information sets and inventory-risk aversion), endogenous price formation, exogenous fundamental values, and market-taker order flow governed by a state-dependent self-exciting process. Finite-horizon stability of the order-flow process is established, after which the market-making problem is solved via multi-agent reinforcement learning with centralized training and decentralized execution. The central claim is that informed order flow is especially harmful at low aggregate informedness (severe adverse selection), but as informedness rises, market-maker profitability exhibits an overall upward trend despite local non-monotonicities, implying that price-discovery benefits can offset adverse-selection costs.
Significance. If the simulation results prove robust to training variation, the work supplies a controlled computational laboratory for quantifying the adverse-selection versus price-discovery trade-off under agent heterogeneity and learning-based liquidity provision. This extends traditional microstructure models by endogenizing both prices and strategies through MARL, offering potential insights for market-design questions in computational finance.
major comments (2)
- [The section describing the multi-agent reinforcement learning implementation and results] The headline result—an overall upward trend in market-maker profitability with rising informedness—rests on the multi-agent RL (CTDE) component producing reliable, near-optimal policies under heterogeneous information. The abstract and setup supply no quantitative evidence on policy convergence, variance across random seeds, training stability, or ablation against heuristic or optimal benchmarks. This is load-bearing because the claimed offset between price-discovery benefits and adverse-selection costs does not follow if the observed trend is an artifact of a particular training run or unstable learning.
- [The section establishing finite-horizon stability of the order-flow process] The finite-horizon stability properties of the market-taker order-flow process are asserted as a prerequisite for the simulation framework, yet the abstract provides neither the explicit conditions under which stability holds nor a reference to the proof or derivation. Because the entire computational laboratory depends on this property, its verification is necessary to support the subsequent profitability claims.
minor comments (2)
- The abstract would benefit from a concise statement of the number of market makers, the range of informedness levels simulated, and the precise definition of profitability used (e.g., whether it includes inventory penalties or only realized P&L).
- Notation for the self-exciting process parameters and the inventory-risk aversion levels should be introduced consistently when first mentioned.
Simulated Author's Rebuttal
Thank you for the referee's constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we will make.
read point-by-point responses
-
Referee: [The section describing the multi-agent reinforcement learning implementation and results] The headline result—an overall upward trend in market-maker profitability with rising informedness—rests on the multi-agent RL (CTDE) component producing reliable, near-optimal policies under heterogeneous information. The abstract and setup supply no quantitative evidence on policy convergence, variance across random seeds, training stability, or ablation against heuristic or optimal benchmarks. This is load-bearing because the claimed offset between price-discovery benefits and adverse-selection costs does not follow if the observed trend is an artifact of a particular training run or unstable learning.
Authors: We concur that the robustness of the MARL policies is critical for the validity of the headline result. The current manuscript focuses on the economic implications rather than the technical training details. In the revision, we will expand the relevant section to include quantitative metrics on policy convergence (e.g., reward curves), variance across at least five random seeds, and ablations comparing the learned policies to simple heuristic market-making strategies. This will confirm that the profitability trend is reliable. revision: yes
-
Referee: [The section establishing finite-horizon stability of the order-flow process] The finite-horizon stability properties of the market-taker order-flow process are asserted as a prerequisite for the simulation framework, yet the abstract provides neither the explicit conditions under which stability holds nor a reference to the proof or derivation. Because the entire computational laboratory depends on this property, its verification is necessary to support the subsequent profitability claims.
Authors: The stability properties are formally established in Section 3 of the manuscript, including the explicit conditions and the proof in the appendix. To address the concern, we will update the abstract to mention the key stability conditions and direct readers to the theorem and proof location. revision: yes
Circularity Check
No circularity: results emerge from forward simulation of RL agents
full rationale
The paper defines a market model with exogenous fundamentals, self-exciting order flow, and heterogeneous agents, then solves for market-maker policies via multi-agent RL (CTDE) and reports simulated profitability trends as informedness varies. No equations are shown that define a quantity in terms of itself or rename a fitted parameter as a prediction. Stability properties are stated as established prior to simulation rather than derived from the target result. No load-bearing self-citation chain or ansatz smuggling is present; the central claim is an observed simulation outcome, not a tautological reduction to inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- inventory-risk aversion levels
- self-exciting process parameters
axioms (1)
- domain assumption finite-horizon stability properties of the market-taker order-flow process
Reference graph
Works this paper leans on
-
[1]
Amihud, Y., & Mendelson, H. (1980). Dealership market: Market-making with inven- tory.Journal of Financial Economics,8(1), 31–53, https://doi.org/10.1016/ 0304-405X(80)90020-3
1980
-
[2]
Avellaneda, M., & Stoikov, S. (2008). High-frequency trading in a limit order book.Quantitative Finance,8(3), 217–224, https://doi.org/10.1080/ 14697680701381228
2008
-
[3]
Bank, P., Cartea, ´A., K¨ orber, L. (2023). Optimal execution and speculation with trade signals.arXiv preprint arXiv:2306.00621,2306.00621, arXiv:2306.00621, https://doi.org/10.48550/arXiv.2306.00621
-
[4]
Barucci, E., Mathieu, A., Sanchez-Betancourt, L. (2025). Market making with fads, informed, and uninformed traders.arXiv preprint arXiv:2501.03658, , https:// doi.org/10.48550/arXiv.2501.03658
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.03658 2025
-
[5]
Black, F., & Scholes, M.S. (1973). The pricing of options and corporate liabilities. Journal of Political Economy,81(3), 637–654, https://doi.org/10.1086/260062
-
[6]
Bodor, H., & Carlier, L. (2025). Deep learning meets queue-reactive: A frame- work for realistic limit order book simulation.arXiv preprint arXiv:2501.08822, 2501.08822, arXiv:2501.08822, https://doi.org/10.48550/arXiv.2501.08822
-
[7]
Bormetti, G., Calcagnile, L.M., Treccani, M., Corsi, F., Marmi, S., Lillo, F. (2015). Modelling systemic price cojumps with Hawkes factor models.Quantitative Finance,15(7), 1137–1156, https://doi.org/10.1080/14697688.2014.996586
-
[8]
Bowsher, C.G. (2007). Modelling security market events in continuous time: Intensity based, multivariate point process models.Journal of Econometrics,141(2), 876–912, https://doi.org/10.1016/j.jeconom.2006.11.007
-
[9]
Campi, L., & Zabaljauregui, D. (2020). Optimal market making under partial infor- mation with general intensities.Applied Mathematical Finance,27(1-2), 1–45, https://doi.org/10.1080/1350486X.2020.1758587 43 Cartea, ´A., & Jaimungal, S. (2016). Incorporating order-flow into optimal execution. Mathematics and Financial Economics,10(3), 339–364, https://doi....
-
[10]
Cambridge, UK: Cambridge University Press. Cartea, ´A., & Wang, Y. (2020). Market making with alpha signals.International Journal of Theoretical and Applied Finance,23(3), 2050016, https://doi.org/ 10.1142/S0219024920500168
-
[11]
Chakraborti, A., Muni Toke, I., Patriarca, M., Abergel, F. (2011). Econophysics review: I. empirical facts.Quantitative Finance,11(7), 991–1012, https:// doi.org/10.1080/14697688.2010.539248
-
[12]
Cheridito, P., Dupret, J.-L., Wu, Z. (2025). Abides-marl: A multi-agent reinforcement learning environment for endogenous price formation and execution in a limit order book.arXiv preprint arXiv:2511.02016, , https://doi.org/10.48550/ arXiv.2511.02016
arXiv 2025
-
[13]
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. (2014). Learning phrase representations using rnn encoder– decoder for statistical machine translation.Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp)(pp. 1724–1734)
2014
-
[14]
Cohen, K.J., Maier, S.F., Schwartz, R.A., Whitcomb, D.K. (1981). Transaction costs, order placement strategy, and existence of the bid-ask spread.Journal of Political Economy,89(2), 287–305, https://doi.org/10.1086/260966
-
[15]
Copeland, T.E., & Galai, D. (1983). Information effects on the bid-ask spread.Jour- nal of Finance,38(5), 1457–1469, https://doi.org/10.1111/j.1540-6261.1983 .tb03834.x
-
[16]
(2003).An introduction to the theory of point processes (2nd ed.)
Daley, D.J., & Vere-Jones, D. (2003).An introduction to the theory of point processes (2nd ed.). Springer
2003
-
[17]
Demsetz, H. (1968). The cost of transacting.Quarterly Journal of Economics,82(1), 33–53, https://doi.org/10.2307/1882244 44
-
[18]
Drissi, F. (2022). Solvability of differential Riccati equations and applications to algorithmic trading with signals.Applied Mathematical Finance,29(6), 457–493, https://doi.org/10.1080/1350486X.2023.2241130
-
[19]
Filimonov, V., & Sornette, D. (2015). Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data.Quantitative Finance,15(8), 1293–1314, https://doi.org/10 .1080/14697688.2015.1032544
arXiv 2015
-
[20]
Garman, M.B. (1976). Market microstructure.Journal of Financial Economics,3(3), 257–275, https://doi.org/10.1016/0304-405X(76)90006-4 Gaˇ sperov, B., Beguˇ si´ c, S., PosedelˇSimovi´ c, P., Kostanjˇ car, Z. (2021). Reinforce- ment learning approaches to optimal market making.Mathematics,9(21), 2689, https://doi.org/10.3390/math9212689
-
[21]
(2018).A crisis of beliefs: Investor psychology and financial fragility
Gennaioli, N., & Shleifer, A. (2018).A crisis of beliefs: Investor psychology and financial fragility. Princeton, NJ: Princeton University Press
2018
-
[22]
Glosten, L.R., & Milgrom, P.R. (1985). Bid, ask and transaction prices in a specialist market with heterogeneously informed traders.Journal of Financial Economics, 14(1), 71–100, https://doi.org/10.1016/0304-405X(85)90044-3
-
[23]
Gould, M.D., Porter, M.A., Williams, S., McDonald, M., Fenn, D.J., Howison, S.D. (2013). Limit order books.Quantitative Finance,13(11), 1709–1742, https:// doi.org/10.1080/14697688.2013.803148
-
[24]
Hawkes, A.G. (1971). Spectra of some self-exciting and mutually exciting point processes.Biometrika,58(1), 83–90, https://doi.org/10.1093/biomet/58.1.83
-
[25]
He, X.-Z., & Lin, S. (2022). Reinforcement learning equilibrium in limit order markets. Journal of Economic Dynamics and Control,144, 104497, https://doi.org/ 10.1016/j.jedc.2022.104497
-
[26]
Ho, T.S.Y., & Stoll, H.R. (1981). Optimal dealer pricing under transactions and return uncertainty.Journal of Financial Economics,9(1), 47–73, https://doi.org/ 10.1016/0304-405X(81)90020-9 45
-
[27]
Ho, T.S.Y., & Stoll, H.R. (1983). The dynamics of dealer markets under com- petition.Journal of Finance,38(4), 1053–1074, https://doi.org/10.1111/ j.1540-6261.1983.tb02282.x
arXiv 1983
-
[28]
Jain, K., Firoozye, N., Kochems, J., Treleaven, P. (2024, November). Limit order book dynamics and order size modelling using compound hawkes pro- cess.Finance Research Letters,69(Part A), 106157, https://doi.org/10.1016/ j.frl.2024.106157
arXiv 2024
-
[29]
Lehalle, C.-A., & Neuman, E. (2019). Incorporating signals into optimal trading. Finance and Stochastics,23(2), 275–311, https://doi.org/10.1007/s00780-019 -00382-7
-
[30]
Stoica, I
Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., . . . Stoica, I. (2018). Rllib: Abstractions for distributed reinforcement learning.Proceed- ings of the 35th international conference on machine learning(pp. 3053–3062). Retrieved from https://proceedings.mlr.press/v80/liang18b.html
2018
-
[31]
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems 30 (neurips 2017)(pp. 6379–6390)
2017
-
[32]
Mildenstein, E., & Schleef, H.J. (1983). The optimal pricing policy of a monopolistic marketmaker in the equity market.Journal of Finance,38(1), 218–231, https:// doi.org/10.1111/j.1540-6261.1983.tb03637.x
-
[33]
Neuman, E., & Voß, M. (2022). Optimal signal-adaptive trading with temporary and transient price-impact.SIAM Journal on Financial Mathematics,13(2), 551–575, https://doi.org/10.1137/20M1375486
-
[34]
Ogata, Y. (1981). On Lewis’ simulation method for point processes.IEEE Transac- tions on Information Theory,27(1), 23–31, https://doi.org/10.1109/TIT.1981 .1056305
-
[35]
(2012, March)
Paddrik, M., Hayes, R., Todd, A., Yang, S.Y., Beling, P.A., Scherer, W.T. (2012, March). An agent based model of the E-Mini S&P 500 applied to flash crash anal- ysis.2012 ieee conference on computational intelligence for financial engineering & economics (cifer 2012)(pp. 257–264). New York City, NY, USA
2012
-
[36]
Rambaldi, M., Bacry, E., Lillo, F. (2017). The role of volume in order book dynamics: a multivariate hawkes process analysis.Quantitative Finance,17(7), 999–1020, 46 https://doi.org/10.1080/14697688.2016.1260759
-
[37]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,1707.06347, arXiv:1707.06347, https://doi.org/10.48550/arXiv.1707.06347
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
-
[38]
(2000).Inefficient markets: An introduction to behavioral finance
Shleifer, A. (2000).Inefficient markets: An introduction to behavioral finance. Oxford: Oxford University Press
2000
-
[39]
Stoll, H.R. (1978). The supply of dealer services in securities markets.Journal of Finance,33(4), 1133–1151, https://doi.org/10.1111/j.1540-6261.1978.tb02053 .x
-
[40]
(2018).Reinforcement learning: An introduction(2nd ed.)
Sutton, R.S., & Barto, A.G. (2018).Reinforcement learning: An introduction(2nd ed.). The MIT Press. Szepesv´ ari, C. (2010).Algorithms for reinforcement learning. Morgan & Claypool Publishers
2018
-
[41]
Terry, J.K., Black, B.J., Grammel, N., Jayakumar, M., Hari, A., Sullivan, R., . . . Ravi, P. (2021). Pettingzoo: Gym for multi-agent reinforcement learning.Advances in neural information processing systems 34 (datasets and benchmarks track). Retrieved from https://openreview.net/forum?id=fLnsj7fpbPI
2021
-
[42]
Tinic, S.M. (1972). The economics of liquidity services.Quarterly Journal of Economics,86(1), 79–93, https://doi.org/10.2307/1880494
-
[43]
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A.M., Wu, Y. (2022). The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing systems 35(pp. 24611–24624). Retrieved from https://openreview.net/forum?id=YVXaxB6L2Pl 47
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.