pith. sign in

arxiv: 2606.05882 · v2 · pith:DSNJQQRDnew · submitted 2026-06-04 · 💱 q-fin.TR

Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery

Pith reviewed 2026-06-27 22:40 UTC · model grok-4.3

classification 💱 q-fin.TR
keywords market informednessmarket makersprofitabilityadverse selectionprice discoveryagent-based modelreinforcement learningliquidity provision
0
0 comments X

The pith

As market informedness increases, market-maker profitability trends upward overall because informed trading aids price discovery enough to offset adverse selection costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an agent-based computational market with market makers that differ in information access and risk aversion, where prices form endogenously and order flow follows a self-exciting process. Agents learn strategies through multi-agent reinforcement learning. Simulations show that low informedness exposes makers to severe adverse selection from informed orders, but rising informedness produces an overall profit increase despite some local drops from market complexity. A sympathetic reader would care because the result suggests informed trading can deliver net benefits to liquidity providers rather than only costs.

Core claim

In this model with heterogeneous information sets, inventory-risk aversion, endogenous prices, exogenous fundamental values, and a state-dependent self-exciting market-taker order flow that satisfies finite-horizon stability, multi-agent reinforcement learning yields market-maker strategies under which profitability displays an overall upward trend with rising aggregate informedness, even amid local non-monotonicities from stochastic learning, indicating that price-discovery benefits can offset adverse-selection costs.

What carries the argument

Multi-agent reinforcement learning with centralized training and decentralized execution applied to market makers holding heterogeneous information sets inside an agent-based market that endogenously forms prices and maintains stable order-flow dynamics.

Load-bearing premise

The reinforcement learning produces strategies that match real-world market-making behavior under heterogeneous information and the order-flow process stays stable over the finite horizon.

What would settle it

Simulations at successively higher informedness levels that instead show flat or declining market-maker profitability without an overall upward trend would refute the central claim.

read the original abstract

This paper studies how market informedness affects market makers' profitability in a computational market environment with heterogeneous learning agents. We develop an agent-based market model in which market makers differ in their information sets and inventory-risk aversion, prices form endogenously, fundamental values evolve exogenously, and market-taker order flow follows a state-dependent self-exciting process. The model provides a controlled computational laboratory for analyzing the interaction between informed trading, adverse selection, price discovery, and liquidity provision. We establish finite-horizon stability properties of the market-taker order-flow process and solve the market-making problem using multi-agent reinforcement learning with centralized training and decentralized execution. The results show that informed market order flow is particularly harmful when aggregate market informedness is low, exposing market makers to severe adverse-selection risk. However, as market informedness increases, market-maker profitability displays an overall upward trend despite local non-monotonicities arising from complex market dynamics and stochastic learning. This suggests that the price-discovery benefits of informed trading can offset its adverse-selection costs. The findings contribute to computational economics by showing how agent heterogeneity, endogenous price formation, and learning-based liquidity provision jointly shape market outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops an agent-based computational market model with heterogeneous market makers (differing in information sets and inventory-risk aversion), endogenous price formation, exogenous fundamental values, and market-taker order flow governed by a state-dependent self-exciting process. Finite-horizon stability of the order-flow process is established, after which the market-making problem is solved via multi-agent reinforcement learning with centralized training and decentralized execution. The central claim is that informed order flow is especially harmful at low aggregate informedness (severe adverse selection), but as informedness rises, market-maker profitability exhibits an overall upward trend despite local non-monotonicities, implying that price-discovery benefits can offset adverse-selection costs.

Significance. If the simulation results prove robust to training variation, the work supplies a controlled computational laboratory for quantifying the adverse-selection versus price-discovery trade-off under agent heterogeneity and learning-based liquidity provision. This extends traditional microstructure models by endogenizing both prices and strategies through MARL, offering potential insights for market-design questions in computational finance.

major comments (2)
  1. [The section describing the multi-agent reinforcement learning implementation and results] The headline result—an overall upward trend in market-maker profitability with rising informedness—rests on the multi-agent RL (CTDE) component producing reliable, near-optimal policies under heterogeneous information. The abstract and setup supply no quantitative evidence on policy convergence, variance across random seeds, training stability, or ablation against heuristic or optimal benchmarks. This is load-bearing because the claimed offset between price-discovery benefits and adverse-selection costs does not follow if the observed trend is an artifact of a particular training run or unstable learning.
  2. [The section establishing finite-horizon stability of the order-flow process] The finite-horizon stability properties of the market-taker order-flow process are asserted as a prerequisite for the simulation framework, yet the abstract provides neither the explicit conditions under which stability holds nor a reference to the proof or derivation. Because the entire computational laboratory depends on this property, its verification is necessary to support the subsequent profitability claims.
minor comments (2)
  1. The abstract would benefit from a concise statement of the number of market makers, the range of informedness levels simulated, and the precise definition of profitability used (e.g., whether it includes inventory penalties or only realized P&L).
  2. Notation for the self-exciting process parameters and the inventory-risk aversion levels should be introduced consistently when first mentioned.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: [The section describing the multi-agent reinforcement learning implementation and results] The headline result—an overall upward trend in market-maker profitability with rising informedness—rests on the multi-agent RL (CTDE) component producing reliable, near-optimal policies under heterogeneous information. The abstract and setup supply no quantitative evidence on policy convergence, variance across random seeds, training stability, or ablation against heuristic or optimal benchmarks. This is load-bearing because the claimed offset between price-discovery benefits and adverse-selection costs does not follow if the observed trend is an artifact of a particular training run or unstable learning.

    Authors: We concur that the robustness of the MARL policies is critical for the validity of the headline result. The current manuscript focuses on the economic implications rather than the technical training details. In the revision, we will expand the relevant section to include quantitative metrics on policy convergence (e.g., reward curves), variance across at least five random seeds, and ablations comparing the learned policies to simple heuristic market-making strategies. This will confirm that the profitability trend is reliable. revision: yes

  2. Referee: [The section establishing finite-horizon stability of the order-flow process] The finite-horizon stability properties of the market-taker order-flow process are asserted as a prerequisite for the simulation framework, yet the abstract provides neither the explicit conditions under which stability holds nor a reference to the proof or derivation. Because the entire computational laboratory depends on this property, its verification is necessary to support the subsequent profitability claims.

    Authors: The stability properties are formally established in Section 3 of the manuscript, including the explicit conditions and the proof in the appendix. To address the concern, we will update the abstract to mention the key stability conditions and direct readers to the theorem and proof location. revision: yes

Circularity Check

0 steps flagged

No circularity: results emerge from forward simulation of RL agents

full rationale

The paper defines a market model with exogenous fundamentals, self-exciting order flow, and heterogeneous agents, then solves for market-maker policies via multi-agent RL (CTDE) and reports simulated profitability trends as informedness varies. No equations are shown that define a quantity in terms of itself or rename a fitted parameter as a prediction. Stability properties are stated as established prior to simulation rather than derived from the target result. No load-bearing self-citation chain or ansatz smuggling is present; the central claim is an observed simulation outcome, not a tautological reduction to inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of the order-flow process stability, the RL training procedure, and several unspecified parameters governing agent heterogeneity and learning; these are not independently verified in the provided abstract.

free parameters (2)
  • inventory-risk aversion levels
    Market makers are described as differing in inventory-risk aversion, which must be parameterized in the model.
  • self-exciting process parameters
    The state-dependent self-exciting order-flow process requires parameters that are not specified in the abstract.
axioms (1)
  • domain assumption finite-horizon stability properties of the market-taker order-flow process
    The paper states that these properties are established as part of the model foundation.

pith-pipeline@v0.9.1-grok · 5744 in / 1088 out tokens · 28792 ms · 2026-06-27T22:40:38.592860+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 27 canonical work pages · 2 internal anchors

  1. [1]

    Amihud, Y., & Mendelson, H. (1980). Dealership market: Market-making with inven- tory.Journal of Financial Economics,8(1), 31–53, https://doi.org/10.1016/ 0304-405X(80)90020-3

  2. [2]

    Avellaneda, M., & Stoikov, S. (2008). High-frequency trading in a limit order book.Quantitative Finance,8(3), 217–224, https://doi.org/10.1080/ 14697680701381228

  3. [3]

    Bank, P., Cartea, ´A., K¨ orber, L. (2023). Optimal execution and speculation with trade signals.arXiv preprint arXiv:2306.00621,2306.00621, arXiv:2306.00621, https://doi.org/10.48550/arXiv.2306.00621

  4. [4]

    Barucci, E., Mathieu, A., Sanchez-Betancourt, L. (2025). Market making with fads, informed, and uninformed traders.arXiv preprint arXiv:2501.03658, , https:// doi.org/10.48550/arXiv.2501.03658

  5. [5]

    Black, F., & Scholes, M.S. (1973). The pricing of options and corporate liabilities. Journal of Political Economy,81(3), 637–654, https://doi.org/10.1086/260062

  6. [6]

    Bodor, H., & Carlier, L. (2025). Deep learning meets queue-reactive: A frame- work for realistic limit order book simulation.arXiv preprint arXiv:2501.08822, 2501.08822, arXiv:2501.08822, https://doi.org/10.48550/arXiv.2501.08822

  7. [7]

    Bormetti, G., Calcagnile, L.M., Treccani, M., Corsi, F., Marmi, S., Lillo, F. (2015). Modelling systemic price cojumps with Hawkes factor models.Quantitative Finance,15(7), 1137–1156, https://doi.org/10.1080/14697688.2014.996586

  8. [8]

    Bowsher, C.G. (2007). Modelling security market events in continuous time: Intensity based, multivariate point process models.Journal of Econometrics,141(2), 876–912, https://doi.org/10.1016/j.jeconom.2006.11.007

  9. [9]

    Campi, L., & Zabaljauregui, D. (2020). Optimal market making under partial infor- mation with general intensities.Applied Mathematical Finance,27(1-2), 1–45, https://doi.org/10.1080/1350486X.2020.1758587 43 Cartea, ´A., & Jaimungal, S. (2016). Incorporating order-flow into optimal execution. Mathematics and Financial Economics,10(3), 339–364, https://doi....

  10. [10]

    Cartea, ´A., & Wang, Y

    Cambridge, UK: Cambridge University Press. Cartea, ´A., & Wang, Y. (2020). Market making with alpha signals.International Journal of Theoretical and Applied Finance,23(3), 2050016, https://doi.org/ 10.1142/S0219024920500168

  11. [11]

    Chakraborti, A., Muni Toke, I., Patriarca, M., Abergel, F. (2011). Econophysics review: I. empirical facts.Quantitative Finance,11(7), 991–1012, https:// doi.org/10.1080/14697688.2010.539248

  12. [12]

    Cheridito, P., Dupret, J.-L., Wu, Z. (2025). Abides-marl: A multi-agent reinforcement learning environment for endogenous price formation and execution in a limit order book.arXiv preprint arXiv:2511.02016, , https://doi.org/10.48550/ arXiv.2511.02016

  13. [13]

    Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. (2014). Learning phrase representations using rnn encoder– decoder for statistical machine translation.Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp)(pp. 1724–1734)

  14. [14]

    Cohen, K.J., Maier, S.F., Schwartz, R.A., Whitcomb, D.K. (1981). Transaction costs, order placement strategy, and existence of the bid-ask spread.Journal of Political Economy,89(2), 287–305, https://doi.org/10.1086/260966

  15. [15]

    Copeland, T.E., & Galai, D. (1983). Information effects on the bid-ask spread.Jour- nal of Finance,38(5), 1457–1469, https://doi.org/10.1111/j.1540-6261.1983 .tb03834.x

  16. [16]

    (2003).An introduction to the theory of point processes (2nd ed.)

    Daley, D.J., & Vere-Jones, D. (2003).An introduction to the theory of point processes (2nd ed.). Springer

  17. [17]

    Demsetz, H. (1968). The cost of transacting.Quarterly Journal of Economics,82(1), 33–53, https://doi.org/10.2307/1882244 44

  18. [18]

    Drissi, F. (2022). Solvability of differential Riccati equations and applications to algorithmic trading with signals.Applied Mathematical Finance,29(6), 457–493, https://doi.org/10.1080/1350486X.2023.2241130

  19. [19]

    Filimonov, V., & Sornette, D. (2015). Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data.Quantitative Finance,15(8), 1293–1314, https://doi.org/10 .1080/14697688.2015.1032544

  20. [20]

    Garman, M.B. (1976). Market microstructure.Journal of Financial Economics,3(3), 257–275, https://doi.org/10.1016/0304-405X(76)90006-4 Gaˇ sperov, B., Beguˇ si´ c, S., PosedelˇSimovi´ c, P., Kostanjˇ car, Z. (2021). Reinforce- ment learning approaches to optimal market making.Mathematics,9(21), 2689, https://doi.org/10.3390/math9212689

  21. [21]

    (2018).A crisis of beliefs: Investor psychology and financial fragility

    Gennaioli, N., & Shleifer, A. (2018).A crisis of beliefs: Investor psychology and financial fragility. Princeton, NJ: Princeton University Press

  22. [22]

    Glosten, L.R., & Milgrom, P.R. (1985). Bid, ask and transaction prices in a specialist market with heterogeneously informed traders.Journal of Financial Economics, 14(1), 71–100, https://doi.org/10.1016/0304-405X(85)90044-3

  23. [23]

    Gould, M.D., Porter, M.A., Williams, S., McDonald, M., Fenn, D.J., Howison, S.D. (2013). Limit order books.Quantitative Finance,13(11), 1709–1742, https:// doi.org/10.1080/14697688.2013.803148

  24. [24]

    Hawkes, A.G. (1971). Spectra of some self-exciting and mutually exciting point processes.Biometrika,58(1), 83–90, https://doi.org/10.1093/biomet/58.1.83

  25. [25]

    He, X.-Z., & Lin, S. (2022). Reinforcement learning equilibrium in limit order markets. Journal of Economic Dynamics and Control,144, 104497, https://doi.org/ 10.1016/j.jedc.2022.104497

  26. [26]

    Ho, T.S.Y., & Stoll, H.R. (1981). Optimal dealer pricing under transactions and return uncertainty.Journal of Financial Economics,9(1), 47–73, https://doi.org/ 10.1016/0304-405X(81)90020-9 45

  27. [27]

    Ho, T.S.Y., & Stoll, H.R. (1983). The dynamics of dealer markets under com- petition.Journal of Finance,38(4), 1053–1074, https://doi.org/10.1111/ j.1540-6261.1983.tb02282.x

  28. [28]

    (2024, November)

    Jain, K., Firoozye, N., Kochems, J., Treleaven, P. (2024, November). Limit order book dynamics and order size modelling using compound hawkes pro- cess.Finance Research Letters,69(Part A), 106157, https://doi.org/10.1016/ j.frl.2024.106157

  29. [29]

    Lehalle, C.-A., & Neuman, E. (2019). Incorporating signals into optimal trading. Finance and Stochastics,23(2), 275–311, https://doi.org/10.1007/s00780-019 -00382-7

  30. [30]

    Stoica, I

    Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., . . . Stoica, I. (2018). Rllib: Abstractions for distributed reinforcement learning.Proceed- ings of the 35th international conference on machine learning(pp. 3053–3062). Retrieved from https://proceedings.mlr.press/v80/liang18b.html

  31. [31]

    Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems 30 (neurips 2017)(pp. 6379–6390)

  32. [32]

    Mildenstein, E., & Schleef, H.J. (1983). The optimal pricing policy of a monopolistic marketmaker in the equity market.Journal of Finance,38(1), 218–231, https:// doi.org/10.1111/j.1540-6261.1983.tb03637.x

  33. [33]

    Neuman, E., & Voß, M. (2022). Optimal signal-adaptive trading with temporary and transient price-impact.SIAM Journal on Financial Mathematics,13(2), 551–575, https://doi.org/10.1137/20M1375486

  34. [34]

    Ogata, Y. (1981). On Lewis’ simulation method for point processes.IEEE Transac- tions on Information Theory,27(1), 23–31, https://doi.org/10.1109/TIT.1981 .1056305

  35. [35]

    (2012, March)

    Paddrik, M., Hayes, R., Todd, A., Yang, S.Y., Beling, P.A., Scherer, W.T. (2012, March). An agent based model of the E-Mini S&P 500 applied to flash crash anal- ysis.2012 ieee conference on computational intelligence for financial engineering & economics (cifer 2012)(pp. 257–264). New York City, NY, USA

  36. [36]

    Rambaldi, M., Bacry, E., Lillo, F. (2017). The role of volume in order book dynamics: a multivariate hawkes process analysis.Quantitative Finance,17(7), 999–1020, 46 https://doi.org/10.1080/14697688.2016.1260759

  37. [37]

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,1707.06347, arXiv:1707.06347, https://doi.org/10.48550/arXiv.1707.06347

  38. [38]

    (2000).Inefficient markets: An introduction to behavioral finance

    Shleifer, A. (2000).Inefficient markets: An introduction to behavioral finance. Oxford: Oxford University Press

  39. [39]

    Stoll, H.R. (1978). The supply of dealer services in securities markets.Journal of Finance,33(4), 1133–1151, https://doi.org/10.1111/j.1540-6261.1978.tb02053 .x

  40. [40]

    (2018).Reinforcement learning: An introduction(2nd ed.)

    Sutton, R.S., & Barto, A.G. (2018).Reinforcement learning: An introduction(2nd ed.). The MIT Press. Szepesv´ ari, C. (2010).Algorithms for reinforcement learning. Morgan & Claypool Publishers

  41. [41]

    Terry, J.K., Black, B.J., Grammel, N., Jayakumar, M., Hari, A., Sullivan, R., . . . Ravi, P. (2021). Pettingzoo: Gym for multi-agent reinforcement learning.Advances in neural information processing systems 34 (datasets and benchmarks track). Retrieved from https://openreview.net/forum?id=fLnsj7fpbPI

  42. [42]

    Tinic, S.M. (1972). The economics of liquidity services.Quarterly Journal of Economics,86(1), 79–93, https://doi.org/10.2307/1880494

  43. [43]

    Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A.M., Wu, Y. (2022). The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing systems 35(pp. 24611–24624). Retrieved from https://openreview.net/forum?id=YVXaxB6L2Pl 47