Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery

Konrad Och\k{e}dzan; Nino Antulov-Fantulin

arxiv: 2606.05882 · v2 · pith:DSNJQQRDnew · submitted 2026-06-04 · 💱 q-fin.TR

Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery

Konrad Och\k{e}dzan , Nino Antulov-Fantulin This is my paper

Pith reviewed 2026-06-27 22:40 UTC · model grok-4.3

classification 💱 q-fin.TR

keywords market informednessmarket makersprofitabilityadverse selectionprice discoveryagent-based modelreinforcement learningliquidity provision

0 comments

The pith

As market informedness increases, market-maker profitability trends upward overall because informed trading aids price discovery enough to offset adverse selection costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an agent-based computational market with market makers that differ in information access and risk aversion, where prices form endogenously and order flow follows a self-exciting process. Agents learn strategies through multi-agent reinforcement learning. Simulations show that low informedness exposes makers to severe adverse selection from informed orders, but rising informedness produces an overall profit increase despite some local drops from market complexity. A sympathetic reader would care because the result suggests informed trading can deliver net benefits to liquidity providers rather than only costs.

Core claim

In this model with heterogeneous information sets, inventory-risk aversion, endogenous prices, exogenous fundamental values, and a state-dependent self-exciting market-taker order flow that satisfies finite-horizon stability, multi-agent reinforcement learning yields market-maker strategies under which profitability displays an overall upward trend with rising aggregate informedness, even amid local non-monotonicities from stochastic learning, indicating that price-discovery benefits can offset adverse-selection costs.

What carries the argument

Multi-agent reinforcement learning with centralized training and decentralized execution applied to market makers holding heterogeneous information sets inside an agent-based market that endogenously forms prices and maintains stable order-flow dynamics.

Load-bearing premise

The reinforcement learning produces strategies that match real-world market-making behavior under heterogeneous information and the order-flow process stays stable over the finite horizon.

What would settle it

Simulations at successively higher informedness levels that instead show flat or declining market-maker profitability without an overall upward trend would refute the central claim.

read the original abstract

This paper studies how market informedness affects market makers' profitability in a computational market environment with heterogeneous learning agents. We develop an agent-based market model in which market makers differ in their information sets and inventory-risk aversion, prices form endogenously, fundamental values evolve exogenously, and market-taker order flow follows a state-dependent self-exciting process. The model provides a controlled computational laboratory for analyzing the interaction between informed trading, adverse selection, price discovery, and liquidity provision. We establish finite-horizon stability properties of the market-taker order-flow process and solve the market-making problem using multi-agent reinforcement learning with centralized training and decentralized execution. The results show that informed market order flow is particularly harmful when aggregate market informedness is low, exposing market makers to severe adverse-selection risk. However, as market informedness increases, market-maker profitability displays an overall upward trend despite local non-monotonicities arising from complex market dynamics and stochastic learning. This suggests that the price-discovery benefits of informed trading can offset its adverse-selection costs. The findings contribute to computational economics by showing how agent heterogeneity, endogenous price formation, and learning-based liquidity provision jointly shape market outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new agent-based RL setup with self-exciting flow and heterogeneous info produces a simulation result of overall rising market-maker profits with informedness, but the RL training details are missing.

read the letter

The main thing to know is that this work builds a controlled simulation market with market makers who have different information sets and risk aversions, endogenous prices, exogenous fundamentals, and state-dependent self-exciting order flow from takers. They prove finite-horizon stability for that flow process and then use multi-agent RL with centralized training and decentralized execution to let the makers learn liquidity provision. The headline output is that maker profitability shows an overall upward trend as aggregate informedness rises, even with some local non-monotonic dips, which the authors read as price discovery benefits outweighing adverse selection costs.

The setup itself is the clearest addition. Mixing heterogeneous information, inventory control, and RL in one endogenous-price environment with a Hawkes-style flow is not standard in the microstructure simulation literature. The stability result is also concrete and useful for anyone wanting to run similar forward simulations.

The soft spot is the lack of any reported checks on whether the RL policies actually converged or whether the profitability trend holds across random seeds and hyperparameter choices. The abstract supplies no parameter values, no error bars, no ablation of the learning component, and no quantitative description of how profitability was measured. Without those, the claimed offset between adverse selection and price discovery rests on unverified simulation outputs.

This paper is aimed at researchers who already work with agent-based models or RL in quantitative finance. A reader interested in computational experiments on informed trading would get value from the model construction and the stability proof. It deserves a serious referee because the modeling choices are explicit and the question is well-posed, even though the current evidence is thin on the learning side.

Referee Report

2 major / 2 minor

Summary. The paper develops an agent-based computational market model with heterogeneous market makers (differing in information sets and inventory-risk aversion), endogenous price formation, exogenous fundamental values, and market-taker order flow governed by a state-dependent self-exciting process. Finite-horizon stability of the order-flow process is established, after which the market-making problem is solved via multi-agent reinforcement learning with centralized training and decentralized execution. The central claim is that informed order flow is especially harmful at low aggregate informedness (severe adverse selection), but as informedness rises, market-maker profitability exhibits an overall upward trend despite local non-monotonicities, implying that price-discovery benefits can offset adverse-selection costs.

Significance. If the simulation results prove robust to training variation, the work supplies a controlled computational laboratory for quantifying the adverse-selection versus price-discovery trade-off under agent heterogeneity and learning-based liquidity provision. This extends traditional microstructure models by endogenizing both prices and strategies through MARL, offering potential insights for market-design questions in computational finance.

major comments (2)

[The section describing the multi-agent reinforcement learning implementation and results] The headline result—an overall upward trend in market-maker profitability with rising informedness—rests on the multi-agent RL (CTDE) component producing reliable, near-optimal policies under heterogeneous information. The abstract and setup supply no quantitative evidence on policy convergence, variance across random seeds, training stability, or ablation against heuristic or optimal benchmarks. This is load-bearing because the claimed offset between price-discovery benefits and adverse-selection costs does not follow if the observed trend is an artifact of a particular training run or unstable learning.
[The section establishing finite-horizon stability of the order-flow process] The finite-horizon stability properties of the market-taker order-flow process are asserted as a prerequisite for the simulation framework, yet the abstract provides neither the explicit conditions under which stability holds nor a reference to the proof or derivation. Because the entire computational laboratory depends on this property, its verification is necessary to support the subsequent profitability claims.

minor comments (2)

The abstract would benefit from a concise statement of the number of market makers, the range of informedness levels simulated, and the precise definition of profitability used (e.g., whether it includes inventory penalties or only realized P&L).
Notation for the self-exciting process parameters and the inventory-risk aversion levels should be introduced consistently when first mentioned.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we will make.

read point-by-point responses

Referee: [The section describing the multi-agent reinforcement learning implementation and results] The headline result—an overall upward trend in market-maker profitability with rising informedness—rests on the multi-agent RL (CTDE) component producing reliable, near-optimal policies under heterogeneous information. The abstract and setup supply no quantitative evidence on policy convergence, variance across random seeds, training stability, or ablation against heuristic or optimal benchmarks. This is load-bearing because the claimed offset between price-discovery benefits and adverse-selection costs does not follow if the observed trend is an artifact of a particular training run or unstable learning.

Authors: We concur that the robustness of the MARL policies is critical for the validity of the headline result. The current manuscript focuses on the economic implications rather than the technical training details. In the revision, we will expand the relevant section to include quantitative metrics on policy convergence (e.g., reward curves), variance across at least five random seeds, and ablations comparing the learned policies to simple heuristic market-making strategies. This will confirm that the profitability trend is reliable. revision: yes
Referee: [The section establishing finite-horizon stability of the order-flow process] The finite-horizon stability properties of the market-taker order-flow process are asserted as a prerequisite for the simulation framework, yet the abstract provides neither the explicit conditions under which stability holds nor a reference to the proof or derivation. Because the entire computational laboratory depends on this property, its verification is necessary to support the subsequent profitability claims.

Authors: The stability properties are formally established in Section 3 of the manuscript, including the explicit conditions and the proof in the appendix. To address the concern, we will update the abstract to mention the key stability conditions and direct readers to the theorem and proof location. revision: yes

Circularity Check

0 steps flagged

No circularity: results emerge from forward simulation of RL agents

full rationale

The paper defines a market model with exogenous fundamentals, self-exciting order flow, and heterogeneous agents, then solves for market-maker policies via multi-agent RL (CTDE) and reports simulated profitability trends as informedness varies. No equations are shown that define a quantity in terms of itself or rename a fitted parameter as a prediction. Stability properties are stated as established prior to simulation rather than derived from the target result. No load-bearing self-citation chain or ansatz smuggling is present; the central claim is an observed simulation outcome, not a tautological reduction to inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of the order-flow process stability, the RL training procedure, and several unspecified parameters governing agent heterogeneity and learning; these are not independently verified in the provided abstract.

free parameters (2)

inventory-risk aversion levels
Market makers are described as differing in inventory-risk aversion, which must be parameterized in the model.
self-exciting process parameters
The state-dependent self-exciting order-flow process requires parameters that are not specified in the abstract.

axioms (1)

domain assumption finite-horizon stability properties of the market-taker order-flow process
The paper states that these properties are established as part of the model foundation.

pith-pipeline@v0.9.1-grok · 5744 in / 1088 out tokens · 28792 ms · 2026-06-27T22:40:38.592860+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 27 canonical work pages · 2 internal anchors

[1]

Amihud, Y., & Mendelson, H. (1980). Dealership market: Market-making with inven- tory.Journal of Financial Economics,8(1), 31–53, https://doi.org/10.1016/ 0304-405X(80)90020-3

1980
[2]

Avellaneda, M., & Stoikov, S. (2008). High-frequency trading in a limit order book.Quantitative Finance,8(3), 217–224, https://doi.org/10.1080/ 14697680701381228

2008
[3]

Bank, P., Cartea, ´A., K¨ orber, L. (2023). Optimal execution and speculation with trade signals.arXiv preprint arXiv:2306.00621,2306.00621, arXiv:2306.00621, https://doi.org/10.48550/arXiv.2306.00621

work page doi:10.48550/arxiv.2306.00621 2023
[4]

Barucci, E., Mathieu, A., Sanchez-Betancourt, L. (2025). Market making with fads, informed, and uninformed traders.arXiv preprint arXiv:2501.03658, , https:// doi.org/10.48550/arXiv.2501.03658

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.03658 2025
[5]

Black, F., & Scholes, M.S. (1973). The pricing of options and corporate liabilities. Journal of Political Economy,81(3), 637–654, https://doi.org/10.1086/260062

work page doi:10.1086/260062 1973
[6]

Bodor, H., & Carlier, L. (2025). Deep learning meets queue-reactive: A frame- work for realistic limit order book simulation.arXiv preprint arXiv:2501.08822, 2501.08822, arXiv:2501.08822, https://doi.org/10.48550/arXiv.2501.08822

work page doi:10.48550/arxiv.2501.08822 2025
[7]

Bormetti, G., Calcagnile, L.M., Treccani, M., Corsi, F., Marmi, S., Lillo, F. (2015). Modelling systemic price cojumps with Hawkes factor models.Quantitative Finance,15(7), 1137–1156, https://doi.org/10.1080/14697688.2014.996586

work page doi:10.1080/14697688.2014.996586 2015
[8]

Bowsher, C.G. (2007). Modelling security market events in continuous time: Intensity based, multivariate point process models.Journal of Econometrics,141(2), 876–912, https://doi.org/10.1016/j.jeconom.2006.11.007

work page doi:10.1016/j.jeconom.2006.11.007 2007
[9]

Campi, L., & Zabaljauregui, D. (2020). Optimal market making under partial infor- mation with general intensities.Applied Mathematical Finance,27(1-2), 1–45, https://doi.org/10.1080/1350486X.2020.1758587 43 Cartea, ´A., & Jaimungal, S. (2016). Incorporating order-flow into optimal execution. Mathematics and Financial Economics,10(3), 339–364, https://doi....

work page doi:10.1080/1350486x.2020.1758587 2020
[10]

Cartea, ´A., & Wang, Y

Cambridge, UK: Cambridge University Press. Cartea, ´A., & Wang, Y. (2020). Market making with alpha signals.International Journal of Theoretical and Applied Finance,23(3), 2050016, https://doi.org/ 10.1142/S0219024920500168

work page doi:10.1142/s0219024920500168 2020
[11]

Chakraborti, A., Muni Toke, I., Patriarca, M., Abergel, F. (2011). Econophysics review: I. empirical facts.Quantitative Finance,11(7), 991–1012, https:// doi.org/10.1080/14697688.2010.539248

work page doi:10.1080/14697688.2010.539248 2011
[12]

Cheridito, P., Dupret, J.-L., Wu, Z. (2025). Abides-marl: A multi-agent reinforcement learning environment for endogenous price formation and execution in a limit order book.arXiv preprint arXiv:2511.02016, , https://doi.org/10.48550/ arXiv.2511.02016

arXiv 2025
[13]

Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. (2014). Learning phrase representations using rnn encoder– decoder for statistical machine translation.Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp)(pp. 1724–1734)

2014
[14]

Cohen, K.J., Maier, S.F., Schwartz, R.A., Whitcomb, D.K. (1981). Transaction costs, order placement strategy, and existence of the bid-ask spread.Journal of Political Economy,89(2), 287–305, https://doi.org/10.1086/260966

work page doi:10.1086/260966 1981
[15]

Copeland, T.E., & Galai, D. (1983). Information effects on the bid-ask spread.Jour- nal of Finance,38(5), 1457–1469, https://doi.org/10.1111/j.1540-6261.1983 .tb03834.x

work page doi:10.1111/j.1540-6261.1983 1983
[16]

(2003).An introduction to the theory of point processes (2nd ed.)

Daley, D.J., & Vere-Jones, D. (2003).An introduction to the theory of point processes (2nd ed.). Springer

2003
[17]

Demsetz, H. (1968). The cost of transacting.Quarterly Journal of Economics,82(1), 33–53, https://doi.org/10.2307/1882244 44

work page doi:10.2307/1882244 1968
[18]

Drissi, F. (2022). Solvability of differential Riccati equations and applications to algorithmic trading with signals.Applied Mathematical Finance,29(6), 457–493, https://doi.org/10.1080/1350486X.2023.2241130

work page doi:10.1080/1350486x.2023.2241130 2022
[19]

Filimonov, V., & Sornette, D. (2015). Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data.Quantitative Finance,15(8), 1293–1314, https://doi.org/10 .1080/14697688.2015.1032544

arXiv 2015
[20]

Garman, M.B. (1976). Market microstructure.Journal of Financial Economics,3(3), 257–275, https://doi.org/10.1016/0304-405X(76)90006-4 Gaˇ sperov, B., Beguˇ si´ c, S., PosedelˇSimovi´ c, P., Kostanjˇ car, Z. (2021). Reinforce- ment learning approaches to optimal market making.Mathematics,9(21), 2689, https://doi.org/10.3390/math9212689

work page doi:10.1016/0304-405x(76)90006-4 1976
[21]

(2018).A crisis of beliefs: Investor psychology and financial fragility

Gennaioli, N., & Shleifer, A. (2018).A crisis of beliefs: Investor psychology and financial fragility. Princeton, NJ: Princeton University Press

2018
[22]

Glosten, L.R., & Milgrom, P.R. (1985). Bid, ask and transaction prices in a specialist market with heterogeneously informed traders.Journal of Financial Economics, 14(1), 71–100, https://doi.org/10.1016/0304-405X(85)90044-3

work page doi:10.1016/0304-405x(85)90044-3 1985
[23]

Gould, M.D., Porter, M.A., Williams, S., McDonald, M., Fenn, D.J., Howison, S.D. (2013). Limit order books.Quantitative Finance,13(11), 1709–1742, https:// doi.org/10.1080/14697688.2013.803148

work page doi:10.1080/14697688.2013.803148 2013
[24]

Hawkes, A.G. (1971). Spectra of some self-exciting and mutually exciting point processes.Biometrika,58(1), 83–90, https://doi.org/10.1093/biomet/58.1.83

work page doi:10.1093/biomet/58.1.83 1971
[25]

He, X.-Z., & Lin, S. (2022). Reinforcement learning equilibrium in limit order markets. Journal of Economic Dynamics and Control,144, 104497, https://doi.org/ 10.1016/j.jedc.2022.104497

work page doi:10.1016/j.jedc.2022.104497 2022
[26]

Ho, T.S.Y., & Stoll, H.R. (1981). Optimal dealer pricing under transactions and return uncertainty.Journal of Financial Economics,9(1), 47–73, https://doi.org/ 10.1016/0304-405X(81)90020-9 45

work page doi:10.1016/0304-405x(81)90020-9 1981
[27]

Ho, T.S.Y., & Stoll, H.R. (1983). The dynamics of dealer markets under com- petition.Journal of Finance,38(4), 1053–1074, https://doi.org/10.1111/ j.1540-6261.1983.tb02282.x

arXiv 1983
[28]

(2024, November)

Jain, K., Firoozye, N., Kochems, J., Treleaven, P. (2024, November). Limit order book dynamics and order size modelling using compound hawkes pro- cess.Finance Research Letters,69(Part A), 106157, https://doi.org/10.1016/ j.frl.2024.106157

arXiv 2024
[29]

Lehalle, C.-A., & Neuman, E. (2019). Incorporating signals into optimal trading. Finance and Stochastics,23(2), 275–311, https://doi.org/10.1007/s00780-019 -00382-7

work page doi:10.1007/s00780-019 2019
[30]

Stoica, I

Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., . . . Stoica, I. (2018). Rllib: Abstractions for distributed reinforcement learning.Proceed- ings of the 35th international conference on machine learning(pp. 3053–3062). Retrieved from https://proceedings.mlr.press/v80/liang18b.html

2018
[31]

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems 30 (neurips 2017)(pp. 6379–6390)

2017
[32]

Mildenstein, E., & Schleef, H.J. (1983). The optimal pricing policy of a monopolistic marketmaker in the equity market.Journal of Finance,38(1), 218–231, https:// doi.org/10.1111/j.1540-6261.1983.tb03637.x

work page doi:10.1111/j.1540-6261.1983.tb03637.x 1983
[33]

Neuman, E., & Voß, M. (2022). Optimal signal-adaptive trading with temporary and transient price-impact.SIAM Journal on Financial Mathematics,13(2), 551–575, https://doi.org/10.1137/20M1375486

work page doi:10.1137/20m1375486 2022
[34]

Ogata, Y. (1981). On Lewis’ simulation method for point processes.IEEE Transac- tions on Information Theory,27(1), 23–31, https://doi.org/10.1109/TIT.1981 .1056305

work page doi:10.1109/tit.1981 1981
[35]

(2012, March)

Paddrik, M., Hayes, R., Todd, A., Yang, S.Y., Beling, P.A., Scherer, W.T. (2012, March). An agent based model of the E-Mini S&P 500 applied to flash crash anal- ysis.2012 ieee conference on computational intelligence for financial engineering & economics (cifer 2012)(pp. 257–264). New York City, NY, USA

2012
[36]

Rambaldi, M., Bacry, E., Lillo, F. (2017). The role of volume in order book dynamics: a multivariate hawkes process analysis.Quantitative Finance,17(7), 999–1020, 46 https://doi.org/10.1080/14697688.2016.1260759

work page doi:10.1080/14697688.2016.1260759 2017
[37]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,1707.06347, arXiv:1707.06347, https://doi.org/10.48550/arXiv.1707.06347

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
[38]

(2000).Inefficient markets: An introduction to behavioral finance

Shleifer, A. (2000).Inefficient markets: An introduction to behavioral finance. Oxford: Oxford University Press

2000
[39]

Stoll, H.R. (1978). The supply of dealer services in securities markets.Journal of Finance,33(4), 1133–1151, https://doi.org/10.1111/j.1540-6261.1978.tb02053 .x

work page doi:10.1111/j.1540-6261.1978.tb02053 1978
[40]

(2018).Reinforcement learning: An introduction(2nd ed.)

Sutton, R.S., & Barto, A.G. (2018).Reinforcement learning: An introduction(2nd ed.). The MIT Press. Szepesv´ ari, C. (2010).Algorithms for reinforcement learning. Morgan & Claypool Publishers

2018
[41]

Terry, J.K., Black, B.J., Grammel, N., Jayakumar, M., Hari, A., Sullivan, R., . . . Ravi, P. (2021). Pettingzoo: Gym for multi-agent reinforcement learning.Advances in neural information processing systems 34 (datasets and benchmarks track). Retrieved from https://openreview.net/forum?id=fLnsj7fpbPI

2021
[42]

Tinic, S.M. (1972). The economics of liquidity services.Quarterly Journal of Economics,86(1), 79–93, https://doi.org/10.2307/1880494

work page doi:10.2307/1880494 1972
[43]

Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A.M., Wu, Y. (2022). The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing systems 35(pp. 24611–24624). Retrieved from https://openreview.net/forum?id=YVXaxB6L2Pl 47

2022

[1] [1]

Amihud, Y., & Mendelson, H. (1980). Dealership market: Market-making with inven- tory.Journal of Financial Economics,8(1), 31–53, https://doi.org/10.1016/ 0304-405X(80)90020-3

1980

[2] [2]

Avellaneda, M., & Stoikov, S. (2008). High-frequency trading in a limit order book.Quantitative Finance,8(3), 217–224, https://doi.org/10.1080/ 14697680701381228

2008

[3] [3]

Bank, P., Cartea, ´A., K¨ orber, L. (2023). Optimal execution and speculation with trade signals.arXiv preprint arXiv:2306.00621,2306.00621, arXiv:2306.00621, https://doi.org/10.48550/arXiv.2306.00621

work page doi:10.48550/arxiv.2306.00621 2023

[4] [4]

Barucci, E., Mathieu, A., Sanchez-Betancourt, L. (2025). Market making with fads, informed, and uninformed traders.arXiv preprint arXiv:2501.03658, , https:// doi.org/10.48550/arXiv.2501.03658

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.03658 2025

[5] [5]

Black, F., & Scholes, M.S. (1973). The pricing of options and corporate liabilities. Journal of Political Economy,81(3), 637–654, https://doi.org/10.1086/260062

work page doi:10.1086/260062 1973

[6] [6]

Bodor, H., & Carlier, L. (2025). Deep learning meets queue-reactive: A frame- work for realistic limit order book simulation.arXiv preprint arXiv:2501.08822, 2501.08822, arXiv:2501.08822, https://doi.org/10.48550/arXiv.2501.08822

work page doi:10.48550/arxiv.2501.08822 2025

[7] [7]

Bormetti, G., Calcagnile, L.M., Treccani, M., Corsi, F., Marmi, S., Lillo, F. (2015). Modelling systemic price cojumps with Hawkes factor models.Quantitative Finance,15(7), 1137–1156, https://doi.org/10.1080/14697688.2014.996586

work page doi:10.1080/14697688.2014.996586 2015

[8] [8]

Bowsher, C.G. (2007). Modelling security market events in continuous time: Intensity based, multivariate point process models.Journal of Econometrics,141(2), 876–912, https://doi.org/10.1016/j.jeconom.2006.11.007

work page doi:10.1016/j.jeconom.2006.11.007 2007

[9] [9]

Campi, L., & Zabaljauregui, D. (2020). Optimal market making under partial infor- mation with general intensities.Applied Mathematical Finance,27(1-2), 1–45, https://doi.org/10.1080/1350486X.2020.1758587 43 Cartea, ´A., & Jaimungal, S. (2016). Incorporating order-flow into optimal execution. Mathematics and Financial Economics,10(3), 339–364, https://doi....

work page doi:10.1080/1350486x.2020.1758587 2020

[10] [10]

Cartea, ´A., & Wang, Y

Cambridge, UK: Cambridge University Press. Cartea, ´A., & Wang, Y. (2020). Market making with alpha signals.International Journal of Theoretical and Applied Finance,23(3), 2050016, https://doi.org/ 10.1142/S0219024920500168

work page doi:10.1142/s0219024920500168 2020

[11] [11]

Chakraborti, A., Muni Toke, I., Patriarca, M., Abergel, F. (2011). Econophysics review: I. empirical facts.Quantitative Finance,11(7), 991–1012, https:// doi.org/10.1080/14697688.2010.539248

work page doi:10.1080/14697688.2010.539248 2011

[12] [12]

Cheridito, P., Dupret, J.-L., Wu, Z. (2025). Abides-marl: A multi-agent reinforcement learning environment for endogenous price formation and execution in a limit order book.arXiv preprint arXiv:2511.02016, , https://doi.org/10.48550/ arXiv.2511.02016

arXiv 2025

[13] [13]

Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. (2014). Learning phrase representations using rnn encoder– decoder for statistical machine translation.Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp)(pp. 1724–1734)

2014

[14] [14]

Cohen, K.J., Maier, S.F., Schwartz, R.A., Whitcomb, D.K. (1981). Transaction costs, order placement strategy, and existence of the bid-ask spread.Journal of Political Economy,89(2), 287–305, https://doi.org/10.1086/260966

work page doi:10.1086/260966 1981

[15] [15]

Copeland, T.E., & Galai, D. (1983). Information effects on the bid-ask spread.Jour- nal of Finance,38(5), 1457–1469, https://doi.org/10.1111/j.1540-6261.1983 .tb03834.x

work page doi:10.1111/j.1540-6261.1983 1983

[16] [16]

(2003).An introduction to the theory of point processes (2nd ed.)

Daley, D.J., & Vere-Jones, D. (2003).An introduction to the theory of point processes (2nd ed.). Springer

2003

[17] [17]

Demsetz, H. (1968). The cost of transacting.Quarterly Journal of Economics,82(1), 33–53, https://doi.org/10.2307/1882244 44

work page doi:10.2307/1882244 1968

[18] [18]

Drissi, F. (2022). Solvability of differential Riccati equations and applications to algorithmic trading with signals.Applied Mathematical Finance,29(6), 457–493, https://doi.org/10.1080/1350486X.2023.2241130

work page doi:10.1080/1350486x.2023.2241130 2022

[19] [19]

Filimonov, V., & Sornette, D. (2015). Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data.Quantitative Finance,15(8), 1293–1314, https://doi.org/10 .1080/14697688.2015.1032544

arXiv 2015

[20] [20]

Garman, M.B. (1976). Market microstructure.Journal of Financial Economics,3(3), 257–275, https://doi.org/10.1016/0304-405X(76)90006-4 Gaˇ sperov, B., Beguˇ si´ c, S., PosedelˇSimovi´ c, P., Kostanjˇ car, Z. (2021). Reinforce- ment learning approaches to optimal market making.Mathematics,9(21), 2689, https://doi.org/10.3390/math9212689

work page doi:10.1016/0304-405x(76)90006-4 1976

[21] [21]

(2018).A crisis of beliefs: Investor psychology and financial fragility

Gennaioli, N., & Shleifer, A. (2018).A crisis of beliefs: Investor psychology and financial fragility. Princeton, NJ: Princeton University Press

2018

[22] [22]

Glosten, L.R., & Milgrom, P.R. (1985). Bid, ask and transaction prices in a specialist market with heterogeneously informed traders.Journal of Financial Economics, 14(1), 71–100, https://doi.org/10.1016/0304-405X(85)90044-3

work page doi:10.1016/0304-405x(85)90044-3 1985

[23] [23]

Gould, M.D., Porter, M.A., Williams, S., McDonald, M., Fenn, D.J., Howison, S.D. (2013). Limit order books.Quantitative Finance,13(11), 1709–1742, https:// doi.org/10.1080/14697688.2013.803148

work page doi:10.1080/14697688.2013.803148 2013

[24] [24]

Hawkes, A.G. (1971). Spectra of some self-exciting and mutually exciting point processes.Biometrika,58(1), 83–90, https://doi.org/10.1093/biomet/58.1.83

work page doi:10.1093/biomet/58.1.83 1971

[25] [25]

He, X.-Z., & Lin, S. (2022). Reinforcement learning equilibrium in limit order markets. Journal of Economic Dynamics and Control,144, 104497, https://doi.org/ 10.1016/j.jedc.2022.104497

work page doi:10.1016/j.jedc.2022.104497 2022

[26] [26]

Ho, T.S.Y., & Stoll, H.R. (1981). Optimal dealer pricing under transactions and return uncertainty.Journal of Financial Economics,9(1), 47–73, https://doi.org/ 10.1016/0304-405X(81)90020-9 45

work page doi:10.1016/0304-405x(81)90020-9 1981

[27] [27]

Ho, T.S.Y., & Stoll, H.R. (1983). The dynamics of dealer markets under com- petition.Journal of Finance,38(4), 1053–1074, https://doi.org/10.1111/ j.1540-6261.1983.tb02282.x

arXiv 1983

[28] [28]

(2024, November)

Jain, K., Firoozye, N., Kochems, J., Treleaven, P. (2024, November). Limit order book dynamics and order size modelling using compound hawkes pro- cess.Finance Research Letters,69(Part A), 106157, https://doi.org/10.1016/ j.frl.2024.106157

arXiv 2024

[29] [29]

Lehalle, C.-A., & Neuman, E. (2019). Incorporating signals into optimal trading. Finance and Stochastics,23(2), 275–311, https://doi.org/10.1007/s00780-019 -00382-7

work page doi:10.1007/s00780-019 2019

[30] [30]

Stoica, I

Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., . . . Stoica, I. (2018). Rllib: Abstractions for distributed reinforcement learning.Proceed- ings of the 35th international conference on machine learning(pp. 3053–3062). Retrieved from https://proceedings.mlr.press/v80/liang18b.html

2018

[31] [31]

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems 30 (neurips 2017)(pp. 6379–6390)

2017

[32] [32]

Mildenstein, E., & Schleef, H.J. (1983). The optimal pricing policy of a monopolistic marketmaker in the equity market.Journal of Finance,38(1), 218–231, https:// doi.org/10.1111/j.1540-6261.1983.tb03637.x

work page doi:10.1111/j.1540-6261.1983.tb03637.x 1983

[33] [33]

Neuman, E., & Voß, M. (2022). Optimal signal-adaptive trading with temporary and transient price-impact.SIAM Journal on Financial Mathematics,13(2), 551–575, https://doi.org/10.1137/20M1375486

work page doi:10.1137/20m1375486 2022

[34] [34]

Ogata, Y. (1981). On Lewis’ simulation method for point processes.IEEE Transac- tions on Information Theory,27(1), 23–31, https://doi.org/10.1109/TIT.1981 .1056305

work page doi:10.1109/tit.1981 1981

[35] [35]

(2012, March)

Paddrik, M., Hayes, R., Todd, A., Yang, S.Y., Beling, P.A., Scherer, W.T. (2012, March). An agent based model of the E-Mini S&P 500 applied to flash crash anal- ysis.2012 ieee conference on computational intelligence for financial engineering & economics (cifer 2012)(pp. 257–264). New York City, NY, USA

2012

[36] [36]

Rambaldi, M., Bacry, E., Lillo, F. (2017). The role of volume in order book dynamics: a multivariate hawkes process analysis.Quantitative Finance,17(7), 999–1020, 46 https://doi.org/10.1080/14697688.2016.1260759

work page doi:10.1080/14697688.2016.1260759 2017

[37] [37]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,1707.06347, arXiv:1707.06347, https://doi.org/10.48550/arXiv.1707.06347

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017

[38] [38]

(2000).Inefficient markets: An introduction to behavioral finance

Shleifer, A. (2000).Inefficient markets: An introduction to behavioral finance. Oxford: Oxford University Press

2000

[39] [39]

Stoll, H.R. (1978). The supply of dealer services in securities markets.Journal of Finance,33(4), 1133–1151, https://doi.org/10.1111/j.1540-6261.1978.tb02053 .x

work page doi:10.1111/j.1540-6261.1978.tb02053 1978

[40] [40]

(2018).Reinforcement learning: An introduction(2nd ed.)

Sutton, R.S., & Barto, A.G. (2018).Reinforcement learning: An introduction(2nd ed.). The MIT Press. Szepesv´ ari, C. (2010).Algorithms for reinforcement learning. Morgan & Claypool Publishers

2018

[41] [41]

Terry, J.K., Black, B.J., Grammel, N., Jayakumar, M., Hari, A., Sullivan, R., . . . Ravi, P. (2021). Pettingzoo: Gym for multi-agent reinforcement learning.Advances in neural information processing systems 34 (datasets and benchmarks track). Retrieved from https://openreview.net/forum?id=fLnsj7fpbPI

2021

[42] [42]

Tinic, S.M. (1972). The economics of liquidity services.Quarterly Journal of Economics,86(1), 79–93, https://doi.org/10.2307/1880494

work page doi:10.2307/1880494 1972

[43] [43]

Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A.M., Wu, Y. (2022). The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing systems 35(pp. 24611–24624). Retrieved from https://openreview.net/forum?id=YVXaxB6L2Pl 47

2022