pith. sign in

arxiv: 1906.11046 · v1 · pith:N3V4LSV6new · submitted 2019-06-24 · 💱 q-fin.TR · cs.LG· stat.ML

Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis

Pith reviewed 2026-05-25 16:43 UTC · model grok-4.3

classification 💱 q-fin.TR cs.LGstat.ML
keywords liquidation strategymulti-agent reinforcement learningAlmgren-Chriss modelmarket impacttrading simulationdeep reinforcement learningoptimal execution
0
0 comments X

The pith

Multi-agent deep reinforcement learning extends the Almgren-Chriss model to generate liquidation strategies that handle market impact and risk.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that multi-agent deep reinforcement learning can solve the problem of liquidating large share positions more effectively than single-agent or traditional methods by modeling interactions among traders. It extends the Almgren and Chriss framework into a multi-agent trading environment so that agents learn sequential selling decisions while accounting for costs from market impact and risk aversion. Adjusting reward functions lets the model capture both cooperative and competitive dynamics between agents. Simulations then produce trading policies that respect practical constraints, demonstrating that reinforcement learning can address realistic liquidation tasks.

Core claim

The paper claims that extending the Almgren and Chriss model to a multi-agent setting supplies a suitable environment for deep reinforcement learning agents, which then learn optimal liquidation policies by balancing market impact costs against risk aversion; reward adjustments further allow analysis of cooperative versus competitive agent behaviors, and the resulting strategies satisfy practical trading constraints.

What carries the argument

The multi-agent deep reinforcement learning setup built on the extended Almgren-Chriss model, in which agents interact through shared market impact and risk terms to decide share sales over time.

If this is right

  • Agents can learn best selling decisions that incorporate high-level market complexities.
  • Adjusting rewards lets the model analyze cooperative and competitive behaviors among traders.
  • The extended Almgren-Chriss model supplies a foundation for future multi-agent trading studies.
  • Reinforcement learning methods can generate liquidation strategies that respect practical constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-agent structure could be tested on problems like simultaneous liquidation of correlated assets.
  • Competitive reward settings might model interactions between institutional traders and market makers.
  • If the learned policies transfer to live markets, they could reduce average slippage for large orders.
  • Adding order-book dynamics to the environment would provide a direct test of whether the current abstraction remains sufficient.

Load-bearing premise

The Almgren and Chriss model can be extended to multiple interacting agents while still accurately representing the essential effects of market impact and risk aversion.

What would settle it

Running the learned multi-agent policies on historical large-block liquidation data and finding that their total implementation shortfall and risk exposure do not improve on single-agent reinforcement learning or standard Almgren-Chriss schedules.

Figures

Figures reproduced from arXiv: 1906.11046 by Wenhang Bao, Xiao-Yang Liu.

Figure 1
Figure 1. Figure 1: Liquidation: multiple agents sell stocks in the market, and their selling decisions would affect each others’ selling cost research results or cutting-edge technologies. However, there are several challenges. First, liquidation of a large number of stock shares would have huge impact on the mar￾ket, making the environment difficult to predict. Secondly, current methods for static environment ignore the dyn… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of expected implementation shortfalls: there are three agents A, B1 and B2. The expected shortfall of agent A is higher than the sum of two expected shortfalls B1 and B2 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trading trajectory: comparing to their original trading trajectories, their current trading trajectories are closer to each other when they are trained in a multi-agent environment. single-agent scenario, where they can sell their shares inde￾pendently, now they have to take into consideration of other players in the market. The selling patterns of other agents would affect their liquidation strategy. The … view at source ↗
Figure 5
Figure 5. Figure 5: Trading trajectory: comparing to independent training, introducing a competitor makes the host agent learn to adapt to new environment and sell all shares of stock in the first two days. nus the higher reward, which is a negative value. As we can see from [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Liquidation is the process of selling a large number of shares of one stock sequentially within a given time frame, taking into consideration the costs arising from market impact and a trader's risk aversion. The main challenge in optimizing liquidation is to find an appropriate modeling system that can incorporate the complexities of the stock market and generate practical trading strategies. In this paper, we propose to use multi-agent deep reinforcement learning model, which better captures high-level complexities comparing to various machine learning methods, such that agents can learn how to make the best selling decisions. First, we theoretically analyze the Almgren and Chriss model and extend its fundamental mechanism so it can be used as the multi-agent trading environment. Our work builds the foundation for future multi-agent environment trading analysis. Secondly, we analyze the cooperative and competitive behaviours between agents by adjusting the reward functions for each agent, which overcomes the limitation of single-agent reinforcement learning algorithms. Finally, we simulate trading and develop an optimal trading strategy with practical constraints by using a reinforcement learning method, which shows the capabilities of reinforcement learning methods in solving realistic liquidation problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes extending the Almgren-Chriss liquidation model to a multi-agent setting as the basis for a multi-agent deep reinforcement learning environment. It claims this extension captures essential market-impact and risk dynamics, then uses reward-function adjustments to study cooperative versus competitive agent behavior, and finally simulates trading to produce optimal liquidation strategies under practical constraints.

Significance. A verified multi-agent extension of Almgren-Chriss that recovers the single-agent limit and yields reproducible RL policies would supply a useful test-bed for liquidation analysis. The abstract, however, supplies neither the required reduction check nor any numerical or analytic evidence that the claimed extension preserves the original dynamics, so the significance cannot be assessed from the provided text.

major comments (1)
  1. [Abstract, first contribution paragraph] Abstract, first contribution paragraph: the claim that the Almgren-Chriss model is extended 'so it can be used as the multi-agent trading environment' while still capturing 'essential market impact and risk dynamics' is load-bearing for the entire contribution, yet the manuscript provides no derivation or numerical verification that the multi-agent price-impact and optimal-trajectory equations reduce exactly to the classic single-agent Almgren-Chriss solution when all but one agent are removed. Without this reduction property the environment may contain spurious interaction terms that invalidate the fidelity claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript to incorporate the requested reduction verification.

read point-by-point responses
  1. Referee: [Abstract, first contribution paragraph] Abstract, first contribution paragraph: the claim that the Almgren-Chriss model is extended 'so it can be used as the multi-agent trading environment' while still capturing 'essential market impact and risk dynamics' is load-bearing for the entire contribution, yet the manuscript provides no derivation or numerical verification that the multi-agent price-impact and optimal-trajectory equations reduce exactly to the classic single-agent Almgren-Chriss solution when all but one agent are removed. Without this reduction property the environment may contain spurious interaction terms that invalidate the fidelity claim.

    Authors: We agree that an explicit reduction check is necessary to substantiate the fidelity claim. The manuscript's theoretical analysis extends the Almgren-Chriss mechanism to multiple agents but does not include the requested derivation or numerical verification of the single-agent limit. In the revised version we will add both an analytic derivation (showing that the multi-agent price-impact and trajectory equations recover the original Almgren-Chriss solution when all but one agent are removed) and a numerical confirmation that no spurious interaction terms remain. This addition will be placed in a new subsection or appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: multi-agent extension stated as theoretical analysis without reduction to fitted inputs or self-citations

full rationale

The abstract states that the authors 'theoretically analyze the Almgren and Chriss model and extend its fundamental mechanism' to multi-agent, then use RL with adjusted rewards to analyze behaviors and simulate strategies. No equations are provided that define the extension in terms of its own outputs, no parameters are fitted then relabeled as predictions, and no self-citations are invoked as load-bearing uniqueness theorems. The derivation chain therefore remains self-contained as an explicit modeling choice whose consistency with the single-agent limit is asserted rather than derived from prior fitted results within the paper itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger reflects the high-level claims. The central claim rests on the extendability of the Almgren-Chriss model and on the premise that reward tuning in DRL will produce realistic cooperative or competitive liquidation behavior. No numerical free parameters or new entities are explicitly introduced in the abstract.

free parameters (1)
  • reward function scaling parameters
    The abstract states that reward functions are adjusted to produce cooperative versus competitive agent behaviors; these scaling choices are free parameters chosen to achieve the desired interaction regime.
axioms (1)
  • domain assumption The fundamental mechanism of the Almgren-Chriss model remains valid when extended to a multi-agent trading environment.
    The abstract explicitly states that the model is theoretically analyzed and extended for use as the multi-agent environment.

pith-pipeline@v0.9.0 · 5718 in / 1274 out tokens · 36091 ms · 2026-05-25T16:43:07.036392+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 4 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    and Chriss, N

    Almgren, R. and Chriss, N. Optimal execution of portfolio transactions. Journal of Risk, 3: 0 5--40, 2001

  3. [3]

    Emergent Complexity via Multi-Agent Competition

    Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., and Mordatch, I. Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748, 2017

  4. [4]

    A., Brennan, T., Korajczyk, R., Mcdonald, R., and Vissing-jorgensen, A

    Brogaard, J. A., Brennan, T., Korajczyk, R., Mcdonald, R., and Vissing-jorgensen, A. High frequency trading and its impact on market quality, 2010

  5. [5]

    Deep hedging

    Buehler, H., Gonon, L., Teichmann, J., and Wood, B. Deep hedging. Quantitative Finance, pp.\ 1--21, 2019

  6. [6]

    H., Kohli, P., and Whiteson, S

    Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H., Kohli, P., and Whiteson, S. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.\ 1146--1155. JMLR. org, 2017

  7. [7]

    High-frequency trading

    Gomber, P., Arndt, B., Lutat, M., and Elko Uhle, T. High-frequency trading. SSRN Electronic Journal, 01 2011. doi:10.2139/ssrn.1858626

  8. [8]

    and Wilcox, D

    Hendricks, D. and Wilcox, D. A reinforcement learning extension to the almgren-chriss framework for optimal trade execution. In IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp.\ 457--464. IEEE, 2014

  9. [9]

    Risk management via anomaly circumvent: Mnemonic deep learning for midterm stock prediction

    Li, X., Li, Y., , Liu, X.-Y., and Wang, C. Risk management via anomaly circumvent: Mnemonic deep learning for midterm stock prediction. In KDD Workshop on Anomaly Detection in Finance, 2019 a

  10. [10]

    Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation

    Li, X., Li, Y., Zhan, Y., and Liu, X.-Y. Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation. In ICML Workshop on Applications and Infrastructure for Multi-Agent Learning, 2019 b

  11. [11]

    P., Hunt, J

    Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. ICLR, 2016

  12. [12]

    P., and Mordatch, I

    Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp.\ 6379--6390, 2017

  13. [13]

    Human-level control through deep reinforcement learning

    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, et al. Human-level control through deep reinforcement learning. Nature, 518 0 (7540): 0 529, 2015

  14. [14]

    P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K

    Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp.\ 1928--1937, 2016

  15. [15]

    P., and Vian, J

    Omidshafiei, S., Pazis, J., Amato, C., How, J. P., and Vian, J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.\ 2681--2690. JMLR. org, 2017

  16. [16]

    Prioritized Experience Replay

    Schaul, T., Quan, J., Antonoglou, I., and Silver, D. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

  17. [17]

    J., Guez, A., et al

    Silver, D., Huang, A., Maddison, C. J., Guez, A., et al. Mastering the game of go with deep neural networks and tree search. Nature, 529 0 (7587): 0 484, 2016

  18. [18]

    Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018

  19. [19]

    Multiagent cooperation and competition with deep reinforcement learning

    Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., and Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PloS One, 12 0 (4): 0 e0172395, 2017

  20. [20]

    Deep reinforcement learning with double q-learning

    Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Thirtieth AAAI Conference on Artificial Intelligence, 2016

  21. [21]

    Sample Efficient Actor-Critic with Experience Replay

    Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., and de Freitas, N. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224, 2016

  22. [22]

    Practical deep reinforcement learning approach for stock trading

    Xiong, Z., Liu, X.-Y., Zhong, S., Walid, A., et al. Practical deep reinforcement learning approach for stock trading. NeurlIPS Workshop on Challenges and Opportunities for AI in Financial Services, 2018

  23. [23]

    A practical machine learning approach for dynamic stock recommendation

    Yang, H., Liu, X.-Y., and Wu, Q. A practical machine learning approach for dynamic stock recommendation. In IEEE International Conference On Trust, Security And Privacy (TrustCom), pp.\ 1693--1697. IEEE, 2018 a

  24. [24]

    Mean field multi-agent reinforcement learning

    Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., and Wang, J. Mean field multi-agent reinforcement learning. In 35th International Conference on Machine Learning, ICML 2018, volume 80, pp.\ 5571--5580. PMLR, 2018 b

  25. [25]

    Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

    Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., and Dasgupta, S. Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv preprint arXiv:1901.08740, 2019