Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis

Wenhang Bao; Xiao-Yang Liu

arxiv: 1906.11046 · v1 · pith:N3V4LSV6new · submitted 2019-06-24 · 💱 q-fin.TR · cs.LG· stat.ML

Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis

Wenhang Bao , Xiao-Yang Liu This is my paper

Pith reviewed 2026-05-25 16:43 UTC · model grok-4.3

classification 💱 q-fin.TR cs.LGstat.ML

keywords liquidation strategymulti-agent reinforcement learningAlmgren-Chriss modelmarket impacttrading simulationdeep reinforcement learningoptimal execution

0 comments

The pith

Multi-agent deep reinforcement learning extends the Almgren-Chriss model to generate liquidation strategies that handle market impact and risk.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that multi-agent deep reinforcement learning can solve the problem of liquidating large share positions more effectively than single-agent or traditional methods by modeling interactions among traders. It extends the Almgren and Chriss framework into a multi-agent trading environment so that agents learn sequential selling decisions while accounting for costs from market impact and risk aversion. Adjusting reward functions lets the model capture both cooperative and competitive dynamics between agents. Simulations then produce trading policies that respect practical constraints, demonstrating that reinforcement learning can address realistic liquidation tasks.

Core claim

The paper claims that extending the Almgren and Chriss model to a multi-agent setting supplies a suitable environment for deep reinforcement learning agents, which then learn optimal liquidation policies by balancing market impact costs against risk aversion; reward adjustments further allow analysis of cooperative versus competitive agent behaviors, and the resulting strategies satisfy practical trading constraints.

What carries the argument

The multi-agent deep reinforcement learning setup built on the extended Almgren-Chriss model, in which agents interact through shared market impact and risk terms to decide share sales over time.

If this is right

Agents can learn best selling decisions that incorporate high-level market complexities.
Adjusting rewards lets the model analyze cooperative and competitive behaviors among traders.
The extended Almgren-Chriss model supplies a foundation for future multi-agent trading studies.
Reinforcement learning methods can generate liquidation strategies that respect practical constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multi-agent structure could be tested on problems like simultaneous liquidation of correlated assets.
Competitive reward settings might model interactions between institutional traders and market makers.
If the learned policies transfer to live markets, they could reduce average slippage for large orders.
Adding order-book dynamics to the environment would provide a direct test of whether the current abstraction remains sufficient.

Load-bearing premise

The Almgren and Chriss model can be extended to multiple interacting agents while still accurately representing the essential effects of market impact and risk aversion.

What would settle it

Running the learned multi-agent policies on historical large-block liquidation data and finding that their total implementation shortfall and risk exposure do not improve on single-agent reinforcement learning or standard Almgren-Chriss schedules.

Figures

Figures reproduced from arXiv: 1906.11046 by Wenhang Bao, Xiao-Yang Liu.

**Figure 1.** Figure 1: Liquidation: multiple agents sell stocks in the market, and their selling decisions would affect each others’ selling cost research results or cutting-edge technologies. However, there are several challenges. First, liquidation of a large number of stock shares would have huge impact on the market, making the environment difficult to predict. Secondly, current methods for static environment ignore the dyn… view at source ↗

**Figure 2.** Figure 2: Comparison of expected implementation shortfalls: there are three agents A, B1 and B2. The expected shortfall of agent A is higher than the sum of two expected shortfalls B1 and B2 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Trading trajectory: comparing to their original trading trajectories, their current trading trajectories are closer to each other when they are trained in a multi-agent environment. single-agent scenario, where they can sell their shares independently, now they have to take into consideration of other players in the market. The selling patterns of other agents would affect their liquidation strategy. The … view at source ↗

**Figure 5.** Figure 5: Trading trajectory: comparing to independent training, introducing a competitor makes the host agent learn to adapt to new environment and sell all shares of stock in the first two days. nus the higher reward, which is a negative value. As we can see from [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Liquidation is the process of selling a large number of shares of one stock sequentially within a given time frame, taking into consideration the costs arising from market impact and a trader's risk aversion. The main challenge in optimizing liquidation is to find an appropriate modeling system that can incorporate the complexities of the stock market and generate practical trading strategies. In this paper, we propose to use multi-agent deep reinforcement learning model, which better captures high-level complexities comparing to various machine learning methods, such that agents can learn how to make the best selling decisions. First, we theoretically analyze the Almgren and Chriss model and extend its fundamental mechanism so it can be used as the multi-agent trading environment. Our work builds the foundation for future multi-agent environment trading analysis. Secondly, we analyze the cooperative and competitive behaviours between agents by adjusting the reward functions for each agent, which overcomes the limitation of single-agent reinforcement learning algorithms. Finally, we simulate trading and develop an optimal trading strategy with practical constraints by using a reinforcement learning method, which shows the capabilities of reinforcement learning methods in solving realistic liquidation problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends Almgren-Chriss to a multi-agent RL setting for liquidation and tunes rewards for cooperation or competition, but supplies no derivations or checks that the extension preserves the original single-agent dynamics.

read the letter

This paper takes the Almgren-Chriss liquidation model and extends it to multiple agents so they can trade in the same environment using deep reinforcement learning. The main move is to adjust each agent's reward function to produce either cooperative or competitive behavior, which is presented as a way around the limits of single-agent RL. They also run simulations that include practical constraints and claim this shows RL can handle realistic liquidation problems. That combination of an extended environment plus reward tuning is the concrete new element here, and it sits on top of a well-known single-agent baseline. The intent to study interaction between agents is reasonable for execution problems where multiple traders can affect the same stock. The simulation step tries to keep things grounded in constraints that matter in practice. The central weakness is that none of the supporting material is visible. There are no equations for the multi-agent price-impact or risk terms, and no test showing that the multi-agent version reduces exactly to the classic Almgren-Chriss solution when all but one agent are removed. Without that reduction property, it is unclear whether the extension adds spurious interaction effects that change the original market-impact and risk dynamics. The abstract also gives no performance numbers, no comparison to single-agent baselines, and no details on how the learned strategies were validated. This leaves the claim that the method solves realistic problems without evidence that can be checked. The work is aimed at people already working on reinforcement learning for quantitative execution. A reader who knows Almgren-Chriss and wants to explore multi-agent variants might pick up the reward-adjustment idea as a starting point, but the paper would need the missing derivations and verification results before it could be used for anything further. I would not bring this to a reading group unless the group is narrowly focused on RL applications in trading. I would not cite it because there is no reproducible result or verified extension to build on. It does not deserve peer review until the model details and the single-agent reduction check are supplied.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes extending the Almgren-Chriss liquidation model to a multi-agent setting as the basis for a multi-agent deep reinforcement learning environment. It claims this extension captures essential market-impact and risk dynamics, then uses reward-function adjustments to study cooperative versus competitive agent behavior, and finally simulates trading to produce optimal liquidation strategies under practical constraints.

Significance. A verified multi-agent extension of Almgren-Chriss that recovers the single-agent limit and yields reproducible RL policies would supply a useful test-bed for liquidation analysis. The abstract, however, supplies neither the required reduction check nor any numerical or analytic evidence that the claimed extension preserves the original dynamics, so the significance cannot be assessed from the provided text.

major comments (1)

[Abstract, first contribution paragraph] Abstract, first contribution paragraph: the claim that the Almgren-Chriss model is extended 'so it can be used as the multi-agent trading environment' while still capturing 'essential market impact and risk dynamics' is load-bearing for the entire contribution, yet the manuscript provides no derivation or numerical verification that the multi-agent price-impact and optimal-trajectory equations reduce exactly to the classic single-agent Almgren-Chriss solution when all but one agent are removed. Without this reduction property the environment may contain spurious interaction terms that invalidate the fidelity claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript to incorporate the requested reduction verification.

read point-by-point responses

Referee: [Abstract, first contribution paragraph] Abstract, first contribution paragraph: the claim that the Almgren-Chriss model is extended 'so it can be used as the multi-agent trading environment' while still capturing 'essential market impact and risk dynamics' is load-bearing for the entire contribution, yet the manuscript provides no derivation or numerical verification that the multi-agent price-impact and optimal-trajectory equations reduce exactly to the classic single-agent Almgren-Chriss solution when all but one agent are removed. Without this reduction property the environment may contain spurious interaction terms that invalidate the fidelity claim.

Authors: We agree that an explicit reduction check is necessary to substantiate the fidelity claim. The manuscript's theoretical analysis extends the Almgren-Chriss mechanism to multiple agents but does not include the requested derivation or numerical verification of the single-agent limit. In the revised version we will add both an analytic derivation (showing that the multi-agent price-impact and trajectory equations recover the original Almgren-Chriss solution when all but one agent are removed) and a numerical confirmation that no spurious interaction terms remain. This addition will be placed in a new subsection or appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: multi-agent extension stated as theoretical analysis without reduction to fitted inputs or self-citations

full rationale

The abstract states that the authors 'theoretically analyze the Almgren and Chriss model and extend its fundamental mechanism' to multi-agent, then use RL with adjusted rewards to analyze behaviors and simulate strategies. No equations are provided that define the extension in terms of its own outputs, no parameters are fitted then relabeled as predictions, and no self-citations are invoked as load-bearing uniqueness theorems. The derivation chain therefore remains self-contained as an explicit modeling choice whose consistency with the single-agent limit is asserted rather than derived from prior fitted results within the paper itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger reflects the high-level claims. The central claim rests on the extendability of the Almgren-Chriss model and on the premise that reward tuning in DRL will produce realistic cooperative or competitive liquidation behavior. No numerical free parameters or new entities are explicitly introduced in the abstract.

free parameters (1)

reward function scaling parameters
The abstract states that reward functions are adjusted to produce cooperative versus competitive agent behaviors; these scaling choices are free parameters chosen to achieve the desired interaction regime.

axioms (1)

domain assumption The fundamental mechanism of the Almgren-Chriss model remains valid when extended to a multi-agent trading environment.
The abstract explicitly states that the model is theoretically analyzed and extended for use as the multi-agent environment.

pith-pipeline@v0.9.0 · 5718 in / 1274 out tokens · 36091 ms · 2026-05-25T16:43:07.036392+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 4 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

and Chriss, N

Almgren, R. and Chriss, N. Optimal execution of portfolio transactions. Journal of Risk, 3: 0 5--40, 2001

work page 2001
[3]

Emergent Complexity via Multi-Agent Competition

Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., and Mordatch, I. Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

A., Brennan, T., Korajczyk, R., Mcdonald, R., and Vissing-jorgensen, A

Brogaard, J. A., Brennan, T., Korajczyk, R., Mcdonald, R., and Vissing-jorgensen, A. High frequency trading and its impact on market quality, 2010

work page 2010
[5]

Deep hedging

Buehler, H., Gonon, L., Teichmann, J., and Wood, B. Deep hedging. Quantitative Finance, pp.\ 1--21, 2019

work page 2019
[6]

H., Kohli, P., and Whiteson, S

Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H., Kohli, P., and Whiteson, S. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.\ 1146--1155. JMLR. org, 2017

work page 2017
[7]

High-frequency trading

Gomber, P., Arndt, B., Lutat, M., and Elko Uhle, T. High-frequency trading. SSRN Electronic Journal, 01 2011. doi:10.2139/ssrn.1858626

work page doi:10.2139/ssrn.1858626 2011
[8]

and Wilcox, D

Hendricks, D. and Wilcox, D. A reinforcement learning extension to the almgren-chriss framework for optimal trade execution. In IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp.\ 457--464. IEEE, 2014

work page 2014
[9]

Risk management via anomaly circumvent: Mnemonic deep learning for midterm stock prediction

Li, X., Li, Y., , Liu, X.-Y., and Wang, C. Risk management via anomaly circumvent: Mnemonic deep learning for midterm stock prediction. In KDD Workshop on Anomaly Detection in Finance, 2019 a

work page 2019
[10]

Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation

Li, X., Li, Y., Zhan, Y., and Liu, X.-Y. Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation. In ICML Workshop on Applications and Infrastructure for Multi-Agent Learning, 2019 b

work page 2019
[11]

P., Hunt, J

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. ICLR, 2016

work page 2016
[12]

P., and Mordatch, I

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp.\ 6379--6390, 2017

work page 2017
[13]

Human-level control through deep reinforcement learning

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, et al. Human-level control through deep reinforcement learning. Nature, 518 0 (7540): 0 529, 2015

work page 2015
[14]

P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp.\ 1928--1937, 2016

work page 1928
[15]

P., and Vian, J

Omidshafiei, S., Pazis, J., Amato, C., How, J. P., and Vian, J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.\ 2681--2690. JMLR. org, 2017

work page 2017
[16]

Prioritized Experience Replay

Schaul, T., Quan, J., Antonoglou, I., and Silver, D. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[17]

J., Guez, A., et al

Silver, D., Huang, A., Maddison, C. J., Guez, A., et al. Mastering the game of go with deep neural networks and tree search. Nature, 529 0 (7587): 0 484, 2016

work page 2016
[18]

Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018

work page 2018
[19]

Multiagent cooperation and competition with deep reinforcement learning

Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., and Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PloS One, 12 0 (4): 0 e0172395, 2017

work page 2017
[20]

Deep reinforcement learning with double q-learning

Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Thirtieth AAAI Conference on Artificial Intelligence, 2016

work page 2016
[21]

Sample Efficient Actor-Critic with Experience Replay

Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., and de Freitas, N. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[22]

Practical deep reinforcement learning approach for stock trading

Xiong, Z., Liu, X.-Y., Zhong, S., Walid, A., et al. Practical deep reinforcement learning approach for stock trading. NeurlIPS Workshop on Challenges and Opportunities for AI in Financial Services, 2018

work page 2018
[23]

A practical machine learning approach for dynamic stock recommendation

Yang, H., Liu, X.-Y., and Wu, Q. A practical machine learning approach for dynamic stock recommendation. In IEEE International Conference On Trust, Security And Privacy (TrustCom), pp.\ 1693--1697. IEEE, 2018 a

work page 2018
[24]

Mean field multi-agent reinforcement learning

Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., and Wang, J. Mean field multi-agent reinforcement learning. In 35th International Conference on Machine Learning, ICML 2018, volume 80, pp.\ 5571--5580. PMLR, 2018 b

work page 2018
[25]

Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., and Dasgupta, S. Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv preprint arXiv:1901.08740, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

and Chriss, N

Almgren, R. and Chriss, N. Optimal execution of portfolio transactions. Journal of Risk, 3: 0 5--40, 2001

work page 2001

[3] [3]

Emergent Complexity via Multi-Agent Competition

Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., and Mordatch, I. Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

A., Brennan, T., Korajczyk, R., Mcdonald, R., and Vissing-jorgensen, A

Brogaard, J. A., Brennan, T., Korajczyk, R., Mcdonald, R., and Vissing-jorgensen, A. High frequency trading and its impact on market quality, 2010

work page 2010

[5] [5]

Deep hedging

Buehler, H., Gonon, L., Teichmann, J., and Wood, B. Deep hedging. Quantitative Finance, pp.\ 1--21, 2019

work page 2019

[6] [6]

H., Kohli, P., and Whiteson, S

Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H., Kohli, P., and Whiteson, S. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.\ 1146--1155. JMLR. org, 2017

work page 2017

[7] [7]

High-frequency trading

Gomber, P., Arndt, B., Lutat, M., and Elko Uhle, T. High-frequency trading. SSRN Electronic Journal, 01 2011. doi:10.2139/ssrn.1858626

work page doi:10.2139/ssrn.1858626 2011

[8] [8]

and Wilcox, D

Hendricks, D. and Wilcox, D. A reinforcement learning extension to the almgren-chriss framework for optimal trade execution. In IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp.\ 457--464. IEEE, 2014

work page 2014

[9] [9]

Risk management via anomaly circumvent: Mnemonic deep learning for midterm stock prediction

Li, X., Li, Y., , Liu, X.-Y., and Wang, C. Risk management via anomaly circumvent: Mnemonic deep learning for midterm stock prediction. In KDD Workshop on Anomaly Detection in Finance, 2019 a

work page 2019

[10] [10]

Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation

Li, X., Li, Y., Zhan, Y., and Liu, X.-Y. Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation. In ICML Workshop on Applications and Infrastructure for Multi-Agent Learning, 2019 b

work page 2019

[11] [11]

P., Hunt, J

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. ICLR, 2016

work page 2016

[12] [12]

P., and Mordatch, I

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp.\ 6379--6390, 2017

work page 2017

[13] [13]

Human-level control through deep reinforcement learning

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, et al. Human-level control through deep reinforcement learning. Nature, 518 0 (7540): 0 529, 2015

work page 2015

[14] [14]

P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp.\ 1928--1937, 2016

work page 1928

[15] [15]

P., and Vian, J

Omidshafiei, S., Pazis, J., Amato, C., How, J. P., and Vian, J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.\ 2681--2690. JMLR. org, 2017

work page 2017

[16] [16]

Prioritized Experience Replay

Schaul, T., Quan, J., Antonoglou, I., and Silver, D. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[17] [17]

J., Guez, A., et al

Silver, D., Huang, A., Maddison, C. J., Guez, A., et al. Mastering the game of go with deep neural networks and tree search. Nature, 529 0 (7587): 0 484, 2016

work page 2016

[18] [18]

Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018

work page 2018

[19] [19]

Multiagent cooperation and competition with deep reinforcement learning

Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., and Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PloS One, 12 0 (4): 0 e0172395, 2017

work page 2017

[20] [20]

Deep reinforcement learning with double q-learning

Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Thirtieth AAAI Conference on Artificial Intelligence, 2016

work page 2016

[21] [21]

Sample Efficient Actor-Critic with Experience Replay

Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., and de Freitas, N. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [22]

Practical deep reinforcement learning approach for stock trading

Xiong, Z., Liu, X.-Y., Zhong, S., Walid, A., et al. Practical deep reinforcement learning approach for stock trading. NeurlIPS Workshop on Challenges and Opportunities for AI in Financial Services, 2018

work page 2018

[23] [23]

A practical machine learning approach for dynamic stock recommendation

Yang, H., Liu, X.-Y., and Wu, Q. A practical machine learning approach for dynamic stock recommendation. In IEEE International Conference On Trust, Security And Privacy (TrustCom), pp.\ 1693--1697. IEEE, 2018 a

work page 2018

[24] [24]

Mean field multi-agent reinforcement learning

Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., and Wang, J. Mean field multi-agent reinforcement learning. In 35th International Conference on Machine Learning, ICML 2018, volume 80, pp.\ 5571--5580. PMLR, 2018 b

work page 2018

[25] [25]

Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization

Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., and Dasgupta, S. Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv preprint arXiv:1901.08740, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901