Augmenting Game AI with Deep Reinforcement Learning

Alessandro Sestini; Amir Baghi; Florian Fuchs; Jean-Philippe Barrette-LaPierre; Joakim Bergdahl; Linus Gissl\'en

arxiv: 2606.20210 · v1 · pith:SUGRIS56new · submitted 2026-06-18 · 💻 cs.AI

Augmenting Game AI with Deep Reinforcement Learning

Alessandro Sestini , Joakim Bergdahl , Amir Baghi , Jean-Philippe Barrette-LaPierre , Florian Fuchs , Linus Gissl\'en This is my paper

Pith reviewed 2026-06-26 17:25 UTC · model grok-4.3

classification 💻 cs.AI

keywords reinforcement learninggame AIvideo gamesmachine learning deploymentbelievable charactersdeep learning

0 comments

The pith

A framework with game-specific requirements allows reinforcement learning to train believable AI for video games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that hand-coded systems struggle to produce complex, human-like behavior in game characters, limiting immersion, while reinforcement learning promises more authentic agents by learning from interaction or player data. Current research limitations block wide use across game genres, so the authors put forward a training framework built around requirements matched to game AI and development needs. They illustrate the approach with concrete game examples, outline the steps for putting machine learning agents into live player-facing roles, and flag bottlenecks that must be solved next. A sympathetic reader would care because successful adoption could replace rigid scripts with adaptive characters that maintain the illusion of realism across many titles.

Core claim

The authors propose a framework for training reinforcement learning models with a set of requirements in mind that are suited towards game AI and game development, present examples of games with reinforcement learning-augmented game AI, describe the practicalities of deploying player-facing machine learning agents in modern games, and identify bottlenecks and hard problems that offer promising research directions.

What carries the argument

The proposed framework of training requirements tailored for game AI and game development.

If this is right

Reinforcement learning agents can be trained and deployed in player-facing roles inside modern commercial games.
Identified bottlenecks become concrete targets for future research to speed industry adoption.
Game development teams gain a structured way to move beyond hand-coded AI toward learned behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could shorten iteration cycles in game studios by letting designers specify high-level requirements rather than detailed rules.
Similar requirement-driven training might apply to other real-time interactive systems such as robotics or training simulators.
If the bottlenecks are resolved, game AI quality could scale with compute rather than with manual authoring effort.

Load-bearing premise

Current research limitations block broad deployment of reinforcement learning across game genres, so a specialized framework is required.

What would settle it

Successful application of unmodified general-purpose reinforcement learning methods to a wide variety of game genres with production-level performance would show the tailored framework is unnecessary.

Figures

Figures reproduced from arXiv: 2606.20210 by Alessandro Sestini, Amir Baghi, Florian Fuchs, Jean-Philippe Barrette-LaPierre, Joakim Bergdahl, Linus Gissl\'en.

**Figure 1.** Figure 1: Environments used as research testbeds. For this study, we use two popular AAA games to showcase the challenges in applying RL for game AI. Top shows an in-game screenshot of EA SPORTS FC 25, a realistic physicsbased football simulation game. In this environment, we try to improve the positioning system of the goalkeeper AI with RL. Bottom shows an in-game screenshot of Battlefield 6, a team-based, large-… view at source ↗

**Figure 2.** Figure 2: Training performance using our approach in EA SPORTS FC 25. Left shows a comparison between an agent trained with standard SAC and our modified variant designed to satisfy the Short Training Time requirement. The dotted line indicates the performance of the built-in hand-coded AI that the RL agent aims to augment. Through targeted modifications, the proposed approach improves sample efficiency, enabling ef… view at source ↗

**Figure 3.** Figure 3: Modes of perception explored in Battlefield 6. Left Raycast fan of 24 rays with high enough density to allow detection of all obstacles within a 10 m radius of the agent. Right Occupancy map of size 50 × 50. Each pixel color represents either: agent (yellow), terrain (purple), obstacle (blue) or target waypoint (green). When the waypoint is out of range, it is mapped to the border of the occupancy map as a… view at source ↗

**Figure 5.** Figure 5: Comparison of locomotion systems in Battlefield 6. Top: RLaugmented locomotion system. Bottom: NavMesh-based game AI. The RLaugmented agent exhibits smoother, more natural trajectories that resemble human player behavior. In contrast, the hand-coded system – mainly due to the discretization of the NavMesh representation – produces more rigid and less realistic movement patterns. Given these conditions, w… view at source ↗

read the original abstract

Immersion in video games depends not only on graphics, audio, and game mechanics, but also on the quality of in-game characters. Producing believable characters, or game AI, remains a significant challenge as behavioral complexity is hard to capture with hand-coded systems. Game AI is a source of immersion and engagement; however, the limitations stemming from the challenges of creating game AI often lead to frustration and the breaking of the illusion of realism within the game. The introduction of machine learning models opens the door to creating more believable, authentic, and relatable characters in games. The promise is that they either learn from interacting with the game, or from player data, to develop true human-like behavior. In this paper, we envision more applications of reinforcement learning for game AI in the future. For this to materialize, current research limitations are prohibitive to broad deployment across game genres. Therefore, we propose a framework for training reinforcement learning models with a set of requirements in mind that are suited towards game AI and game development. We present examples of games with reinforcement learning-augmented game AI and describe the practicalities of deploying player-facing machine learning agents in modern games. Furthermore, we identify bottlenecks and hard problems in these areas, which we believe offer promising research directions to accelerate the adoption of machine learning in game AI for the video game industry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a vision paper that restates RL challenges for game AI and calls for a requirements-based framework but supplies no details, tests, or concrete proposals.

read the letter

The main takeaway is that this paper is a position statement. It argues that RL could make game characters more believable than hand-coded systems, but current research limits prevent wide use across genres, so a tailored framework is needed. That is the entire contribution.

It does a solid job of connecting standard RL motivations to game-specific concerns like maintaining immersion, handling player-facing agents in shipped titles, and noting real deployment frictions. The discussion of bottlenecks as future research directions is straightforward and matches what many in the field already discuss.

The soft spots are central. No requirements list is given, no framework is defined or compared to existing RL setups, and no evidence supports the claim that limitations are prohibitive enough to need this approach. The paper mentions examples of RL-augmented AI but does not analyze them or present any results. Everything rests on assertion rather than data or derivation.

Citation patterns are ordinary for the area and do not reveal gaps or new connections. There are no equations, parameters, or reproducible elements to evaluate.

This is for readers already working in game AI who want a high-level summary of why RL adoption has been slow. It offers no new tools, findings, or falsifiable claims that a researcher could build on directly.

I would not send it for serious peer review. It reads more like a workshop position piece or industry note than a paper with verifiable content.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that limitations in current reinforcement learning research prevent broad deployment of RL for game AI across genres, and therefore proposes a high-level framework for training RL models with game-AI-specific requirements in mind. It discusses examples of RL-augmented game AI, practical issues in deploying player-facing agents in modern games, and identifies bottlenecks as promising research directions.

Significance. A well-specified and validated framework could help translate RL advances into industry practice by focusing on game-specific constraints such as believability, scalability, and integration with existing engines. The identification of hard problems offers a useful research agenda. However, because the manuscript presents only a conceptual proposal without formal definitions, concrete requirements, or any empirical results, its significance remains prospective rather than demonstrated.

major comments (2)

[Abstract] Abstract: the central motivation—that 'current research limitations are prohibitive to broad deployment across game genres'—is asserted without citing specific studies, failure modes, or quantitative evidence, which is load-bearing for justifying the need for a new framework.
[Framework proposal (main body)] Framework proposal: the manuscript states that it proposes 'a framework for training reinforcement learning models with a set of requirements in mind' but provides neither an explicit enumeration of those requirements nor a formal description of the framework components, training procedure, or evaluation criteria, rendering the core contribution unevaluable.

minor comments (1)

[Abstract] The abstract refers to 'examples of games with reinforcement learning-augmented game AI' and 'practicalities of deploying player-facing machine learning agents' without indicating which sections contain the details or how many examples are provided.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central motivation—that 'current research limitations are prohibitive to broad deployment across game genres'—is asserted without citing specific studies, failure modes, or quantitative evidence, which is load-bearing for justifying the need for a new framework.

Authors: We agree that the motivation would be strengthened by explicit citations to studies documenting RL limitations in game AI. In the revision we will add references to relevant work on sample inefficiency, poor cross-genre generalization, and integration difficulties with existing game engines, thereby grounding the claim that these issues currently limit broad deployment. revision: yes
Referee: [Framework proposal (main body)] Framework proposal: the manuscript states that it proposes 'a framework for training reinforcement learning models with a set of requirements in mind' but provides neither an explicit enumeration of those requirements nor a formal description of the framework components, training procedure, or evaluation criteria, rendering the core contribution unevaluable.

Authors: The paper is a conceptual vision piece whose primary goal is to articulate why game-specific requirements matter and to outline high-level research directions. We therefore did not include formal definitions or a training procedure. We accept that an explicit (even if high-level) enumeration of the key requirements and framework components would make the proposal more actionable. In revision we will add a concise section that enumerates the main requirements (believability, scalability, engine integration, etc.) and sketches the corresponding framework elements without introducing empirical results or formal algorithms. revision: partial

Circularity Check

0 steps flagged

No significant circularity; position paper without derivations

full rationale

The manuscript is a high-level vision and position statement proposing a framework for RL-augmented game AI. It contains no equations, fitted parameters, formal derivations, or testable predictions that could reduce to inputs by construction. The argument rests on a stated premise about current research limitations and the need for tailored requirements, but this premise is presented as motivation rather than derived from any self-referential step or self-citation chain within the paper. No load-bearing technical claims exist that match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The proposal rests on the domain assumption that RL can produce human-like behavior and that current limitations block deployment; no free parameters or invented entities with independent evidence are introduced.

axioms (1)

domain assumption Reinforcement learning models can develop true human-like behavior from interacting with the game or from player data.
Stated directly in the abstract as the core promise of the approach.

invented entities (1)

Framework for training RL models with game-AI-specific requirements no independent evidence
purpose: To overcome current research limitations and enable broad deployment
Introduced as the central proposal without external validation or falsifiable predictions.

pith-pipeline@v0.9.1-grok · 5783 in / 1130 out tokens · 19683 ms · 2026-06-26T17:25:24.498390+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 12 canonical work pages · 4 internal anchors

[1]

Grandmaster level in starcraft ii using multi-agent reinforcement learning,

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,”Nature, 2019

2019
[2]

Outracing champion gran turismo drivers with deep reinforcement learning,

P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subrama- nian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchset al., “Outracing champion gran turismo drivers with deep reinforcement learning,”Nature, 2022

2022
[3]

Dota 2 with Large Scale Deep Reinforcement Learning

C. Berner, G. Brockman, B. Chanet al., “Dota 2 with large scale deep reinforcement learning,”arXiv preprint arXiv:1912.06680, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[4]

Minimax exploiter: A data efficient approach for competitive self-play,

D. Bairamian, P. Marcotte, J. Romoff, G. Robert, and D. Nowrouzezahrai, “Minimax exploiter: A data efficient approach for competitive self-play,” inProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

2024
[5]

Honor of kings arena: an environment for gen- eralization in competitive reinforcement learning,

H. Wei, J. Chen, X. Ji, H. Qin, M. Deng, S. Li, L. Wang, W. Zhang, Y . Yu, L. Lincet al., “Honor of kings arena: an environment for gen- eralization in competitive reinforcement learning,”Advances in Neural Information Processing Systems, 2022

2022
[6]

Towards informed design and validation assistance in computer games using imitation learning,

A. Sestini, J. Bergdahl, K. Tollmar, A. D. Bagdanov, and L. Gissl ´en, “Towards informed design and validation assistance in computer games using imitation learning,” inIEEE Conference on Games (CoG), 2023

2023
[7]

Automated gameplay testing and validation with curiosity-conditioned proximal trajectories,

A. Sestini, L. Gissl ´en, J. Bergdahl, K. Tollmar, and A. D. Bagdanov, “Automated gameplay testing and validation with curiosity-conditioned proximal trajectories,”IEEE Transactions on Games, 2022

2022
[8]

Technical challenges of deploying reinforcement learning agents for game testing in aaa games,

J. Gillberg, J. Bergdahl, A. Sestini, A. Eakins, and L. Gissl ´en, “Technical challenges of deploying reinforcement learning agents for game testing in aaa games,” inIEEE Conference on Games (CoG), 2023

2023
[9]

modl.ai,

modl, “modl.ai,” 2026, https://modl.ai/ [Accessed: 2026]

2026
[10]

nunu, “nunu,” 2026, https://nunu.ai/ [Accessed: 2026]

2026
[11]

Human-like goalkeeping in a realistic football simulation: a sample-efficient reinforcement learning approach,

A. Sestini, J. Bergdahl, J.-P. Barrette-LaPierre, F. Fuchs, B. Chen, M. Jones, and L. Gissl ´en, “Human-like goalkeeping in a realistic football simulation: a sample-efficient reinforcement learning approach,” inReinforcement Learning Conference, 2026

2026
[12]

Deepcrawl: Deep re- inforcement learning for turn-based strategy games,

A. Sestini, A. Kuhnle, and A. D. Bagdanov, “Deepcrawl: Deep re- inforcement learning for turn-based strategy games,”arXiv preprint arXiv:2012.01914, 2020

work page arXiv 2012
[13]

“it’s unwieldy and it takes a lot of time

M. Jacob, S. Devlin, and K. Hofmann, ““it’s unwieldy and it takes a lot of time”—challenges and opportunities for creating agents in commercial games,” inAAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2020

2020
[14]

Ai methods for games,

G. N. Yannakakis and J. Togelius, “Ai methods for games,” inArtificial Intelligence and Games. Springer, 2025

2025
[15]

Improving the performance of backward chained behavior trees that use reinforcement learning,

M. Kartasev, J. Saler, and P. ¨Ogren, “Improving the performance of backward chained behavior trees that use reinforcement learning,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

2023
[16]

Applying goal-oriented action planning to games,

O. Jeff, “Applying goal-oriented action planning to games,”AI game programming wisdom, 2003

2003
[17]

Pufferlib 2.0: Reinforcement learning at 1m steps/s,

J. Suarez, “Pufferlib 2.0: Reinforcement learning at 1m steps/s,” in Reinforcement Learning Conference, 2025

2025
[18]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Ku- mar, H. Zhu, A. Gupta, P. Abbeelet al., “Soft actor-critic algorithms and applications,”arXiv preprint arXiv:1812.05905, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Replay across experiments: A natural extension of off-policy rl,

D. Tirumala, T. Lampe, J. E. Chen, T. Haarnoja, S. Huang, G. Lever, B. Moran, T. Hertweck, L. Hasenclever, M. Riedmilleret al., “Replay across experiments: A natural extension of off-policy rl,”arXiv preprint arXiv:2311.15951, 2023

work page arXiv 2023
[20]

Human-like bots for tactical shooters using compute-efficient sensors,

N. Justesen, M. Kaselimi, S. Snodgrass, M. V ozaru, M. Schlegelet al., “Human-like bots for tactical shooters using compute-efficient sensors,” IEEE Transactions on Games, 2025

2025
[21]

Deep reinforce- ment learning for navigation in aaa video games,

E. Alonso, M. Peter, D. Goumard, and J. Romoff, “Deep reinforce- ment learning for navigation in aaa video games,”arXiv preprint arXiv:2011.04764, 2020

work page arXiv 2011
[22]

Counter-strike deathmatch with large-scale behavioural cloning,

T. Pearce and J. Zhu, “Counter-strike deathmatch with large-scale behavioural cloning,” inIEEE Conference on Games (CoG), 2022

2022
[23]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwrightet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, 2022

2022
[24]

Flora: sample-efficient preference-based rl via low-rank style adaptation of reward functions,

D. Marta, S. Holk, M. Vasco, J. Lundell, T. Homberger, F. Buschet al., “Flora: sample-efficient preference-based rl via low-rank style adaptation of reward functions,” inIEEE International Conference on Robotics and Automation (ICRA), 2025

2025
[25]

Win- ning is not everything: Enhancing game development with intelligent agents,

Y . Zhao, I. Borovikov, F. de Mesentier Silva, A. Beirami, J. Rupert, C. Somers, J. Harder, J. Kolen, J. Pinto, R. Pourabolghasemet al., “Win- ning is not everything: Enhancing game development with intelligent agents,”IEEE Transactions on Games, 2020

2020
[26]

Bigger, better, faster: Human-level atari with human-level efficiency,

M. Schwarzer, J. S. O. Ceron, A. Courvilleet al., “Bigger, better, faster: Human-level atari with human-level efficiency,” inInternational Conference on Machine Learning, 2023

2023
[27]

SPEQ: Offline stabilization phases for efficient q-learning in high update-to-data ratio reinforcement learning,

C. Romeo, G. Macaluso, A. Sestini, and A. D. Bagdanov, “SPEQ: Offline stabilization phases for efficient q-learning in high update-to-data ratio reinforcement learning,” inReinforcement Learning Conference, 2025

2025
[28]

Optimistic critics can empower small actors,

O. Mastikhina, D. Sreenivas, and P. S. Castro, “Optimistic critics can empower small actors,”arXiv preprint arXiv:2506.01016, 2025

work page arXiv 2025
[29]

Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control,

M. Nauman, M. Ostaszewski, K. Jankowski,et al., “Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control,” Advances in neural information processing systems, 2024

2024
[30]

Fine-tuning reinforcement learning models is secretly a forgetting mitigation problem,

M. Wołczyk, B. Cupiał, M. Ostaszewski, and M. e. a. Bortkiewicz, “Fine-tuning reinforcement learning models is secretly a forgetting mitigation problem,”arXiv preprint arXiv:2402.02868, 2024

work page arXiv 2024
[31]

A Study on Overfitting in Deep Reinforcement Learning

C. Zhang, O. Vinyals, R. Munos, and S. Bengio, “A study on overfitting in deep reinforcement learning,”arXiv preprint arXiv:1804.06893, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

Omni-epic: Open- endedness via models of human notions of interestingness with environ- ments programmed in code,

M. Faldor, J. Zhang, A. Cully, and J. Clune, “Omni-epic: Open- endedness via models of human notions of interestingness with environ- ments programmed in code,”arXiv preprint arXiv:2405.15568, 2024

work page arXiv 2024
[33]

Loss of plasticity in deep continual learning,

S. Dohare, J. F. Hernandez-Garcia, Q. Lanet al., “Loss of plasticity in deep continual learning,”Nature, 2024

2024
[34]

Efficient visibility approxima- tion for game ai using neural omnidirectional distance fields,

Z. Ying, N. Edwards, and M. Kutuzov, “Efficient visibility approxima- tion for game ai using neural omnidirectional distance fields,”ACM on Computer Graphics and Interactive Techniques, 2024

2024
[35]

Scaling behavior cloning improves causal reasoning: An open model for real- time video game playing,

Y . Yue, I. Salia, S. Hunt, C. Green, W. Shi, and J. J. Hunt, “Scaling behavior cloning improves causal reasoning: An open model for real- time video game playing,”arXiv preprint arXiv:2601.04575, 2026

work page arXiv 2026
[36]

Nitrogen: An open foundation model for generalist gaming agents,

L. Magne, A. Awadalla, G. Wang, Y . Xuet al., “Nitrogen: An open foundation model for generalist gaming agents,”arXiv preprint arXiv:2601.02427, 2026

work page arXiv 2026
[37]

Generating personas for games with multimodal adversarial imitation learning,

W. Ahlberg, A. Sestini, K. Tollmar, and L. Gissl´en, “Generating personas for games with multimodal adversarial imitation learning,” inIEEE Conference on Games (CoG), 2023

2023
[38]

Policy fusion for adaptive and customizable reinforcement learning agents,

A. Sestini, A. Kuhnle, and A. D. Bagdanov, “Policy fusion for adaptive and customizable reinforcement learning agents,” inIEEE Conference on Games (CoG), 2021

2021
[39]

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversarial inverse reinforcement learning,”arXiv preprint arXiv:1710.11248, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

Deep learning for video game playing,

N. Justesen, P. Bontrager, J. Togelius, and S. Risi, “Deep learning for video game playing,”IEEE Transactions on Games, 2019

2019
[41]

Large language models and games: A survey and roadmap,

R. Gallotta, G. Todd, M. Zammit, S. Earle, A. Liapis, J. Togelius, and G. N. Yannakakis, “Large language models and games: A survey and roadmap,”IEEE Transactions on Games, 2024

2024

[1] [1]

Grandmaster level in starcraft ii using multi-agent reinforcement learning,

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,”Nature, 2019

2019

[2] [2]

Outracing champion gran turismo drivers with deep reinforcement learning,

P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subrama- nian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchset al., “Outracing champion gran turismo drivers with deep reinforcement learning,”Nature, 2022

2022

[3] [3]

Dota 2 with Large Scale Deep Reinforcement Learning

C. Berner, G. Brockman, B. Chanet al., “Dota 2 with large scale deep reinforcement learning,”arXiv preprint arXiv:1912.06680, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912

[4] [4]

Minimax exploiter: A data efficient approach for competitive self-play,

D. Bairamian, P. Marcotte, J. Romoff, G. Robert, and D. Nowrouzezahrai, “Minimax exploiter: A data efficient approach for competitive self-play,” inProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

2024

[5] [5]

Honor of kings arena: an environment for gen- eralization in competitive reinforcement learning,

H. Wei, J. Chen, X. Ji, H. Qin, M. Deng, S. Li, L. Wang, W. Zhang, Y . Yu, L. Lincet al., “Honor of kings arena: an environment for gen- eralization in competitive reinforcement learning,”Advances in Neural Information Processing Systems, 2022

2022

[6] [6]

Towards informed design and validation assistance in computer games using imitation learning,

A. Sestini, J. Bergdahl, K. Tollmar, A. D. Bagdanov, and L. Gissl ´en, “Towards informed design and validation assistance in computer games using imitation learning,” inIEEE Conference on Games (CoG), 2023

2023

[7] [7]

Automated gameplay testing and validation with curiosity-conditioned proximal trajectories,

A. Sestini, L. Gissl ´en, J. Bergdahl, K. Tollmar, and A. D. Bagdanov, “Automated gameplay testing and validation with curiosity-conditioned proximal trajectories,”IEEE Transactions on Games, 2022

2022

[8] [8]

Technical challenges of deploying reinforcement learning agents for game testing in aaa games,

J. Gillberg, J. Bergdahl, A. Sestini, A. Eakins, and L. Gissl ´en, “Technical challenges of deploying reinforcement learning agents for game testing in aaa games,” inIEEE Conference on Games (CoG), 2023

2023

[9] [9]

modl.ai,

modl, “modl.ai,” 2026, https://modl.ai/ [Accessed: 2026]

2026

[10] [10]

nunu, “nunu,” 2026, https://nunu.ai/ [Accessed: 2026]

2026

[11] [11]

Human-like goalkeeping in a realistic football simulation: a sample-efficient reinforcement learning approach,

A. Sestini, J. Bergdahl, J.-P. Barrette-LaPierre, F. Fuchs, B. Chen, M. Jones, and L. Gissl ´en, “Human-like goalkeeping in a realistic football simulation: a sample-efficient reinforcement learning approach,” inReinforcement Learning Conference, 2026

2026

[12] [12]

Deepcrawl: Deep re- inforcement learning for turn-based strategy games,

A. Sestini, A. Kuhnle, and A. D. Bagdanov, “Deepcrawl: Deep re- inforcement learning for turn-based strategy games,”arXiv preprint arXiv:2012.01914, 2020

work page arXiv 2012

[13] [13]

“it’s unwieldy and it takes a lot of time

M. Jacob, S. Devlin, and K. Hofmann, ““it’s unwieldy and it takes a lot of time”—challenges and opportunities for creating agents in commercial games,” inAAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2020

2020

[14] [14]

Ai methods for games,

G. N. Yannakakis and J. Togelius, “Ai methods for games,” inArtificial Intelligence and Games. Springer, 2025

2025

[15] [15]

Improving the performance of backward chained behavior trees that use reinforcement learning,

M. Kartasev, J. Saler, and P. ¨Ogren, “Improving the performance of backward chained behavior trees that use reinforcement learning,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

2023

[16] [16]

Applying goal-oriented action planning to games,

O. Jeff, “Applying goal-oriented action planning to games,”AI game programming wisdom, 2003

2003

[17] [17]

Pufferlib 2.0: Reinforcement learning at 1m steps/s,

J. Suarez, “Pufferlib 2.0: Reinforcement learning at 1m steps/s,” in Reinforcement Learning Conference, 2025

2025

[18] [18]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Ku- mar, H. Zhu, A. Gupta, P. Abbeelet al., “Soft actor-critic algorithms and applications,”arXiv preprint arXiv:1812.05905, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Replay across experiments: A natural extension of off-policy rl,

D. Tirumala, T. Lampe, J. E. Chen, T. Haarnoja, S. Huang, G. Lever, B. Moran, T. Hertweck, L. Hasenclever, M. Riedmilleret al., “Replay across experiments: A natural extension of off-policy rl,”arXiv preprint arXiv:2311.15951, 2023

work page arXiv 2023

[20] [20]

Human-like bots for tactical shooters using compute-efficient sensors,

N. Justesen, M. Kaselimi, S. Snodgrass, M. V ozaru, M. Schlegelet al., “Human-like bots for tactical shooters using compute-efficient sensors,” IEEE Transactions on Games, 2025

2025

[21] [21]

Deep reinforce- ment learning for navigation in aaa video games,

E. Alonso, M. Peter, D. Goumard, and J. Romoff, “Deep reinforce- ment learning for navigation in aaa video games,”arXiv preprint arXiv:2011.04764, 2020

work page arXiv 2011

[22] [22]

Counter-strike deathmatch with large-scale behavioural cloning,

T. Pearce and J. Zhu, “Counter-strike deathmatch with large-scale behavioural cloning,” inIEEE Conference on Games (CoG), 2022

2022

[23] [23]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwrightet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, 2022

2022

[24] [24]

Flora: sample-efficient preference-based rl via low-rank style adaptation of reward functions,

D. Marta, S. Holk, M. Vasco, J. Lundell, T. Homberger, F. Buschet al., “Flora: sample-efficient preference-based rl via low-rank style adaptation of reward functions,” inIEEE International Conference on Robotics and Automation (ICRA), 2025

2025

[25] [25]

Win- ning is not everything: Enhancing game development with intelligent agents,

Y . Zhao, I. Borovikov, F. de Mesentier Silva, A. Beirami, J. Rupert, C. Somers, J. Harder, J. Kolen, J. Pinto, R. Pourabolghasemet al., “Win- ning is not everything: Enhancing game development with intelligent agents,”IEEE Transactions on Games, 2020

2020

[26] [26]

Bigger, better, faster: Human-level atari with human-level efficiency,

M. Schwarzer, J. S. O. Ceron, A. Courvilleet al., “Bigger, better, faster: Human-level atari with human-level efficiency,” inInternational Conference on Machine Learning, 2023

2023

[27] [27]

SPEQ: Offline stabilization phases for efficient q-learning in high update-to-data ratio reinforcement learning,

C. Romeo, G. Macaluso, A. Sestini, and A. D. Bagdanov, “SPEQ: Offline stabilization phases for efficient q-learning in high update-to-data ratio reinforcement learning,” inReinforcement Learning Conference, 2025

2025

[28] [28]

Optimistic critics can empower small actors,

O. Mastikhina, D. Sreenivas, and P. S. Castro, “Optimistic critics can empower small actors,”arXiv preprint arXiv:2506.01016, 2025

work page arXiv 2025

[29] [29]

Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control,

M. Nauman, M. Ostaszewski, K. Jankowski,et al., “Bigger, regularized, optimistic: scaling for compute and sample efficient continuous control,” Advances in neural information processing systems, 2024

2024

[30] [30]

Fine-tuning reinforcement learning models is secretly a forgetting mitigation problem,

M. Wołczyk, B. Cupiał, M. Ostaszewski, and M. e. a. Bortkiewicz, “Fine-tuning reinforcement learning models is secretly a forgetting mitigation problem,”arXiv preprint arXiv:2402.02868, 2024

work page arXiv 2024

[31] [31]

A Study on Overfitting in Deep Reinforcement Learning

C. Zhang, O. Vinyals, R. Munos, and S. Bengio, “A study on overfitting in deep reinforcement learning,”arXiv preprint arXiv:1804.06893, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

Omni-epic: Open- endedness via models of human notions of interestingness with environ- ments programmed in code,

M. Faldor, J. Zhang, A. Cully, and J. Clune, “Omni-epic: Open- endedness via models of human notions of interestingness with environ- ments programmed in code,”arXiv preprint arXiv:2405.15568, 2024

work page arXiv 2024

[33] [33]

Loss of plasticity in deep continual learning,

S. Dohare, J. F. Hernandez-Garcia, Q. Lanet al., “Loss of plasticity in deep continual learning,”Nature, 2024

2024

[34] [34]

Efficient visibility approxima- tion for game ai using neural omnidirectional distance fields,

Z. Ying, N. Edwards, and M. Kutuzov, “Efficient visibility approxima- tion for game ai using neural omnidirectional distance fields,”ACM on Computer Graphics and Interactive Techniques, 2024

2024

[35] [35]

Scaling behavior cloning improves causal reasoning: An open model for real- time video game playing,

Y . Yue, I. Salia, S. Hunt, C. Green, W. Shi, and J. J. Hunt, “Scaling behavior cloning improves causal reasoning: An open model for real- time video game playing,”arXiv preprint arXiv:2601.04575, 2026

work page arXiv 2026

[36] [36]

Nitrogen: An open foundation model for generalist gaming agents,

L. Magne, A. Awadalla, G. Wang, Y . Xuet al., “Nitrogen: An open foundation model for generalist gaming agents,”arXiv preprint arXiv:2601.02427, 2026

work page arXiv 2026

[37] [37]

Generating personas for games with multimodal adversarial imitation learning,

W. Ahlberg, A. Sestini, K. Tollmar, and L. Gissl´en, “Generating personas for games with multimodal adversarial imitation learning,” inIEEE Conference on Games (CoG), 2023

2023

[38] [38]

Policy fusion for adaptive and customizable reinforcement learning agents,

A. Sestini, A. Kuhnle, and A. D. Bagdanov, “Policy fusion for adaptive and customizable reinforcement learning agents,” inIEEE Conference on Games (CoG), 2021

2021

[39] [39]

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversarial inverse reinforcement learning,”arXiv preprint arXiv:1710.11248, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[40] [40]

Deep learning for video game playing,

N. Justesen, P. Bontrager, J. Togelius, and S. Risi, “Deep learning for video game playing,”IEEE Transactions on Games, 2019

2019

[41] [41]

Large language models and games: A survey and roadmap,

R. Gallotta, G. Todd, M. Zammit, S. Earle, A. Liapis, J. Togelius, and G. N. Yannakakis, “Large language models and games: A survey and roadmap,”IEEE Transactions on Games, 2024

2024