Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks

Bo Ma; Jie Cao; Min Xu; Ngoc Hung Nguyen; Nguyen Cong Luong; Nguyen Duc Duy Anh; Nguyen Duc Hai; Nguyen Quoc Khanh; Qiushi Zhao; Shaohan Feng

arxiv: 2606.05208 · v1 · pith:5LX6JDDVnew · submitted 2026-05-26 · 📡 eess.SP · cs.LG

Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks

Nguyen Cong Luong , Shaohan Feng , Nguyen Duc Hai , Zeping Sui , Bo Ma , Min Xu , Zhihao Dong , Qiushi Zhao

show 5 more authors

Nguyen Duc Duy Anh Nguyen Quoc Khanh Ngoc Hung Nguyen Zitian Zhang Jie Cao

This is my paper

Pith reviewed 2026-06-29 16:12 UTC · model grok-4.3

classification 📡 eess.SP cs.LG

keywords TransformerReinforcement LearningCommunication NetworksSelf-AttentionResource AllocationSurvey

0 comments

The pith

The self-attention mechanism in Transformers allows RL to model long-range dependencies and global correlations in communication networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines how Transformer architectures improve reinforcement learning for communication network problems. Traditional RL requires many environment interactions, struggles to capture long-term relationships, and handles partial observability poorly. The paper shows that self-attention overcomes these constraints by computing global correlations across sequences, speeding training, and processing heterogeneous data types. It reviews applications in resource allocation, computation offloading, routing, trajectory control, and network security while listing open challenges and directions such as semantic communication.

Core claim

The paper establishes that integrating the Transformer with RL overcomes limitations in interaction count, long-term relationship modeling, and partial observability by using self-attention to capture long-range dependencies and global correlations efficiently while accelerating training and managing multiple data modalities in network tasks.

What carries the argument

Self-attention mechanism, which computes pairwise relationships across an entire input sequence to model long-range dependencies and global correlations.

If this is right

Resource allocation and computation offloading decisions require fewer environment samples.
Routing and trajectory control perform better under partial observability.
Network security tasks gain from handling heterogeneous data modalities.
Overall training time for RL agents in networks decreases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend to large-scale dynamic networks where standard RL fails to converge quickly.
Combining the approach with semantic communication may create new optimization objectives beyond bit-level metrics.
Real-time deployment in live network testbeds would test whether the reported efficiency gains hold outside simulation.

Load-bearing premise

That the self-attention mechanism of Transformers directly resolves the interaction volume, long-term modeling, and partial observability problems that limit traditional RL in communication network settings.

What would settle it

An experiment in which a Transformer-augmented RL agent requires the same number of environment interactions as a standard RL agent to reach target performance in a resource allocation or routing task would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.05208 by Bo Ma, Jie Cao, Min Xu, Ngoc Hung Nguyen, Nguyen Cong Luong, Nguyen Duc Duy Anh, Nguyen Duc Hai, Nguyen Quoc Khanh, Qiushi Zhao, Shaohan Feng, Zeping Sui, Zhihao Dong, Zitian Zhang.

**Figure 1.** Figure 1: Structure of the original Transformer [11]. B. Fundamentals of Transformer and Attention Mechanism 1) Transformer Architecture: The Transformer [11] was originally introduced as a deep learning architecture for addressing NLP tasks, but it has been applied in every domain and fundamentally transformed the AI landscape. It consists of an encoder and a decoder fueled by self-attention mechanism, fully-conne… view at source ↗

**Figure 2.** Figure 2: (a) Average episodic return of standard offline DRL and Transformer-enabled RL compared to the dataset mean, (b) [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Transformer-based link-state aggregation framework, in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: A representative architecture where the multimodal [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: A representative architecture where a Transformer [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: A summary of four application patterns of Transformer [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

read the original abstract

Reinforcement Learning (RL) has long been a powerful solution to various problems in communication networks. However, traditional RL models still face with several limitations. Not only do they rely on large numbers of interactions with the environment, but they are also limited in terms of modeling long-term relationships and tackling partial observability. In recent years, the Transformer model has demonstrated the ability to enhance RL models, allowing them to overcome these issues. Particularly, the self-attention mechanism within the Transformer enables efficient modeling of long-range dependencies and global correlations, as well as accelerates training processes and handles heterogeneous data modalities. In this paper, we present a comprehensive survey of Transformer-based RL algorithms and their applications in communication networks. Specifically, the paper provides the mathematical background of RL and Transformer architectures, along with insights into key issues such as resource allocation, computation offloading, routing, and trajectory control, and network security. We conclude the paper by discussing challenges, open issues, and notable future research directions, including Transformer-enhanced DRL algorithms for semantic communication and network optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward survey compiling existing work on Transformer-enhanced RL for communication networks, with no new technical results.

read the letter

This paper is a literature survey on using Transformers to address limitations in traditional RL for network problems like resource allocation and routing. It walks through RL and Transformer basics, then maps applications in offloading, trajectory control, and security before listing open issues.

What it does is organize scattered papers into one place and restate the standard advantages of self-attention for long-range dependencies. That can save time for readers who want an entry point into the intersection of these areas.

The soft spot is that the value rests entirely on coverage and accuracy of the cited works. The abstract gives no indication of novel synthesis or critical gaps identified beyond the usual list, and the claims about overcoming partial observability are the same ones already made in the Transformer literature. Without seeing the full reference list or how the authors handled recent versus foundational papers, it is impossible to judge whether the survey is balanced or misses key counter-examples.

A reader already working in RL for networks will not find new derivations or experiments here. Someone entering the topic might get a useful map. The paper deserves peer review if the editors want a survey in this niche; the central contribution is compilation rather than resolution of open questions, so referees should focus on completeness rather than originality of claims.

Referee Report

0 major / 2 minor

Summary. The paper is a survey presenting the mathematical background of reinforcement learning and Transformer architectures, reviewing Transformer-enhanced RL algorithms, and surveying their applications to communication network problems including resource allocation, computation offloading, routing, trajectory control, and network security; it concludes with challenges, open issues, and future directions such as semantic communication.

Significance. As a compilation and organization of existing literature on integrating self-attention mechanisms with RL to address sample inefficiency, long-range dependencies, and partial observability in communication networks, the survey can serve as a useful reference point for the field when coverage is representative.

minor comments (2)

[Abstract] Abstract: the statement that self-attention 'accelerates training processes' is presented without a supporting citation or concrete example from the surveyed works; a brief pointer to a key reference would strengthen the claim.
The survey structure would benefit from an explicit table or taxonomy that maps specific Transformer-RL variants to the listed application areas (resource allocation, offloading, etc.) to improve navigability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the constructive summary and for recommending minor revision. The report does not enumerate specific major comments, so we have no points requiring detailed rebuttal or revision at this stage. We will incorporate any minor editorial suggestions during the revision process and confirm that the survey coverage remains representative of the literature.

Circularity Check

0 steps flagged

No significant circularity; survey of external literature

full rationale

The paper is explicitly a survey compiling mathematical background and applications from prior external work on RL and Transformers. No novel derivations, predictions, or fitted parameters are advanced whose validity reduces to self-referential logic or unverified self-citations. Standard properties of self-attention are cited from the established Transformer literature rather than derived here.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper with no new mathematical models, free parameters, axioms, or invented entities; it reviews existing methods from the literature without introducing original derivations.

pith-pipeline@v0.9.1-grok · 5754 in / 1080 out tokens · 55277 ms · 2026-06-29T16:12:15.041779+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

217 extracted references · 25 canonical work pages · 10 internal anchors

[1]

Deep reinforcement learning for autonomous driving: A survey,

B. R. Kiran, I. Sobh, V . Talpaert, P. Mannion, A. A. Al Sallab, S. Yo- gamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,”IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909–4926, 2021

2021
[2]

Rein- forcement learning for mobile robotics exploration: A survey,

L. C. Garaffa, M. Basso, A. A. Konzen, and E. P. de Freitas, “Rein- forcement learning for mobile robotics exploration: A survey,”IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 3796–3810, 2021

2021
[3]

Reinforcement learning based recommender systems: A survey,

M. M. Afsar, T. Crump, and B. Far, “Reinforcement learning based recommender systems: A survey,”ACM Comput. Surv., vol. 55, no. 7, pp. 1–38, 2022

2022
[4]

Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,

A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi, “Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,”arXiv preprint arXiv:2106.00574, 2021

work page arXiv 2021
[5]

Multi-agent deep reinforcement learning-based task scheduling and resource sharing for o-ran-empowered multi-uav-assisted wireless sensor networks,

M. L. Betalo, S. Leng, H. N. Abishu, F. A. Dharejo, A. M. Seid, A. Erbad, R. A. Naqvi, L. Zhou, and M. Guizani, “Multi-agent deep reinforcement learning-based task scheduling and resource sharing for o-ran-empowered multi-uav-assisted wireless sensor networks,”IEEE Trans. Veh. Technol., vol. 73, no. 7, pp. 9247–9261, 2023

2023
[6]

Toward autonomous multi-uav wireless network: A survey of reinforcement learning-based approaches,

Y . Bai, H. Zhao, X. Zhang, Z. Chang, R. Jäntti, and K. Yang, “Toward autonomous multi-uav wireless network: A survey of reinforcement learning-based approaches,”IEEE Commun. Surveys Tuts., vol. 25, no. 4, pp. 3038–3067, 2023

2023
[7]

Applications of deep reinforcement learning in communications and networking: A survey,

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y .-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,”IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3133–3174, 2019

2019
[8]

Transformers in reinforcement learning: a survey,

P. Agarwal, A. A. Rahman, P.-L. St-Charles, S. J. Prince, and S. E. Kahou, “Transformers in reinforcement learning: a survey,”arXiv preprint arXiv:2307.05979, 2023

work page arXiv 2023
[9]

Transdreamer: Rein- forcement learning with transformer world models,

C. Chen, Y .-F. Wu, J. Yoon, and S. Ahn, “Transdreamer: Rein- forcement learning with transformer world models,”arXiv preprint arXiv:2202.09481, 2022

work page arXiv 2022
[10]

Deep transformer q-networks for partially observable reinforcement learning,

K. Esslinger, R. Platt, and C. Amato, “Deep transformer q-networks for partially observable reinforcement learning,”arXiv preprint arXiv:2206.01078, 2022

work page arXiv 2022
[11]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Adv. Neural Inf. Process. Syst., vol. 30, 2017

2017
[12]

Autonomous link control in digital twin aided mobile network: From virtual channel generation to intelligent power allocation,

C. Che, G. Liang, K. Zheng, L. Xiang, J. Hu, K. Yang, Q. H. Abbasi, J. Cooper, and M. A. Imran, “Autonomous link control in digital twin aided mobile network: From virtual channel generation to intelligent power allocation,”IEEE Internet Things J., vol. 12, no. 19, pp. 39 745– 39 761, 2025

2025
[13]

Dsaf-former: Drl based sub-channel assignment framework using transformer in mmwave iabn,

Z. Ma, Z. Liu, G. Han, J. Li, T. Li, and Q. Guo, “Dsaf-former: Drl based sub-channel assignment framework using transformer in mmwave iabn,”IEEE Internet Things J., vol. 12, no. 19, pp. 40 576– 40 591, 2025

2025
[14]

Tpto: A transformer-ppo based task offloading solution for edge computing environments,

N. Gholipour, M. D. de Assuncao, P. Agarwal, J. Gascon-Samson, and R. Buyya, “Tpto: A transformer-ppo based task offloading solution for edge computing environments,” inIEEE 29th ICPADS, 2023, pp. 1115–1122

2023
[15]

Transformer- based distributed task offloading and resource management in cloud- edge computing networks,

M. Han, X. Sun, X. Wang, W. Zhan, and X. Chen, “Transformer- based distributed task offloading and resource management in cloud- edge computing networks,”IEEE J. Sel. Areas. Commun., vol. 43, no. 9, pp. 2938–2953, 2025

2025
[16]

From perception to action: Transformer-enhanced deep reinforcement learning for autonomous robot navigation,

B. Abdelkader, N. Emira, and E. Nadjib, “From perception to action: Transformer-enhanced deep reinforcement learning for autonomous robot navigation,” inIEEE 7th PAIS, 2025, pp. 1–6

2025
[17]

Transformer based collaborative reinforcement learning for fluid antenna system (fas)-enabled 3d uav positioning,

X. Xu, H. Xu, D. Wei, W. Saad, M. Bennis, and M. Chen, “Transformer based collaborative reinforcement learning for fluid antenna system (fas)-enabled 3d uav positioning,”IEEE J. Sel. Areas. Commun., vol. 44, pp. 1128–1143, 2026

2026
[18]

Anti-jamming task schedul- ing in mec-o-ran with hierarchical drl and transformer-based control,

G. Asemian, M. Amini, and B. Kantarci, “Anti-jamming task schedul- ing in mec-o-ran with hierarchical drl and transformer-based control,” IEEE Internet Things J., vol. 13, no. 4, pp. 7714–7729, 2026

2026
[19]

Radar: Robust drl-based resource allocation against adversarial attacks in intelligent o-ran,

Y . A. Ergu and V .-L. Nguyen, “Radar: Robust drl-based resource allocation against adversarial attacks in intelligent o-ran,”IEEE Trans. Green Commun. Netw., vol. 9, no. 4, pp. 2305–2318, 2025

2025
[20]

Enhancing iot intelligence: A transformer-based reinforcement learn- ing methodology,

G. Rjoub, S. Islam, J. Bentahar, M. A. Almaiah, and R. Alrawashdeh, “Enhancing iot intelligence: A transformer-based reinforcement learn- ing methodology,” inIEEE IWCMC, 2024, pp. 1418–1423

2024
[21]

A comparison of neural networks for wireless channel prediction,

O. Stenhammar, G. Fodor, and C. Fischione, “A comparison of neural networks for wireless channel prediction,”IEEE Wirel. Commun., vol. 31, no. 3, pp. 235–241, 2024

2024
[22]

Machine learning for future wire- less communications: Channel prediction perspectives,

H. Kim, J. Choi, and D. J. Love, “Machine learning for future wire- less communications: Channel prediction perspectives,”arXiv preprint arXiv:2502.18196, 2025

work page arXiv 2025
[23]

Generative ai for deep reinforcement learning: Framework, analysis, and use cases,

G. Sun, W. Xie, D. Niyato, F. Mei, J. Kang, H. Du, and S. Mao, “Generative ai for deep reinforcement learning: Framework, analysis, and use cases,”IEEE Wirel. Commun., vol. 32, no. 3, pp. 186–195, 2025

2025
[24]

Dueling network architectures for deep reinforcement learning,

Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in ICML, 2016, pp. 1995–2003

2016
[25]

On transforming reinforcement learning with transformers: The development trajectory,

S. Hu, L. Shen, Y . Zhang, Y . Chen, and D. Tao, “On transforming reinforcement learning with transformers: The development trajectory,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 8580–8599, 2024

2024
[26]

Mastering atari, go, chess and shogi by planning with a learned model,

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepelet al., “Mastering atari, go, chess and shogi by planning with a learned model,”Nature, vol. 588, no. 7839, pp. 604–609, 2020

2020
[27]

Mastering Atari with Discrete World Models

D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,”arXiv preprint arXiv:2010.02193, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[28]

Measuring sample efficiency and generalization in reinforce- ment learning benchmarks: Neurips 2020 procgen benchmark,

S. Mohanty, J. Poonganam, A. Gaidon, A. Kolobov, B. Wulfe, D. Chakraborty, G. Šemetulskis, J. Schapke, J. Kubilius, J. Pašukonis et al., “Measuring sample efficiency and generalization in reinforce- ment learning benchmarks: Neurips 2020 procgen benchmark,”arXiv preprint arXiv:2103.15332, 2021

work page arXiv 2020
[29]

A survey on trans- formers in reinforcement learning,

W. Li, H. Luo, Z. Lin, C. Zhang, Z. Lu, and D. Ye, “A survey on trans- formers in reinforcement learning,”arXiv preprint arXiv:2301.03044, 2023

work page arXiv 2023
[30]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,”arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[31]

R. S. Sutton, A. G. Bartoet al.,Reinforcement learning: An introduc- tion. MIT press Cambridge, 1998, vol. 1, no. 1

1998
[32]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005
[33]

Diaformer: Automatic diagnosis via symptoms sequence generation,

J. Chen, D. Li, Q. Chen, W. Zhou, and X. Liu, “Diaformer: Automatic diagnosis via symptoms sequence generation,” inAAAI, vol. 36, no. 4, 2022, pp. 4432–4440

2022
[34]

Addressing optimism bias in sequence modeling for reinforcement learning,

A. R. Villaflor, Z. Huang, S. Pande, J. M. Dolan, and J. Schneider, “Addressing optimism bias in sequence modeling for reinforcement learning,” inICML, 2022, pp. 22 270–22 283

2022
[35]

Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data,

A. Mandlekar, F. Ramos, B. Boots, S. Savarese, L. Fei-Fei, A. Garg, and D. Fox, “Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data,” inIEEE In. Conf. Robot. Autom., 2020, pp. 4414–4420. 27

2020
[36]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

2016
[37]

Layer Normalization

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,”arXiv preprint arXiv:1607.06450, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[38]

Deep Learning using Rectified Linear Units (ReLU)

A. F. Agarap, “Deep learning using rectified linear units (relu),”ArXiv, vol. abs/1803.08375, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[39]

Gaussian error linear units (gelus),

D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv: Learning, 2016

2016
[40]

A survey of transformers,

T. Lin, Y . Wang, X. Liu, and X. Qiu, “A survey of transformers,”AI open, vol. 3, pp. 111–132, 2022

2022
[41]

Generating Long Sequences with Sparse Transformers

R. Child, S. Gray, A. Radford, and I. Sutskever, “Generating long sequences with sparse transformers,”arXiv preprint arXiv:1904.10509, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[42]

Transformers are rnns: Fast autoregressive transformers with linear attention,

A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in ICLR, 2020, pp. 5156–5165

2020
[43]

Rethinking attention with performers,

K. M. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Q. Davis, A. Mohiuddin, L. Kaiser, D. B. Belanger, L. J. Colwell, and A. Weller, “Rethinking attention with performers,” inICLR, 2021, pp. 1–14

2021
[44]

Linear transformers are secretly fast weight programmers,

I. Schlag, K. Irie, and J. Schmidhuber, “Linear transformers are secretly fast weight programmers,” inICML, 2021, pp. 9355–9366

2021
[45]

Generating wikipedia by summarizing long sequences,

P. J. Liu*, M. Saleh*, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer, “Generating wikipedia by summarizing long sequences,” inICLR, 2018, pp. 1–18

2018
[46]

Fast transformers with clustered attention,

A. Vyas, A. Katharopoulos, and F. Fleuret, “Fast transformers with clustered attention,”Adv. Neural Inf. Process. Syst., vol. 33, pp. 21 665– 21 674, 2020

2020
[47]

Poolingformer: Long document modeling with pooling attention,

H. Zhang, Y . Gong, Y . Shen, W. Li, J. Lv, N. Duan, and W. Chen, “Poolingformer: Long document modeling with pooling attention,” in ICML, 2021, pp. 12 437–12 446

2021
[48]

Compressed self-attention for deep metric learning with low-rank approximation,

Z. Chen, M. Gong, L. Ge, and B. Du, “Compressed self-attention for deep metric learning with low-rank approximation,” inIJCAI, 2021, pp. 2058–2064

2021
[49]

Nyströmformer: A nyström-based algorithm for approximat- ing self-attention,

Y . Xiong, Z. Zeng, R. Chakraborty, M. Tan, G. Fung, Y . Li, and V . Singh, “Nyströmformer: A nyström-based algorithm for approximat- ing self-attention,” inAAAI, vol. 35, no. 16, 2021, pp. 14 138–14 148

2021
[50]

Masked language modeling for proteins via linearly scalable long- context transformers,

K. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, D. Belanger, L. Colwellet al., “Masked language modeling for proteins via linearly scalable long- context transformers,”arXiv preprint arXiv:2006.03555, 2020

work page arXiv 2006
[51]

Exploring the limits of transfer learning with a unified text-to-text transformer,

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”J. Mach. Learn. Res., vol. 21, no. 140, pp. 1–67, 2020

2020
[52]

Rethinking positional encoding in language pre-training,

G. Ke, D. He, and T. Liu, “Rethinking positional encoding in language pre-training,” inICLR, 2021, pp. 1–14

2021
[53]

Modeling localness for self-attention networks,

B. Yang, Z. Tu, D. F. Wong, F. Meng, L. S. Chao, and T. Zhang, “Modeling localness for self-attention networks,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2018, pp. 4449–4458

2018
[54]

Multi-head attention with disagreement regularization,

J. Li, Z. Tu, B. Yang, M. R. Lyu, and T. Zhang, “Multi-head attention with disagreement regularization,” inEMNLP, 2018, pp. 2897–2903

2018
[55]

Revealing the dark secrets of bert,

O. Kovaleva, A. Romanov, A. Rogers, and A. Rumshisky, “Revealing the dark secrets of bert,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2019, pp. 4365–4374

2019
[56]

Adaptive attention span in transformers,

S. Sukhbaatar, E. Grave, P. Bojanowski, and A. Joulin, “Adaptive attention span in transformers,” inACL, 2019, pp. 331–335

2019
[57]

Multi-scale self- attention for text classification,

Q. Guo, X. Qiu, P. Liu, X. Xue, and Z. Zhang, “Multi-scale self- attention for text classification,” inAAAI, vol. 34, no. 05, 2020, pp. 7847–7854

2020
[58]

Information aggregation for multi-head attention with routing-by-agreement,

J. Li, B. Yang, Z.-Y . Dou, X. Wang, M. R. Lyu, and Z. Tu, “Information aggregation for multi-head attention with routing-by-agreement,” in NAACL. Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 3566–3575

2019
[59]

Improving multi-head attention with capsule networks,

S. Gu and Y . Feng, “Improving multi-head attention with capsule networks,” inProc. CCF Int. Conf. Nat. Lang. Process. Chin. Comput. Springer, 2019, pp. 314–326

2019
[60]

An image is worth16×16words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth16×16words: Transformers for image recognition at scale,” inICLR, 2021, pp. 1–21

2021
[61]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021, pp. 9992–10 002

2021
[62]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis.Springer, 2020, pp. 213–229

2020
[63]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProc. IEEE Int. Conf. Comput. Vis., 2023, pp. 4015–4026

2023
[64]

Graph Attention Networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y . Bengio, “Graph Attention Networks,”ICLR, 2018

2018
[65]

Graph transformer networks,

S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim, “Graph transformer networks,”Adv. Neural Inf. Process. Syst., vol. 32, 2019

2019
[66]

Heterogeneous graph trans- former,

Z. Hu, Y . Dong, K. Wang, and Y . Sun, “Heterogeneous graph trans- former,” inProc. Web Conf., 2020, pp. 2704–2710

2020
[67]

Do transformers really perform badly for graph representation?

C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y . Shen, and T.-Y . Liu, “Do transformers really perform badly for graph representation?” Adv. Neural Inf. Process. Syst., vol. 34, pp. 28 877–28 888, 2021

2021
[68]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997

1997
[69]

Multimodal Learning With Transformers: A Survey ,

P. Xu, X. Zhu, and D. A. Clifton, “ Multimodal Learning With Transformers: A Survey ,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12 113–12 132, October 2023

2023
[70]

Multi-game decision transformers,

K.-H. Lee, O. Nachum, M. Yang, L. Y . Lee, D. Freeman, W. Xu, S. Guadarrama, I. S. Fischer, E. Jang, H. Michalewski, and I. Mordatch, “Multi-game decision transformers,”ArXiv, vol. abs/2205.15241, 2022

work page arXiv 2022
[71]

Q-learning,

C. J. Watkins and P. Dayan, “Q-learning,”Mach. Learn., vol. 8, no. 3, pp. 279–292, 1992

1992
[72]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,”arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[73]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[74]

Monotonic value function factorisation for deep multi- agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi- agent reinforcement learning,”J. Mach. Learn. Res., vol. 21, no. 178, pp. 1–51, 2020

2020
[75]

Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems,

M. Gallici, M. Martin, and I. Masmitja, “Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems,”arXiv preprint arXiv:2301.05334, 2023

work page arXiv 2023
[76]

A transformer-based thermal surrogate model for cooling control in data centers,

H. Zhou, N. Mu, and Q.-S. Jia, “A transformer-based thermal surrogate model for cooling control in data centers,”IEEE Robot. Autom. Lett., vol. 10, no. 1, pp. 644–651, 2025

2025
[77]

Trandrl: A transformer-driven deep reinforcement learning enabled prescriptive maintenance framework,

Y . Zhao, J. Yang, W. Wang, H. Yang, and D. Niyato, “Trandrl: A transformer-driven deep reinforcement learning enabled prescriptive maintenance framework,”IEEE Internet Things J., vol. 11, no. 21, pp. 35 432–35 444, 2024

2024
[78]

A deep reinforcement learning with transformer integration for directed acyclic graph scheduling in edge networks,

X. Song, J. Feng, L. Liu, Q. Pei, F. R. Yu, and N. Zhang, “A deep reinforcement learning with transformer integration for directed acyclic graph scheduling in edge networks,”IEEE Trans. Wireless Commun, vol. 25, pp. 5506–5520, 2026

2026
[79]

Robust downlink data transmission in leo satellite-terrestrial networks: A rate- splitting multiple access approach,

X. Zhang, X. Qin, Y . Wang, Y . Xu, H. Zhou, and W. Zhuang, “Robust downlink data transmission in leo satellite-terrestrial networks: A rate- splitting multiple access approach,”IEEE Internet Things J., vol. 12, no. 14, pp. 27 364–27 378, 2025

2025
[80]

Learning-based task-centric multi-user semantic communication solu- tion for vehicle networks,

Y . Yuan, J. Zhang, X. Xu, B. Wang, S. Han, M. Sun, and P. Zhang, “Learning-based task-centric multi-user semantic communication solu- tion for vehicle networks,”IEEE Trans. Veh. Technol., vol. 74, no. 6, pp. 9328–9342, 2025

2025

Showing first 80 references.

[1] [1]

Deep reinforcement learning for autonomous driving: A survey,

B. R. Kiran, I. Sobh, V . Talpaert, P. Mannion, A. A. Al Sallab, S. Yo- gamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,”IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909–4926, 2021

2021

[2] [2]

Rein- forcement learning for mobile robotics exploration: A survey,

L. C. Garaffa, M. Basso, A. A. Konzen, and E. P. de Freitas, “Rein- forcement learning for mobile robotics exploration: A survey,”IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 3796–3810, 2021

2021

[3] [3]

Reinforcement learning based recommender systems: A survey,

M. M. Afsar, T. Crump, and B. Far, “Reinforcement learning based recommender systems: A survey,”ACM Comput. Surv., vol. 55, no. 7, pp. 1–38, 2022

2022

[4] [4]

Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,

A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi, “Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,”arXiv preprint arXiv:2106.00574, 2021

work page arXiv 2021

[5] [5]

Multi-agent deep reinforcement learning-based task scheduling and resource sharing for o-ran-empowered multi-uav-assisted wireless sensor networks,

M. L. Betalo, S. Leng, H. N. Abishu, F. A. Dharejo, A. M. Seid, A. Erbad, R. A. Naqvi, L. Zhou, and M. Guizani, “Multi-agent deep reinforcement learning-based task scheduling and resource sharing for o-ran-empowered multi-uav-assisted wireless sensor networks,”IEEE Trans. Veh. Technol., vol. 73, no. 7, pp. 9247–9261, 2023

2023

[6] [6]

Toward autonomous multi-uav wireless network: A survey of reinforcement learning-based approaches,

Y . Bai, H. Zhao, X. Zhang, Z. Chang, R. Jäntti, and K. Yang, “Toward autonomous multi-uav wireless network: A survey of reinforcement learning-based approaches,”IEEE Commun. Surveys Tuts., vol. 25, no. 4, pp. 3038–3067, 2023

2023

[7] [7]

Applications of deep reinforcement learning in communications and networking: A survey,

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y .-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,”IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3133–3174, 2019

2019

[8] [8]

Transformers in reinforcement learning: a survey,

P. Agarwal, A. A. Rahman, P.-L. St-Charles, S. J. Prince, and S. E. Kahou, “Transformers in reinforcement learning: a survey,”arXiv preprint arXiv:2307.05979, 2023

work page arXiv 2023

[9] [9]

Transdreamer: Rein- forcement learning with transformer world models,

C. Chen, Y .-F. Wu, J. Yoon, and S. Ahn, “Transdreamer: Rein- forcement learning with transformer world models,”arXiv preprint arXiv:2202.09481, 2022

work page arXiv 2022

[10] [10]

Deep transformer q-networks for partially observable reinforcement learning,

K. Esslinger, R. Platt, and C. Amato, “Deep transformer q-networks for partially observable reinforcement learning,”arXiv preprint arXiv:2206.01078, 2022

work page arXiv 2022

[11] [11]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Adv. Neural Inf. Process. Syst., vol. 30, 2017

2017

[12] [12]

Autonomous link control in digital twin aided mobile network: From virtual channel generation to intelligent power allocation,

C. Che, G. Liang, K. Zheng, L. Xiang, J. Hu, K. Yang, Q. H. Abbasi, J. Cooper, and M. A. Imran, “Autonomous link control in digital twin aided mobile network: From virtual channel generation to intelligent power allocation,”IEEE Internet Things J., vol. 12, no. 19, pp. 39 745– 39 761, 2025

2025

[13] [13]

Dsaf-former: Drl based sub-channel assignment framework using transformer in mmwave iabn,

Z. Ma, Z. Liu, G. Han, J. Li, T. Li, and Q. Guo, “Dsaf-former: Drl based sub-channel assignment framework using transformer in mmwave iabn,”IEEE Internet Things J., vol. 12, no. 19, pp. 40 576– 40 591, 2025

2025

[14] [14]

Tpto: A transformer-ppo based task offloading solution for edge computing environments,

N. Gholipour, M. D. de Assuncao, P. Agarwal, J. Gascon-Samson, and R. Buyya, “Tpto: A transformer-ppo based task offloading solution for edge computing environments,” inIEEE 29th ICPADS, 2023, pp. 1115–1122

2023

[15] [15]

Transformer- based distributed task offloading and resource management in cloud- edge computing networks,

M. Han, X. Sun, X. Wang, W. Zhan, and X. Chen, “Transformer- based distributed task offloading and resource management in cloud- edge computing networks,”IEEE J. Sel. Areas. Commun., vol. 43, no. 9, pp. 2938–2953, 2025

2025

[16] [16]

From perception to action: Transformer-enhanced deep reinforcement learning for autonomous robot navigation,

B. Abdelkader, N. Emira, and E. Nadjib, “From perception to action: Transformer-enhanced deep reinforcement learning for autonomous robot navigation,” inIEEE 7th PAIS, 2025, pp. 1–6

2025

[17] [17]

Transformer based collaborative reinforcement learning for fluid antenna system (fas)-enabled 3d uav positioning,

X. Xu, H. Xu, D. Wei, W. Saad, M. Bennis, and M. Chen, “Transformer based collaborative reinforcement learning for fluid antenna system (fas)-enabled 3d uav positioning,”IEEE J. Sel. Areas. Commun., vol. 44, pp. 1128–1143, 2026

2026

[18] [18]

Anti-jamming task schedul- ing in mec-o-ran with hierarchical drl and transformer-based control,

G. Asemian, M. Amini, and B. Kantarci, “Anti-jamming task schedul- ing in mec-o-ran with hierarchical drl and transformer-based control,” IEEE Internet Things J., vol. 13, no. 4, pp. 7714–7729, 2026

2026

[19] [19]

Radar: Robust drl-based resource allocation against adversarial attacks in intelligent o-ran,

Y . A. Ergu and V .-L. Nguyen, “Radar: Robust drl-based resource allocation against adversarial attacks in intelligent o-ran,”IEEE Trans. Green Commun. Netw., vol. 9, no. 4, pp. 2305–2318, 2025

2025

[20] [20]

Enhancing iot intelligence: A transformer-based reinforcement learn- ing methodology,

G. Rjoub, S. Islam, J. Bentahar, M. A. Almaiah, and R. Alrawashdeh, “Enhancing iot intelligence: A transformer-based reinforcement learn- ing methodology,” inIEEE IWCMC, 2024, pp. 1418–1423

2024

[21] [21]

A comparison of neural networks for wireless channel prediction,

O. Stenhammar, G. Fodor, and C. Fischione, “A comparison of neural networks for wireless channel prediction,”IEEE Wirel. Commun., vol. 31, no. 3, pp. 235–241, 2024

2024

[22] [22]

Machine learning for future wire- less communications: Channel prediction perspectives,

H. Kim, J. Choi, and D. J. Love, “Machine learning for future wire- less communications: Channel prediction perspectives,”arXiv preprint arXiv:2502.18196, 2025

work page arXiv 2025

[23] [23]

Generative ai for deep reinforcement learning: Framework, analysis, and use cases,

G. Sun, W. Xie, D. Niyato, F. Mei, J. Kang, H. Du, and S. Mao, “Generative ai for deep reinforcement learning: Framework, analysis, and use cases,”IEEE Wirel. Commun., vol. 32, no. 3, pp. 186–195, 2025

2025

[24] [24]

Dueling network architectures for deep reinforcement learning,

Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in ICML, 2016, pp. 1995–2003

2016

[25] [25]

On transforming reinforcement learning with transformers: The development trajectory,

S. Hu, L. Shen, Y . Zhang, Y . Chen, and D. Tao, “On transforming reinforcement learning with transformers: The development trajectory,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 8580–8599, 2024

2024

[26] [26]

Mastering atari, go, chess and shogi by planning with a learned model,

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepelet al., “Mastering atari, go, chess and shogi by planning with a learned model,”Nature, vol. 588, no. 7839, pp. 604–609, 2020

2020

[27] [27]

Mastering Atari with Discrete World Models

D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,”arXiv preprint arXiv:2010.02193, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[28] [28]

Measuring sample efficiency and generalization in reinforce- ment learning benchmarks: Neurips 2020 procgen benchmark,

S. Mohanty, J. Poonganam, A. Gaidon, A. Kolobov, B. Wulfe, D. Chakraborty, G. Šemetulskis, J. Schapke, J. Kubilius, J. Pašukonis et al., “Measuring sample efficiency and generalization in reinforce- ment learning benchmarks: Neurips 2020 procgen benchmark,”arXiv preprint arXiv:2103.15332, 2021

work page arXiv 2020

[29] [29]

A survey on trans- formers in reinforcement learning,

W. Li, H. Luo, Z. Lin, C. Zhang, Z. Lu, and D. Ye, “A survey on trans- formers in reinforcement learning,”arXiv preprint arXiv:2301.03044, 2023

work page arXiv 2023

[30] [30]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,”arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[31] [31]

R. S. Sutton, A. G. Bartoet al.,Reinforcement learning: An introduc- tion. MIT press Cambridge, 1998, vol. 1, no. 1

1998

[32] [32]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005

[33] [33]

Diaformer: Automatic diagnosis via symptoms sequence generation,

J. Chen, D. Li, Q. Chen, W. Zhou, and X. Liu, “Diaformer: Automatic diagnosis via symptoms sequence generation,” inAAAI, vol. 36, no. 4, 2022, pp. 4432–4440

2022

[34] [34]

Addressing optimism bias in sequence modeling for reinforcement learning,

A. R. Villaflor, Z. Huang, S. Pande, J. M. Dolan, and J. Schneider, “Addressing optimism bias in sequence modeling for reinforcement learning,” inICML, 2022, pp. 22 270–22 283

2022

[35] [35]

Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data,

A. Mandlekar, F. Ramos, B. Boots, S. Savarese, L. Fei-Fei, A. Garg, and D. Fox, “Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data,” inIEEE In. Conf. Robot. Autom., 2020, pp. 4414–4420. 27

2020

[36] [36]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

2016

[37] [37]

Layer Normalization

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,”arXiv preprint arXiv:1607.06450, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[38] [38]

Deep Learning using Rectified Linear Units (ReLU)

A. F. Agarap, “Deep learning using rectified linear units (relu),”ArXiv, vol. abs/1803.08375, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [39]

Gaussian error linear units (gelus),

D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv: Learning, 2016

2016

[40] [40]

A survey of transformers,

T. Lin, Y . Wang, X. Liu, and X. Qiu, “A survey of transformers,”AI open, vol. 3, pp. 111–132, 2022

2022

[41] [41]

Generating Long Sequences with Sparse Transformers

R. Child, S. Gray, A. Radford, and I. Sutskever, “Generating long sequences with sparse transformers,”arXiv preprint arXiv:1904.10509, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[42] [42]

Transformers are rnns: Fast autoregressive transformers with linear attention,

A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in ICLR, 2020, pp. 5156–5165

2020

[43] [43]

Rethinking attention with performers,

K. M. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Q. Davis, A. Mohiuddin, L. Kaiser, D. B. Belanger, L. J. Colwell, and A. Weller, “Rethinking attention with performers,” inICLR, 2021, pp. 1–14

2021

[44] [44]

Linear transformers are secretly fast weight programmers,

I. Schlag, K. Irie, and J. Schmidhuber, “Linear transformers are secretly fast weight programmers,” inICML, 2021, pp. 9355–9366

2021

[45] [45]

Generating wikipedia by summarizing long sequences,

P. J. Liu*, M. Saleh*, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer, “Generating wikipedia by summarizing long sequences,” inICLR, 2018, pp. 1–18

2018

[46] [46]

Fast transformers with clustered attention,

A. Vyas, A. Katharopoulos, and F. Fleuret, “Fast transformers with clustered attention,”Adv. Neural Inf. Process. Syst., vol. 33, pp. 21 665– 21 674, 2020

2020

[47] [47]

Poolingformer: Long document modeling with pooling attention,

H. Zhang, Y . Gong, Y . Shen, W. Li, J. Lv, N. Duan, and W. Chen, “Poolingformer: Long document modeling with pooling attention,” in ICML, 2021, pp. 12 437–12 446

2021

[48] [48]

Compressed self-attention for deep metric learning with low-rank approximation,

Z. Chen, M. Gong, L. Ge, and B. Du, “Compressed self-attention for deep metric learning with low-rank approximation,” inIJCAI, 2021, pp. 2058–2064

2021

[49] [49]

Nyströmformer: A nyström-based algorithm for approximat- ing self-attention,

Y . Xiong, Z. Zeng, R. Chakraborty, M. Tan, G. Fung, Y . Li, and V . Singh, “Nyströmformer: A nyström-based algorithm for approximat- ing self-attention,” inAAAI, vol. 35, no. 16, 2021, pp. 14 138–14 148

2021

[50] [50]

Masked language modeling for proteins via linearly scalable long- context transformers,

K. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, D. Belanger, L. Colwellet al., “Masked language modeling for proteins via linearly scalable long- context transformers,”arXiv preprint arXiv:2006.03555, 2020

work page arXiv 2006

[51] [51]

Exploring the limits of transfer learning with a unified text-to-text transformer,

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”J. Mach. Learn. Res., vol. 21, no. 140, pp. 1–67, 2020

2020

[52] [52]

Rethinking positional encoding in language pre-training,

G. Ke, D. He, and T. Liu, “Rethinking positional encoding in language pre-training,” inICLR, 2021, pp. 1–14

2021

[53] [53]

Modeling localness for self-attention networks,

B. Yang, Z. Tu, D. F. Wong, F. Meng, L. S. Chao, and T. Zhang, “Modeling localness for self-attention networks,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2018, pp. 4449–4458

2018

[54] [54]

Multi-head attention with disagreement regularization,

J. Li, Z. Tu, B. Yang, M. R. Lyu, and T. Zhang, “Multi-head attention with disagreement regularization,” inEMNLP, 2018, pp. 2897–2903

2018

[55] [55]

Revealing the dark secrets of bert,

O. Kovaleva, A. Romanov, A. Rogers, and A. Rumshisky, “Revealing the dark secrets of bert,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2019, pp. 4365–4374

2019

[56] [56]

Adaptive attention span in transformers,

S. Sukhbaatar, E. Grave, P. Bojanowski, and A. Joulin, “Adaptive attention span in transformers,” inACL, 2019, pp. 331–335

2019

[57] [57]

Multi-scale self- attention for text classification,

Q. Guo, X. Qiu, P. Liu, X. Xue, and Z. Zhang, “Multi-scale self- attention for text classification,” inAAAI, vol. 34, no. 05, 2020, pp. 7847–7854

2020

[58] [58]

Information aggregation for multi-head attention with routing-by-agreement,

J. Li, B. Yang, Z.-Y . Dou, X. Wang, M. R. Lyu, and Z. Tu, “Information aggregation for multi-head attention with routing-by-agreement,” in NAACL. Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 3566–3575

2019

[59] [59]

Improving multi-head attention with capsule networks,

S. Gu and Y . Feng, “Improving multi-head attention with capsule networks,” inProc. CCF Int. Conf. Nat. Lang. Process. Chin. Comput. Springer, 2019, pp. 314–326

2019

[60] [60]

An image is worth16×16words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth16×16words: Transformers for image recognition at scale,” inICLR, 2021, pp. 1–21

2021

[61] [61]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021, pp. 9992–10 002

2021

[62] [62]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis.Springer, 2020, pp. 213–229

2020

[63] [63]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProc. IEEE Int. Conf. Comput. Vis., 2023, pp. 4015–4026

2023

[64] [64]

Graph Attention Networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y . Bengio, “Graph Attention Networks,”ICLR, 2018

2018

[65] [65]

Graph transformer networks,

S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim, “Graph transformer networks,”Adv. Neural Inf. Process. Syst., vol. 32, 2019

2019

[66] [66]

Heterogeneous graph trans- former,

Z. Hu, Y . Dong, K. Wang, and Y . Sun, “Heterogeneous graph trans- former,” inProc. Web Conf., 2020, pp. 2704–2710

2020

[67] [67]

Do transformers really perform badly for graph representation?

C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y . Shen, and T.-Y . Liu, “Do transformers really perform badly for graph representation?” Adv. Neural Inf. Process. Syst., vol. 34, pp. 28 877–28 888, 2021

2021

[68] [68]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997

1997

[69] [69]

Multimodal Learning With Transformers: A Survey ,

P. Xu, X. Zhu, and D. A. Clifton, “ Multimodal Learning With Transformers: A Survey ,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12 113–12 132, October 2023

2023

[70] [70]

Multi-game decision transformers,

K.-H. Lee, O. Nachum, M. Yang, L. Y . Lee, D. Freeman, W. Xu, S. Guadarrama, I. S. Fischer, E. Jang, H. Michalewski, and I. Mordatch, “Multi-game decision transformers,”ArXiv, vol. abs/2205.15241, 2022

work page arXiv 2022

[71] [71]

Q-learning,

C. J. Watkins and P. Dayan, “Q-learning,”Mach. Learn., vol. 8, no. 3, pp. 279–292, 1992

1992

[72] [72]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,”arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[73] [73]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[74] [74]

Monotonic value function factorisation for deep multi- agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi- agent reinforcement learning,”J. Mach. Learn. Res., vol. 21, no. 178, pp. 1–51, 2020

2020

[75] [75]

Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems,

M. Gallici, M. Martin, and I. Masmitja, “Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems,”arXiv preprint arXiv:2301.05334, 2023

work page arXiv 2023

[76] [76]

A transformer-based thermal surrogate model for cooling control in data centers,

H. Zhou, N. Mu, and Q.-S. Jia, “A transformer-based thermal surrogate model for cooling control in data centers,”IEEE Robot. Autom. Lett., vol. 10, no. 1, pp. 644–651, 2025

2025

[77] [77]

Trandrl: A transformer-driven deep reinforcement learning enabled prescriptive maintenance framework,

Y . Zhao, J. Yang, W. Wang, H. Yang, and D. Niyato, “Trandrl: A transformer-driven deep reinforcement learning enabled prescriptive maintenance framework,”IEEE Internet Things J., vol. 11, no. 21, pp. 35 432–35 444, 2024

2024

[78] [78]

A deep reinforcement learning with transformer integration for directed acyclic graph scheduling in edge networks,

X. Song, J. Feng, L. Liu, Q. Pei, F. R. Yu, and N. Zhang, “A deep reinforcement learning with transformer integration for directed acyclic graph scheduling in edge networks,”IEEE Trans. Wireless Commun, vol. 25, pp. 5506–5520, 2026

2026

[79] [79]

Robust downlink data transmission in leo satellite-terrestrial networks: A rate- splitting multiple access approach,

X. Zhang, X. Qin, Y . Wang, Y . Xu, H. Zhou, and W. Zhuang, “Robust downlink data transmission in leo satellite-terrestrial networks: A rate- splitting multiple access approach,”IEEE Internet Things J., vol. 12, no. 14, pp. 27 364–27 378, 2025

2025

[80] [80]

Learning-based task-centric multi-user semantic communication solu- tion for vehicle networks,

Y . Yuan, J. Zhang, X. Xu, B. Wang, S. Han, M. Sun, and P. Zhang, “Learning-based task-centric multi-user semantic communication solu- tion for vehicle networks,”IEEE Trans. Veh. Technol., vol. 74, no. 6, pp. 9328–9342, 2025

2025