Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors

Hieu Le; Jian Tao; Mostafa Ibrahim; Oguz Bedir; Sabit Ekin

arxiv: 2604.05165 · v2 · submitted 2026-04-06 · 💻 cs.AI · eess.SP

Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors

Hieu Le , Mostafa Ibrahim , Oguz Bedir , Jian Tao , Sabit Ekin This is my paper

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.AI eess.SP

keywords Reconfigurable Intelligent SurfacesMulti-Agent Reinforcement LearningCSI-freeHierarchical MARLmmWave networksBeam focusingLocalization dataRay-tracing evaluation

0 comments

The pith

A hierarchical multi-agent RL system controls reconfigurable reflectors using only user locations instead of channel estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to address the high computational cost and pilot overhead of CSI estimation plus centralized optimization in large RIS deployments for mmWave networks. It replaces channel estimates with user localization data inside a two-tier HMARL controller: a high-level layer makes discrete, temporally extended user-to-reflector assignments, while low-level MAPPO agents optimize continuous focal points under CTDE training. This split allows decentralized execution at run time while still coordinating through central training. Sympathetic readers would care because the approach removes the main practical barriers to scaling smart radio environments. Ray-tracing tests show the method delivers up to 7.79 dB RSSI gains over centralized baselines and stays effective even with sub-meter localization noise.

Core claim

By substituting pilot-based channel estimation with accessible user localization data, the CSI-free hierarchical MARL architecture decomposes the reflector control problem into high-level discrete user-to-reflector allocations and low-level continuous focal-point optimization via MAPPO under a CTDE scheme, producing up to 7.79 dB RSSI improvements over centralized baselines in deterministic ray-tracing evaluations while remaining resilient to sub-meter localization errors.

What carries the argument

The two-tier neural architecture with a high-level controller for temporally extended discrete allocations and low-level MAPPO controllers for continuous focal points, trained centrally but executed decentrally.

Load-bearing premise

That user localization data alone supplies enough spatial information to manage macro-scale wave propagation and achieve reliable beam focusing without any pilot-based channel measurements.

What would settle it

A controlled ray-tracing or measurement campaign in which user positions are known to sub-meter accuracy yet the hierarchical system produces lower RSSI than a CSI-based centralized optimizer under identical environmental conditions.

Figures

Figures reproduced from arXiv: 2604.05165 by Hieu Le, Jian Tao, Mostafa Ibrahim, Oguz Bedir, Sabit Ekin.

**Figure 1.** Figure 1: Reflector Design. l, the mechanical orientation of tile (i, j) at position ri,j is deterministically governed by its normal vector: ⃗ni,j (fl) = 1 2 fl − ri,j ∥fl − ri,j∥2 + s − ri,j ∥s − ri,j∥2 . (1) This geometric formulation allows us to derive the necessary elevation θi,j and azimuth ϕi,j angles without requiring instantaneous electromagnetic CSI. To manage the massive combinatorial complexity of … view at source ↗

**Figure 2.** Figure 2: Hierarchical Multi-Agent Reinforcement Learning Architecture. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental setup of the conference room simulation environment. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Performance evaluation for the 4-user configuration. (a) Episode [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: RSSI performance under varying degrees of user localization [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Reconfigurable Intelligent Surfaces (RIS) has a potential to engineer smart radio environments for next-generation millimeter-wave (mmWave) networks. However, the prohibitive computational overhead of Channel State Information (CSI) estimation and the dimensionality explosion inherent in centralized optimization severely hinder practical large-scale deployments. To overcome these bottlenecks, we introduce a ``CSI-free" paradigm powered by a Hierarchical Multi-Agent Reinforcement Learning (HMARL) architecture to control mechanically reconfigurable reflective surfaces. By substituting pilot-based channel estimation with accessible user localization data, our framework leverages spatial intelligence for macro-scale wave propagation management. The control problem is decomposed into a two-tier neural architecture: a high-level controller executes temporally extended, discrete user-to-reflector allocations, while low-level controllers autonomously optimize continuous focal points utilizing Multi-Agent Proximal Policy Optimization (MAPPO) under a Centralized Training with Decentralized Execution (CTDE) scheme. Comprehensive deterministic ray-tracing evaluations demonstrate that this hierarchical framework achieves massive RSSI improvements of up to 7.79 dB over centralized baselines. Furthermore, the system exhibits robust multi-user scalability and maintains highly resilient beam-focusing performance under practical sub-meter localization tracking errors. By eliminating CSI overhead while maintaining high-fidelity signal redirection, this work establishes a scalable and cost-effective blueprint for intelligent wireless environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a workable CSI-free hierarchical MARL split for RIS beam control that gets decent RSSI gains in ray-tracing, but the headline 7.79 dB number only holds if the centralized baselines were run under the exact same localization-only constraints.

read the letter

The punchline is that this work replaces CSI estimation with user location data and splits the RIS control problem into a high-level discrete allocator and low-level MAPPO agents under CTDE. That decomposition is the actual new piece for mechanically reconfigurable surfaces, and the simulations indicate it scales to multiple users while staying robust to sub-meter localization noise. The ray-tracing results are deterministic and reproducible in principle, which is a plus for this kind of applied work. They also report concrete RSSI lifts and show the system avoids the usual dimensionality blow-up of centralized optimization. That part is useful for anyone thinking about 5G/6G deployment constraints. The soft spot is the baseline comparison. The 7.79 dB figure only isolates the benefit of the two-tier structure if the centralized methods were also limited to position data and given comparable compute; the abstract does not spell this out, so the delta could partly reflect easier problem formulations rather than superior wave management. No error bars or run counts appear in the summary either, which makes it harder to judge how stable the gains are. The paper is aimed at wireless systems researchers who already know RIS and RL basics. It is worth sending to peer review because the architecture is concrete, the motivation is practical, and the simulation setup is falsifiable. Reviewers should focus on the baseline information parity and any additional real-world channel effects not captured in the ray tracer.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a CSI-free hierarchical multi-agent reinforcement learning (HMARL) framework for controlling mechanically reconfigurable intelligent surfaces (RIS) in mmWave networks. It substitutes pilot-based CSI estimation with user localization data, decomposes control into a high-level discrete user-to-reflector allocator and low-level continuous focal-point optimizers trained with MAPPO under CTDE, and reports up to 7.79 dB RSSI gains over centralized baselines in deterministic ray-tracing simulations, along with robustness to sub-meter localization errors and multi-user scalability.

Significance. If the reported gains are shown to arise from the hierarchical decomposition under identical information constraints, the work would provide a concrete, scalable alternative to CSI-dependent RIS control, addressing both estimation overhead and dimensionality issues in large deployments. The localization-based paradigm and two-tier architecture represent a practical step toward intelligent radio environments.

major comments (2)

[Abstract] Abstract: The central claim of 'massive RSSI improvements of up to 7.79 dB over centralized baselines' is load-bearing for the paper's contribution, yet the abstract (and presumably the evaluation section) provides no information on whether the centralized baselines receive equivalent localization data only, or instead assume perfect CSI or employ non-RL methods that do not face the same dimensionality constraints the paper criticizes. Without this, the numerical delta cannot be attributed specifically to the HMARL decomposition rather than unequal problem difficulty.
[Evaluation] Evaluation/results: The manuscript must supply concrete details on baseline implementations, exact ray-tracing setups, statistical significance of the 7.79 dB figure, error bars or variance across runs, and whether baselines were given the same optimization budget and localization inputs. These elements are required to substantiate the performance claims and the assertion of 'robust multi-user scalability.'

minor comments (2)

[Abstract] The abstract could more precisely state the number of reflectors, users, and frequency bands used in the ray-tracing experiments to allow immediate assessment of scale.
[Method] Notation for the high-level and low-level policies (e.g., how the discrete allocation output interfaces with the continuous focal-point actions) should be introduced earlier and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications on the baseline comparisons and commitments to expand the evaluation details in the revised manuscript.

read point-by-point responses

Referee: [Abstract] The central claim of 'massive RSSI improvements of up to 7.79 dB over centralized baselines' is load-bearing for the paper's contribution, yet the abstract (and presumably the evaluation section) provides no information on whether the centralized baselines receive equivalent localization data only, or instead assume perfect CSI or employ non-RL methods that do not face the same dimensionality constraints the paper criticizes. Without this, the numerical delta cannot be attributed specifically to the HMARL decomposition rather than unequal problem difficulty.

Authors: We appreciate the referee highlighting the need for explicit clarification. All methods in our evaluations, including the centralized baselines, operate strictly under the CSI-free paradigm and receive only user localization data as input; none assume perfect CSI. The centralized baselines are implemented as non-hierarchical RL agents (standard MAPPO without the high-level discrete allocator) that directly optimize reflector parameters using identical localization inputs and face the same dimensionality issues. We will revise both the abstract and evaluation section to state explicitly that all comparisons use equivalent information constraints, allowing the reported gains to be attributed to the hierarchical decomposition. revision: yes
Referee: [Evaluation] The manuscript must supply concrete details on baseline implementations, exact ray-tracing setups, statistical significance of the 7.79 dB figure, error bars or variance across runs, and whether baselines were given the same optimization budget and localization inputs. These elements are required to substantiate the performance claims and the assertion of 'robust multi-user scalability.'

Authors: We agree that these details are essential for rigor. In the revised manuscript we will expand the evaluation section to include: precise descriptions of baseline architectures and training procedures; full ray-tracing parameters (environment geometry, material properties, and propagation settings); statistical analysis with mean RSSI gains, standard deviations across independent runs, error bars on figures, and significance testing for the 7.79 dB result; and explicit confirmation that every method received identical localization inputs and the same optimization budget. These additions will directly support the scalability and robustness claims. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper applies established RL methods (MAPPO under CTDE) to RIS beam control using localization data in place of CSI. Performance is reported via external deterministic ray-tracing benchmarks against centralized baselines. No equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations reduce any claim to its own inputs by construction. The hierarchical decomposition and RSSI gains are presented as empirical outcomes, not tautological derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claim depends on domain assumption that localization data suffices for wave management and on standard RL training assumptions; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption User localization data provides sufficient spatial intelligence to substitute for CSI in macro-scale wave propagation management
Explicitly stated as the core substitution enabling the CSI-free paradigm.
domain assumption Deterministic ray-tracing simulations accurately model real-world RIS performance and multi-user interactions
Underpins all reported RSSI improvements and robustness claims.

pith-pipeline@v0.9.0 · 5537 in / 1227 out tokens · 26265 ms · 2026-05-10T18:52:59.345173+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,

M. Di Renzo, A. Zappone, M. Debbah, M.-S. Alouini, C. Yuen, J. de Rosny, and S. Tretyakov, “Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 11, pp. 2450–2525, 2020

work page 2020
[2]

Electromagnetic Metasurfaces and Reconfigurable Metasurfaces: A Review,

S. Zahra, L. Ma, W. Wang, J. Li, D. Chen, Y . Liu, Y . Zhou, N. Li, Y . Huang, and G. Wen, “Electromagnetic Metasurfaces and Reconfigurable Metasurfaces: A Review,”Frontiers in Physics, vol. 8,

work page
[3]

Available: https://www.frontiersin.org/articles/10.3389 /fphy.2020.593411

[Online]. Available: https://www.frontiersin.org/articles/10.3389 /fphy.2020.593411

work page arXiv 2020
[4]

Exploring Reconfigurable intelligent surfaces for 6G: State-of-the-art and the road ahead,

S. Basharat, M. Khan, M. Iqbal, U. S. Hashmi, S. A. R. Zaidi, and I. Robertson, “Exploring Reconfigurable intelligent surfaces for 6G: State-of-the-art and the road ahead,”IET Communications, vol. 16, no. 13, pp. 1458–1474, 2022. [Online]. Available: https: //ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/cmu2.12364

work page doi:10.1049/cmu2.12364 2022
[5]

Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,

C. Hu, L. Dai, S. Han, and X. Wang, “Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,”IEEE Transactions on Communications, vol. 69, no. 11, pp. 7736–7747, 2021

work page 2021
[6]

Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,

C. Huang, R. Mo, and C. Yuen, “Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1839–1850, 2020

work page 2020
[7]

Deep Reinforce- ment Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation,

A. Taha, Y . Zhang, F. B. Mismar, and A. Alkhateeb, “Deep Reinforce- ment Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation,” in2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2020, pp. 1–5

work page 2020
[8]

Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning,

A. Taha, M. Alrabeiah, and A. Alkhateeb, “Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning,”IEEE Access, vol. 9, pp. 44 304–44 321, 2021

work page 2021
[9]

A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces,

H. Choi, L. V . Nguyen, J. Choi, and A. L. Swindlehurst, “A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces,” in2024 IEEE International Conference on Com- munications Workshops (ICC Workshops), 2024, pp. 208–213

work page 2024
[10]

A Deep Learning Based Modeling of Reconfigurable Intelligent Surface As- sisted Wireless Communications for Phase Shift Configuration,

B. Sheen, J. Yang, X. Feng, and M. M. U. Chowdhury, “A Deep Learning Based Modeling of Reconfigurable Intelligent Surface As- sisted Wireless Communications for Phase Shift Configuration,”IEEE Open Journal of the Communications Society, vol. 2, pp. 262–272, 2021

work page 2021
[11]

Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,” in2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall), 2024, pp. 1–7

work page 2024
[12]

Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,

W. Khawaja, O. Ozdemir, Y . Yapici, F. Erden, and I. Guvenc, “Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,” IEEE Open Journal of the Communications Society, vol. 1, pp. 263–281, 2020

work page 2020
[13]

Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,” IEEE Transactions on Machine Learning in Communications and Net- working, vol. 4, pp. 265–281, 2026

work page 2026
[14]

Hierarchical Multi- Agent Reinforcement Learning,

R. Makar, S. Mahadevan, and M. Ghavamzadeh, “Hierarchical Multi- Agent Reinforcement Learning,” inProceedings of the Fifth Interna- tional Conference on Autonomous Agents, 2001, pp. 246–253

work page 2001
[15]

The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 611–24 624, 2022

work page 2022
[16]

Multi-agent reinforcement learning as a rehearsal for decentralized planning,

L. Kraemer and B. Banerjee, “Multi-agent reinforcement learning as a rehearsal for decentralized planning,”Neurocomputing, vol. 190, pp. 82–94, 2016

work page 2016
[17]

Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,

A. Nasari, H. Le, R. Lawrence, Z. He, X. Yang, M. Krell, A. Tsyplikhin, M. Tatineni, T. Cockerill, L. Perez, D. Chakravorty, and H. Liu, “Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,” inPractice and Experience in Advanced Research Computing 2022: Revolution...

work page doi:10.1145/3491418.3530772 2022
[18]

Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,

H. Le, Z. He, M. Le, D. Chakravorty, L. M. Perez, A. Chilumuru, Y . Yao, and J. Chen, “Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,” inPractice and Experience in Advanced Research Computing 2024: Human Powered Computing, ser. PEARC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Ava...

work page doi:10.1145/3626203.3670527 2024

[1] [1]

Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,

M. Di Renzo, A. Zappone, M. Debbah, M.-S. Alouini, C. Yuen, J. de Rosny, and S. Tretyakov, “Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 11, pp. 2450–2525, 2020

work page 2020

[2] [2]

Electromagnetic Metasurfaces and Reconfigurable Metasurfaces: A Review,

S. Zahra, L. Ma, W. Wang, J. Li, D. Chen, Y . Liu, Y . Zhou, N. Li, Y . Huang, and G. Wen, “Electromagnetic Metasurfaces and Reconfigurable Metasurfaces: A Review,”Frontiers in Physics, vol. 8,

work page

[3] [3]

Available: https://www.frontiersin.org/articles/10.3389 /fphy.2020.593411

[Online]. Available: https://www.frontiersin.org/articles/10.3389 /fphy.2020.593411

work page arXiv 2020

[4] [4]

Exploring Reconfigurable intelligent surfaces for 6G: State-of-the-art and the road ahead,

S. Basharat, M. Khan, M. Iqbal, U. S. Hashmi, S. A. R. Zaidi, and I. Robertson, “Exploring Reconfigurable intelligent surfaces for 6G: State-of-the-art and the road ahead,”IET Communications, vol. 16, no. 13, pp. 1458–1474, 2022. [Online]. Available: https: //ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/cmu2.12364

work page doi:10.1049/cmu2.12364 2022

[5] [5]

Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,

C. Hu, L. Dai, S. Han, and X. Wang, “Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,”IEEE Transactions on Communications, vol. 69, no. 11, pp. 7736–7747, 2021

work page 2021

[6] [6]

Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,

C. Huang, R. Mo, and C. Yuen, “Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1839–1850, 2020

work page 2020

[7] [7]

Deep Reinforce- ment Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation,

A. Taha, Y . Zhang, F. B. Mismar, and A. Alkhateeb, “Deep Reinforce- ment Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation,” in2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2020, pp. 1–5

work page 2020

[8] [8]

Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning,

A. Taha, M. Alrabeiah, and A. Alkhateeb, “Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning,”IEEE Access, vol. 9, pp. 44 304–44 321, 2021

work page 2021

[9] [9]

A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces,

H. Choi, L. V . Nguyen, J. Choi, and A. L. Swindlehurst, “A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces,” in2024 IEEE International Conference on Com- munications Workshops (ICC Workshops), 2024, pp. 208–213

work page 2024

[10] [10]

A Deep Learning Based Modeling of Reconfigurable Intelligent Surface As- sisted Wireless Communications for Phase Shift Configuration,

B. Sheen, J. Yang, X. Feng, and M. M. U. Chowdhury, “A Deep Learning Based Modeling of Reconfigurable Intelligent Surface As- sisted Wireless Communications for Phase Shift Configuration,”IEEE Open Journal of the Communications Society, vol. 2, pp. 262–272, 2021

work page 2021

[11] [11]

Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,” in2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall), 2024, pp. 1–7

work page 2024

[12] [12]

Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,

W. Khawaja, O. Ozdemir, Y . Yapici, F. Erden, and I. Guvenc, “Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,” IEEE Open Journal of the Communications Society, vol. 1, pp. 263–281, 2020

work page 2020

[13] [13]

Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,” IEEE Transactions on Machine Learning in Communications and Net- working, vol. 4, pp. 265–281, 2026

work page 2026

[14] [14]

Hierarchical Multi- Agent Reinforcement Learning,

R. Makar, S. Mahadevan, and M. Ghavamzadeh, “Hierarchical Multi- Agent Reinforcement Learning,” inProceedings of the Fifth Interna- tional Conference on Autonomous Agents, 2001, pp. 246–253

work page 2001

[15] [15]

The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 611–24 624, 2022

work page 2022

[16] [16]

Multi-agent reinforcement learning as a rehearsal for decentralized planning,

L. Kraemer and B. Banerjee, “Multi-agent reinforcement learning as a rehearsal for decentralized planning,”Neurocomputing, vol. 190, pp. 82–94, 2016

work page 2016

[17] [17]

Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,

A. Nasari, H. Le, R. Lawrence, Z. He, X. Yang, M. Krell, A. Tsyplikhin, M. Tatineni, T. Cockerill, L. Perez, D. Chakravorty, and H. Liu, “Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,” inPractice and Experience in Advanced Research Computing 2022: Revolution...

work page doi:10.1145/3491418.3530772 2022

[18] [18]

Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,

H. Le, Z. He, M. Le, D. Chakravorty, L. M. Perez, A. Chilumuru, Y . Yao, and J. Chen, “Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,” inPractice and Experience in Advanced Research Computing 2024: Human Powered Computing, ser. PEARC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Ava...

work page doi:10.1145/3626203.3670527 2024