Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays

Hieu Le; Jian Tao; Mostafa Ibrahim; Oguz Bedir; Sabit Ekin

arxiv: 2604.05162 · v1 · submitted 2026-04-06 · 💻 cs.AI · eess.SP

Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays

Hieu Le , Oguz Bedir , Mostafa Ibrahim , Jian Tao , Sabit Ekin This is my paper

Pith reviewed 2026-05-10 18:57 UTC · model grok-4.3

classification 💻 cs.AI eess.SP

keywords Reconfigurable Intelligent SurfacesMulti-Agent Reinforcement LearningCSI-free wirelessBeam focusingMechanical reflector arraysDecentralized controlWireless network optimization

0 comments

The pith

Multi-agent reinforcement learning controls reflector arrays for beam focusing using only user locations, avoiding all channel state information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a multi-agent reinforcement learning method to adjust arrays of movable metallic reflectors in wireless settings. Agents learn to cooperate on beam direction by taking user coordinates as input and mapping mechanical limits to a simpler virtual focal-point space. This replaces the usual requirement to estimate full channel properties, which is computationally heavy. Simulations in changing non-line-of-sight scenes show the approach adapts quickly to moving users and produces large signal gains over fixed reflectors while remaining stable even when location data contains noise.

Core claim

A centralized-training decentralized-execution architecture with Multi-Agent Proximal Policy Optimization lets decentralized agents learn cooperative beam-focusing policies from user coordinates alone; the policies are obtained by mapping high-dimensional mechanical constraints onto a reduced-order virtual focal point space, yielding CSI-free operation that reaches up to 26.86 dB improvement over static flat reflectors and outperforms single-agent and hardware-constrained baselines in spatial selectivity and temporal stability.

What carries the argument

The reduced-order virtual focal point mapping inside a CTDE MAPPO framework, which converts mechanical reflector adjustments into a lower-dimensional action space that agents optimize using only user position observations.

If this is right

Reflector arrays can operate without pilot overhead or channel estimation hardware.
Policies adapt in real time to user movement in non-line-of-sight conditions.
Performance holds when localization error reaches one meter.
The same framework outperforms both single-agent reinforcement learning and constrained deep reinforcement learning alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Existing indoor or outdoor localization systems could supply the required coordinates without new infrastructure.
The virtual focal point reduction may extend to other mechanically tunable surfaces such as lens arrays or phased arrays with limited actuators.
Long-term operation could lower energy use by eliminating continuous channel sounding.

Load-bearing premise

Accurate user coordinates are supplied as input and the virtual focal point abstraction adequately represents real mechanical limits and radio propagation effects.

What would settle it

A physical testbed deployment in which the learned policy drives the reflectors while measured received signal strength is compared against both static reflectors and the simulated gains under identical user trajectories.

Figures

Figures reproduced from arXiv: 2604.05162 by Hieu Le, Jian Tao, Mostafa Ibrahim, Oguz Bedir, Sabit Ekin.

**Figure 1.** Figure 1: Reflector Design. For reflector element (i, j) belonging to agent l, elevation θi,j,t and azimuth ϕi,j,t can be calculated using a bisector vector as: −−→ni,j,t = 1 2 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Workflow of the Deep Reinforcement Learning and the integrated [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Reward convergence for proposed and baseline algorithms for re [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Temporal RSSI performance under dynamic user mobility. Over a [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 4.** Figure 4: Heat map visualizations of spatial signal focusing capabilities: [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Reconfigurable Intelligent Surfaces (RIS) are pivotal for next-generation smart radio environments, yet their practical deployment is severely bottlenecked by the intractable computational overhead of Channel State Information (CSI) estimation. To bypass this fundamental physical-layer barrier, we propose an AI-native, data-driven paradigm that replaces complex channel modeling with spatial intelligence. This paper presents a fully autonomous Multi-Agent Reinforcement Learning (MARL) framework to control mechanically adjustable metallic reflector arrays. By mapping high-dimensional mechanical constraints to a reduced-order virtual focal point space, we deploy a Centralized Training with Decentralized Execution (CTDE) architecture. Using Multi-Agent Proximal Policy Optimization (MAPPO), our decentralized agents learn cooperative beam-focusing strategies relying on user coordinates, achieving CSI-free operation. High-fidelity ray-tracing simulations in dynamic non-line-of-sight (NLOS) environments demonstrate that this multi-agent approach rapidly adapts to user mobility, yielding up to a 26.86 dB enhancement over static flat reflectors and outperforming single-agent and hardware-constrained DRL baselines in both spatial selectivity and temporal stability. Crucially, the learned policies exhibit good deployment resilience, sustaining stable signal coverage even under 1.0-meter localization noise. These results validate the efficacy of MARL-driven spatial abstractions as a scalable, highly practical pathway toward AI-empowered wireless networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MARL with a virtual focal point reduction gives CSI-free mechanical reflector control and solid sim gains, but the abstraction's fidelity to real mechanics and paths is the open question.

read the letter

The core idea is a CTDE MARL setup where decentralized MAPPO agents steer mechanical reflector arrays from user coordinates alone, using a reduced virtual focal point space to avoid any CSI. That combination for mechanical hardware, rather than electronic RIS, is the clearest new piece, and the ray-tracing results show it adapting to mobility while beating static reflectors by up to 26.86 dB and doing better than single-agent or hardware-limited baselines. The noise resilience test under 1 m localization error is also a practical plus that stands out in the abstract.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a CSI-free control paradigm for mechanically adjustable metallic reflector arrays using a multi-agent reinforcement learning (MARL) framework. Mechanical constraints are mapped to a reduced-order virtual focal point space; decentralized agents trained with Multi-Agent Proximal Policy Optimization (MAPPO) under a Centralized Training with Decentralized Execution (CTDE) architecture learn cooperative beam-focusing policies from user coordinates alone. High-fidelity ray-tracing simulations in dynamic NLOS environments report up to 26.86 dB gain over static flat reflectors, outperforming single-agent and hardware-constrained DRL baselines in spatial selectivity and temporal stability, with resilience to 1 m localization noise.

Significance. If the simulation results hold under fuller verification, the work provides a concrete demonstration that spatial abstractions and decentralized MARL can bypass the CSI estimation bottleneck for practical RIS-like deployments. The emphasis on coordinate-only inputs, cooperative policies, and reported robustness to mobility and localization error constitutes a useful empirical contribution to AI-native wireless control, particularly for environments where analytical channel models are intractable.

major comments (3)

[§4] §4 (Simulation Setup and Results): The central performance claims (26.86 dB gain, outperformance of baselines, temporal stability) rest on high-fidelity ray-tracing but provide no explicit parameters (carrier frequency, array size, environment geometry, number of Monte Carlo trials, or statistical tests). Without these, the quantitative gains cannot be independently reproduced or compared to the cited baselines.
[§3.2] §3.2 (Virtual Focal Point Mapping): The reduced-order mapping of mechanical degrees of freedom to the virtual focal-point space is load-bearing for the CSI-free claim. No ablation study or cross-validation against a full-wave EM model (including per-panel tilt limits, mutual coupling, or higher-order NLOS paths) is reported; this leaves open the risk that learned policies will not transfer when the same coordinate inputs are applied to an unreduced mechanical/EM simulator.
[§5] §5 (Deployment Resilience): The claim of stable coverage under 1.0 m localization noise is presented as a key practical advantage, yet the noise model, its injection into the coordinate observations, and its effect on the reward function are not detailed. This omission weakens the temporal-stability conclusion.

minor comments (2)

[Abstract] Abstract: The phrase 'high-fidelity ray-tracing simulations' is used without even a one-sentence summary of key parameters; adding this would improve immediate readability.
[§3.1] §3.1: The reward-function design is listed among free parameters but never written explicitly; providing the mathematical form would clarify how cooperation is incentivized.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important aspects of reproducibility, validation, and practical deployment that we have addressed through targeted revisions to the manuscript. We provide point-by-point responses below.

read point-by-point responses

Referee: [§4] §4 (Simulation Setup and Results): The central performance claims (26.86 dB gain, outperformance of baselines, temporal stability) rest on high-fidelity ray-tracing but provide no explicit parameters (carrier frequency, array size, environment geometry, number of Monte Carlo trials, or statistical tests). Without these, the quantitative gains cannot be independently reproduced or compared to the cited baselines.

Authors: We agree that the original submission omitted explicit simulation parameters, limiting independent verification. In the revised manuscript, Section 4 now includes a new Table I that specifies all key parameters: carrier frequency of 28 GHz, 10×10 reflector array with per-panel mechanical tilt limits of ±30°, environment geometry (200 m × 200 m urban NLOS layout with explicit building positions and materials), 1000 Monte Carlo trials per scenario, and statistical tests (paired t-tests with p < 0.01 for baseline comparisons, plus mean and 95% confidence intervals). These additions enable direct reproduction and comparison. revision: yes
Referee: [§3.2] §3.2 (Virtual Focal Point Mapping): The reduced-order mapping of mechanical degrees of freedom to the virtual focal-point space is load-bearing for the CSI-free claim. No ablation study or cross-validation against a full-wave EM model (including per-panel tilt limits, mutual coupling, or higher-order NLOS paths) is reported; this leaves open the risk that learned policies will not transfer when the same coordinate inputs are applied to an unreduced mechanical/EM simulator.

Authors: The virtual focal point mapping is derived from geometric optics and is intended as a practical abstraction for mechanical reflectors. We acknowledge the value of full-wave validation. The revised Section 3.2 now incorporates an ablation study comparing the reduced-order model against a full-wave EM simulator (using method of moments for a 4×4 sub-array subset) across 200 scenarios, showing policy transfer with <1.8 dB average degradation. Mutual coupling and higher-order paths are discussed as limitations in the new Appendix C, with the ray-tracing simulator already incorporating per-panel tilt constraints and primary NLOS paths; we argue this is sufficient for the claimed scale while noting full EM as future work. revision: partial
Referee: [§5] §5 (Deployment Resilience): The claim of stable coverage under 1.0 m localization noise is presented as a key practical advantage, yet the noise model, its injection into the coordinate observations, and its effect on the reward function are not detailed. This omission weakens the temporal-stability conclusion.

Authors: We appreciate this observation. The noise model is zero-mean Gaussian with σ = 1.0 m, injected independently at each time step directly into the user coordinate vector observed by the agents. The reward function (based on instantaneous received power at the user location) is unaffected by the noise; robustness emerges from the training process under noisy observations. The revised Section 5 now details the injection procedure with pseudocode, includes sensitivity curves for noise levels from 0–2 m, and reports that temporal stability (measured as variance in received power over 1000 steps) degrades by only 12% at 1 m noise relative to noiseless case. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical MARL simulation results are independent of inputs

full rationale

The paper describes a CTDE MAPPO framework trained in ray-tracing simulations to map user coordinates to virtual focal-point actions for reflector control. No analytical derivation chain exists; performance metrics (e.g., 26.86 dB gain) are obtained from forward simulation of learned policies rather than any fitted parameter or self-referential prediction. The reduced-order mapping is an explicit design choice, not a tautology, and no self-citations or uniqueness theorems are invoked as load-bearing premises. The approach is self-contained against external simulation benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim depends on unstated RL training details and simulation fidelity assumptions not provided in the abstract.

free parameters (2)

MAPPO training hyperparameters
Learning rates, clipping parameters, and reward scaling not specified.
Reward function design
Specific rewards for beam focusing and cooperation not detailed.

axioms (2)

domain assumption Ray-tracing simulations accurately represent real-world wireless propagation in NLOS environments
Basis for all performance claims.
domain assumption User coordinates can be obtained with sufficient accuracy for control
Primary input to the agents; tested with 1m noise but assumed available.

invented entities (1)

Virtual focal point space no independent evidence
purpose: Reduced-order representation to map high-dimensional mechanical constraints for agent control
Core abstraction enabling the CSI-free approach

pith-pipeline@v0.9.0 · 5550 in / 1311 out tokens · 36488 ms · 2026-05-10T18:57:49.652244+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By mapping high-dimensional mechanical constraints to a reduced-order virtual focal point space... fl,t+1 = fl,t + al,t ... ϕi,j,t = atan2... θi,j,t = arccos...
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CSI-free operation... relying on user coordinates... 26.86 dB enhancement

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,

M. Di Renzo, A. Zappone, M. Debbah, M.-S. Alouini, C. Yuen, J. de Rosny, and S. Tretyakov, “Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 11, pp. 2450–2525, 2020

work page 2020
[2]

Reconfigurable Intelligent Surfaces: A Signal Processing Perspective with Wireless Applications,

E. Bj ¨ornson, H. Wymeersch, B. Matthiesen, P. Popovski, L. Sanguinetti, and E. de Carvalho, “Reconfigurable Intelligent Surfaces: A Signal Processing Perspective with Wireless Applications,”IEEE Signal Pro- cessing Magazine, vol. 39, no. 2, pp. 135–158, 2022

work page 2022
[3]

An Overview of Signal Processing Techniques for RIS/IRS-Aided Wireless Systems,

C. Pan, G. Zhou, K. Zhi, S. Hong, T. Wu, Y . Pan, H. Ren, M. D. Renzo, A. Lee Swindlehurst, R. Zhang, and A. Y . Zhang, “An Overview of Signal Processing Techniques for RIS/IRS-Aided Wireless Systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 5, pp. 883–917, 2022

work page 2022
[4]

Practical Channel Estimation and Phase Shift Design for Intelligent Reflecting Surface Empowered MIMO Systems,

S. Kim, H. Lee, J. Cha, S.-J. Kim, J. Park, and J. Choi, “Practical Channel Estimation and Phase Shift Design for Intelligent Reflecting Surface Empowered MIMO Systems,”IEEE Transactions on Wireless Communications, vol. 21, no. 8, pp. 6226–6241, 2022

work page 2022
[5]

Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,

C. Hu, L. Dai, S. Han, and X. Wang, “Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,”IEEE Transactions on Communications, vol. 69, no. 11, pp. 7736–7747, 2021

work page 2021
[6]

Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,

C. Huang, R. Mo, and C. Yuen, “Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1839–1850, 2020

work page 2020
[7]

A Deep Reinforcement Learning Approach for Autonomous Reconfigurable In- telligent Surfaces,

H. Choi, L. V . Nguyen, J. Choi, and A. L. Swindlehurst, “A Deep Reinforcement Learning Approach for Autonomous Reconfigurable In- telligent Surfaces,” in2024 IEEE International Conference on Commu- nications Workshops (ICC Workshops), 2024, pp. 208–213

work page 2024
[8]

A Deep Learning Based Modeling of Reconfigurable Intelligent Surface Assisted Wireless Communications for Phase Shift Configuration,

B. Sheen, J. Yang, X. Feng, and M. M. U. Chowdhury, “A Deep Learning Based Modeling of Reconfigurable Intelligent Surface Assisted Wireless Communications for Phase Shift Configuration,”IEEE Open Journal of the Communications Society, vol. 2, pp. 262–272, 2021

work page 2021
[9]

Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,” IEEE Transactions on Machine Learning in Communications and Net- working, vol. 4, pp. 265–281, 2026

work page 2026
[10]

Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,

W. Khawaja, O. Ozdemir, Y . Yapici, F. Erden, and I. Guvenc, “Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,” IEEE Open Journal of the Communications Society, vol. 1, pp. 263– 281, 2020

work page 2020
[11]

Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,” in2024 IEEE 100th V ehicular Technology Conference (VTC2024-Fall), 2024, pp. 1–7

work page 2024
[12]

A Comprehensive Survey of Mmultiagent Reinforcement Learning,

L. Busoniu, R. Babuska, and B. De Schutter, “A Comprehensive Survey of Mmultiagent Reinforcement Learning,”IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2, pp. 156–172, 2008

work page 2008
[13]

Fully Decen- tralized Multi-agent Reinforcement Learning with Networked Agents,

K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Basar, “Fully Decen- tralized Multi-agent Reinforcement Learning with Networked Agents,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 5872–5881

work page 2018
[14]

Multi-Agent DRL-Based Task Offloading in Multiple RIS-Aided IoV Networks,

B. Hazarika, K. Singh, S. Biswas, S. Mumtaz, and C.-P. Li, “Multi-Agent DRL-Based Task Offloading in Multiple RIS-Aided IoV Networks,” IEEE Transactions on V ehicular Technology, vol. 73, no. 1, pp. 1175– 1190, 2024

work page 2024
[15]

Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning,

K. Qi, Q. Wu, P. Fan, N. Cheng, Q. Fan, and J. Wang, “Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning,”IEEE Communications Letters, vol. 28, no. 10, pp. 2427– 2431, 2024

work page 2024
[16]

Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,

A. Nasari, H. Le, R. Lawrence, Z. He, X. Yang, M. Krell, A. Tsyplikhin, M. Tatineni, T. Cockerill, L. Perez, D. Chakravorty, and H. Liu, “Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,” inPractice and Experience in Advanced Research Computing 2022: Revolution...

work page arXiv 2022
[17]

Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,

H. Le, Z. He, M. Le, D. Chakravorty, L. M. Perez, A. Chilumuru, Y . Yao, and J. Chen, “Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,” inPractice and Experience in Advanced Research Computing 2024: Human Powered Computing, ser. PEARC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Ava...

work page doi:10.1145/3626203.3670527 2024
[18]

(2024) Unlocking On-device Generative AI with an NPU and Heterogeneous Computing

Qualcomm. (2024) Unlocking On-device Generative AI with an NPU and Heterogeneous Computing. [Online]. Available: https: //www.qualcomm.com/content/dam/qcomm-martech/dm-assets/docume nts/Unlocking-on-device-generative-AI-with-an-NPU-and-heterogeneo us-computing.pdf

work page 2024
[19]

The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 24 611– 24 624, 2022

work page 2022

[1] [1]

Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,

M. Di Renzo, A. Zappone, M. Debbah, M.-S. Alouini, C. Yuen, J. de Rosny, and S. Tretyakov, “Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 11, pp. 2450–2525, 2020

work page 2020

[2] [2]

Reconfigurable Intelligent Surfaces: A Signal Processing Perspective with Wireless Applications,

E. Bj ¨ornson, H. Wymeersch, B. Matthiesen, P. Popovski, L. Sanguinetti, and E. de Carvalho, “Reconfigurable Intelligent Surfaces: A Signal Processing Perspective with Wireless Applications,”IEEE Signal Pro- cessing Magazine, vol. 39, no. 2, pp. 135–158, 2022

work page 2022

[3] [3]

An Overview of Signal Processing Techniques for RIS/IRS-Aided Wireless Systems,

C. Pan, G. Zhou, K. Zhi, S. Hong, T. Wu, Y . Pan, H. Ren, M. D. Renzo, A. Lee Swindlehurst, R. Zhang, and A. Y . Zhang, “An Overview of Signal Processing Techniques for RIS/IRS-Aided Wireless Systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 5, pp. 883–917, 2022

work page 2022

[4] [4]

Practical Channel Estimation and Phase Shift Design for Intelligent Reflecting Surface Empowered MIMO Systems,

S. Kim, H. Lee, J. Cha, S.-J. Kim, J. Park, and J. Choi, “Practical Channel Estimation and Phase Shift Design for Intelligent Reflecting Surface Empowered MIMO Systems,”IEEE Transactions on Wireless Communications, vol. 21, no. 8, pp. 6226–6241, 2022

work page 2022

[5] [5]

Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,

C. Hu, L. Dai, S. Han, and X. Wang, “Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,”IEEE Transactions on Communications, vol. 69, no. 11, pp. 7736–7747, 2021

work page 2021

[6] [6]

Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,

C. Huang, R. Mo, and C. Yuen, “Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1839–1850, 2020

work page 2020

[7] [7]

A Deep Reinforcement Learning Approach for Autonomous Reconfigurable In- telligent Surfaces,

H. Choi, L. V . Nguyen, J. Choi, and A. L. Swindlehurst, “A Deep Reinforcement Learning Approach for Autonomous Reconfigurable In- telligent Surfaces,” in2024 IEEE International Conference on Commu- nications Workshops (ICC Workshops), 2024, pp. 208–213

work page 2024

[8] [8]

A Deep Learning Based Modeling of Reconfigurable Intelligent Surface Assisted Wireless Communications for Phase Shift Configuration,

B. Sheen, J. Yang, X. Feng, and M. M. U. Chowdhury, “A Deep Learning Based Modeling of Reconfigurable Intelligent Surface Assisted Wireless Communications for Phase Shift Configuration,”IEEE Open Journal of the Communications Society, vol. 2, pp. 262–272, 2021

work page 2021

[9] [9]

Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,” IEEE Transactions on Machine Learning in Communications and Net- working, vol. 4, pp. 265–281, 2026

work page 2026

[10] [10]

Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,

W. Khawaja, O. Ozdemir, Y . Yapici, F. Erden, and I. Guvenc, “Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,” IEEE Open Journal of the Communications Society, vol. 1, pp. 263– 281, 2020

work page 2020

[11] [11]

Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,

H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,” in2024 IEEE 100th V ehicular Technology Conference (VTC2024-Fall), 2024, pp. 1–7

work page 2024

[12] [12]

A Comprehensive Survey of Mmultiagent Reinforcement Learning,

L. Busoniu, R. Babuska, and B. De Schutter, “A Comprehensive Survey of Mmultiagent Reinforcement Learning,”IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2, pp. 156–172, 2008

work page 2008

[13] [13]

Fully Decen- tralized Multi-agent Reinforcement Learning with Networked Agents,

K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Basar, “Fully Decen- tralized Multi-agent Reinforcement Learning with Networked Agents,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 5872–5881

work page 2018

[14] [14]

Multi-Agent DRL-Based Task Offloading in Multiple RIS-Aided IoV Networks,

B. Hazarika, K. Singh, S. Biswas, S. Mumtaz, and C.-P. Li, “Multi-Agent DRL-Based Task Offloading in Multiple RIS-Aided IoV Networks,” IEEE Transactions on V ehicular Technology, vol. 73, no. 1, pp. 1175– 1190, 2024

work page 2024

[15] [15]

Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning,

K. Qi, Q. Wu, P. Fan, N. Cheng, Q. Fan, and J. Wang, “Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning,”IEEE Communications Letters, vol. 28, no. 10, pp. 2427– 2431, 2024

work page 2024

[16] [16]

Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,

A. Nasari, H. Le, R. Lawrence, Z. He, X. Yang, M. Krell, A. Tsyplikhin, M. Tatineni, T. Cockerill, L. Perez, D. Chakravorty, and H. Liu, “Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,” inPractice and Experience in Advanced Research Computing 2022: Revolution...

work page arXiv 2022

[17] [17]

Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,

H. Le, Z. He, M. Le, D. Chakravorty, L. M. Perez, A. Chilumuru, Y . Yao, and J. Chen, “Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,” inPractice and Experience in Advanced Research Computing 2024: Human Powered Computing, ser. PEARC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Ava...

work page doi:10.1145/3626203.3670527 2024

[18] [18]

(2024) Unlocking On-device Generative AI with an NPU and Heterogeneous Computing

Qualcomm. (2024) Unlocking On-device Generative AI with an NPU and Heterogeneous Computing. [Online]. Available: https: //www.qualcomm.com/content/dam/qcomm-martech/dm-assets/docume nts/Unlocking-on-device-generative-AI-with-an-NPU-and-heterogeneo us-computing.pdf

work page 2024

[19] [19]

The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 24 611– 24 624, 2022

work page 2022