pith. sign in

arxiv: 2604.05165 · v2 · submitted 2026-04-06 · 💻 cs.AI · eess.SP

Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.AI eess.SP
keywords Reconfigurable Intelligent SurfacesMulti-Agent Reinforcement LearningCSI-freeHierarchical MARLmmWave networksBeam focusingLocalization dataRay-tracing evaluation
0
0 comments X

The pith

A hierarchical multi-agent RL system controls reconfigurable reflectors using only user locations instead of channel estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to address the high computational cost and pilot overhead of CSI estimation plus centralized optimization in large RIS deployments for mmWave networks. It replaces channel estimates with user localization data inside a two-tier HMARL controller: a high-level layer makes discrete, temporally extended user-to-reflector assignments, while low-level MAPPO agents optimize continuous focal points under CTDE training. This split allows decentralized execution at run time while still coordinating through central training. Sympathetic readers would care because the approach removes the main practical barriers to scaling smart radio environments. Ray-tracing tests show the method delivers up to 7.79 dB RSSI gains over centralized baselines and stays effective even with sub-meter localization noise.

Core claim

By substituting pilot-based channel estimation with accessible user localization data, the CSI-free hierarchical MARL architecture decomposes the reflector control problem into high-level discrete user-to-reflector allocations and low-level continuous focal-point optimization via MAPPO under a CTDE scheme, producing up to 7.79 dB RSSI improvements over centralized baselines in deterministic ray-tracing evaluations while remaining resilient to sub-meter localization errors.

What carries the argument

The two-tier neural architecture with a high-level controller for temporally extended discrete allocations and low-level MAPPO controllers for continuous focal points, trained centrally but executed decentrally.

Load-bearing premise

That user localization data alone supplies enough spatial information to manage macro-scale wave propagation and achieve reliable beam focusing without any pilot-based channel measurements.

What would settle it

A controlled ray-tracing or measurement campaign in which user positions are known to sub-meter accuracy yet the hierarchical system produces lower RSSI than a CSI-based centralized optimizer under identical environmental conditions.

Figures

Figures reproduced from arXiv: 2604.05165 by Hieu Le, Jian Tao, Mostafa Ibrahim, Oguz Bedir, Sabit Ekin.

Figure 1
Figure 1. Figure 1: Reflector Design. l, the mechanical orientation of tile (i, j) at position ri,j is deterministically governed by its normal vector: ⃗ni,j (fl) = 1 2  fl − ri,j ∥fl − ri,j∥2 + s − ri,j ∥s − ri,j∥2  . (1) This geometric formulation allows us to derive the neces￾sary elevation θi,j and azimuth ϕi,j angles without requiring instantaneous electromagnetic CSI. To manage the massive combinatorial complexity of … view at source ↗
Figure 2
Figure 2. Figure 2: Hierarchical Multi-Agent Reinforcement Learning Architecture. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experimental setup of the conference room simulation environment. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance evaluation for the 4-user configuration. (a) Episode [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: RSSI performance under varying degrees of user localization [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

Reconfigurable Intelligent Surfaces (RIS) has a potential to engineer smart radio environments for next-generation millimeter-wave (mmWave) networks. However, the prohibitive computational overhead of Channel State Information (CSI) estimation and the dimensionality explosion inherent in centralized optimization severely hinder practical large-scale deployments. To overcome these bottlenecks, we introduce a ``CSI-free" paradigm powered by a Hierarchical Multi-Agent Reinforcement Learning (HMARL) architecture to control mechanically reconfigurable reflective surfaces. By substituting pilot-based channel estimation with accessible user localization data, our framework leverages spatial intelligence for macro-scale wave propagation management. The control problem is decomposed into a two-tier neural architecture: a high-level controller executes temporally extended, discrete user-to-reflector allocations, while low-level controllers autonomously optimize continuous focal points utilizing Multi-Agent Proximal Policy Optimization (MAPPO) under a Centralized Training with Decentralized Execution (CTDE) scheme. Comprehensive deterministic ray-tracing evaluations demonstrate that this hierarchical framework achieves massive RSSI improvements of up to 7.79 dB over centralized baselines. Furthermore, the system exhibits robust multi-user scalability and maintains highly resilient beam-focusing performance under practical sub-meter localization tracking errors. By eliminating CSI overhead while maintaining high-fidelity signal redirection, this work establishes a scalable and cost-effective blueprint for intelligent wireless environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a CSI-free hierarchical multi-agent reinforcement learning (HMARL) framework for controlling mechanically reconfigurable intelligent surfaces (RIS) in mmWave networks. It substitutes pilot-based CSI estimation with user localization data, decomposes control into a high-level discrete user-to-reflector allocator and low-level continuous focal-point optimizers trained with MAPPO under CTDE, and reports up to 7.79 dB RSSI gains over centralized baselines in deterministic ray-tracing simulations, along with robustness to sub-meter localization errors and multi-user scalability.

Significance. If the reported gains are shown to arise from the hierarchical decomposition under identical information constraints, the work would provide a concrete, scalable alternative to CSI-dependent RIS control, addressing both estimation overhead and dimensionality issues in large deployments. The localization-based paradigm and two-tier architecture represent a practical step toward intelligent radio environments.

major comments (2)
  1. [Abstract] Abstract: The central claim of 'massive RSSI improvements of up to 7.79 dB over centralized baselines' is load-bearing for the paper's contribution, yet the abstract (and presumably the evaluation section) provides no information on whether the centralized baselines receive equivalent localization data only, or instead assume perfect CSI or employ non-RL methods that do not face the same dimensionality constraints the paper criticizes. Without this, the numerical delta cannot be attributed specifically to the HMARL decomposition rather than unequal problem difficulty.
  2. [Evaluation] Evaluation/results: The manuscript must supply concrete details on baseline implementations, exact ray-tracing setups, statistical significance of the 7.79 dB figure, error bars or variance across runs, and whether baselines were given the same optimization budget and localization inputs. These elements are required to substantiate the performance claims and the assertion of 'robust multi-user scalability.'
minor comments (2)
  1. [Abstract] The abstract could more precisely state the number of reflectors, users, and frequency bands used in the ray-tracing experiments to allow immediate assessment of scale.
  2. [Method] Notation for the high-level and low-level policies (e.g., how the discrete allocation output interfaces with the continuous focal-point actions) should be introduced earlier and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications on the baseline comparisons and commitments to expand the evaluation details in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] The central claim of 'massive RSSI improvements of up to 7.79 dB over centralized baselines' is load-bearing for the paper's contribution, yet the abstract (and presumably the evaluation section) provides no information on whether the centralized baselines receive equivalent localization data only, or instead assume perfect CSI or employ non-RL methods that do not face the same dimensionality constraints the paper criticizes. Without this, the numerical delta cannot be attributed specifically to the HMARL decomposition rather than unequal problem difficulty.

    Authors: We appreciate the referee highlighting the need for explicit clarification. All methods in our evaluations, including the centralized baselines, operate strictly under the CSI-free paradigm and receive only user localization data as input; none assume perfect CSI. The centralized baselines are implemented as non-hierarchical RL agents (standard MAPPO without the high-level discrete allocator) that directly optimize reflector parameters using identical localization inputs and face the same dimensionality issues. We will revise both the abstract and evaluation section to state explicitly that all comparisons use equivalent information constraints, allowing the reported gains to be attributed to the hierarchical decomposition. revision: yes

  2. Referee: [Evaluation] The manuscript must supply concrete details on baseline implementations, exact ray-tracing setups, statistical significance of the 7.79 dB figure, error bars or variance across runs, and whether baselines were given the same optimization budget and localization inputs. These elements are required to substantiate the performance claims and the assertion of 'robust multi-user scalability.'

    Authors: We agree that these details are essential for rigor. In the revised manuscript we will expand the evaluation section to include: precise descriptions of baseline architectures and training procedures; full ray-tracing parameters (environment geometry, material properties, and propagation settings); statistical analysis with mean RSSI gains, standard deviations across independent runs, error bars on figures, and significance testing for the 7.79 dB result; and explicit confirmation that every method received identical localization inputs and the same optimization budget. These additions will directly support the scalability and robustness claims. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper applies established RL methods (MAPPO under CTDE) to RIS beam control using localization data in place of CSI. Performance is reported via external deterministic ray-tracing benchmarks against centralized baselines. No equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations reduce any claim to its own inputs by construction. The hierarchical decomposition and RSSI gains are presented as empirical outcomes, not tautological derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claim depends on domain assumption that localization data suffices for wave management and on standard RL training assumptions; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption User localization data provides sufficient spatial intelligence to substitute for CSI in macro-scale wave propagation management
    Explicitly stated as the core substitution enabling the CSI-free paradigm.
  • domain assumption Deterministic ray-tracing simulations accurately model real-world RIS performance and multi-user interactions
    Underpins all reported RSSI improvements and robustness claims.

pith-pipeline@v0.9.0 · 5537 in / 1227 out tokens · 26265 ms · 2026-05-10T18:52:59.345173+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,

    M. Di Renzo, A. Zappone, M. Debbah, M.-S. Alouini, C. Yuen, J. de Rosny, and S. Tretyakov, “Smart Radio Environments Empow- ered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 11, pp. 2450–2525, 2020

  2. [2]

    Electromagnetic Metasurfaces and Reconfigurable Metasurfaces: A Review,

    S. Zahra, L. Ma, W. Wang, J. Li, D. Chen, Y . Liu, Y . Zhou, N. Li, Y . Huang, and G. Wen, “Electromagnetic Metasurfaces and Reconfigurable Metasurfaces: A Review,”Frontiers in Physics, vol. 8,

  3. [3]

    Available: https://www.frontiersin.org/articles/10.3389 /fphy.2020.593411

    [Online]. Available: https://www.frontiersin.org/articles/10.3389 /fphy.2020.593411

  4. [4]

    Exploring Reconfigurable intelligent surfaces for 6G: State-of-the-art and the road ahead,

    S. Basharat, M. Khan, M. Iqbal, U. S. Hashmi, S. A. R. Zaidi, and I. Robertson, “Exploring Reconfigurable intelligent surfaces for 6G: State-of-the-art and the road ahead,”IET Communications, vol. 16, no. 13, pp. 1458–1474, 2022. [Online]. Available: https: //ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/cmu2.12364

  5. [5]

    Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,

    C. Hu, L. Dai, S. Han, and X. Wang, “Two-Timescale Channel Esti- mation for Reconfigurable Intelligent Surface Aided Wireless Commu- nications,”IEEE Transactions on Communications, vol. 69, no. 11, pp. 7736–7747, 2021

  6. [6]

    Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,

    C. Huang, R. Mo, and C. Yuen, “Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning,”IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1839–1850, 2020

  7. [7]

    Deep Reinforce- ment Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation,

    A. Taha, Y . Zhang, F. B. Mismar, and A. Alkhateeb, “Deep Reinforce- ment Learning for Intelligent Reflecting Surfaces: Towards Standalone Operation,” in2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2020, pp. 1–5

  8. [8]

    Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning,

    A. Taha, M. Alrabeiah, and A. Alkhateeb, “Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning,”IEEE Access, vol. 9, pp. 44 304–44 321, 2021

  9. [9]

    A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces,

    H. Choi, L. V . Nguyen, J. Choi, and A. L. Swindlehurst, “A Deep Reinforcement Learning Approach for Autonomous Reconfigurable Intelligent Surfaces,” in2024 IEEE International Conference on Com- munications Workshops (ICC Workshops), 2024, pp. 208–213

  10. [10]

    A Deep Learning Based Modeling of Reconfigurable Intelligent Surface As- sisted Wireless Communications for Phase Shift Configuration,

    B. Sheen, J. Yang, X. Feng, and M. M. U. Chowdhury, “A Deep Learning Based Modeling of Reconfigurable Intelligent Surface As- sisted Wireless Communications for Phase Shift Configuration,”IEEE Open Journal of the Communications Society, vol. 2, pp. 262–272, 2021

  11. [11]

    Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,

    H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Guiding Wireless Signals with Arrays of Metallic Linear Fresnel Reflectors: A Low- cost, Frequency-versatile, and Practical Approach,” in2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall), 2024, pp. 1–7

  12. [12]

    Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,

    W. Khawaja, O. Ozdemir, Y . Yapici, F. Erden, and I. Guvenc, “Coverage Enhancement for NLOS mmWave Links Using Passive Reflectors,” IEEE Open Journal of the Communications Society, vol. 1, pp. 263–281, 2020

  13. [13]

    Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,

    H. Le, O. Bedir, M. Ibrahim, J. Tao, and S. Ekin, “Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays,” IEEE Transactions on Machine Learning in Communications and Net- working, vol. 4, pp. 265–281, 2026

  14. [14]

    Hierarchical Multi- Agent Reinforcement Learning,

    R. Makar, S. Mahadevan, and M. Ghavamzadeh, “Hierarchical Multi- Agent Reinforcement Learning,” inProceedings of the Fifth Interna- tional Conference on Autonomous Agents, 2001, pp. 246–253

  15. [15]

    The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,

    C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The Surprising Effectiveness of PPO in Cooperative Multi-agent Games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 611–24 624, 2022

  16. [16]

    Multi-agent reinforcement learning as a rehearsal for decentralized planning,

    L. Kraemer and B. Banerjee, “Multi-agent reinforcement learning as a rehearsal for decentralized planning,”Neurocomputing, vol. 190, pp. 82–94, 2016

  17. [17]

    Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,

    A. Nasari, H. Le, R. Lawrence, Z. He, X. Yang, M. Krell, A. Tsyplikhin, M. Tatineni, T. Cockerill, L. Perez, D. Chakravorty, and H. Liu, “Benchmarking the Performance of Accelerators on National Cyberinfrastructure Resources for Artificial Intelligence / Machine Learning Workloads,” inPractice and Experience in Advanced Research Computing 2022: Revolution...

  18. [18]

    Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,

    H. Le, Z. He, M. Le, D. Chakravorty, L. M. Perez, A. Chilumuru, Y . Yao, and J. Chen, “Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units,” inPractice and Experience in Advanced Research Computing 2024: Human Powered Computing, ser. PEARC ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Ava...