pith. machine review for the scientific record. sign in

arxiv: 2604.17201 · v1 · submitted 2026-04-19 · 📡 eess.SP

Robust Resource Allocation in RIS-Assisted Wireless Networks Integrating NOMA and Over-the-Air Federated Learning

Pith reviewed 2026-05-10 06:28 UTC · model grok-4.3

classification 📡 eess.SP
keywords reconfigurable intelligent surfacenon-orthogonal multiple accessover-the-air federated learningdeep reinforcement learningresource allocationimperfect channel state informationsuccessive interference cancellation
0
0 comments X

The pith

LSTM-DDPG achieves faster convergence and lower variance than standard deep reinforcement learning for resource allocation in RIS-assisted NOMA-AirFL networks under channel uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a wireless network that uses reconfigurable intelligent surfaces to support simultaneous non-orthogonal multiple access for data transmission and over-the-air federated learning for distributed model training. It formulates a joint optimization problem that minimizes the optimality gap while handling co-channel interference, imperfect channel state information, and successive interference cancellation errors. The resulting non-convex problem is recast as a Markov decision process and solved with an LSTM-DDPG algorithm that retains memory of past states to improve decision making over time. Simulations indicate that this approach converges more quickly, shows less performance variation, and maintains better robustness to channel inaccuracies than baseline algorithms such as DDPG, SAC, and A2C. A sympathetic reader would care because it points to a concrete method for using limited spectrum to run both communication and learning tasks at once.

Core claim

The central claim is that reformulating the joint resource allocation problem for RIS phase shifts and power allocation in a network serving both NOMA communication users and AirFL learning users as a Markov decision process, then solving it with an LSTM-enhanced deep deterministic policy gradient algorithm, reduces the optimality gap more effectively than standard deep reinforcement learning methods when imperfect channel state information and successive interference cancellation errors are present.

What carries the argument

The LSTM-DDPG algorithm, which augments the deep deterministic policy gradient with long short-term memory to retain temporal information when learning policies for power allocation and RIS phase shifts in the Markov decision process.

Load-bearing premise

The Markov decision process formulation accurately captures the interactions between NOMA users, AirFL users, imperfect channel state information, and successive interference cancellation errors.

What would settle it

Real-world deployment on a hardware testbed with measured channels and actual RIS hardware shows that the learned LSTM-DDPG policy fails to converge faster or exhibits higher variance than DDPG, SAC, or A2C baselines.

Figures

Figures reproduced from arXiv: 2604.17201 by Ghosheh Abed Hodtani, Gongpu Wang, Ji Wang, Ming Zeng, Mohsen Ahmadzadeh, Saeid Pakravan, Xingwang Li.

Figure 1
Figure 1. Figure 1: Illustration of the multi-RIS-assisted system integrating [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average reward versus different number of AirFL users. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average reward versus power budgets of users. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of DRL algorithms based on the average [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Average reward versus the CSI imperfection. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average reward versus the SIC imperfection. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: The learning performance over different communication rounds with MNIST dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

This paper addresses the critical issue of spectrum scarcity and the need to support diverse services, including communication and learning tasks, by presenting a reconfigurable intelligent surface (RIS)-aided wireless network framework that integrates non-orthogonal multiple access (NOMA) with over-the-air federated learning (AirFL). The proposed system leverages the ability of RIS to adaptively shape wireless channels, aiming to enhance overall network performance for both communication and learning through concurrent uplink transmissions. To tackle critical challenges such as co-channel interference, imperfect channel state information (CSI), and successive interference cancellation (SIC), we develop an optimization framework that focuses on minimizing the optimality gap. This joint optimization is formulated as a non-convex problem, complicated by the intricate interactions between NOMA and AirFL users as well as the impact of imperfect CSI and SIC. To overcome these challenges and reduce the optimality gap, we reformulate the optimization problem as a Markov decision process and solve it using a long short-term memory deep deterministic policy gradient (LSTM-DDPG) algorithm, a memory-based approach within deep reinforcement learning (DRL). Simulation results demonstrate that the proposed approach achieves faster convergence, lower variance, and improved robustness under channel uncertainty, outperforming baseline DRL algorithms such as DDPG, soft actor-critic (SAC), and advantage actor-critic (A2C).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This manuscript proposes a RIS-assisted wireless network integrating NOMA and over-the-air federated learning (AirFL) to support concurrent communication and learning tasks. It formulates a joint non-convex optimization problem to minimize the optimality gap under co-channel interference, imperfect CSI, and SIC errors, then reformulates it as a Markov decision process solved via an LSTM-DDPG algorithm. Simulations claim faster convergence, lower variance, and improved robustness compared to DDPG, SAC, and A2C baselines.

Significance. If the MDP formulation is shown to faithfully capture the system interactions, the work could offer a practical DRL-based method for resource allocation in integrated RIS-NOMA-AirFL systems, addressing spectrum scarcity while handling real-world impairments like channel uncertainty.

major comments (2)
  1. [Abstract] Abstract (optimization framework and MDP reformulation): The claim that LSTM-DDPG reduces the optimality gap rests on the assumption that the MDP reward computed in simulation equals the true gap. No derivation or bound is supplied demonstrating that this holds when underlying channel statistics or SIC error rates differ from the training ensemble; this directly undermines the reported robustness under channel uncertainty.
  2. [Abstract] Abstract (simulation results): The outperformance in convergence and variance over DDPG/SAC/A2C is presented without evidence that the MDP state/action/reward definitions accurately encode the coupled effects of NOMA user interference, AirFL aggregation, imperfect CSI estimation, and SIC decoding errors, making the headline performance claims dependent on unverified model fidelity.
minor comments (1)
  1. [Abstract] The abstract could specify key simulation parameters (e.g., number of NOMA/AirFL users, RIS elements, or CSI error variance) to aid reproducibility of the reported convergence curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's comments highlighting the need for stronger validation of the MDP formulation and its connection to the optimality gap. Below, we respond to each major comment and describe the planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract (optimization framework and MDP reformulation): The claim that LSTM-DDPG reduces the optimality gap rests on the assumption that the MDP reward computed in simulation equals the true gap. No derivation or bound is supplied demonstrating that this holds when underlying channel statistics or SIC error rates differ from the training ensemble; this directly undermines the reported robustness under channel uncertainty.

    Authors: We agree that a formal bound relating the simulated MDP reward to the true optimality gap under shifts in channel statistics or SIC error rates would provide stronger guarantees. The MDP is constructed directly from the system model in Section III, with the reward defined as the negative of the optimality gap expression that incorporates imperfect CSI and SIC errors, and the state including CSI estimates and error parameters. Simulations already test robustness by evaluating policies under channel distributions and SIC rates outside the training ensemble. In the revision we will add a dedicated discussion subsection deriving the MDP components from the optimization problem and presenting additional empirical results under mismatched conditions to support the robustness claims. revision: partial

  2. Referee: [Abstract] Abstract (simulation results): The outperformance in convergence and variance over DDPG/SAC/A2C is presented without evidence that the MDP state/action/reward definitions accurately encode the coupled effects of NOMA user interference, AirFL aggregation, imperfect CSI estimation, and SIC decoding errors, making the headline performance claims dependent on unverified model fidelity.

    Authors: The MDP definitions are intended to encode these couplings explicitly: the state comprises the estimated CSI vectors for all NOMA and AirFL users; the action space consists of joint power allocation coefficients and RIS phase shifts; and the reward is computed from the closed-form optimality gap that includes NOMA interference terms, SIC error propagation, and AirFL over-the-air aggregation noise. We will revise the manuscript to include a new subsection with explicit equations mapping each system effect to the MDP elements, together with sensitivity analysis and ablation results that isolate the impact of each impairment on learning performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained empirical simulation

full rationale

The paper formulates a joint non-convex optimization for RIS-NOMA-AirFL resource allocation to minimize optimality gap under imperfect CSI and SIC errors, then reformulates it as an MDP solved via LSTM-DDPG. Reported results are simulation-based comparisons of convergence speed, variance, and robustness against DDPG/SAC/A2C baselines. No quoted equations, self-citations, or steps reduce the central performance claims by construction to fitted inputs or prior author results; the MDP reward and policy learning are independent of the final comparative metrics, and no uniqueness theorem or ansatz is smuggled in.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard wireless channel models, the ability of DRL to solve the non-convex joint optimization, and simulation-based validation; no new physical entities are introduced.

free parameters (1)
  • LSTM-DDPG hyperparameters
    Learning rate, memory size, and exploration parameters are chosen to achieve the reported convergence and must be tuned for the specific MDP.
axioms (1)
  • domain assumption Standard models for RIS phase shifts, NOMA power allocation, and imperfect CSI estimation errors hold.
    Invoked when formulating the optimality gap and the MDP state space.

pith-pipeline@v0.9.0 · 5568 in / 1246 out tokens · 34023 ms · 2026-05-10T06:28:44.975365+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references

  1. [1]

    The roadmap to 6G: AI empowered wireless net- works,

    K. Letaiefet al., “The roadmap to 6G: AI empowered wireless net- works,”IEEE Commun. Mag., vol. 57, no. 8, pp. 84–90, Aug. 2019

  2. [2]

    A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,

    W. Saadet al., “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,”IEEE Netw., vol. 34, no. 3, pp. 134–142, Oct. 2019

  3. [3]

    RIS-assisted over-the-air federated learning in millimeter wave MIMO networks,

    L. Huet al., “RIS-assisted over-the-air federated learning in millimeter wave MIMO networks,”J. Commun. Inf. Netw., vol. 7, no. 2, pp. 145– 156, Jun. 2022

  4. [4]

    Over-the-air federated learning from heterogeneous data,

    T. Seryet al., “Over-the-air federated learning from heterogeneous data,” IEEE Trans. Signal Process., vol. 69, pp. 3796–3811, Jun. 2021

  5. [5]

    Federated learning via over-the-air computation,

    K. Yanget al., “Federated learning via over-the-air computation,”IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, Jan. 2020

  6. [6]

    Robust resource allocation for over-the-air com- putation networks with fluid antenna array,

    S. Pakravanet al., “Robust resource allocation for over-the-air com- putation networks with fluid antenna array,” inProc. IEEE Globecom Workshops, Cape Town, South Africa, pp. 1–6, Aug. 2024

  7. [7]

    AI-based fluid antenna design for client selection in over-the-air federated learning,

    M. Ahmadzadehet al., “AI-based fluid antenna design for client selection in over-the-air federated learning,”IEEE Internet Things J., vol. 12, no. 20, pp. 42 549–42 558, Oct. 2025

  8. [8]

    6G wireless networks: Vision, requirements, architec- ture, and key technologies,

    Z. Zhanget al., “6G wireless networks: Vision, requirements, architec- ture, and key technologies,”IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 28–41, Jul. 2019

  9. [9]

    Physical layer security for NOMA systems: Require- ments, issues, and recommendations,

    S. Pakravanet al., “Physical layer security for NOMA systems: Require- ments, issues, and recommendations,”IEEE Internet Things J., vol. 10, no. 24, pp. 21 721–21 737, Dec. 2023

  10. [10]

    A survey of NOMA: Current status and open research challenges,

    B. Makkiet al., “A survey of NOMA: Current status and open research challenges,”IEEE Open J. Commun. Soc., vol. 1, pp. 179–189, Jan. 2020

  11. [11]

    Reconfigurable-intelligent-surface empowered wireless communications: Challenges and opportunities,

    X. Yuanet al., “Reconfigurable-intelligent-surface empowered wireless communications: Challenges and opportunities,”IEEE Wireless Com- mun., vol. 28, no. 2, pp. 136–143, Feb. 2021

  12. [12]

    Physical-layer security of RIS-assisted networks over correlated fisher-snedecor F fading channels,

    S. Pakravanet al., “Physical-layer security of RIS-assisted networks over correlated fisher-snedecor F fading channels,”IEEE Internet Things J., vol. 11, no. 9, pp. 15 152–15 165, May. 2024

  13. [13]

    Covert communications with enhanced physical layer security in RIS-assisted cooperative networks,

    X. Liet al., “Covert communications with enhanced physical layer security in RIS-assisted cooperative networks,”IEEE Trans. Wireless Commun., vol. 24, no. 7, pp. 5605–5619, Jul. 2025

  14. [14]

    Reconfigurable intelligent surfaces for energy effi- ciency in wireless communication,

    C. Huanget al., “Reconfigurable intelligent surfaces for energy effi- ciency in wireless communication,”IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 4157–4170, Jun. 2019. 16

  15. [15]

    Wideband beamforming for RIS assisted near-field communications,

    J. Wanget al., “Wideband beamforming for RIS assisted near-field communications,”IEEE Trans. Wireless Commun., vol. 23, no. 11, pp. 16 836–16 851, Nov. 2024

  16. [16]

    AI-based secure NOMA and cognitive radio- enabled green communications: Channel state information and battery value uncertainties,

    S. Sheikhzadehet al., “AI-based secure NOMA and cognitive radio- enabled green communications: Channel state information and battery value uncertainties,”IEEE Trans. Green Commun. Netw., vol. 6, no. 2, pp. 1037–1054, Dec. 2021

  17. [17]

    Over-the-air computation via RIS,

    W. Fanget al., “Over-the-air computation via RIS,”IEEE Trans. Commun., vol. 69, no. 12, pp. 8612–8626, Sep. 2021

  18. [18]

    RIS enhanced massive non-orthogonal multiple access networks: Deployment and passive beamforming design,

    X. Liuet al., “RIS enhanced massive non-orthogonal multiple access networks: Deployment and passive beamforming design,”IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 8612–8626, Jan. 2021

  19. [19]

    Balancing accuracy and integrity for reconfigurable intelligent surface-aided over-the-air federated learning,

    J. Zhenget al., “Balancing accuracy and integrity for reconfigurable intelligent surface-aided over-the-air federated learning,”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 964–10 980, Jul. 2022

  20. [20]

    Joint location and beamforming design for STAR-RIS assisted NOMA systems,

    Q. Gaoet al., “Joint location and beamforming design for STAR-RIS assisted NOMA systems,”IEEE Trans. Commun., vol. 71, no. 4, pp. 2532–2546, Feb. 2023

  21. [21]

    STAR-RIS-assisted covert wireless communications with randomly distributed blockages,

    X. Liet al., “STAR-RIS-assisted covert wireless communications with randomly distributed blockages,”IEEE Trans. Wireless Commun., vol. 24, no. 6, pp. 4690–4705, Jun. 2025

  22. [22]

    Deep reinforcement learning for energy efficiency maximization in SWIPT-based over-the-air federated learning,

    X. Zhanget al., “Deep reinforcement learning for energy efficiency maximization in SWIPT-based over-the-air federated learning,”IEEE Trans. Green Commun. Netw., vol. 8, no. 1, pp. 525–541, Aug. 2024

  23. [23]

    Integrating over-the-air federated learning and non- orthogonal multiple access: What role can RIS play?

    W. Niet al., “Integrating over-the-air federated learning and non- orthogonal multiple access: What role can RIS play?”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 083–10 099, Jun. 2022

  24. [24]

    STAR-RIS integrated non orthogonal multiple access and over- the-air federated learning: Framework, analysis, and optimization,

    ——, “STAR-RIS integrated non orthogonal multiple access and over- the-air federated learning: Framework, analysis, and optimization,”IEEE Internet Things J., vol. 9, no. 18, pp. 17 136–17 156, Jul. 2022

  25. [25]

    Novel over-the-air federated learning via reconfigurable intelligent surface and SWIPT,

    G. Zhenget al., “Novel over-the-air federated learning via reconfigurable intelligent surface and SWIPT,” pp. 34 140–34 155, Jan. 2024

  26. [26]

    Federated learning with NOMA assisted by multiple RIS: Latency minimizing optimization and auction,

    T. H. T. Leet al., “Federated learning with NOMA assisted by multiple RIS: Latency minimizing optimization and auction,”IEEE Trans. Veh. Technol., vol. 72, no. 9, pp. 11 558–11 574, Nov. 2023

  27. [27]

    RIS-assisted over-the-air adaptive federated learning with noisy downlink,

    J. Maoet al., “RIS-assisted over-the-air adaptive federated learning with noisy downlink,” inProc. IEEE ICC Workshops, Rome, Italy, pp. 98– 103, May. 2023

  28. [28]

    Deep reinforcement learning for robust RIS- aided OTA-FL in cognitive radio,

    M. Ahmadzadehet al., “Deep reinforcement learning for robust RIS- aided OTA-FL in cognitive radio,” inProc. IEEE MECOM, Abu Dhabi, United Arab Emirates, pp. 368-373, Feb. 2024

  29. [29]

    Enhanced over-the-air federated learning using AI-based fluid antenna system,

    ——, “Enhanced over-the-air federated learning using AI-based fluid antenna system,” inProc. IEEE WCNC, Milan, Italy, pp. 1-6, May. 2025

  30. [30]

    Fluid antenna-assisted uplink NOMA networks under imperfect SIC,

    S. Pakravanet al., “Fluid antenna-assisted uplink NOMA networks under imperfect SIC,”IEEE Trans. Veh. Technol., vol. 71, no. 1, pp. 1689– 1694, Jan. 2026

  31. [31]

    Deep reinforcement learning for multi-functional RIS- aided over-the-air federated learning in internet of robotic things,

    X. Zhanget al., “Deep reinforcement learning for multi-functional RIS- aided over-the-air federated learning in internet of robotic things,” in Proc. IEEE ICC, Denver, USA, pp. 5461-5466, Jun. 2024

  32. [32]

    Resource allocation for multi-cell IRS-aided NOMA networks,

    W. Niet al., “Resource allocation for multi-cell IRS-aided NOMA networks,”IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4253– 4268, Jul. 2021

  33. [33]

    Multicell MIMO communications relying on intelligent reflecting surfaces,

    C. Panet al., “Multicell MIMO communications relying on intelligent reflecting surfaces,”IEEE Trans. Wireless Commun., vol. 19, no. 8, pp. 5218–5233, May. 2020

  34. [34]

    Convergence time optimization for federated learning over wireless networks,

    M. Chenet al., “Convergence time optimization for federated learning over wireless networks,”IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2457–2471, Dec. 2021

  35. [35]

    Residual transceiver hardware impairments on cooperative NOMA networks,

    X. Liet al., “Residual transceiver hardware impairments on cooperative NOMA networks,”IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 680–695, Jan. 2020

  36. [36]

    Theoretical analysis of the dynamic decode ordering SIC receiver for uplink NOMA systems,

    Y . Gaoet al., “Theoretical analysis of the dynamic decode ordering SIC receiver for uplink NOMA systems,”IEEE Commun. Lett., vol. 21, no. 10, pp. 2246–2249, Jun. 2017

  37. [37]

    Optimized power control design for over-the-air federated edge learning,

    X. Caoet al., “Optimized power control design for over-the-air federated edge learning,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 342–358, Nov. 2022

  38. [38]

    Joint optimization of communications and federated learning over the air,

    X. Fanet al., “Joint optimization of communications and federated learning over the air,”IEEE Trans. Wireless Commun., vol. 21, no. 6, pp. 4434–4449, Dec. 2022

  39. [39]

    AI-based resource allocation in end-to-end net- work slicing under demand and CSI uncertainties,

    A. Gharehgoliet al., “AI-based resource allocation in end-to-end net- work slicing under demand and CSI uncertainties,”IEEE Trans. Netw. Serv. Manag., vol. 20, no. 3, pp. 3630–3651, Feb. 2023

  40. [40]

    AI-enhanced RIS-aided cognitive radio network: Integrating communication and over-the-air federated learning users,

    M. Ahmadzadehet al., “AI-enhanced RIS-aided cognitive radio network: Integrating communication and over-the-air federated learning users,” IEEE Trans. Veh. Technol., pp. 1–14, Jan. 2026

  41. [41]

    Intelligent reflecting surface-assisted cognitive radio system,

    J. Yuanet al., “Intelligent reflecting surface-assisted cognitive radio system,”IEEE Trans. Commun., vol. 69, no. 1, pp. 675–687, Oct. 2020