Robust Resource Allocation in RIS-Assisted Wireless Networks Integrating NOMA and Over-the-Air Federated Learning
Pith reviewed 2026-05-10 06:28 UTC · model grok-4.3
The pith
LSTM-DDPG achieves faster convergence and lower variance than standard deep reinforcement learning for resource allocation in RIS-assisted NOMA-AirFL networks under channel uncertainty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that reformulating the joint resource allocation problem for RIS phase shifts and power allocation in a network serving both NOMA communication users and AirFL learning users as a Markov decision process, then solving it with an LSTM-enhanced deep deterministic policy gradient algorithm, reduces the optimality gap more effectively than standard deep reinforcement learning methods when imperfect channel state information and successive interference cancellation errors are present.
What carries the argument
The LSTM-DDPG algorithm, which augments the deep deterministic policy gradient with long short-term memory to retain temporal information when learning policies for power allocation and RIS phase shifts in the Markov decision process.
Load-bearing premise
The Markov decision process formulation accurately captures the interactions between NOMA users, AirFL users, imperfect channel state information, and successive interference cancellation errors.
What would settle it
Real-world deployment on a hardware testbed with measured channels and actual RIS hardware shows that the learned LSTM-DDPG policy fails to converge faster or exhibits higher variance than DDPG, SAC, or A2C baselines.
Figures
read the original abstract
This paper addresses the critical issue of spectrum scarcity and the need to support diverse services, including communication and learning tasks, by presenting a reconfigurable intelligent surface (RIS)-aided wireless network framework that integrates non-orthogonal multiple access (NOMA) with over-the-air federated learning (AirFL). The proposed system leverages the ability of RIS to adaptively shape wireless channels, aiming to enhance overall network performance for both communication and learning through concurrent uplink transmissions. To tackle critical challenges such as co-channel interference, imperfect channel state information (CSI), and successive interference cancellation (SIC), we develop an optimization framework that focuses on minimizing the optimality gap. This joint optimization is formulated as a non-convex problem, complicated by the intricate interactions between NOMA and AirFL users as well as the impact of imperfect CSI and SIC. To overcome these challenges and reduce the optimality gap, we reformulate the optimization problem as a Markov decision process and solve it using a long short-term memory deep deterministic policy gradient (LSTM-DDPG) algorithm, a memory-based approach within deep reinforcement learning (DRL). Simulation results demonstrate that the proposed approach achieves faster convergence, lower variance, and improved robustness under channel uncertainty, outperforming baseline DRL algorithms such as DDPG, soft actor-critic (SAC), and advantage actor-critic (A2C).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This manuscript proposes a RIS-assisted wireless network integrating NOMA and over-the-air federated learning (AirFL) to support concurrent communication and learning tasks. It formulates a joint non-convex optimization problem to minimize the optimality gap under co-channel interference, imperfect CSI, and SIC errors, then reformulates it as a Markov decision process solved via an LSTM-DDPG algorithm. Simulations claim faster convergence, lower variance, and improved robustness compared to DDPG, SAC, and A2C baselines.
Significance. If the MDP formulation is shown to faithfully capture the system interactions, the work could offer a practical DRL-based method for resource allocation in integrated RIS-NOMA-AirFL systems, addressing spectrum scarcity while handling real-world impairments like channel uncertainty.
major comments (2)
- [Abstract] Abstract (optimization framework and MDP reformulation): The claim that LSTM-DDPG reduces the optimality gap rests on the assumption that the MDP reward computed in simulation equals the true gap. No derivation or bound is supplied demonstrating that this holds when underlying channel statistics or SIC error rates differ from the training ensemble; this directly undermines the reported robustness under channel uncertainty.
- [Abstract] Abstract (simulation results): The outperformance in convergence and variance over DDPG/SAC/A2C is presented without evidence that the MDP state/action/reward definitions accurately encode the coupled effects of NOMA user interference, AirFL aggregation, imperfect CSI estimation, and SIC decoding errors, making the headline performance claims dependent on unverified model fidelity.
minor comments (1)
- [Abstract] The abstract could specify key simulation parameters (e.g., number of NOMA/AirFL users, RIS elements, or CSI error variance) to aid reproducibility of the reported convergence curves.
Simulated Author's Rebuttal
We appreciate the referee's comments highlighting the need for stronger validation of the MDP formulation and its connection to the optimality gap. Below, we respond to each major comment and describe the planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract (optimization framework and MDP reformulation): The claim that LSTM-DDPG reduces the optimality gap rests on the assumption that the MDP reward computed in simulation equals the true gap. No derivation or bound is supplied demonstrating that this holds when underlying channel statistics or SIC error rates differ from the training ensemble; this directly undermines the reported robustness under channel uncertainty.
Authors: We agree that a formal bound relating the simulated MDP reward to the true optimality gap under shifts in channel statistics or SIC error rates would provide stronger guarantees. The MDP is constructed directly from the system model in Section III, with the reward defined as the negative of the optimality gap expression that incorporates imperfect CSI and SIC errors, and the state including CSI estimates and error parameters. Simulations already test robustness by evaluating policies under channel distributions and SIC rates outside the training ensemble. In the revision we will add a dedicated discussion subsection deriving the MDP components from the optimization problem and presenting additional empirical results under mismatched conditions to support the robustness claims. revision: partial
-
Referee: [Abstract] Abstract (simulation results): The outperformance in convergence and variance over DDPG/SAC/A2C is presented without evidence that the MDP state/action/reward definitions accurately encode the coupled effects of NOMA user interference, AirFL aggregation, imperfect CSI estimation, and SIC decoding errors, making the headline performance claims dependent on unverified model fidelity.
Authors: The MDP definitions are intended to encode these couplings explicitly: the state comprises the estimated CSI vectors for all NOMA and AirFL users; the action space consists of joint power allocation coefficients and RIS phase shifts; and the reward is computed from the closed-form optimality gap that includes NOMA interference terms, SIC error propagation, and AirFL over-the-air aggregation noise. We will revise the manuscript to include a new subsection with explicit equations mapping each system effect to the MDP elements, together with sensitivity analysis and ablation results that isolate the impact of each impairment on learning performance. revision: yes
Circularity Check
No significant circularity; derivation is self-contained empirical simulation
full rationale
The paper formulates a joint non-convex optimization for RIS-NOMA-AirFL resource allocation to minimize optimality gap under imperfect CSI and SIC errors, then reformulates it as an MDP solved via LSTM-DDPG. Reported results are simulation-based comparisons of convergence speed, variance, and robustness against DDPG/SAC/A2C baselines. No quoted equations, self-citations, or steps reduce the central performance claims by construction to fitted inputs or prior author results; the MDP reward and policy learning are independent of the final comparative metrics, and no uniqueness theorem or ansatz is smuggled in.
Axiom & Free-Parameter Ledger
free parameters (1)
- LSTM-DDPG hyperparameters
axioms (1)
- domain assumption Standard models for RIS phase shifts, NOMA power allocation, and imperfect CSI estimation errors hold.
Reference graph
Works this paper leans on
-
[1]
The roadmap to 6G: AI empowered wireless net- works,
K. Letaiefet al., “The roadmap to 6G: AI empowered wireless net- works,”IEEE Commun. Mag., vol. 57, no. 8, pp. 84–90, Aug. 2019
2019
-
[2]
A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,
W. Saadet al., “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,”IEEE Netw., vol. 34, no. 3, pp. 134–142, Oct. 2019
2019
-
[3]
RIS-assisted over-the-air federated learning in millimeter wave MIMO networks,
L. Huet al., “RIS-assisted over-the-air federated learning in millimeter wave MIMO networks,”J. Commun. Inf. Netw., vol. 7, no. 2, pp. 145– 156, Jun. 2022
2022
-
[4]
Over-the-air federated learning from heterogeneous data,
T. Seryet al., “Over-the-air federated learning from heterogeneous data,” IEEE Trans. Signal Process., vol. 69, pp. 3796–3811, Jun. 2021
2021
-
[5]
Federated learning via over-the-air computation,
K. Yanget al., “Federated learning via over-the-air computation,”IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, Jan. 2020
2022
-
[6]
Robust resource allocation for over-the-air com- putation networks with fluid antenna array,
S. Pakravanet al., “Robust resource allocation for over-the-air com- putation networks with fluid antenna array,” inProc. IEEE Globecom Workshops, Cape Town, South Africa, pp. 1–6, Aug. 2024
2024
-
[7]
AI-based fluid antenna design for client selection in over-the-air federated learning,
M. Ahmadzadehet al., “AI-based fluid antenna design for client selection in over-the-air federated learning,”IEEE Internet Things J., vol. 12, no. 20, pp. 42 549–42 558, Oct. 2025
2025
-
[8]
6G wireless networks: Vision, requirements, architec- ture, and key technologies,
Z. Zhanget al., “6G wireless networks: Vision, requirements, architec- ture, and key technologies,”IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 28–41, Jul. 2019
2019
-
[9]
Physical layer security for NOMA systems: Require- ments, issues, and recommendations,
S. Pakravanet al., “Physical layer security for NOMA systems: Require- ments, issues, and recommendations,”IEEE Internet Things J., vol. 10, no. 24, pp. 21 721–21 737, Dec. 2023
2023
-
[10]
A survey of NOMA: Current status and open research challenges,
B. Makkiet al., “A survey of NOMA: Current status and open research challenges,”IEEE Open J. Commun. Soc., vol. 1, pp. 179–189, Jan. 2020
2020
-
[11]
Reconfigurable-intelligent-surface empowered wireless communications: Challenges and opportunities,
X. Yuanet al., “Reconfigurable-intelligent-surface empowered wireless communications: Challenges and opportunities,”IEEE Wireless Com- mun., vol. 28, no. 2, pp. 136–143, Feb. 2021
2021
-
[12]
Physical-layer security of RIS-assisted networks over correlated fisher-snedecor F fading channels,
S. Pakravanet al., “Physical-layer security of RIS-assisted networks over correlated fisher-snedecor F fading channels,”IEEE Internet Things J., vol. 11, no. 9, pp. 15 152–15 165, May. 2024
2024
-
[13]
Covert communications with enhanced physical layer security in RIS-assisted cooperative networks,
X. Liet al., “Covert communications with enhanced physical layer security in RIS-assisted cooperative networks,”IEEE Trans. Wireless Commun., vol. 24, no. 7, pp. 5605–5619, Jul. 2025
2025
-
[14]
Reconfigurable intelligent surfaces for energy effi- ciency in wireless communication,
C. Huanget al., “Reconfigurable intelligent surfaces for energy effi- ciency in wireless communication,”IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 4157–4170, Jun. 2019. 16
2019
-
[15]
Wideband beamforming for RIS assisted near-field communications,
J. Wanget al., “Wideband beamforming for RIS assisted near-field communications,”IEEE Trans. Wireless Commun., vol. 23, no. 11, pp. 16 836–16 851, Nov. 2024
2024
-
[16]
AI-based secure NOMA and cognitive radio- enabled green communications: Channel state information and battery value uncertainties,
S. Sheikhzadehet al., “AI-based secure NOMA and cognitive radio- enabled green communications: Channel state information and battery value uncertainties,”IEEE Trans. Green Commun. Netw., vol. 6, no. 2, pp. 1037–1054, Dec. 2021
2021
-
[17]
Over-the-air computation via RIS,
W. Fanget al., “Over-the-air computation via RIS,”IEEE Trans. Commun., vol. 69, no. 12, pp. 8612–8626, Sep. 2021
2021
-
[18]
RIS enhanced massive non-orthogonal multiple access networks: Deployment and passive beamforming design,
X. Liuet al., “RIS enhanced massive non-orthogonal multiple access networks: Deployment and passive beamforming design,”IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 8612–8626, Jan. 2021
2021
-
[19]
Balancing accuracy and integrity for reconfigurable intelligent surface-aided over-the-air federated learning,
J. Zhenget al., “Balancing accuracy and integrity for reconfigurable intelligent surface-aided over-the-air federated learning,”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 964–10 980, Jul. 2022
2022
-
[20]
Joint location and beamforming design for STAR-RIS assisted NOMA systems,
Q. Gaoet al., “Joint location and beamforming design for STAR-RIS assisted NOMA systems,”IEEE Trans. Commun., vol. 71, no. 4, pp. 2532–2546, Feb. 2023
2023
-
[21]
STAR-RIS-assisted covert wireless communications with randomly distributed blockages,
X. Liet al., “STAR-RIS-assisted covert wireless communications with randomly distributed blockages,”IEEE Trans. Wireless Commun., vol. 24, no. 6, pp. 4690–4705, Jun. 2025
2025
-
[22]
Deep reinforcement learning for energy efficiency maximization in SWIPT-based over-the-air federated learning,
X. Zhanget al., “Deep reinforcement learning for energy efficiency maximization in SWIPT-based over-the-air federated learning,”IEEE Trans. Green Commun. Netw., vol. 8, no. 1, pp. 525–541, Aug. 2024
2024
-
[23]
Integrating over-the-air federated learning and non- orthogonal multiple access: What role can RIS play?
W. Niet al., “Integrating over-the-air federated learning and non- orthogonal multiple access: What role can RIS play?”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 083–10 099, Jun. 2022
2022
-
[24]
STAR-RIS integrated non orthogonal multiple access and over- the-air federated learning: Framework, analysis, and optimization,
——, “STAR-RIS integrated non orthogonal multiple access and over- the-air federated learning: Framework, analysis, and optimization,”IEEE Internet Things J., vol. 9, no. 18, pp. 17 136–17 156, Jul. 2022
2022
-
[25]
Novel over-the-air federated learning via reconfigurable intelligent surface and SWIPT,
G. Zhenget al., “Novel over-the-air federated learning via reconfigurable intelligent surface and SWIPT,” pp. 34 140–34 155, Jan. 2024
2024
-
[26]
Federated learning with NOMA assisted by multiple RIS: Latency minimizing optimization and auction,
T. H. T. Leet al., “Federated learning with NOMA assisted by multiple RIS: Latency minimizing optimization and auction,”IEEE Trans. Veh. Technol., vol. 72, no. 9, pp. 11 558–11 574, Nov. 2023
2023
-
[27]
RIS-assisted over-the-air adaptive federated learning with noisy downlink,
J. Maoet al., “RIS-assisted over-the-air adaptive federated learning with noisy downlink,” inProc. IEEE ICC Workshops, Rome, Italy, pp. 98– 103, May. 2023
2023
-
[28]
Deep reinforcement learning for robust RIS- aided OTA-FL in cognitive radio,
M. Ahmadzadehet al., “Deep reinforcement learning for robust RIS- aided OTA-FL in cognitive radio,” inProc. IEEE MECOM, Abu Dhabi, United Arab Emirates, pp. 368-373, Feb. 2024
2024
-
[29]
Enhanced over-the-air federated learning using AI-based fluid antenna system,
——, “Enhanced over-the-air federated learning using AI-based fluid antenna system,” inProc. IEEE WCNC, Milan, Italy, pp. 1-6, May. 2025
2025
-
[30]
Fluid antenna-assisted uplink NOMA networks under imperfect SIC,
S. Pakravanet al., “Fluid antenna-assisted uplink NOMA networks under imperfect SIC,”IEEE Trans. Veh. Technol., vol. 71, no. 1, pp. 1689– 1694, Jan. 2026
2026
-
[31]
Deep reinforcement learning for multi-functional RIS- aided over-the-air federated learning in internet of robotic things,
X. Zhanget al., “Deep reinforcement learning for multi-functional RIS- aided over-the-air federated learning in internet of robotic things,” in Proc. IEEE ICC, Denver, USA, pp. 5461-5466, Jun. 2024
2024
-
[32]
Resource allocation for multi-cell IRS-aided NOMA networks,
W. Niet al., “Resource allocation for multi-cell IRS-aided NOMA networks,”IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4253– 4268, Jul. 2021
2021
-
[33]
Multicell MIMO communications relying on intelligent reflecting surfaces,
C. Panet al., “Multicell MIMO communications relying on intelligent reflecting surfaces,”IEEE Trans. Wireless Commun., vol. 19, no. 8, pp. 5218–5233, May. 2020
2020
-
[34]
Convergence time optimization for federated learning over wireless networks,
M. Chenet al., “Convergence time optimization for federated learning over wireless networks,”IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2457–2471, Dec. 2021
2021
-
[35]
Residual transceiver hardware impairments on cooperative NOMA networks,
X. Liet al., “Residual transceiver hardware impairments on cooperative NOMA networks,”IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 680–695, Jan. 2020
2020
-
[36]
Theoretical analysis of the dynamic decode ordering SIC receiver for uplink NOMA systems,
Y . Gaoet al., “Theoretical analysis of the dynamic decode ordering SIC receiver for uplink NOMA systems,”IEEE Commun. Lett., vol. 21, no. 10, pp. 2246–2249, Jun. 2017
2017
-
[37]
Optimized power control design for over-the-air federated edge learning,
X. Caoet al., “Optimized power control design for over-the-air federated edge learning,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 342–358, Nov. 2022
2022
-
[38]
Joint optimization of communications and federated learning over the air,
X. Fanet al., “Joint optimization of communications and federated learning over the air,”IEEE Trans. Wireless Commun., vol. 21, no. 6, pp. 4434–4449, Dec. 2022
2022
-
[39]
AI-based resource allocation in end-to-end net- work slicing under demand and CSI uncertainties,
A. Gharehgoliet al., “AI-based resource allocation in end-to-end net- work slicing under demand and CSI uncertainties,”IEEE Trans. Netw. Serv. Manag., vol. 20, no. 3, pp. 3630–3651, Feb. 2023
2023
-
[40]
AI-enhanced RIS-aided cognitive radio network: Integrating communication and over-the-air federated learning users,
M. Ahmadzadehet al., “AI-enhanced RIS-aided cognitive radio network: Integrating communication and over-the-air federated learning users,” IEEE Trans. Veh. Technol., pp. 1–14, Jan. 2026
2026
-
[41]
Intelligent reflecting surface-assisted cognitive radio system,
J. Yuanet al., “Intelligent reflecting surface-assisted cognitive radio system,”IEEE Trans. Commun., vol. 69, no. 1, pp. 675–687, Oct. 2020
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.