arxiv: 2604.17201 · v1 · submitted 2026-04-19 · 📡 eess.SP

Robust Resource Allocation in RIS-Assisted Wireless Networks Integrating NOMA and Over-the-Air Federated Learning

Saeid Pakravan , Mohsen Ahmadzadeh , Ming Zeng , Ghosheh Abed Hodtani , Xingwang Li , Ji Wang , Gongpu Wang This is my paper

Pith reviewed 2026-05-10 06:28 UTC · model grok-4.3

classification 📡 eess.SP

keywords reconfigurable intelligent surfacenon-orthogonal multiple accessover-the-air federated learningdeep reinforcement learningresource allocationimperfect channel state informationsuccessive interference cancellation

0 comments

The pith

LSTM-DDPG achieves faster convergence and lower variance than standard deep reinforcement learning for resource allocation in RIS-assisted NOMA-AirFL networks under channel uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a wireless network that uses reconfigurable intelligent surfaces to support simultaneous non-orthogonal multiple access for data transmission and over-the-air federated learning for distributed model training. It formulates a joint optimization problem that minimizes the optimality gap while handling co-channel interference, imperfect channel state information, and successive interference cancellation errors. The resulting non-convex problem is recast as a Markov decision process and solved with an LSTM-DDPG algorithm that retains memory of past states to improve decision making over time. Simulations indicate that this approach converges more quickly, shows less performance variation, and maintains better robustness to channel inaccuracies than baseline algorithms such as DDPG, SAC, and A2C. A sympathetic reader would care because it points to a concrete method for using limited spectrum to run both communication and learning tasks at once.

Core claim

The central claim is that reformulating the joint resource allocation problem for RIS phase shifts and power allocation in a network serving both NOMA communication users and AirFL learning users as a Markov decision process, then solving it with an LSTM-enhanced deep deterministic policy gradient algorithm, reduces the optimality gap more effectively than standard deep reinforcement learning methods when imperfect channel state information and successive interference cancellation errors are present.

What carries the argument

The LSTM-DDPG algorithm, which augments the deep deterministic policy gradient with long short-term memory to retain temporal information when learning policies for power allocation and RIS phase shifts in the Markov decision process.

Load-bearing premise

The Markov decision process formulation accurately captures the interactions between NOMA users, AirFL users, imperfect channel state information, and successive interference cancellation errors.

What would settle it

Real-world deployment on a hardware testbed with measured channels and actual RIS hardware shows that the learned LSTM-DDPG policy fails to converge faster or exhibits higher variance than DDPG, SAC, or A2C baselines.

Figures

Figures reproduced from arXiv: 2604.17201 by Ghosheh Abed Hodtani, Gongpu Wang, Ji Wang, Ming Zeng, Mohsen Ahmadzadeh, Saeid Pakravan, Xingwang Li.

**Figure 3.** Figure 3: Average reward versus different number of AirFL users. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Average reward versus power budgets of users. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 2.** Figure 2: Comparison of DRL algorithms based on the average [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 5.** Figure 5: Average reward versus the CSI imperfection. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Average reward versus the SIC imperfection. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 8.** Figure 8: The learning performance over different communication rounds with MNIST dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

This paper addresses the critical issue of spectrum scarcity and the need to support diverse services, including communication and learning tasks, by presenting a reconfigurable intelligent surface (RIS)-aided wireless network framework that integrates non-orthogonal multiple access (NOMA) with over-the-air federated learning (AirFL). The proposed system leverages the ability of RIS to adaptively shape wireless channels, aiming to enhance overall network performance for both communication and learning through concurrent uplink transmissions. To tackle critical challenges such as co-channel interference, imperfect channel state information (CSI), and successive interference cancellation (SIC), we develop an optimization framework that focuses on minimizing the optimality gap. This joint optimization is formulated as a non-convex problem, complicated by the intricate interactions between NOMA and AirFL users as well as the impact of imperfect CSI and SIC. To overcome these challenges and reduce the optimality gap, we reformulate the optimization problem as a Markov decision process and solve it using a long short-term memory deep deterministic policy gradient (LSTM-DDPG) algorithm, a memory-based approach within deep reinforcement learning (DRL). Simulation results demonstrate that the proposed approach achieves faster convergence, lower variance, and improved robustness under channel uncertainty, outperforming baseline DRL algorithms such as DDPG, soft actor-critic (SAC), and advantage actor-critic (A2C).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies LSTM-DDPG to joint RIS-NOMA-AirFL allocation and reports simulation gains over baselines, but the robustness rests on unverified MDP fidelity for imperfect CSI and SIC.

read the letter

The paper applies an LSTM-enhanced DDPG algorithm to resource allocation in a RIS-assisted network that combines NOMA communications with over-the-air federated learning. It claims faster convergence and better robustness in simulations compared to standard DRL methods. The new element is the joint handling of NOMA and AirFL users under one optimization framework that accounts for imperfect CSI and SIC errors. The authors reformulate the non-convex problem as an MDP and use the memory capabilities of LSTM to improve policy learning in this setting. This is a legitimate extension of existing DRL techniques to a more complex wireless scenario, and the simulations appear to demonstrate practical gains in convergence speed and variance reduction. What the work does well is clearly defining the system with concurrent uplink transmissions and focusing on the optimality gap as the objective. The choice of LSTM-DDPG makes sense for capturing sequential dependencies in channel states or interference patterns. The main soft spot is the reliance on simulation fidelity. The performance improvements depend on how accurately the MDP state, action, and reward capture the real effects of co-channel interference, CSI estimation errors, and SIC decoding failures. Without analytical guarantees or tests on mismatched channel distributions, the outperformance over DDPG, SAC, and A2C might not generalize. The abstract does not provide details on the exact reward formulation or hyperparameter tuning, which leaves some uncertainty about whether the baselines were given equivalent effort. This paper is aimed at researchers in wireless networks and machine learning for communications, particularly those interested in 6G systems that support both data transmission and distributed training. A reader looking for applied DRL examples in integrated setups would find the results useful. It deserves a serious referee because the problem is well-motivated and the method is implemented with comparisons, even though the central claims need verification on the modeling assumptions. I would recommend sending this to peer review. The simulation results provide a starting point for discussion, and referees can push for more on robustness or alternative optimization approaches.

Referee Report

2 major / 1 minor

Summary. This manuscript proposes a RIS-assisted wireless network integrating NOMA and over-the-air federated learning (AirFL) to support concurrent communication and learning tasks. It formulates a joint non-convex optimization problem to minimize the optimality gap under co-channel interference, imperfect CSI, and SIC errors, then reformulates it as a Markov decision process solved via an LSTM-DDPG algorithm. Simulations claim faster convergence, lower variance, and improved robustness compared to DDPG, SAC, and A2C baselines.

Significance. If the MDP formulation is shown to faithfully capture the system interactions, the work could offer a practical DRL-based method for resource allocation in integrated RIS-NOMA-AirFL systems, addressing spectrum scarcity while handling real-world impairments like channel uncertainty.

major comments (2)

[Abstract] Abstract (optimization framework and MDP reformulation): The claim that LSTM-DDPG reduces the optimality gap rests on the assumption that the MDP reward computed in simulation equals the true gap. No derivation or bound is supplied demonstrating that this holds when underlying channel statistics or SIC error rates differ from the training ensemble; this directly undermines the reported robustness under channel uncertainty.
[Abstract] Abstract (simulation results): The outperformance in convergence and variance over DDPG/SAC/A2C is presented without evidence that the MDP state/action/reward definitions accurately encode the coupled effects of NOMA user interference, AirFL aggregation, imperfect CSI estimation, and SIC decoding errors, making the headline performance claims dependent on unverified model fidelity.

minor comments (1)

[Abstract] The abstract could specify key simulation parameters (e.g., number of NOMA/AirFL users, RIS elements, or CSI error variance) to aid reproducibility of the reported convergence curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's comments highlighting the need for stronger validation of the MDP formulation and its connection to the optimality gap. Below, we respond to each major comment and describe the planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (optimization framework and MDP reformulation): The claim that LSTM-DDPG reduces the optimality gap rests on the assumption that the MDP reward computed in simulation equals the true gap. No derivation or bound is supplied demonstrating that this holds when underlying channel statistics or SIC error rates differ from the training ensemble; this directly undermines the reported robustness under channel uncertainty.

Authors: We agree that a formal bound relating the simulated MDP reward to the true optimality gap under shifts in channel statistics or SIC error rates would provide stronger guarantees. The MDP is constructed directly from the system model in Section III, with the reward defined as the negative of the optimality gap expression that incorporates imperfect CSI and SIC errors, and the state including CSI estimates and error parameters. Simulations already test robustness by evaluating policies under channel distributions and SIC rates outside the training ensemble. In the revision we will add a dedicated discussion subsection deriving the MDP components from the optimization problem and presenting additional empirical results under mismatched conditions to support the robustness claims. revision: partial
Referee: [Abstract] Abstract (simulation results): The outperformance in convergence and variance over DDPG/SAC/A2C is presented without evidence that the MDP state/action/reward definitions accurately encode the coupled effects of NOMA user interference, AirFL aggregation, imperfect CSI estimation, and SIC decoding errors, making the headline performance claims dependent on unverified model fidelity.

Authors: The MDP definitions are intended to encode these couplings explicitly: the state comprises the estimated CSI vectors for all NOMA and AirFL users; the action space consists of joint power allocation coefficients and RIS phase shifts; and the reward is computed from the closed-form optimality gap that includes NOMA interference terms, SIC error propagation, and AirFL over-the-air aggregation noise. We will revise the manuscript to include a new subsection with explicit equations mapping each system effect to the MDP elements, together with sensitivity analysis and ablation results that isolate the impact of each impairment on learning performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained empirical simulation

full rationale

The paper formulates a joint non-convex optimization for RIS-NOMA-AirFL resource allocation to minimize optimality gap under imperfect CSI and SIC errors, then reformulates it as an MDP solved via LSTM-DDPG. Reported results are simulation-based comparisons of convergence speed, variance, and robustness against DDPG/SAC/A2C baselines. No quoted equations, self-citations, or steps reduce the central performance claims by construction to fitted inputs or prior author results; the MDP reward and policy learning are independent of the final comparative metrics, and no uniqueness theorem or ansatz is smuggled in.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard wireless channel models, the ability of DRL to solve the non-convex joint optimization, and simulation-based validation; no new physical entities are introduced.

free parameters (1)

LSTM-DDPG hyperparameters
Learning rate, memory size, and exploration parameters are chosen to achieve the reported convergence and must be tuned for the specific MDP.

axioms (1)

domain assumption Standard models for RIS phase shifts, NOMA power allocation, and imperfect CSI estimation errors hold.
Invoked when formulating the optimality gap and the MDP state space.

pith-pipeline@v0.9.0 · 5568 in / 1246 out tokens · 34023 ms · 2026-05-10T06:28:44.975365+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references

[1]

The roadmap to 6G: AI empowered wireless net- works,

K. Letaiefet al., “The roadmap to 6G: AI empowered wireless net- works,”IEEE Commun. Mag., vol. 57, no. 8, pp. 84–90, Aug. 2019

2019
[2]

A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,

W. Saadet al., “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,”IEEE Netw., vol. 34, no. 3, pp. 134–142, Oct. 2019

2019
[3]

RIS-assisted over-the-air federated learning in millimeter wave MIMO networks,

L. Huet al., “RIS-assisted over-the-air federated learning in millimeter wave MIMO networks,”J. Commun. Inf. Netw., vol. 7, no. 2, pp. 145– 156, Jun. 2022

2022
[4]

Over-the-air federated learning from heterogeneous data,

T. Seryet al., “Over-the-air federated learning from heterogeneous data,” IEEE Trans. Signal Process., vol. 69, pp. 3796–3811, Jun. 2021

2021
[5]

Federated learning via over-the-air computation,

K. Yanget al., “Federated learning via over-the-air computation,”IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, Jan. 2020

2022
[6]

Robust resource allocation for over-the-air com- putation networks with fluid antenna array,

S. Pakravanet al., “Robust resource allocation for over-the-air com- putation networks with fluid antenna array,” inProc. IEEE Globecom Workshops, Cape Town, South Africa, pp. 1–6, Aug. 2024

2024
[7]

AI-based fluid antenna design for client selection in over-the-air federated learning,

M. Ahmadzadehet al., “AI-based fluid antenna design for client selection in over-the-air federated learning,”IEEE Internet Things J., vol. 12, no. 20, pp. 42 549–42 558, Oct. 2025

2025
[8]

6G wireless networks: Vision, requirements, architec- ture, and key technologies,

Z. Zhanget al., “6G wireless networks: Vision, requirements, architec- ture, and key technologies,”IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 28–41, Jul. 2019

2019
[9]

Physical layer security for NOMA systems: Require- ments, issues, and recommendations,

S. Pakravanet al., “Physical layer security for NOMA systems: Require- ments, issues, and recommendations,”IEEE Internet Things J., vol. 10, no. 24, pp. 21 721–21 737, Dec. 2023

2023
[10]

A survey of NOMA: Current status and open research challenges,

B. Makkiet al., “A survey of NOMA: Current status and open research challenges,”IEEE Open J. Commun. Soc., vol. 1, pp. 179–189, Jan. 2020

2020
[11]

Reconfigurable-intelligent-surface empowered wireless communications: Challenges and opportunities,

X. Yuanet al., “Reconfigurable-intelligent-surface empowered wireless communications: Challenges and opportunities,”IEEE Wireless Com- mun., vol. 28, no. 2, pp. 136–143, Feb. 2021

2021
[12]

Physical-layer security of RIS-assisted networks over correlated fisher-snedecor F fading channels,

S. Pakravanet al., “Physical-layer security of RIS-assisted networks over correlated fisher-snedecor F fading channels,”IEEE Internet Things J., vol. 11, no. 9, pp. 15 152–15 165, May. 2024

2024
[13]

Covert communications with enhanced physical layer security in RIS-assisted cooperative networks,

X. Liet al., “Covert communications with enhanced physical layer security in RIS-assisted cooperative networks,”IEEE Trans. Wireless Commun., vol. 24, no. 7, pp. 5605–5619, Jul. 2025

2025
[14]

Reconfigurable intelligent surfaces for energy effi- ciency in wireless communication,

C. Huanget al., “Reconfigurable intelligent surfaces for energy effi- ciency in wireless communication,”IEEE Trans. Wireless Commun., vol. 18, no. 8, pp. 4157–4170, Jun. 2019. 16

2019
[15]

Wideband beamforming for RIS assisted near-field communications,

J. Wanget al., “Wideband beamforming for RIS assisted near-field communications,”IEEE Trans. Wireless Commun., vol. 23, no. 11, pp. 16 836–16 851, Nov. 2024

2024
[16]

AI-based secure NOMA and cognitive radio- enabled green communications: Channel state information and battery value uncertainties,

S. Sheikhzadehet al., “AI-based secure NOMA and cognitive radio- enabled green communications: Channel state information and battery value uncertainties,”IEEE Trans. Green Commun. Netw., vol. 6, no. 2, pp. 1037–1054, Dec. 2021

2021
[17]

Over-the-air computation via RIS,

W. Fanget al., “Over-the-air computation via RIS,”IEEE Trans. Commun., vol. 69, no. 12, pp. 8612–8626, Sep. 2021

2021
[18]

RIS enhanced massive non-orthogonal multiple access networks: Deployment and passive beamforming design,

X. Liuet al., “RIS enhanced massive non-orthogonal multiple access networks: Deployment and passive beamforming design,”IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 8612–8626, Jan. 2021

2021
[19]

Balancing accuracy and integrity for reconfigurable intelligent surface-aided over-the-air federated learning,

J. Zhenget al., “Balancing accuracy and integrity for reconfigurable intelligent surface-aided over-the-air federated learning,”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 964–10 980, Jul. 2022

2022
[20]

Joint location and beamforming design for STAR-RIS assisted NOMA systems,

Q. Gaoet al., “Joint location and beamforming design for STAR-RIS assisted NOMA systems,”IEEE Trans. Commun., vol. 71, no. 4, pp. 2532–2546, Feb. 2023

2023
[21]

STAR-RIS-assisted covert wireless communications with randomly distributed blockages,

X. Liet al., “STAR-RIS-assisted covert wireless communications with randomly distributed blockages,”IEEE Trans. Wireless Commun., vol. 24, no. 6, pp. 4690–4705, Jun. 2025

2025
[22]

Deep reinforcement learning for energy efficiency maximization in SWIPT-based over-the-air federated learning,

X. Zhanget al., “Deep reinforcement learning for energy efficiency maximization in SWIPT-based over-the-air federated learning,”IEEE Trans. Green Commun. Netw., vol. 8, no. 1, pp. 525–541, Aug. 2024

2024
[23]

Integrating over-the-air federated learning and non- orthogonal multiple access: What role can RIS play?

W. Niet al., “Integrating over-the-air federated learning and non- orthogonal multiple access: What role can RIS play?”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 083–10 099, Jun. 2022

2022
[24]

STAR-RIS integrated non orthogonal multiple access and over- the-air federated learning: Framework, analysis, and optimization,

——, “STAR-RIS integrated non orthogonal multiple access and over- the-air federated learning: Framework, analysis, and optimization,”IEEE Internet Things J., vol. 9, no. 18, pp. 17 136–17 156, Jul. 2022

2022
[25]

Novel over-the-air federated learning via reconfigurable intelligent surface and SWIPT,

G. Zhenget al., “Novel over-the-air federated learning via reconfigurable intelligent surface and SWIPT,” pp. 34 140–34 155, Jan. 2024

2024
[26]

Federated learning with NOMA assisted by multiple RIS: Latency minimizing optimization and auction,

T. H. T. Leet al., “Federated learning with NOMA assisted by multiple RIS: Latency minimizing optimization and auction,”IEEE Trans. Veh. Technol., vol. 72, no. 9, pp. 11 558–11 574, Nov. 2023

2023
[27]

RIS-assisted over-the-air adaptive federated learning with noisy downlink,

J. Maoet al., “RIS-assisted over-the-air adaptive federated learning with noisy downlink,” inProc. IEEE ICC Workshops, Rome, Italy, pp. 98– 103, May. 2023

2023
[28]

Deep reinforcement learning for robust RIS- aided OTA-FL in cognitive radio,

M. Ahmadzadehet al., “Deep reinforcement learning for robust RIS- aided OTA-FL in cognitive radio,” inProc. IEEE MECOM, Abu Dhabi, United Arab Emirates, pp. 368-373, Feb. 2024

2024
[29]

Enhanced over-the-air federated learning using AI-based fluid antenna system,

——, “Enhanced over-the-air federated learning using AI-based fluid antenna system,” inProc. IEEE WCNC, Milan, Italy, pp. 1-6, May. 2025

2025
[30]

Fluid antenna-assisted uplink NOMA networks under imperfect SIC,

S. Pakravanet al., “Fluid antenna-assisted uplink NOMA networks under imperfect SIC,”IEEE Trans. Veh. Technol., vol. 71, no. 1, pp. 1689– 1694, Jan. 2026

2026
[31]

Deep reinforcement learning for multi-functional RIS- aided over-the-air federated learning in internet of robotic things,

X. Zhanget al., “Deep reinforcement learning for multi-functional RIS- aided over-the-air federated learning in internet of robotic things,” in Proc. IEEE ICC, Denver, USA, pp. 5461-5466, Jun. 2024

2024
[32]

Resource allocation for multi-cell IRS-aided NOMA networks,

W. Niet al., “Resource allocation for multi-cell IRS-aided NOMA networks,”IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4253– 4268, Jul. 2021

2021
[33]

Multicell MIMO communications relying on intelligent reflecting surfaces,

C. Panet al., “Multicell MIMO communications relying on intelligent reflecting surfaces,”IEEE Trans. Wireless Commun., vol. 19, no. 8, pp. 5218–5233, May. 2020

2020
[34]

Convergence time optimization for federated learning over wireless networks,

M. Chenet al., “Convergence time optimization for federated learning over wireless networks,”IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2457–2471, Dec. 2021

2021
[35]

Residual transceiver hardware impairments on cooperative NOMA networks,

X. Liet al., “Residual transceiver hardware impairments on cooperative NOMA networks,”IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 680–695, Jan. 2020

2020
[36]

Theoretical analysis of the dynamic decode ordering SIC receiver for uplink NOMA systems,

Y . Gaoet al., “Theoretical analysis of the dynamic decode ordering SIC receiver for uplink NOMA systems,”IEEE Commun. Lett., vol. 21, no. 10, pp. 2246–2249, Jun. 2017

2017
[37]

Optimized power control design for over-the-air federated edge learning,

X. Caoet al., “Optimized power control design for over-the-air federated edge learning,”IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 342–358, Nov. 2022

2022
[38]

Joint optimization of communications and federated learning over the air,

X. Fanet al., “Joint optimization of communications and federated learning over the air,”IEEE Trans. Wireless Commun., vol. 21, no. 6, pp. 4434–4449, Dec. 2022

2022
[39]

AI-based resource allocation in end-to-end net- work slicing under demand and CSI uncertainties,

A. Gharehgoliet al., “AI-based resource allocation in end-to-end net- work slicing under demand and CSI uncertainties,”IEEE Trans. Netw. Serv. Manag., vol. 20, no. 3, pp. 3630–3651, Feb. 2023

2023
[40]

AI-enhanced RIS-aided cognitive radio network: Integrating communication and over-the-air federated learning users,

M. Ahmadzadehet al., “AI-enhanced RIS-aided cognitive radio network: Integrating communication and over-the-air federated learning users,” IEEE Trans. Veh. Technol., pp. 1–14, Jan. 2026

2026
[41]

Intelligent reflecting surface-assisted cognitive radio system,

J. Yuanet al., “Intelligent reflecting surface-assisted cognitive radio system,”IEEE Trans. Commun., vol. 69, no. 1, pp. 675–687, Oct. 2020

2020