Energy Saving for Cell-Free Massive MIMO Networks: A Multi-Agent Deep Reinforcement Learning Approach
Pith reviewed 2026-05-10 17:32 UTC · model grok-4.3
The pith
Multi-agent reinforcement learning lets each access point in cell-free massive MIMO networks independently select antenna setups and sleep modes to cut power use.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The multi-agent DRL framework trains each access point as an independent agent that observes local traffic and channel conditions, then selects both the number of active antennas and the appropriate advanced sleep mode to minimize power draw while meeting service requirements; after training, the agents operate fully distributed and achieve 56.23 percent lower power consumption than a baseline with no energy-saving scheme and 30.12 percent lower than a non-learning lightest-sleep-only policy, with only a modest rise in drop ratio.
What carries the argument
Multi-agent deep reinforcement learning in which each access point acts as a separate agent that chooses its own antenna reconfiguration and advanced sleep mode based on local observations of traffic and channels.
If this is right
- Cell-free massive MIMO networks can reduce power consumption substantially while adapting to real-time traffic changes without central coordination.
- Distributed agent decisions eliminate the signaling overhead of centralized controllers.
- The same framework can be compared against other reinforcement learning variants to identify the best trade-off between power and service quality.
- Advanced sleep modes combined with antenna scaling provide a practical lever for energy management in dense deployments.
Where Pith is reading between the lines
- The distributed nature suggests the method could scale to very large numbers of access points where centralized training becomes impractical.
- Policies learned in one network layout might transfer to nearby layouts with similar traffic statistics, reducing the need for full retraining.
- Incorporating predicted future traffic into the agent observations could further lower the drop ratio without sacrificing power gains.
Load-bearing premise
The simulation model captures real-world dynamic traffic and channel variations closely enough that agents trained in simulation will maintain their performance when deployed without retraining.
What would settle it
Running the trained agents on live cell-free massive MIMO hardware under measured varying user loads and recording whether the achieved power savings and drop ratio remain within a few percentage points of the simulated figures.
Figures
read the original abstract
This paper focuses on energy savings in downlink operation of cell-free massive MIMO (CF mMIMO) networks under dynamic traffic conditions. We propose a multi-agent deep reinforcement learning (MADRL) algorithm that enables each access point (AP) to autonomously control antenna re-configuration and advanced sleep mode (ASM) selection. After the training process, the proposed framework operates in a fully distributed manner, eliminating the need for centralized control and allowing each AP to dynamically adjust to real-time traffic fluctuations. Simulation results show that the proposed algorithm reduces power consumption (PC) by 56.23% compared to systems without any energy-saving scheme and by 30.12% relative to a non-learning mechanism that only utilizes the lightest sleep mode, with only a slight increase in drop ratio. Moreover, compared to the widely used deep Q-network (DQN) algorithm, it achieves a similar PC level but with a significantly lower drop ratio.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a multi-agent deep reinforcement learning (MADRL) algorithm for energy-efficient downlink operation in cell-free massive MIMO networks under dynamic traffic. Each access point autonomously selects antenna reconfigurations and advanced sleep modes (ASM) during training, then operates in a fully distributed fashion without central coordination. Simulations report that the approach reduces power consumption by 56.23% versus no energy-saving scheme and by 30.12% versus a non-learning lightest-sleep-mode baseline, with only a slight increase in drop ratio; it also matches DQN power levels while achieving a lower drop ratio.
Significance. If the simulation results are reproducible, the work demonstrates that MADRL can deliver substantial energy savings in CF mMIMO while preserving QoS under varying loads, with the distributed post-training operation offering clear scalability advantages over centralized schemes. The quantitative comparisons to both heuristic and single-agent RL baselines provide concrete evidence of practical benefit in the simulated regime.
major comments (2)
- Abstract and results section: the specific power-consumption reductions (56.23% and 30.12%) and drop-ratio claims are presented without accompanying details on the number of Monte Carlo runs, variance across seeds, statistical significance tests, or exact traffic/channel parameter ranges, which are required to assess whether the reported gains are robust rather than artifacts of a single simulation configuration.
- Simulation setup (presumed §IV): the traffic model and channel variation assumptions underlying the training and testing environments are not shown to guarantee generalization; the weakest assumption that agents perform well on unseen scenarios without retraining therefore remains untested by the current experimental design.
minor comments (2)
- The acronym ASM is used before its expansion; define it at first occurrence.
- Figure captions for the power-consumption and drop-ratio plots should explicitly state the number of independent runs and any error bars or confidence intervals.
Simulated Author's Rebuttal
We thank the referee for the positive recommendation of minor revision and the constructive comments on result robustness and generalization. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: Abstract and results section: the specific power-consumption reductions (56.23% and 30.12%) and drop-ratio claims are presented without accompanying details on the number of Monte Carlo runs, variance across seeds, statistical significance tests, or exact traffic/channel parameter ranges, which are required to assess whether the reported gains are robust rather than artifacts of a single simulation configuration.
Authors: We agree that these supporting details are essential for evaluating result robustness. In the revised manuscript we will expand the abstract and results section to report the number of Monte Carlo runs performed, variance or standard deviations across random seeds, outcomes of statistical significance tests against the baselines, and the precise ranges and distributions of traffic arrival rates and channel parameters used in the simulations. revision: yes
-
Referee: Simulation setup (presumed §IV): the traffic model and channel variation assumptions underlying the training and testing environments are not shown to guarantee generalization; the weakest assumption that agents perform well on unseen scenarios without retraining therefore remains untested by the current experimental design.
Authors: The training process already exposes agents to a range of dynamic traffic loads and channel realizations within the modeled parameter space. We acknowledge, however, that explicit out-of-distribution testing on completely unseen traffic or channel statistics without retraining was not performed. In revision we will clarify the diversity of the training environment in §IV and add a dedicated discussion of generalization limits; we will also include any feasible additional experiments on modified test scenarios. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper proposes a MADRL algorithm for AP control in CF mMIMO and reports performance via simulations. The central claims (PC reductions of 56.23% and 30.12%) are direct numerical outputs from executing the trained policy under the stated traffic/channel models and baselines. No analytical derivation, prediction step, or uniqueness theorem is present that reduces to fitted inputs or self-citations by construction. RL training on simulated data is standard and does not create the enumerated circular patterns; the results remain falsifiable by re-running the described setup.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ubiquitous cell-free massive MIMO communications,
G. Interdonato, E. Björnson, H. Quoc Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,”EURASIP J. on Wireless Commun. and Netw., vol. 2019, no. 1, pp. 1–13, 2019
work page 2019
-
[2]
T. Van Chien, E. Björnson, and E. G. Larsson, “Optimal design of energy-efficient cell-free massive MIMO: Joint power allocation and load balancing,” inIEEE ICASSP, 2020, pp. 5145–5149
work page 2020
-
[3]
Cell-free massive MIMO energy efficiency improvement by access points iterative selection,
S. S. Mohammed and A. N. Almamori, “Cell-free massive MIMO energy efficiency improvement by access points iterative selection,” Journal of Engineering, vol. 30, no. 03, pp. 129–142, 2024
work page 2024
-
[4]
E. Shi, J. Zhang, Z. Liu, Y . Zhu, C. Yuen, D. W. K. Ng, M. Di Renzo, and B. Ai, “Joint precoding and AP selection for energy efficient RIS-aided cell-free massive MIMO using multi-agent reinforcement learning,” arXiv preprint arXiv:2411.11070, 2024
-
[5]
Energy reduction in cell-free massive MIMO through fine-grained resource management,
Ö. T. Demir, L. Méndez-Monsanto, N. Bastianello, E. Fitzgerald, and G. Callebaut, “Energy reduction in cell-free massive MIMO through fine-grained resource management,” in2024 Joint European Conference on Networks and Communications & 6G Summit (EuCNC/6G Summit). IEEE, 2024, pp. 547–552
work page 2024
-
[6]
J. García-Morales, G. Femenias, and F. Riera-Palou, “Energy-efficient access-point sleep-mode techniques for cell-free mmWave massive MIMO networks with non-uniform spatial traffic density,”IEEE Access, vol. 8, pp. 137 587–137 605, 2020
work page 2020
-
[7]
Ö. T. Demir, M. Masoudi, E. Björnson, and C. Cavdar, “Cell-free massive MIMO in O-RAN: Energy-aware joint orchestration of cloud, fronthaul, and radio resources,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 356–372, 2024
work page 2024
-
[8]
F. Riera-Palou, G. Femenias, D. López-Pérez, N. Piovesan, and A. De Domenico, “Energy efficient cell-free massive MIMO on 5G deployments: Sleep modes strategies and user stream management,” arXiv preprint arXiv:2306.06404, 2023
-
[9]
Energy-efficient cell-free massive MIMO with wireless fronthaul,
O. A. Topal, Ö. T. Demir, E. Björnson, and C. Cavdar, “Energy-efficient cell-free massive MIMO with wireless fronthaul,” in2024 58th Asilomar Conference on Signals, Systems, and Computers, 2024, pp. 1591–1596
work page 2024
-
[10]
Multi-agent reinforcement learning for energy saving in multi-cell massive MIMO systems,
T. Cai, Q. Wang, S. Zhang, Ö. T. Demir, and C. Cavdar, “Multi-agent reinforcement learning for energy saving in multi-cell massive MIMO systems,” inIEEE ICMLCN, 2024, pp. 480–485
work page 2024
-
[11]
Foundations of user- centric cell-free massive MIMO,
Ö. T. Demir, E. Björnson, and L. Sanguinetti, “Foundations of user- centric cell-free massive MIMO,”Foundations and Trends® in Signal Processing, vol. 14, no. 3-4, pp. 162–472, 2021
work page 2021
-
[12]
Local partial zero-forcing precoding for cell-free massive MIMO,
G. Interdonato, M. Karlsson, E. Björnson, and E. G. Larsson, “Local partial zero-forcing precoding for cell-free massive MIMO,”IEEE Trans. on Wireless Commun., vol. 19, no. 7, pp. 4758–4774, 2020
work page 2020
-
[13]
A. A. Razzacet al., “Advanced sleep modes in 5g multiple base stations using non-cooperative multi-agent reinforcement learning,” in IEEE GLOBECOM, 2023, pp. 7025–7030
work page 2023
-
[14]
Kairos: Energy-efficient radio unit control for O-RAN via advanced sleep modes,
J. Lozano, J. A. Ayala-Romero, A. Garcia-Saavedra, and X. Costa-Perez, “Kairos: Energy-efficient radio unit control for O-RAN via advanced sleep modes,”arXiv preprint arXiv:2501.15853, 2025
-
[15]
An analytical energy performance evaluation methodology for 5G base stations,
S. K. G. Peesapati, M. Olsson, M. Masoudi, S. Andersson, and C. Cav- dar, “An analytical energy performance evaluation methodology for 5G base stations,” inIEEE WiMob, 2021, pp. 202–207
work page 2021
-
[16]
Mobile traffic classification and multi-cell base station control for energy-efficient 5G networks,
C. Tianzhang, “Mobile traffic classification and multi-cell base station control for energy-efficient 5G networks,” M.S. thesis, KTH Royal Institute of Tecnology, Stockholm, Sweden, 2023, available at https: //kth.diva-portal.org/smash/get/diva2:1752823/FULLTEXT01.pdf
work page 2023
-
[17]
Study on channel model for frequencies from 0.5 to 100 GHz (Release 14),
3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz (Release 14),” 3GPP, Tech. Rep. TR 38.901 V14.0.0, 2017, eTSI TR 138 901 V14.0.0. [Online]. Available: https://www.etsi.org/deliver/etsi_ tr/138900_138999/138901/14.00.00_60/tr_138901v140000p.pdf
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.