Energy Saving for Cell-Free Massive MIMO Networks: A Multi-Agent Deep Reinforcement Learning Approach

Cicek Cavdar; Keyu Li; Mustafa Ozger; Ozan Alp Topal; \"Ozlem Tugfe Demir; Qichen Wang

arxiv: 2604.07133 · v1 · submitted 2026-04-08 · 💻 cs.IT · cs.AI· cs.LG· math.IT

Energy Saving for Cell-Free Massive MIMO Networks: A Multi-Agent Deep Reinforcement Learning Approach

Qichen Wang , Keyu Li , Ozan Alp Topal , \"Ozlem Tugfe Demir , Mustafa Ozger , Cicek Cavdar This is my paper

Pith reviewed 2026-05-10 17:32 UTC · model grok-4.3

classification 💻 cs.IT cs.AIcs.LGmath.IT

keywords energy savingcell-free massive MIMOmulti-agent deep reinforcement learningadvanced sleep modepower consumptiondistributed controldynamic traffic

0 comments

The pith

Multi-agent reinforcement learning lets each access point in cell-free massive MIMO networks independently select antenna setups and sleep modes to cut power use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a multi-agent deep reinforcement learning method for energy savings in the downlink of cell-free massive MIMO networks facing changing traffic loads. Each access point learns on its own to reconfigure active antennas and pick advanced sleep modes, running without any central controller once trained. Simulations indicate the approach lowers total power consumption by 56.23 percent versus no energy-saving actions and by 30.12 percent versus a simple lightest-sleep-mode rule, while the increase in user drop ratio stays small. It matches the power performance of standard deep Q-network methods yet produces a noticeably lower drop ratio.

Core claim

The multi-agent DRL framework trains each access point as an independent agent that observes local traffic and channel conditions, then selects both the number of active antennas and the appropriate advanced sleep mode to minimize power draw while meeting service requirements; after training, the agents operate fully distributed and achieve 56.23 percent lower power consumption than a baseline with no energy-saving scheme and 30.12 percent lower than a non-learning lightest-sleep-only policy, with only a modest rise in drop ratio.

What carries the argument

Multi-agent deep reinforcement learning in which each access point acts as a separate agent that chooses its own antenna reconfiguration and advanced sleep mode based on local observations of traffic and channels.

If this is right

Cell-free massive MIMO networks can reduce power consumption substantially while adapting to real-time traffic changes without central coordination.
Distributed agent decisions eliminate the signaling overhead of centralized controllers.
The same framework can be compared against other reinforcement learning variants to identify the best trade-off between power and service quality.
Advanced sleep modes combined with antenna scaling provide a practical lever for energy management in dense deployments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The distributed nature suggests the method could scale to very large numbers of access points where centralized training becomes impractical.
Policies learned in one network layout might transfer to nearby layouts with similar traffic statistics, reducing the need for full retraining.
Incorporating predicted future traffic into the agent observations could further lower the drop ratio without sacrificing power gains.

Load-bearing premise

The simulation model captures real-world dynamic traffic and channel variations closely enough that agents trained in simulation will maintain their performance when deployed without retraining.

What would settle it

Running the trained agents on live cell-free massive MIMO hardware under measured varying user loads and recording whether the achieved power savings and drop ratio remain within a few percentage points of the simulated figures.

Figures

Figures reproduced from arXiv: 2604.07133 by Cicek Cavdar, Keyu Li, Mustafa Ozger, Ozan Alp Topal, \"Ozlem Tugfe Demir, Qichen Wang.

**Figure 1.** Figure 1: CF MIMO system model. A. Channel Model The channel between AP l and UE k is characterized by a large-scale fading coefficient βl,k, capturing path loss and shadowing. The small-scale fading is assumed to follow independent and identically distributed Rayleigh fading. The coherence block has τc symbols, of which τp are used for uplink channel estimation and τc−τp are used for downlink data. During the uplin… view at source ↗

**Figure 2.** Figure 2: , a smaller ϕ yields a slower growth of the positive term with ρ, thus keeping the optimization biased toward reducing data drops [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Time-varying number of APs in different sleep modes. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Total demand rate [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

This paper focuses on energy savings in downlink operation of cell-free massive MIMO (CF mMIMO) networks under dynamic traffic conditions. We propose a multi-agent deep reinforcement learning (MADRL) algorithm that enables each access point (AP) to autonomously control antenna re-configuration and advanced sleep mode (ASM) selection. After the training process, the proposed framework operates in a fully distributed manner, eliminating the need for centralized control and allowing each AP to dynamically adjust to real-time traffic fluctuations. Simulation results show that the proposed algorithm reduces power consumption (PC) by 56.23% compared to systems without any energy-saving scheme and by 30.12% relative to a non-learning mechanism that only utilizes the lightest sleep mode, with only a slight increase in drop ratio. Moreover, compared to the widely used deep Q-network (DQN) algorithm, it achieves a similar PC level but with a significantly lower drop ratio.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MADRL for antenna and sleep control in cell-free MIMO gives clear simulation power cuts but rests on limited baselines and sim-only evidence.

read the letter

This paper takes multi-agent deep RL and applies it so each access point in a cell-free massive MIMO setup can pick its own antenna configuration and advanced sleep mode on the fly. After training the agents run without any central coordinator, which is the practical part that stands out. The simulations report 56% lower power use than doing nothing for energy and 30% lower than a basic lightest-sleep baseline, with only a small rise in drop ratio and better drop performance than plain DQN at similar power levels. That combination of MADRL with antenna reconfiguration and ASM under dynamic traffic looks like a new specific application even if the underlying RL tools are established. The distributed operation after training is a genuine plus for scalability. The numbers are concrete and the internal logic of the claim holds inside the simulated regime, with no circular definitions or missing derivations that would break the argument. The soft spots are straightforward. Everything depends on the simulation model for traffic and channels, and the abstract gives no count of runs, variance numbers, or statistical tests. The non-learning baseline is described only at a high level, so it is hard to know exactly what the 30% gain is measured against. Generalization to real deployments or traffic patterns not seen in training is assumed rather than shown in detail. These are typical limits for this style of work rather than fatal gaps. The paper is for people who work on energy efficiency in large wireless networks or on RL control for communications systems. It has enough of a complete method and quantitative outcome to deserve a serious referee, even if the review will focus on experimental rigor and extra baselines. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes a multi-agent deep reinforcement learning (MADRL) algorithm for energy-efficient downlink operation in cell-free massive MIMO networks under dynamic traffic. Each access point autonomously selects antenna reconfigurations and advanced sleep modes (ASM) during training, then operates in a fully distributed fashion without central coordination. Simulations report that the approach reduces power consumption by 56.23% versus no energy-saving scheme and by 30.12% versus a non-learning lightest-sleep-mode baseline, with only a slight increase in drop ratio; it also matches DQN power levels while achieving a lower drop ratio.

Significance. If the simulation results are reproducible, the work demonstrates that MADRL can deliver substantial energy savings in CF mMIMO while preserving QoS under varying loads, with the distributed post-training operation offering clear scalability advantages over centralized schemes. The quantitative comparisons to both heuristic and single-agent RL baselines provide concrete evidence of practical benefit in the simulated regime.

major comments (2)

Abstract and results section: the specific power-consumption reductions (56.23% and 30.12%) and drop-ratio claims are presented without accompanying details on the number of Monte Carlo runs, variance across seeds, statistical significance tests, or exact traffic/channel parameter ranges, which are required to assess whether the reported gains are robust rather than artifacts of a single simulation configuration.
Simulation setup (presumed §IV): the traffic model and channel variation assumptions underlying the training and testing environments are not shown to guarantee generalization; the weakest assumption that agents perform well on unseen scenarios without retraining therefore remains untested by the current experimental design.

minor comments (2)

The acronym ASM is used before its expansion; define it at first occurrence.
Figure captions for the power-consumption and drop-ratio plots should explicitly state the number of independent runs and any error bars or confidence intervals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation of minor revision and the constructive comments on result robustness and generalization. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract and results section: the specific power-consumption reductions (56.23% and 30.12%) and drop-ratio claims are presented without accompanying details on the number of Monte Carlo runs, variance across seeds, statistical significance tests, or exact traffic/channel parameter ranges, which are required to assess whether the reported gains are robust rather than artifacts of a single simulation configuration.

Authors: We agree that these supporting details are essential for evaluating result robustness. In the revised manuscript we will expand the abstract and results section to report the number of Monte Carlo runs performed, variance or standard deviations across random seeds, outcomes of statistical significance tests against the baselines, and the precise ranges and distributions of traffic arrival rates and channel parameters used in the simulations. revision: yes
Referee: Simulation setup (presumed §IV): the traffic model and channel variation assumptions underlying the training and testing environments are not shown to guarantee generalization; the weakest assumption that agents perform well on unseen scenarios without retraining therefore remains untested by the current experimental design.

Authors: The training process already exposes agents to a range of dynamic traffic loads and channel realizations within the modeled parameter space. We acknowledge, however, that explicit out-of-distribution testing on completely unseen traffic or channel statistics without retraining was not performed. In revision we will clarify the diversity of the training environment in §IV and add a dedicated discussion of generalization limits; we will also include any feasible additional experiments on modified test scenarios. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes a MADRL algorithm for AP control in CF mMIMO and reports performance via simulations. The central claims (PC reductions of 56.23% and 30.12%) are direct numerical outputs from executing the trained policy under the stated traffic/channel models and baselines. No analytical derivation, prediction step, or uniqueness theorem is present that reduces to fitted inputs or self-citations by construction. RL training on simulated data is standard and does not create the enumerated circular patterns; the results remain falsifiable by re-running the described setup.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; the method uses standard MADRL techniques applied to the problem, relying on simulation-based training which typically involves many unspecified hyperparameters and standard RL assumptions such as Markov decision processes.

pith-pipeline@v0.9.0 · 5484 in / 1039 out tokens · 79123 ms · 2026-05-10T17:32:03.598896+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Ubiquitous cell-free massive MIMO communications,

G. Interdonato, E. Björnson, H. Quoc Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,”EURASIP J. on Wireless Commun. and Netw., vol. 2019, no. 1, pp. 1–13, 2019

work page 2019
[2]

Optimal design of energy-efficient cell-free massive MIMO: Joint power allocation and load balancing,

T. Van Chien, E. Björnson, and E. G. Larsson, “Optimal design of energy-efficient cell-free massive MIMO: Joint power allocation and load balancing,” inIEEE ICASSP, 2020, pp. 5145–5149

work page 2020
[3]

Cell-free massive MIMO energy efficiency improvement by access points iterative selection,

S. S. Mohammed and A. N. Almamori, “Cell-free massive MIMO energy efficiency improvement by access points iterative selection,” Journal of Engineering, vol. 30, no. 03, pp. 129–142, 2024

work page 2024
[4]

Joint precoding and AP selection for energy efficient RIS-aided cell-free massive MIMO using multi-agent reinforcement learning,

E. Shi, J. Zhang, Z. Liu, Y . Zhu, C. Yuen, D. W. K. Ng, M. Di Renzo, and B. Ai, “Joint precoding and AP selection for energy efficient RIS-aided cell-free massive MIMO using multi-agent reinforcement learning,” arXiv preprint arXiv:2411.11070, 2024

work page arXiv 2024
[5]

Energy reduction in cell-free massive MIMO through fine-grained resource management,

Ö. T. Demir, L. Méndez-Monsanto, N. Bastianello, E. Fitzgerald, and G. Callebaut, “Energy reduction in cell-free massive MIMO through fine-grained resource management,” in2024 Joint European Conference on Networks and Communications & 6G Summit (EuCNC/6G Summit). IEEE, 2024, pp. 547–552

work page 2024
[6]

Energy-efficient access-point sleep-mode techniques for cell-free mmWave massive MIMO networks with non-uniform spatial traffic density,

J. García-Morales, G. Femenias, and F. Riera-Palou, “Energy-efficient access-point sleep-mode techniques for cell-free mmWave massive MIMO networks with non-uniform spatial traffic density,”IEEE Access, vol. 8, pp. 137 587–137 605, 2020

work page 2020
[7]

Cell-free massive MIMO in O-RAN: Energy-aware joint orchestration of cloud, fronthaul, and radio resources,

Ö. T. Demir, M. Masoudi, E. Björnson, and C. Cavdar, “Cell-free massive MIMO in O-RAN: Energy-aware joint orchestration of cloud, fronthaul, and radio resources,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 356–372, 2024

work page 2024
[8]

Energy efficient cell-free massive MIMO on 5G deployments: Sleep modes strategies and user stream management,

F. Riera-Palou, G. Femenias, D. López-Pérez, N. Piovesan, and A. De Domenico, “Energy efficient cell-free massive MIMO on 5G deployments: Sleep modes strategies and user stream management,” arXiv preprint arXiv:2306.06404, 2023

work page arXiv 2023
[9]

Energy-efficient cell-free massive MIMO with wireless fronthaul,

O. A. Topal, Ö. T. Demir, E. Björnson, and C. Cavdar, “Energy-efficient cell-free massive MIMO with wireless fronthaul,” in2024 58th Asilomar Conference on Signals, Systems, and Computers, 2024, pp. 1591–1596

work page 2024
[10]

Multi-agent reinforcement learning for energy saving in multi-cell massive MIMO systems,

T. Cai, Q. Wang, S. Zhang, Ö. T. Demir, and C. Cavdar, “Multi-agent reinforcement learning for energy saving in multi-cell massive MIMO systems,” inIEEE ICMLCN, 2024, pp. 480–485

work page 2024
[11]

Foundations of user- centric cell-free massive MIMO,

Ö. T. Demir, E. Björnson, and L. Sanguinetti, “Foundations of user- centric cell-free massive MIMO,”Foundations and Trends® in Signal Processing, vol. 14, no. 3-4, pp. 162–472, 2021

work page 2021
[12]

Local partial zero-forcing precoding for cell-free massive MIMO,

G. Interdonato, M. Karlsson, E. Björnson, and E. G. Larsson, “Local partial zero-forcing precoding for cell-free massive MIMO,”IEEE Trans. on Wireless Commun., vol. 19, no. 7, pp. 4758–4774, 2020

work page 2020
[13]

Advanced sleep modes in 5g multiple base stations using non-cooperative multi-agent reinforcement learning,

A. A. Razzacet al., “Advanced sleep modes in 5g multiple base stations using non-cooperative multi-agent reinforcement learning,” in IEEE GLOBECOM, 2023, pp. 7025–7030

work page 2023
[14]

Kairos: Energy-efficient radio unit control for O-RAN via advanced sleep modes,

J. Lozano, J. A. Ayala-Romero, A. Garcia-Saavedra, and X. Costa-Perez, “Kairos: Energy-efficient radio unit control for O-RAN via advanced sleep modes,”arXiv preprint arXiv:2501.15853, 2025

work page arXiv 2025
[15]

An analytical energy performance evaluation methodology for 5G base stations,

S. K. G. Peesapati, M. Olsson, M. Masoudi, S. Andersson, and C. Cav- dar, “An analytical energy performance evaluation methodology for 5G base stations,” inIEEE WiMob, 2021, pp. 202–207

work page 2021
[16]

Mobile traffic classification and multi-cell base station control for energy-efficient 5G networks,

C. Tianzhang, “Mobile traffic classification and multi-cell base station control for energy-efficient 5G networks,” M.S. thesis, KTH Royal Institute of Tecnology, Stockholm, Sweden, 2023, available at https: //kth.diva-portal.org/smash/get/diva2:1752823/FULLTEXT01.pdf

work page 2023
[17]

Study on channel model for frequencies from 0.5 to 100 GHz (Release 14),

3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz (Release 14),” 3GPP, Tech. Rep. TR 38.901 V14.0.0, 2017, eTSI TR 138 901 V14.0.0. [Online]. Available: https://www.etsi.org/deliver/etsi_ tr/138900_138999/138901/14.00.00_60/tr_138901v140000p.pdf

work page 2017

[1] [1]

Ubiquitous cell-free massive MIMO communications,

G. Interdonato, E. Björnson, H. Quoc Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,”EURASIP J. on Wireless Commun. and Netw., vol. 2019, no. 1, pp. 1–13, 2019

work page 2019

[2] [2]

Optimal design of energy-efficient cell-free massive MIMO: Joint power allocation and load balancing,

T. Van Chien, E. Björnson, and E. G. Larsson, “Optimal design of energy-efficient cell-free massive MIMO: Joint power allocation and load balancing,” inIEEE ICASSP, 2020, pp. 5145–5149

work page 2020

[3] [3]

Cell-free massive MIMO energy efficiency improvement by access points iterative selection,

S. S. Mohammed and A. N. Almamori, “Cell-free massive MIMO energy efficiency improvement by access points iterative selection,” Journal of Engineering, vol. 30, no. 03, pp. 129–142, 2024

work page 2024

[4] [4]

Joint precoding and AP selection for energy efficient RIS-aided cell-free massive MIMO using multi-agent reinforcement learning,

E. Shi, J. Zhang, Z. Liu, Y . Zhu, C. Yuen, D. W. K. Ng, M. Di Renzo, and B. Ai, “Joint precoding and AP selection for energy efficient RIS-aided cell-free massive MIMO using multi-agent reinforcement learning,” arXiv preprint arXiv:2411.11070, 2024

work page arXiv 2024

[5] [5]

Energy reduction in cell-free massive MIMO through fine-grained resource management,

Ö. T. Demir, L. Méndez-Monsanto, N. Bastianello, E. Fitzgerald, and G. Callebaut, “Energy reduction in cell-free massive MIMO through fine-grained resource management,” in2024 Joint European Conference on Networks and Communications & 6G Summit (EuCNC/6G Summit). IEEE, 2024, pp. 547–552

work page 2024

[6] [6]

Energy-efficient access-point sleep-mode techniques for cell-free mmWave massive MIMO networks with non-uniform spatial traffic density,

J. García-Morales, G. Femenias, and F. Riera-Palou, “Energy-efficient access-point sleep-mode techniques for cell-free mmWave massive MIMO networks with non-uniform spatial traffic density,”IEEE Access, vol. 8, pp. 137 587–137 605, 2020

work page 2020

[7] [7]

Cell-free massive MIMO in O-RAN: Energy-aware joint orchestration of cloud, fronthaul, and radio resources,

Ö. T. Demir, M. Masoudi, E. Björnson, and C. Cavdar, “Cell-free massive MIMO in O-RAN: Energy-aware joint orchestration of cloud, fronthaul, and radio resources,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 356–372, 2024

work page 2024

[8] [8]

Energy efficient cell-free massive MIMO on 5G deployments: Sleep modes strategies and user stream management,

F. Riera-Palou, G. Femenias, D. López-Pérez, N. Piovesan, and A. De Domenico, “Energy efficient cell-free massive MIMO on 5G deployments: Sleep modes strategies and user stream management,” arXiv preprint arXiv:2306.06404, 2023

work page arXiv 2023

[9] [9]

Energy-efficient cell-free massive MIMO with wireless fronthaul,

O. A. Topal, Ö. T. Demir, E. Björnson, and C. Cavdar, “Energy-efficient cell-free massive MIMO with wireless fronthaul,” in2024 58th Asilomar Conference on Signals, Systems, and Computers, 2024, pp. 1591–1596

work page 2024

[10] [10]

Multi-agent reinforcement learning for energy saving in multi-cell massive MIMO systems,

T. Cai, Q. Wang, S. Zhang, Ö. T. Demir, and C. Cavdar, “Multi-agent reinforcement learning for energy saving in multi-cell massive MIMO systems,” inIEEE ICMLCN, 2024, pp. 480–485

work page 2024

[11] [11]

Foundations of user- centric cell-free massive MIMO,

Ö. T. Demir, E. Björnson, and L. Sanguinetti, “Foundations of user- centric cell-free massive MIMO,”Foundations and Trends® in Signal Processing, vol. 14, no. 3-4, pp. 162–472, 2021

work page 2021

[12] [12]

Local partial zero-forcing precoding for cell-free massive MIMO,

G. Interdonato, M. Karlsson, E. Björnson, and E. G. Larsson, “Local partial zero-forcing precoding for cell-free massive MIMO,”IEEE Trans. on Wireless Commun., vol. 19, no. 7, pp. 4758–4774, 2020

work page 2020

[13] [13]

Advanced sleep modes in 5g multiple base stations using non-cooperative multi-agent reinforcement learning,

A. A. Razzacet al., “Advanced sleep modes in 5g multiple base stations using non-cooperative multi-agent reinforcement learning,” in IEEE GLOBECOM, 2023, pp. 7025–7030

work page 2023

[14] [14]

Kairos: Energy-efficient radio unit control for O-RAN via advanced sleep modes,

J. Lozano, J. A. Ayala-Romero, A. Garcia-Saavedra, and X. Costa-Perez, “Kairos: Energy-efficient radio unit control for O-RAN via advanced sleep modes,”arXiv preprint arXiv:2501.15853, 2025

work page arXiv 2025

[15] [15]

An analytical energy performance evaluation methodology for 5G base stations,

S. K. G. Peesapati, M. Olsson, M. Masoudi, S. Andersson, and C. Cav- dar, “An analytical energy performance evaluation methodology for 5G base stations,” inIEEE WiMob, 2021, pp. 202–207

work page 2021

[16] [16]

Mobile traffic classification and multi-cell base station control for energy-efficient 5G networks,

C. Tianzhang, “Mobile traffic classification and multi-cell base station control for energy-efficient 5G networks,” M.S. thesis, KTH Royal Institute of Tecnology, Stockholm, Sweden, 2023, available at https: //kth.diva-portal.org/smash/get/diva2:1752823/FULLTEXT01.pdf

work page 2023

[17] [17]

Study on channel model for frequencies from 0.5 to 100 GHz (Release 14),

3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz (Release 14),” 3GPP, Tech. Rep. TR 38.901 V14.0.0, 2017, eTSI TR 138 901 V14.0.0. [Online]. Available: https://www.etsi.org/deliver/etsi_ tr/138900_138999/138901/14.00.00_60/tr_138901v140000p.pdf

work page 2017