Multi-agent Reinforcement Learning-based Joint Design of Low-Carbon P2P Market and Bidding Strategy in Microgrids

Aniq Ashan; Gaoxi Xiao; Honglin Gao; Junhao Ren; Lan Zhao; Qiyu Kang; Sijie Wang; Yajuan Sun

arxiv: 2604.02728 · v1 · submitted 2026-04-03 · 💻 cs.MA

Multi-agent Reinforcement Learning-based Joint Design of Low-Carbon P2P Market and Bidding Strategy in Microgrids

Junhao Ren , Honglin Gao , Sijie Wang , Lan Zhao , Qiyu Kang , Aniq Ashan , Yajuan Sun , Gaoxi Xiao This is my paper

Pith reviewed 2026-05-13 19:05 UTC · model grok-4.3

classification 💻 cs.MA

keywords multi-agent reinforcement learningpeer-to-peer energy tradingmicrogridslow-carbon market designdecentralized partially observable Markov decision processrenewable energy utilizationbidding strategy

0 comments

The pith

Multi-agent reinforcement learning lets self-interested microgrids trade peer-to-peer while a market operator maximizes low-carbon community welfare.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a joint design where microgrids use multi-agent reinforcement learning to set their own bids in an intraday peer-to-peer market, and a novel clearing rule steers the outcome toward higher renewable use and lower carbon emissions. Existing P2P and microgrid methods rely on centralized optimization or rigid coordination rules that prove difficult to deploy. By modeling decisions as a decentralized partially observable Markov decision process, the framework gives each microgrid autonomy to chase its own economic gains while the clearing mechanism supplies macro-level incentives for local clean energy consumption. Simulations show the combination raises renewable utilization inside the community and cuts dependence on high-emission external power.

Core claim

Formulating microgrid bidding as a DEC-POMDP and solving it via multi-agent reinforcement learning, together with a new market clearing mechanism, produces bidding strategies that improve renewable energy utilization and reduce reliance on external high-carbon electricity while preserving individual economic incentives.

What carries the argument

The multi-agent reinforcement learning solver for the Decentralized Partially Observable Markov Decision Process (DEC-POMDP) that generates autonomous bids, combined with the novel market clearing mechanism that rewards local renewable consumption to maximize social welfare.

If this is right

Each microgrid earns higher net revenue while the community as a whole emits less carbon.
Local renewable generation is consumed inside the microgrid cluster instead of being curtailed or exported at low value.
Dependence on the main grid for high-emission power falls measurably during peak renewable hours.
The design scales to larger numbers of microgrids without requiring a central optimizer to dictate every bid.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same MARL-plus-clearing structure could be tested on networks that include electric-vehicle charging or battery storage to check whether the low-carbon incentive still holds.
Policy makers could examine whether the observed autonomy-plus-regulation balance reduces the need for strict feed-in tariffs or capacity markets.
Weather forecast errors and sudden demand spikes not modeled in the simulations remain open variables that would need field measurement.

Load-bearing premise

The novel market clearing mechanism can be implemented in real applications without restrictive coordination rules, and the simulation improvements will persist under actual uncertainties and participant behaviors.

What would settle it

A real microgrid deployment in which the framework runs for several months yet shows no measurable rise in local renewable consumption or drop in high-carbon external purchases relative to a baseline P2P market without the learning and clearing rules.

Figures

Figures reproduced from arXiv: 2604.02728 by Aniq Ashan, Gaoxi Xiao, Honglin Gao, Junhao Ren, Lan Zhao, Qiyu Kang, Sijie Wang, Yajuan Sun.

**Figure 2.** Figure 2: The training process of LSTM-MAPPO algorithm for Intraday P2P [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: 24-hour Normalized Demand and PV Profiles of 4 microgrids [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: 24-hour dynamical change of emergency price and FiT over times. For the market clearing mechanisms, we compare the proposed JPQ market clearing mechanism with the following double auction mechanisms: (1) Greedy used in [16]; (2) Multi-round double auction (MRDA) proposed in [21]; (3) Vickrey-variant double auction (VVDA) presented in [24]. For the learning algorithms, we compare the LSTMMAPPO algorithm w… view at source ↗

**Figure 5.** Figure 5: Training performance of four market clearing mechanism with LSTM [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

The challenges of the uncertainties in renewable energy generation and the instability of the real-time market limit the effective utilization of clean energy in microgrid communities. Existing peer-to-peer (P2P) and microgrid coordination approaches typically rely on certain centralized optimization or restrictive coordination rules which are difficult to be implemented in real-life applications. To address the challenge, we propose an intraday P2P trading framework that allows self-interested microgrids to pursue their economic benefits, while allowing the market operator to maximize the social welfare, namely the low carbon emission objective, of the entire community. Specifically, the decision-making processes of the microgrids are formulated as a Decentralized Partially Observable Markov Decision Process (DEC-POMDP) and solved using a Multi-Agent Reinforcement Learning (MARL) framework. Such an approach grants each microgrid a high degree of decision-making autonomy, while a novel market clearing mechanism is introduced to provide macro-regulation, incentivizing microgrids to prioritize local renewable energy consumption and hence reduce carbon emissions. Simulation results demonstrate that the combination of the self-interested bidding strategy and the P2P market design helps significantly improve renewable energy utilization and reduce reliance on external electricity with high carbon-emissions. The framework achieves a balanced integration of local autonomy, self-interest pursuit, and improved community-level economic and environmental benefits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies existing MARL to a joint P2P market and bidding setup in microgrids and reports simulation gains in renewable use, but those gains rest on untested assumptions about uncertainty and behavior.

read the letter

Colleague, the core of this work is a DEC-POMDP formulation solved by multi-agent RL that lets individual microgrids bid selfishly while a new clearing rule steers the community toward higher local renewable consumption and lower external high-carbon imports. That combination is the main thing worth noting: it tries to square local autonomy with a macro low-carbon objective without falling back on centralized optimization or heavy coordination rules. The abstract presents this as a practical step for microgrid communities facing generation uncertainty and real-time market volatility. The simulation results are said to show clear improvements on both utilization and emissions metrics, which fits the stated motivation. What the paper does reasonably well is spell out how the market operator can inject incentives through clearing without removing agent decision rights, and the DEC-POMDP framing is a standard but appropriate choice for partial observability across microgrids. The stress-test note on generalization is fair. The reported gains come from simulation only, with no visible details on baseline algorithms, statistical significance, error bars, or sensitivity to higher renewable variance, forecast errors, or shifting participant strategies. If the learned policies degrade outside the training distribution, the claimed benefits become hard to trust for real deployments. No real-world traces or robustness checks are referenced in the abstract, so the central claim stays plausible but thin on evidence. This is aimed at the applied RL-for-energy crowd rather than core RL or market-design theorists. A reader already working on smart-grid coordination might pick up the specific clearing mechanism as a concrete example, but the work does not introduce new RL algorithms or prove new theoretical properties. I would send it to peer review. The formulation is coherent and the application timely, so referees can check the simulation rigor and see whether the robustness gaps can be closed.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an intraday P2P trading framework for microgrids in which each microgrid's bidding is modeled as a DEC-POMDP and solved by MARL to allow self-interested economic optimization, while a novel market-clearing rule supplies macro-level regulation that incentivizes local renewable consumption and thereby reduces community carbon emissions. Simulation results are presented as evidence that the combination yields significantly higher renewable utilization and lower reliance on high-carbon external electricity.

Significance. If the simulation outcomes prove robust, the work would supply a concrete mechanism for reconciling decentralized autonomy with community-scale low-carbon objectives, addressing a recognized limitation of existing centralized or rule-based P2P designs.

major comments (2)

[Simulation results] Simulation results section: the central claim of significant improvement in renewable utilization rests on simulation outcomes whose setup, baselines, number of runs, error bars, and statistical significance are not reported, directly undermining assessment of the headline performance gains.
[Market-clearing mechanism] Market-clearing mechanism description: the novel clearing rule is asserted to provide macro-regulation without restrictive coordination, yet no formal statement of the rule, incentive-compatibility proof, or sensitivity analysis to non-stationary bidding strategies is supplied, leaving the autonomy-plus-welfare claim unsupported beyond the chosen simulation parameters.

minor comments (1)

[Abstract] Abstract: the phrase 'significantly improve' is used without accompanying quantitative deltas or baseline comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of our results. We address each major point below.

read point-by-point responses

Referee: [Simulation results] Simulation results section: the central claim of significant improvement in renewable utilization rests on simulation outcomes whose setup, baselines, number of runs, error bars, and statistical significance are not reported, directly undermining assessment of the headline performance gains.

Authors: We agree that the simulation results section requires additional detail to substantiate the performance claims. In the revised manuscript we will expand this section to specify the full simulation setup (number of microgrids, renewable profiles, load data, and market parameters), the exact baselines employed (no-P2P, centralized optimization, and rule-based P2P), the number of independent runs (ten trials), error bars or standard deviations on all reported metrics, and statistical significance tests (paired t-tests) confirming the observed gains in renewable utilization and external high-carbon purchases. revision: yes
Referee: [Market-clearing mechanism] Market-clearing mechanism description: the novel clearing rule is asserted to provide macro-regulation without restrictive coordination, yet no formal statement of the rule, incentive-compatibility proof, or sensitivity analysis to non-stationary bidding strategies is supplied, leaving the autonomy-plus-welfare claim unsupported beyond the chosen simulation parameters.

Authors: We will insert a formal mathematical statement of the market-clearing rule (including the priority-matching objective and price-formation equations) in Section III of the revision. We will also add a sensitivity analysis that perturbs bidding strategies away from the learned MARL policies and reports the resulting changes in community welfare. A complete incentive-compatibility proof for arbitrary non-stationary strategies is difficult within the current MARL setting; we will therefore add a discussion of the mechanism's alignment properties and its empirical robustness rather than a general proof. revision: partial

Circularity Check

0 steps flagged

No circularity: results are simulation outcomes from standard DEC-POMDP + MARL plus proposed clearing rule

full rationale

The paper formulates microgrid bidding as a DEC-POMDP solved via MARL and introduces a novel market-clearing mechanism to incentivize local renewables. The headline improvements in renewable utilization are reported from simulation runs of this framework; no equation, parameter fit, or self-citation reduces the claimed gains to an input by construction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard RL assumptions and a proposed clearing rule whose performance is shown only in simulation; no new physical entities are introduced.

free parameters (1)

MARL training hyperparameters
Learning rates, exploration parameters, and network sizes chosen to train the agents; values not specified in abstract.

axioms (1)

domain assumption Microgrid decision processes can be accurately modeled as a DEC-POMDP
Invoked when formulating the bidding problem; standard in multi-agent RL but requires partial observability to hold in practice.

pith-pipeline@v0.9.0 · 5565 in / 1208 out tokens · 36023 ms · 2026-05-13T19:05:30.556285+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

[1]

Paris agreement,

C. A. Horowitz, “Paris agreement,”Int. Leg. Mater., vol. 55, no. 4, pp. 740–755, 2016

work page 2016
[2]

Rapid cost decrease of renewables and storage accelerates the decarbonization of China’s power system,

G. He, J. Lin, F. Sifuentes, X. Liu, N. Abhyankar, and A. Phadke, “Rapid cost decrease of renewables and storage accelerates the decarbonization of China’s power system,”Nat Commun, vol. 11, no. 1, p. 2486, 2020

work page 2020
[3]

Power system planning with increasing variable renewable energy: A review of optimization models,

X. Deng and T. Lv, “Power system planning with increasing variable renewable energy: A review of optimization models,”J. Cleaner Prod., vol. 246, p. 118962, 2020

work page 2020
[4]

Distributed energy generation and sustainable development,

K. Alanne and A. Saari, “Distributed energy generation and sustainable development,”Renew. Sustain. Energy Rev., vol. 10, no. 6, pp. 539–558, 2006

work page 2006
[5]

Optimal Day-Ahead Power Procurement With Renewable Energy and Demand Response,

S. Kwon, L. Ntaimo, and N. Gautam, “Optimal Day-Ahead Power Procurement With Renewable Energy and Demand Response,”IEEE Trans. Power Syst., vol. 32, no. 5, pp. 3924–3933, 2017

work page 2017
[6]

Using peer-to-peer energy-trading platforms to incentivize prosumers to form federated power plants,

T. Morstyn, N. Farrell, S. J. Darby, and M. D. McCulloch, “Using peer-to-peer energy-trading platforms to incentivize prosumers to form federated power plants,”Nat Energy, vol. 3, no. 2, pp. 94–101, 2018

work page 2018
[7]

Comparison of Centralized and Peer-to-Peer Decen- tralized Market Designs for Community Markets,

C. Liu and Z. Li, “Comparison of Centralized and Peer-to-Peer Decen- tralized Market Designs for Community Markets,”IEEE Trans. Smart Grid, vol. 58, no. 1, pp. 67–77, 2022

work page 2022
[8]

Optimization of a solar-based integrated energy system considering interaction between generation, network, and demand side,

X. Luo, Y . Liu, P. Feng, Y . Gao, and Z. Guo, “Optimization of a solar-based integrated energy system considering interaction between generation, network, and demand side,”Appl. Energy, vol. 294, p. 116931, 2021

work page 2021
[9]

Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading,

L. Wang, Y . Zhang, W. Song, and Q. Li, “Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading,”IEEE Trans. Ind. Informat., vol. 18, no. 3, pp. 1447–1457, 2022

work page 2022
[10]

Peer-to-peer joint electricity and carbon trading based on carbon-aware distribution locational marginal pricing,

Z. Lu, L. Bai, J. Wang, J. Wei, Y . Xiao, and Y . Chen, “Peer-to-peer joint electricity and carbon trading based on carbon-aware distribution locational marginal pricing,”IEEE Trans. Power Syst., vol. 38, no. 1, pp. 835–852, 2023

work page 2023
[11]

Game-Based Pricing for Joint Carbon and Electricity Trading in Microgrids,

W. Xu, F. Lin, R. Jia, C. Tang, Z. Zheng, and M. Li, “Game-Based Pricing for Joint Carbon and Electricity Trading in Microgrids,”IEEE Internet of Things Journal, vol. 11, no. 16, pp. 27 732–27 743, 2024

work page 2024
[12]

The surprising effectiveness of PPO in cooperative multi-agent games,

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . WU, “The surprising effectiveness of PPO in cooperative multi-agent games,” inAdv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 24 611–24 624

work page 2022
[13]

Risk-averse energy trading in multienergy microgrids: A two-stage stochastic game approach,

C. Li, Y . Xu, X. Yu, C. Ryan, and T. Huang, “Risk-averse energy trading in multienergy microgrids: A two-stage stochastic game approach,”IEEE Trans. Ind. Informat., vol. 13, no. 5, pp. 2620–2630, 2017

work page 2017
[14]

Z. Wang, H. Hou, B. Zhao, L. Zhang, Y . Shi, and C. Xie, “Risk-averse stochastic capacity planning and P2P trading collaborative optimization for multi-energy microgrids considering carbon emission limitations: An asymmetric nash bargaining approach,”Appl. Energy, vol. 357, p. 122505, 2024

work page 2024
[15]

Multi-agent low-carbon optimal dispatch of regional integrated energy system based on mixed game theory,

Z. Liang and L. Mu, “Multi-agent low-carbon optimal dispatch of regional integrated energy system based on mixed game theory,”Energy, vol. 295, p. 130953, 2024

work page 2024
[16]

Multi-agent reinforcement learning for automated peer-to-peer energy trading in double-side auc- tion market,

D. Qiu, J. Wang, J. Wang, and G. Strbac, “Multi-agent reinforcement learning for automated peer-to-peer energy trading in double-side auc- tion market,” inProc. 30th Int. Joint Conf. Artif. Intell., 2021, pp. 2913– 2920

work page 2021
[17]

A multi-agent reinforcement learning approach for investigating and optimising peer-to-peer prosumer energy markets,

R. May and P. Huang, “A multi-agent reinforcement learning approach for investigating and optimising peer-to-peer prosumer energy markets,” Appl. Energy, vol. 334, p. 120705, 2023

work page 2023
[18]

A multi-stage stochastic dispatching method for electricity-hydrogen integrated energy systems driven by model and data,

Z. Yang, Z. Ren, H. Li, Z. Sun, J. Feng, and W. Xia, “A multi-stage stochastic dispatching method for electricity-hydrogen integrated energy systems driven by model and data,”Appl. Energy, vol. 371, p. 123668, Oct. 2024

work page 2024
[19]

Combined carbon capture and utilization with peer-to-peer energy trading for multimicrogrids using multiagent proximal policy optimization,

M. Chen, Z. Shen, L. Wang, and G. Zhang, “Combined carbon capture and utilization with peer-to-peer energy trading for multimicrogrids using multiagent proximal policy optimization,”IEEE Trans. Control Netw. Syst., vol. 11, no. 4, pp. 2173–2186, 2024

work page 2024
[20]

Joint energy and carbon trading for multi-microgrid system based on multi-agent deep reinforcement learning,

Y . Zhou, Z. Ma, T. Wang, J. Zhang, X. Shi, and S. Zou, “Joint energy and carbon trading for multi-microgrid system based on multi-agent deep reinforcement learning,”IEEE Trans. Power Syst., vol. 39, no. 6, pp. 7376–7388, 2024

work page 2024
[21]

Multi-Round Double Auction-Enabled Peer-to- Peer Energy Exchange in Active Distribution Networks,

H. Haggi and W. Sun, “Multi-Round Double Auction-Enabled Peer-to- Peer Energy Exchange in Active Distribution Networks,”IEEE Trans. Smart Grid, vol. 12, no. 5, pp. 4403–4414, 2021

work page 2021
[22]

Multi-Agent Deep Reinforcement Learning for Simulating Centralized Double-Sided Auction Electricity Market,

B. Yin, H. Weng, Y . Hu, J. Xi, P. Ding, and J. Liu, “Multi-Agent Deep Reinforcement Learning for Simulating Centralized Double-Sided Auction Electricity Market,”IEEE Trans. Power Syst., vol. 40, no. 1, pp. 518–529, 2025

work page 2025
[23]

Residential load and rooftop PV generation: An Australian distribution network dataset,

E. L. Ratnam, S. R. Weller, C. M. Kellett, and A. T. Murray, “Residential load and rooftop PV generation: An Australian distribution network dataset,”Int. J. Sustain. Energy, vol. 36, no. 8, pp. 787–806, Sep. 2017

work page 2017
[24]

Comparisons of auction designs through multiagent learning in peer-to-peer energy trading,

Z. Zhao, C. Feng, and A. L. Liu, “Comparisons of auction designs through multiagent learning in peer-to-peer energy trading,”IEEE Trans. Smart Grid, vol. 14, no. 1, pp. 593–605, 2023

work page 2023
[25]

Optimizing bidding strategy in electricity market based on graph convolutional neural network and deep reinforcement learning,

H. Weng, Y . Hu, M. Liang, J. Xi, and B. Yin, “Optimizing bidding strategy in electricity market based on graph convolutional neural network and deep reinforcement learning,”Appl. Energy, vol. 380, p. 124978, 2025

work page 2025
[26]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Peer-to- peer energy trading with energy trading consistency in interconnected multi-energy microgrids: A multi-agent deep reinforcement learning approach,

Y . Cui, Y . Xu, Y . Wang, Y . Zhao, H. Zhu, and D. Cheng, “Peer-to- peer energy trading with energy trading consistency in interconnected multi-energy microgrids: A multi-agent deep reinforcement learning approach,”International Journal of Electrical Power & Energy Systems, vol. 156, p. 109753, 2024

work page 2024
[28]

Collaborative optimization of multi-microgrids system with shared energy storage based on multi- agent stochastic game and reinforcement learning,

Y . Wang, Y . Cui, Y . Li, and Y . Xu, “Collaborative optimization of multi-microgrids system with shared energy storage based on multi- agent stochastic game and reinforcement learning,”Energy, vol. 280, p. 128182, 2023

work page 2023
[29]

ChatGPT: [Large Language Model],

OpenAI, “ChatGPT: [Large Language Model],” 2023, [Online], Avail- able: https://openai.com/chat

work page 2023

[1] [1]

Paris agreement,

C. A. Horowitz, “Paris agreement,”Int. Leg. Mater., vol. 55, no. 4, pp. 740–755, 2016

work page 2016

[2] [2]

Rapid cost decrease of renewables and storage accelerates the decarbonization of China’s power system,

G. He, J. Lin, F. Sifuentes, X. Liu, N. Abhyankar, and A. Phadke, “Rapid cost decrease of renewables and storage accelerates the decarbonization of China’s power system,”Nat Commun, vol. 11, no. 1, p. 2486, 2020

work page 2020

[3] [3]

Power system planning with increasing variable renewable energy: A review of optimization models,

X. Deng and T. Lv, “Power system planning with increasing variable renewable energy: A review of optimization models,”J. Cleaner Prod., vol. 246, p. 118962, 2020

work page 2020

[4] [4]

Distributed energy generation and sustainable development,

K. Alanne and A. Saari, “Distributed energy generation and sustainable development,”Renew. Sustain. Energy Rev., vol. 10, no. 6, pp. 539–558, 2006

work page 2006

[5] [5]

Optimal Day-Ahead Power Procurement With Renewable Energy and Demand Response,

S. Kwon, L. Ntaimo, and N. Gautam, “Optimal Day-Ahead Power Procurement With Renewable Energy and Demand Response,”IEEE Trans. Power Syst., vol. 32, no. 5, pp. 3924–3933, 2017

work page 2017

[6] [6]

Using peer-to-peer energy-trading platforms to incentivize prosumers to form federated power plants,

T. Morstyn, N. Farrell, S. J. Darby, and M. D. McCulloch, “Using peer-to-peer energy-trading platforms to incentivize prosumers to form federated power plants,”Nat Energy, vol. 3, no. 2, pp. 94–101, 2018

work page 2018

[7] [7]

Comparison of Centralized and Peer-to-Peer Decen- tralized Market Designs for Community Markets,

C. Liu and Z. Li, “Comparison of Centralized and Peer-to-Peer Decen- tralized Market Designs for Community Markets,”IEEE Trans. Smart Grid, vol. 58, no. 1, pp. 67–77, 2022

work page 2022

[8] [8]

Optimization of a solar-based integrated energy system considering interaction between generation, network, and demand side,

X. Luo, Y . Liu, P. Feng, Y . Gao, and Z. Guo, “Optimization of a solar-based integrated energy system considering interaction between generation, network, and demand side,”Appl. Energy, vol. 294, p. 116931, 2021

work page 2021

[9] [9]

Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading,

L. Wang, Y . Zhang, W. Song, and Q. Li, “Stochastic cooperative bidding strategy for multiple microgrids with peer-to-peer energy trading,”IEEE Trans. Ind. Informat., vol. 18, no. 3, pp. 1447–1457, 2022

work page 2022

[10] [10]

Peer-to-peer joint electricity and carbon trading based on carbon-aware distribution locational marginal pricing,

Z. Lu, L. Bai, J. Wang, J. Wei, Y . Xiao, and Y . Chen, “Peer-to-peer joint electricity and carbon trading based on carbon-aware distribution locational marginal pricing,”IEEE Trans. Power Syst., vol. 38, no. 1, pp. 835–852, 2023

work page 2023

[11] [11]

Game-Based Pricing for Joint Carbon and Electricity Trading in Microgrids,

W. Xu, F. Lin, R. Jia, C. Tang, Z. Zheng, and M. Li, “Game-Based Pricing for Joint Carbon and Electricity Trading in Microgrids,”IEEE Internet of Things Journal, vol. 11, no. 16, pp. 27 732–27 743, 2024

work page 2024

[12] [12]

The surprising effectiveness of PPO in cooperative multi-agent games,

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . WU, “The surprising effectiveness of PPO in cooperative multi-agent games,” inAdv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 24 611–24 624

work page 2022

[13] [13]

Risk-averse energy trading in multienergy microgrids: A two-stage stochastic game approach,

C. Li, Y . Xu, X. Yu, C. Ryan, and T. Huang, “Risk-averse energy trading in multienergy microgrids: A two-stage stochastic game approach,”IEEE Trans. Ind. Informat., vol. 13, no. 5, pp. 2620–2630, 2017

work page 2017

[14] [14]

Z. Wang, H. Hou, B. Zhao, L. Zhang, Y . Shi, and C. Xie, “Risk-averse stochastic capacity planning and P2P trading collaborative optimization for multi-energy microgrids considering carbon emission limitations: An asymmetric nash bargaining approach,”Appl. Energy, vol. 357, p. 122505, 2024

work page 2024

[15] [15]

Multi-agent low-carbon optimal dispatch of regional integrated energy system based on mixed game theory,

Z. Liang and L. Mu, “Multi-agent low-carbon optimal dispatch of regional integrated energy system based on mixed game theory,”Energy, vol. 295, p. 130953, 2024

work page 2024

[16] [16]

Multi-agent reinforcement learning for automated peer-to-peer energy trading in double-side auc- tion market,

D. Qiu, J. Wang, J. Wang, and G. Strbac, “Multi-agent reinforcement learning for automated peer-to-peer energy trading in double-side auc- tion market,” inProc. 30th Int. Joint Conf. Artif. Intell., 2021, pp. 2913– 2920

work page 2021

[17] [17]

A multi-agent reinforcement learning approach for investigating and optimising peer-to-peer prosumer energy markets,

R. May and P. Huang, “A multi-agent reinforcement learning approach for investigating and optimising peer-to-peer prosumer energy markets,” Appl. Energy, vol. 334, p. 120705, 2023

work page 2023

[18] [18]

A multi-stage stochastic dispatching method for electricity-hydrogen integrated energy systems driven by model and data,

Z. Yang, Z. Ren, H. Li, Z. Sun, J. Feng, and W. Xia, “A multi-stage stochastic dispatching method for electricity-hydrogen integrated energy systems driven by model and data,”Appl. Energy, vol. 371, p. 123668, Oct. 2024

work page 2024

[19] [19]

Combined carbon capture and utilization with peer-to-peer energy trading for multimicrogrids using multiagent proximal policy optimization,

M. Chen, Z. Shen, L. Wang, and G. Zhang, “Combined carbon capture and utilization with peer-to-peer energy trading for multimicrogrids using multiagent proximal policy optimization,”IEEE Trans. Control Netw. Syst., vol. 11, no. 4, pp. 2173–2186, 2024

work page 2024

[20] [20]

Joint energy and carbon trading for multi-microgrid system based on multi-agent deep reinforcement learning,

Y . Zhou, Z. Ma, T. Wang, J. Zhang, X. Shi, and S. Zou, “Joint energy and carbon trading for multi-microgrid system based on multi-agent deep reinforcement learning,”IEEE Trans. Power Syst., vol. 39, no. 6, pp. 7376–7388, 2024

work page 2024

[21] [21]

Multi-Round Double Auction-Enabled Peer-to- Peer Energy Exchange in Active Distribution Networks,

H. Haggi and W. Sun, “Multi-Round Double Auction-Enabled Peer-to- Peer Energy Exchange in Active Distribution Networks,”IEEE Trans. Smart Grid, vol. 12, no. 5, pp. 4403–4414, 2021

work page 2021

[22] [22]

Multi-Agent Deep Reinforcement Learning for Simulating Centralized Double-Sided Auction Electricity Market,

B. Yin, H. Weng, Y . Hu, J. Xi, P. Ding, and J. Liu, “Multi-Agent Deep Reinforcement Learning for Simulating Centralized Double-Sided Auction Electricity Market,”IEEE Trans. Power Syst., vol. 40, no. 1, pp. 518–529, 2025

work page 2025

[23] [23]

Residential load and rooftop PV generation: An Australian distribution network dataset,

E. L. Ratnam, S. R. Weller, C. M. Kellett, and A. T. Murray, “Residential load and rooftop PV generation: An Australian distribution network dataset,”Int. J. Sustain. Energy, vol. 36, no. 8, pp. 787–806, Sep. 2017

work page 2017

[24] [24]

Comparisons of auction designs through multiagent learning in peer-to-peer energy trading,

Z. Zhao, C. Feng, and A. L. Liu, “Comparisons of auction designs through multiagent learning in peer-to-peer energy trading,”IEEE Trans. Smart Grid, vol. 14, no. 1, pp. 593–605, 2023

work page 2023

[25] [25]

Optimizing bidding strategy in electricity market based on graph convolutional neural network and deep reinforcement learning,

H. Weng, Y . Hu, M. Liang, J. Xi, and B. Yin, “Optimizing bidding strategy in electricity market based on graph convolutional neural network and deep reinforcement learning,”Appl. Energy, vol. 380, p. 124978, 2025

work page 2025

[26] [26]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Peer-to- peer energy trading with energy trading consistency in interconnected multi-energy microgrids: A multi-agent deep reinforcement learning approach,

Y . Cui, Y . Xu, Y . Wang, Y . Zhao, H. Zhu, and D. Cheng, “Peer-to- peer energy trading with energy trading consistency in interconnected multi-energy microgrids: A multi-agent deep reinforcement learning approach,”International Journal of Electrical Power & Energy Systems, vol. 156, p. 109753, 2024

work page 2024

[28] [28]

Collaborative optimization of multi-microgrids system with shared energy storage based on multi- agent stochastic game and reinforcement learning,

Y . Wang, Y . Cui, Y . Li, and Y . Xu, “Collaborative optimization of multi-microgrids system with shared energy storage based on multi- agent stochastic game and reinforcement learning,”Energy, vol. 280, p. 128182, 2023

work page 2023

[29] [29]

ChatGPT: [Large Language Model],

OpenAI, “ChatGPT: [Large Language Model],” 2023, [Online], Avail- able: https://openai.com/chat

work page 2023