A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets

Zhanhua Pan; Zhaoxia Jing; Zunnan Xu

arxiv: 2604.10252 · v1 · submitted 2026-04-11 · 💻 cs.AI · cs.SY· eess.SY

A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets

Zunnan Xu , Zhaoxia Jing , Zhanhua Pan This is my paper

Pith reviewed 2026-05-10 15:52 UTC · model grok-4.3

classification 💻 cs.AI cs.SYeess.SY

keywords reinforcement learningagent-based simulationelectricity marketsmulti-segment bidsmonotone parameterizationNash equilibriumgradient distortionmarket mechanism analysis

0 comments

The pith

A dual-positive monotone parameterization lets RL agents output feasible multi-segment bids without breaking gradient flow or invertibility in electricity market simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Reinforcement learning agent-based simulations model strategic bidding in electricity markets but face two core problems. Existing methods apply post-processing steps like sorting or projection to enforce monotone bounded bids, yet these mappings lose continuous differentiability, injectivity, and boundary invertibility, distorting gradients and producing spurious convergence. The paper replaces those steps with a dual-positive monotone parameterization that maps unconstrained network outputs directly to valid bid curves while preserving the required smoothness and invertibility properties. It further introduces a validity assessment framework that quantifies the distance between observed outcomes and Nash equilibrium rather than relying solely on whether training curves have flattened. If both contributions work as stated, market mechanism studies gain reliable gradient signals and a concrete test for equilibrium quality.

Core claim

The central claim is that the dual-positive monotone parameterization ensures continuous differentiability, injectivity, and invertibility at boundaries or kinks for multi-segment stepwise bids, thereby preventing gradient distortion and spurious convergence in RL-ABS, while the validity assessment framework rigorously measures the distance between simulation outcomes and Nash equilibrium beyond training-curve convergence.

What carries the argument

The dual-positive monotone parameterization, a mapping that converts policy-network actions into monotone bounded multi-segment bids while retaining continuous differentiability, injectivity, and invertibility at kinks and boundaries.

Load-bearing premise

The parameterization can be inserted into standard RL policy networks without creating new optimization instabilities or producing infeasible bids, and the distance metric accurately reflects true equilibrium deviation without extra assumptions on agent rationality.

What would settle it

Train agents on a small market instance whose analytical Nash equilibrium is known in advance; measure whether bid curves generated by the new parameterization reach that equilibrium with stable gradients at segment boundaries while post-processing baselines exhibit vanishing gradients or premature plateaus.

Figures

Figures reproduced from arXiv: 2604.10252 by Zhanhua Pan, Zhaoxia Jing, Zunnan Xu.

**Figure 1.** Figure 1: Conceptual sketches of real-world electricity-market bids and common RL-ABS bid models. The second category is Step-One ( [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Relationship between Agent Bid Models and Reinforcement Learning Methods [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Conceptual illustration of sorting. The sorting post-processing operation sorts the price output x ∈ R K of the raw policy network in ascending order, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Conceptual illustrations of two clipping-based post-processing schemes. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Conceptual illustration of projection-induced staircase behavior. When the policy network outputs a continuous but irregular bid curve, projection-based post-processing is often considered to enforce the bid-curve constraints. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: DPMP-based framework for stepwise bid generation of a generator agent. 4.1. Feasible Set of Stepwise Bid Curves A K-segment stepwise bid curve is determined by the generation output breakpoints and the segment prices: 0 = Q0 < Q1 < · · · < QK−1 < QK = Qmax, p1 ≤ p2 ≤ · · · ≤ pK (35) where the constant bid price over the generation interval [Qi−1, Qi] is denoted by pi . Let the set of all curves satisfying … view at source ↗

**Figure 7.** Figure 7: Validity Assessment Framework for Electricity Market RL-ABS. 5.1. Single-Agent Algorithm Validity Assessment This section addresses a fundamental yet often overlooked question: when reinforcement learning is applied to electricity market agent-based simulation (ABS), does it truly learn the optimal bidding strategy? To avoid misleading conclusions drawn solely from convergence curves or higher profits, th… view at source ↗

**Figure 8.** Figure 8: Evolution of stepwise bid curves over episodes 0 300 600 900 1200 1500 Episode 0 200 400 600 800 1000 Capacity (MW) 0 20 40 60 80 100 Bid Price 20 40 60 80 Bid Price (a) DPMP Staircase Bid Evolution, t=47 0 300 600 900 1200 1500 Episode 0 200 400 600 800 1000 Capacity (MW) 0 200 400 600 800 1000 Bid Price 0 200 400 600 800 1000 Bid Price (b) SORT Staircase Bid Evolution, t=47 0 300 600 900 1200 1500 Episod… view at source ↗

**Figure 9.** Figure 9: Profit curves of the four methods (DPMP/SORT/PROJECT/CLIP). 0 200 400 600 800 1000 Episode 0.0% 200.0% 400.0% 600.0% 800.0% Relative Optimality Gap (%) 700 800 900 0.0% 20.0% 40.0% DPMP SORT PROJECT CLIP [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: Optimality-gap curves of the four methods (DPMP/SORT/PROJECT/CLIP) [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: Optimality-gap curves of the four algorithms [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗

**Figure 12.** Figure 12: presents an overview of the convergence trajectories of system profit, average generator profit, and average daily LMP in the baseline DPMP-PPO multi-agent training. Overall, the training dynamics exhibit a clear two-stage pattern: i. Rapid adjustment stage (approximately 0–200 episodes): Both system profit (MA10) and average daily LMP (MA10) show a pronounced downward trend, indicating that during early … view at source ↗

**Figure 13.** Figure 13: Agent-wise Exploitability under DPMP-PPO Baseline Profile. (3) System-level implications [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗

**Figure 14.** Figure 14: Topology of the IEEE 39-bus network [PITH_FULL_IMAGE:figures/full_fig_p041_14.png] view at source ↗

read the original abstract

Reinforcement learning agent-based simulation (RL-ABS) has become an important tool for electricity market mechanism analysis and evaluation. In the modeling of monotone, bounded, multi-segment stepwise bids, existing methods typically let the policy network first output an unconstrained action and then convert it into a feasible bid curve satisfying monotonicity and boundedness through post-processing mappings such as sorting, clipping, or projection. However, such post-processing mappings often fail to satisfy continuous differentiability, injectivity, and invertibility at boundaries or kinks, thereby causing gradient distortion and leading to spurious convergence in simulation results. Meanwhile, most existing studies conduct mechanism analysis and evaluation mainly on the basis of training-curve convergence, without rigorously assessing the distance between the simulation outcomes and Nash equilibrium, which severely undermines the credibility of the results. To address these issues, this paper proposes...

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a parameterization for multi-segment bids that preserves differentiability and invertibility plus a Nash-distance check that goes beyond training curves.

read the letter

The paper gives a parameterization for multi-segment bids that stays monotone and positive in a way that keeps the mapping differentiable and invertible, plus a framework to check how far the RL sim is from actual Nash eq instead of relying on convergence plots. It targets two real headaches in RL-ABS for electricity markets: post-processing steps like sorting or clipping that distort gradients, and evaluation that stops at training curves without checking equilibrium distance. The dual-positive monotone construction and the validity metric are presented as direct responses to those gaps. The math follows from the definitions without internal contradictions, and the stress-test found no load-bearing assumptions that collapse under the paper's own premises. That part holds up. Experiments appear to demonstrate cleaner gradients and more reliable distance measures on the test cases. One soft spot is practical rollout: it is not obvious how easily the parameterization drops into off-the-shelf policy networks without new instabilities or extra hyperparameter work, and the Nash metric's accuracy on real market data with incomplete information is not fully stress-tested beyond the controlled setups. Those are implementation questions rather than fatal flaws. This work is aimed at researchers building RL simulations for market mechanism design, especially in energy systems where policy decisions rest on the outputs. A reader who already works with agent-based electricity models will see immediate value in the fixes. It is solid enough and addresses documented shortcomings in the subfield, so it deserves a serious referee rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper identifies limitations in existing post-processing mappings (sorting, clipping, projection) for enforcing monotonicity and boundedness on multi-segment stepwise bids in RL-ABS for electricity markets; these mappings often violate continuous differentiability, injectivity, and boundary invertibility, distorting gradients and producing spurious convergence. It proposes a dual-positive monotone parameterization that directly parameterizes feasible bid curves while preserving the required analytic properties, together with a validity assessment framework that quantifies the distance of learned outcomes to Nash equilibrium rather than relying solely on training-curve convergence.

Significance. If the parameterization indeed supplies continuous differentiability, injectivity, and boundary invertibility without introducing new optimization instabilities, and if the validity metric reliably reflects equilibrium deviation, the work would strengthen the credibility of RL-ABS studies in electricity-market mechanism design. The constructions appear internally consistent under the paper's own definitions, with no hidden circularities or contradictory assumptions.

major comments (2)

[§3.1–3.3] §3.1–3.3: the dual-positive parameterization is shown to satisfy the headline analytic properties by construction, yet the manuscript does not provide an explicit verification (e.g., derivative calculation or injectivity proof) for the multi-segment case at interior kinks; a short lemma or numerical check would make the gradient-preservation claim load-bearing rather than asserted.
[§4.2] §4.2: the validity framework defines a Nash-distance metric, but the paper does not demonstrate that this metric remains informative when agents employ the new parameterization; an ablation comparing distance values before and after parameterization would confirm that the framework is not merely re-labeling training convergence.

minor comments (2)

[§2–3] Notation for bid-segment indices and positivity constraints is introduced in §2 but reused without re-definition in §3; a single consolidated notation table would improve readability.
[Abstract] The abstract states the problems clearly but supplies no equation numbers or key definitions; moving one illustrative equation from §3 into the abstract would help readers immediately grasp the parameterization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address each major comment below and will incorporate the suggested clarifications and additions into the revised manuscript.

read point-by-point responses

Referee: [§3.1–3.3] §3.1–3.3: the dual-positive parameterization is shown to satisfy the headline analytic properties by construction, yet the manuscript does not provide an explicit verification (e.g., derivative calculation or injectivity proof) for the multi-segment case at interior kinks; a short lemma or numerical check would make the gradient-preservation claim load-bearing rather than asserted.

Authors: We agree that an explicit verification strengthens the claim. In the revised manuscript we will add a short lemma establishing continuous differentiability and injectivity at interior kinks for the multi-segment case, together with a brief numerical check confirming that gradients are preserved through the parameterization. revision: yes
Referee: [§4.2] §4.2: the validity framework defines a Nash-distance metric, but the paper does not demonstrate that this metric remains informative when agents employ the new parameterization; an ablation comparing distance values before and after parameterization would confirm that the framework is not merely re-labeling training convergence.

Authors: We accept the suggestion. The revised version will include an ablation study that reports Nash-distance values both before and after applying the dual-positive parameterization, thereby showing that the metric continues to reflect equilibrium deviation rather than merely tracking training convergence. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes a dual-positive monotone parameterization explicitly constructed to enforce continuous differentiability, injectivity, and boundary invertibility, along with a separate validity assessment framework for Nash equilibrium distance. These are presented as novel definitions and metrics addressing gaps in prior post-processing methods, without any load-bearing steps that reduce the claimed properties or predictions back to fitted inputs, self-citations, or ansatzes by construction. The abstract and description frame the contributions as independent solutions rather than derivations that loop to their own premises. No equations or self-referential chains are indicated that would trigger the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The provided abstract contains no explicit free parameters, axioms, or invented entities; all details of the parameterization and framework are described at a high level without mathematical specification.

pith-pipeline@v0.9.0 · 5461 in / 1174 out tokens · 44933 ms · 2026-05-10T15:52:11.440567+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Y . Song, S. Huang, L. Chen, S. Cui, S. Mei, Optimal bidding framework for integrated renewable-storage plant in high-dimensional real-time markets, Sustainability 17 (18) (2025) 8159

work page 2025
[2]

Glismann, Ancillary services acquisition model: Considering market interactions in policy design, Applied Energy 304 (2021) 117697

S. Glismann, Ancillary services acquisition model: Considering market interactions in policy design, Applied Energy 304 (2021) 117697

work page 2021
[3]

Ringler, D

P . Ringler, D. Keles, W. Fichtner, Agent-based modelling and simulation of smart electricity grids and markets–a literature review, Renewable and Sustainable Energy Reviews 57 (2016) 205–215. 34

work page 2016
[4]

Z. Pan, Z. Jing, T. Ji, Y . Song, A multi-agent simulation model considering the bounded rationality of market participants: an example of gencos participation in the electricity spot market, in: International Workshop on Multi-Agent Systems and Agent-Based Simulation, Springer, 2023, pp. 129–145

work page 2023
[5]

Sridhar, S

A. Sridhar, S. Honkapuro, F. Ruiz, J. Stoklasa, S. Annala, A. Wol ﬀ, Residential consumer enrollment in demand response: An agent based approach, Applied Energy 374 (2024) 123988

work page 2024
[6]

V entosa, A

M. V entosa, A. Baıllo, A. Ramos, M. Rivier, Electricity market modeling trends, Energy policy 33 (7) (2005) 897–913

work page 2005
[7]

Baillo, M

A. Baillo, M. V entosa, M. Rivier, A. Ramos, Optimal o ﬀering strategies for generation companies operating in electricity spot markets, IEEE Transactions on Power Systems 19 (2) (2004) 745–753

work page 2004
[8]

B. F. Hobbs, C. B. Metzler, J.-S. Pang, Strategic gaming analysis for electric power systems: An mpec approach, IEEE transactions on power systems 15 (2) (2000) 638–645

work page 2000
[9]

Shaﬁe-Khah, J

M. Shaﬁe-Khah, J. P . Catalão, A stochastic multi-layer agent-based model to study electricity market participants behavior, IEEE Transactions on Power Systems 30 (2) (2014) 867–881

work page 2014
[10]

Fraunholz, E

C. Fraunholz, E. Kraft, D. Keles, W. Fichtner, Advanced price forecasting in agent-based electricity market simulation, Applied Energy 290 (2021) 116688

work page 2021
[11]

Nanduri, T

V . Nanduri, T. K. Das, A reinforcement learning model to assess market power under auction-based energy pricing, IEEE transactions on Power Systems 22 (1) (2007) 85–95

work page 2007
[12]

Rahimiyan, H

M. Rahimiyan, H. R. Mashhadi, An adaptive q-learning algorithm developed for agent-based computational modeling of electricity market, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40 (5) (2010) 547–556

work page 2010
[13]

L. Y u, P . Wang, Y . Zhang, N. Li, R. Cherkaoui, A reinforcement-probability bayesian approach for strategic bidding and market clearing for renewable energy sources with uncertainty, Journal of Cleaner Production 429 (2023) 139403

work page 2023
[14]

Liang, C

Y . Liang, C. Guo, Z. Ding, H. Hua, Agent-based modeling in electricity market using deep deterministic policy gradient algorithm, IEEE transactions on power systems 35 (6) (2020) 4180–4192

work page 2020
[15]

K. V . Chandrakala, P . Kiran, Multi-agent based modeling and learning approach for intelligent day-ahead bidding strategy in wholesale electricity market, Expert Systems with Applications 233 (2023) 121014

work page 2023
[16]

Rokhforoz, M

P . Rokhforoz, M. Montazeri, O. Fink, Multi-agent reinforcement learning with graph convolutional neural net- works for optimal bidding strategies of generation units in electricity markets, Expert Systems with Applications 225 (2023) 120010

work page 2023
[17]

B. Yin, H. Weng, Y . Hu, J. Xi, P . Ding, J. Liu, Multi-agent deep reinforcement learning for simulating centralized double-sided auction electricity market, IEEE Transactions on Power Systems 40 (1) (2024) 518–529

work page 2024
[18]

H. Weng, Y . Hu, M. Liang, J. Xi, B. Yin, Optimizing bidding strategy in electricity market based on graph convolutional neural network and deep reinforcement learning, Applied Energy 380 (2025) 124978

work page 2025
[19]

J. Wu, J. Wang, X. Kong, Intelligent strategic bidding in competitive electricity markets using multi-agent sim- ulation and deep reinforcement learning, Applied Soft Computing 152 (2024) 111235

work page 2024
[20]

ZHANG, Y

J. ZHANG, Y . Zhang, X. Wang, C. JIANG, L. W ANG, Game bidding and beneﬁt allocation strategy for virtual power plants with multiple new market entities based on multi-agent reinforcement learning, Power System Technology (2024) 1–12. 35

work page 2024
[21]

Jiang, J

Y . Jiang, J. Dong, H. Huang, Optimal bidding strategy for the price-maker virtual power plant in the day-ahead market based on multi-agent twin delayed deep deterministic policy gradient algorithm, Energy 306 (2024) 132388

work page 2024
[22]

Z. Pan, Z. Jing, Decision-making and cost models of generation company agents for supporting future electricity market mechanism design based on agent-based simulation, Applied Energy 391 (2025) 125881

work page 2025
[23]

R. S. Sutton, A. G. Barto, et al., Reinforcement learning: An introduction, V ol. 1, MIT press Cambridge, 1998

work page 1998
[24]

Zhao, Mathematical foundations of reinforcement learning, Springer Nature, 2025

S. Zhao, Mathematical foundations of reinforcement learning, Springer Nature, 2025

work page 2025
[25]

Manual, 11: Energy and ancillary services market operations revision: 122 (2021)

P . Manual, 11: Energy and ancillary services market operations revision: 122 (2021)

work page 2021
[26]

Löhndorf, D

N. Löhndorf, D. Wozabal, S. Minner, Optimizing trading decisions for hydro storage systems using approximate dual dynamic programming, Operations Research 61 (4) (2013) 810–823

work page 2013
[27]

Fujita, S.-i

Y . Fujita, S.-i. Maeda, Clipped action policy gradient, in: International conference on machine learning, PMLR, 2018, pp. 1597–1606

work page 2018
[28]

L. Y u, P . Wang, Z. Chen, D. Li, N. Li, R. Cherkaoui, Finding nash equilibrium based on reinforcement learning for bidding strategy and distributed algorithm for iso in imperfect electricity market, Applied Energy 350 (2023) 121704

work page 2023
[29]

Openspiel: A frame- work for reinforcement learning in games.arXiv preprint arXiv:1908.09453,

M. Lanctot, E. Lockhart, J.-B. Lespiau, V . Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshaﬁei, et al., Openspiel: A framework for reinforcement learning in games, arXiv preprint arXiv:1908.09453 (2019)

work page arXiv 1908
[30]

De Leeuw, K

J. De Leeuw, K. Hornik, P . Mair, Isotone optimization in r: pool-adjacent-violators algorithm (pava) and active set methods, Journal of statistical software 32 (2010) 1–24. Appendix A. A.1 Proof of Necessary Condition 1 (NC1) for Post-Processing Operations Necessary Condition 1 (NC1): The post-processing mapping h should satisfy 8a0, Ph(x) = a0j s = 0 Pr...

work page 2010

[1] [1]

Y . Song, S. Huang, L. Chen, S. Cui, S. Mei, Optimal bidding framework for integrated renewable-storage plant in high-dimensional real-time markets, Sustainability 17 (18) (2025) 8159

work page 2025

[2] [2]

Glismann, Ancillary services acquisition model: Considering market interactions in policy design, Applied Energy 304 (2021) 117697

S. Glismann, Ancillary services acquisition model: Considering market interactions in policy design, Applied Energy 304 (2021) 117697

work page 2021

[3] [3]

Ringler, D

P . Ringler, D. Keles, W. Fichtner, Agent-based modelling and simulation of smart electricity grids and markets–a literature review, Renewable and Sustainable Energy Reviews 57 (2016) 205–215. 34

work page 2016

[4] [4]

Z. Pan, Z. Jing, T. Ji, Y . Song, A multi-agent simulation model considering the bounded rationality of market participants: an example of gencos participation in the electricity spot market, in: International Workshop on Multi-Agent Systems and Agent-Based Simulation, Springer, 2023, pp. 129–145

work page 2023

[5] [5]

Sridhar, S

A. Sridhar, S. Honkapuro, F. Ruiz, J. Stoklasa, S. Annala, A. Wol ﬀ, Residential consumer enrollment in demand response: An agent based approach, Applied Energy 374 (2024) 123988

work page 2024

[6] [6]

V entosa, A

M. V entosa, A. Baıllo, A. Ramos, M. Rivier, Electricity market modeling trends, Energy policy 33 (7) (2005) 897–913

work page 2005

[7] [7]

Baillo, M

A. Baillo, M. V entosa, M. Rivier, A. Ramos, Optimal o ﬀering strategies for generation companies operating in electricity spot markets, IEEE Transactions on Power Systems 19 (2) (2004) 745–753

work page 2004

[8] [8]

B. F. Hobbs, C. B. Metzler, J.-S. Pang, Strategic gaming analysis for electric power systems: An mpec approach, IEEE transactions on power systems 15 (2) (2000) 638–645

work page 2000

[9] [9]

Shaﬁe-Khah, J

M. Shaﬁe-Khah, J. P . Catalão, A stochastic multi-layer agent-based model to study electricity market participants behavior, IEEE Transactions on Power Systems 30 (2) (2014) 867–881

work page 2014

[10] [10]

Fraunholz, E

C. Fraunholz, E. Kraft, D. Keles, W. Fichtner, Advanced price forecasting in agent-based electricity market simulation, Applied Energy 290 (2021) 116688

work page 2021

[11] [11]

Nanduri, T

V . Nanduri, T. K. Das, A reinforcement learning model to assess market power under auction-based energy pricing, IEEE transactions on Power Systems 22 (1) (2007) 85–95

work page 2007

[12] [12]

Rahimiyan, H

M. Rahimiyan, H. R. Mashhadi, An adaptive q-learning algorithm developed for agent-based computational modeling of electricity market, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40 (5) (2010) 547–556

work page 2010

[13] [13]

L. Y u, P . Wang, Y . Zhang, N. Li, R. Cherkaoui, A reinforcement-probability bayesian approach for strategic bidding and market clearing for renewable energy sources with uncertainty, Journal of Cleaner Production 429 (2023) 139403

work page 2023

[14] [14]

Liang, C

Y . Liang, C. Guo, Z. Ding, H. Hua, Agent-based modeling in electricity market using deep deterministic policy gradient algorithm, IEEE transactions on power systems 35 (6) (2020) 4180–4192

work page 2020

[15] [15]

K. V . Chandrakala, P . Kiran, Multi-agent based modeling and learning approach for intelligent day-ahead bidding strategy in wholesale electricity market, Expert Systems with Applications 233 (2023) 121014

work page 2023

[16] [16]

Rokhforoz, M

P . Rokhforoz, M. Montazeri, O. Fink, Multi-agent reinforcement learning with graph convolutional neural net- works for optimal bidding strategies of generation units in electricity markets, Expert Systems with Applications 225 (2023) 120010

work page 2023

[17] [17]

B. Yin, H. Weng, Y . Hu, J. Xi, P . Ding, J. Liu, Multi-agent deep reinforcement learning for simulating centralized double-sided auction electricity market, IEEE Transactions on Power Systems 40 (1) (2024) 518–529

work page 2024

[18] [18]

H. Weng, Y . Hu, M. Liang, J. Xi, B. Yin, Optimizing bidding strategy in electricity market based on graph convolutional neural network and deep reinforcement learning, Applied Energy 380 (2025) 124978

work page 2025

[19] [19]

J. Wu, J. Wang, X. Kong, Intelligent strategic bidding in competitive electricity markets using multi-agent sim- ulation and deep reinforcement learning, Applied Soft Computing 152 (2024) 111235

work page 2024

[20] [20]

ZHANG, Y

J. ZHANG, Y . Zhang, X. Wang, C. JIANG, L. W ANG, Game bidding and beneﬁt allocation strategy for virtual power plants with multiple new market entities based on multi-agent reinforcement learning, Power System Technology (2024) 1–12. 35

work page 2024

[21] [21]

Jiang, J

Y . Jiang, J. Dong, H. Huang, Optimal bidding strategy for the price-maker virtual power plant in the day-ahead market based on multi-agent twin delayed deep deterministic policy gradient algorithm, Energy 306 (2024) 132388

work page 2024

[22] [22]

Z. Pan, Z. Jing, Decision-making and cost models of generation company agents for supporting future electricity market mechanism design based on agent-based simulation, Applied Energy 391 (2025) 125881

work page 2025

[23] [23]

R. S. Sutton, A. G. Barto, et al., Reinforcement learning: An introduction, V ol. 1, MIT press Cambridge, 1998

work page 1998

[24] [24]

Zhao, Mathematical foundations of reinforcement learning, Springer Nature, 2025

S. Zhao, Mathematical foundations of reinforcement learning, Springer Nature, 2025

work page 2025

[25] [25]

Manual, 11: Energy and ancillary services market operations revision: 122 (2021)

P . Manual, 11: Energy and ancillary services market operations revision: 122 (2021)

work page 2021

[26] [26]

Löhndorf, D

N. Löhndorf, D. Wozabal, S. Minner, Optimizing trading decisions for hydro storage systems using approximate dual dynamic programming, Operations Research 61 (4) (2013) 810–823

work page 2013

[27] [27]

Fujita, S.-i

Y . Fujita, S.-i. Maeda, Clipped action policy gradient, in: International conference on machine learning, PMLR, 2018, pp. 1597–1606

work page 2018

[28] [28]

L. Y u, P . Wang, Z. Chen, D. Li, N. Li, R. Cherkaoui, Finding nash equilibrium based on reinforcement learning for bidding strategy and distributed algorithm for iso in imperfect electricity market, Applied Energy 350 (2023) 121704

work page 2023

[29] [29]

Openspiel: A frame- work for reinforcement learning in games.arXiv preprint arXiv:1908.09453,

M. Lanctot, E. Lockhart, J.-B. Lespiau, V . Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshaﬁei, et al., Openspiel: A framework for reinforcement learning in games, arXiv preprint arXiv:1908.09453 (2019)

work page arXiv 1908

[30] [30]

De Leeuw, K

J. De Leeuw, K. Hornik, P . Mair, Isotone optimization in r: pool-adjacent-violators algorithm (pava) and active set methods, Journal of statistical software 32 (2010) 1–24. Appendix A. A.1 Proof of Necessary Condition 1 (NC1) for Post-Processing Operations Necessary Condition 1 (NC1): The post-processing mapping h should satisfy 8a0, Ph(x) = a0j s = 0 Pr...

work page 2010