arxiv: 2605.14043 · v1 · submitted 2026-05-13 · 📡 eess.SY · cs.SY

Recognition: 1 theorem link

· Lean Theorem

Optimal design of solar-battery hybrid resources considering multi-market participation under weather and price uncertainty

Hikaru Hoshino , Taiyo Mantani , Eiko Furutani

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:24 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords solar-battery hybriddeep reinforcement learningmulti-market biddingsystem sizingstochastic optimizationancillary servicesuncertainty modeling

0 comments

The pith

A deep reinforcement learning framework jointly optimizes solar-battery hybrid sizing and multi-market bidding strategies under uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method that treats solar panel and battery capacities as variables inside a reinforcement learning policy rather than fixing them first and optimizing bids later. This unified approach learns how to size the hybrid resource and how to allocate its limited power and energy across energy and ancillary services markets at the same time, while facing stochastic weather and price conditions. Traditional two-step methods separate sizing from operation and often produce designs that underperform when real variability arrives. By keeping everything inside one stochastic learning process, the framework aims to discover capacities that remain profitable across many possible scenarios drawn from historical data.

Core claim

The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation.

What carries the argument

A deep reinforcement learning policy whose action space includes continuous design variables for solar and battery capacities, allowing the agent to learn both resource sizing and market bidding decisions together under uncertainty.

If this is right

The learned policy produces hybrid capacities that allocate power and energy across markets in ways that respond to realized conditions rather than to fixed forecasts.
Economic assessment of the hybrid resource incorporates the value of flexibility across multiple revenue streams within a single optimization run.
Designs remain effective when renewable output and market prices deviate from the scenarios seen during training.
The method avoids the need for separate scenario reduction or robust optimization steps before sizing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding technique could be tested on other hybrid combinations such as wind plus storage or solar plus demand response.
If policy gradients become unstable for very large capacity ranges, discretization or hierarchical RL might be required to keep learning tractable.
Real-time market participation would require extending the state to include live price signals and state-of-charge limits.

Load-bearing premise

Embedding continuous system sizes directly into the reinforcement learning policy produces stable learning and effective designs when uncertainty is present.

What would settle it

Train the policy on one set of historical weather and price traces, then evaluate the resulting fixed sizes and bidding policy on a completely held-out set of traces; if the achieved profit is lower than that of a well-tuned sequential optimization baseline, the joint-optimization claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.14043 by Eiko Furutani, Hikaru Hoshino, Taiyo Mantani.

**Figure 1.** Figure 1: Comparison of PV-battery coupling architectures 2.1. Background and Definitions PV-battery systems can be deployed under several architectural and market-participation configurations. Despite market-specific differences, two key dimensions determine how these systems interact with the grid: • Electrical configuration: How PV and battery are interconnected behind the Point of Interconnection (POI) to the g… view at source ↗

**Figure 2.** Figure 2: Recovery of clipped energy in hybrid resources First, hybrid architectures enable the recovery of “clipped” PV energy in plants with a high DC/AC ratio, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: illustrates the overall co-optimization framework. Let 𝜔 represent the system design parameter as in Eq. (1), and 𝜋𝜃 the operational policy parameterized by 𝜃. The upper part of the figure shows the operational learning component, which follows a standard DRL framework except that the design parameter 𝜔 is included in the state and remains fixed within each episode. Given a sampled design 𝜔, the agent in… view at source ↗

**Figure 4.** Figure 4: Schematic overview of the serial strategy Among three AS bids, the capacity for contingency reserve is first allocated. For simplicity of exposition, we assume that the contingency reserve is provided exclusively by the battery, without loss of generality2 . In this case, the feasible reserve capacity is constrained by the POI capacity in Eq. (2) and the discharge power limit of the battery converter in E… view at source ↗

**Figure 5.** Figure 5: Progress of episode rewards during training Hybrid Co-located 10000 0 10000 20000 30000 40000 50000 60000 Average Revenue ($) Energy Revenue Ancillary Revenue Capacity Payment Imbalance Penalty Degradation Cost [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: presents the breakdown of the revenue components, including energy market revenue (the first term in Eq. (12)), AS revenue, capacity payment, battery degradation cost, and imbalance penalties (the second term in Eq. (12)). The results show that the imbalance penalties are 0 20 40 60 80 100 120 140 160 Time (hours) 20 15 10 5 0 5 10 15 20 Capacity (MW) 20 0 20 40 60 80 100 120 Price($/MWh) b_e Energy pric… view at source ↗

**Figure 8.** Figure 8: , the breakdown of revenues in [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Breakdown of revenue components of hybrid and co-located resources improvement demonstrates the importance of jointly optimizing system design and operational strategies, as the cooptimization framework identifies a configuration that more effectively exploits multi-market revenue opportunities. In this baseline case, the optimal design results in a battery with a duration of 2.71 h, which is below the c… view at source ↗

**Figure 10.** Figure 10: Time-series behaviors under hypothetical scenarios resulting battery duration remains around 2.50 h. This result suggests that, for hybrid resources, the incentive provided by the capacity market reform is not sufficient to promote long-duration storage, even under reduced battery costs, and the economic value of the battery is primarily derived from energy and AS markets. At the same time, the result hig… view at source ↗

**Figure 11.** Figure 11: Breakdown of revenue components over the year 0 20 40 60 80 100 120 140 160 Time (hours) 30 20 10 0 10 20 30 Capacity (MW) 2022-07-01 to 2022-07-07 0 20 40 60 80 100 Price ($/MWh) b_e Energy price b_res Spinning-Reserve price b_up Reg-Up price b_dn Reg-Down price actual PV power (a) 1st week of July 0 20 40 60 80 100 120 140 160 Time (hours) 30 20 10 0 10 20 30 Capacity (MW) 2022-09-01 to 2022-09-07 0 200… view at source ↗

**Figure 12.** Figure 12: Representative time series in selected months accounting for operational constraints and stochastic variations in prices and renewable generation. Numerical results demonstrated that the framework can effectively identify economically rational system configurations and operational policies, highlighting the advantages of hybrid resources over co-located alternatives. Furthermore, the applicability to lon… view at source ↗

read the original abstract

The rapid growth of variable renewable energy has increased the need for flexible and efficiently coordinated energy resources. In this context, hybrid resources that combine renewable generation and battery storage within a single market-participating entity have attracted growing attention. Such hybrid resources can have multiple revenue streams, while allocating limited power and energy capacity across multiple electricity markets including energy and ancillary services. This multi-market coordination increases operational complexity and complicates profitability assessment, making optimal system sizing a challenging design problem. In addition, uncertainty in renewable generation and market prices makes it difficult for conventional optimization approaches to determine system designs that remain effective under stochastic operating conditions. To address these challenges, this paper proposes a deep reinforcement learning-based co-optimization framework for hybrid solar-battery resources. The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation. Case studies using historical renewable generation and market data demonstrate the effectiveness of the proposed framework in identifying economically rational hybrid system design considering multi-market operation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's DRL co-optimization for solar-battery sizing and multi-market bidding is a reasonable framing but lacks the quantitative backing needed to judge its effectiveness.

read the letter

The core idea here is to use deep reinforcement learning to handle both the one-time decision on how large to make the solar array and battery, and the ongoing decisions on how to bid that capacity into energy and ancillary service markets. By putting the sizing variables inside the policy, the approach aims to find designs that perform well across uncertain weather and price scenarios without needing a separate outer loop for sizing. This is new in the sense that most prior work either fixes the sizes first or optimizes them sequentially with the operations. The unified formulation lets the learning process trade off capital costs against expected revenues from coordinated bidding. The case studies mentioned use historical data, which is a reasonable starting point for showing practical value in real power systems. The paper does a decent job framing why multi-market participation matters for hybrids and why uncertainty makes conventional methods struggle. It correctly identifies that allocating limited capacity across markets adds complexity that standard stochastic programming might not scale well for. The soft spots are more about what's missing than what's wrong. The abstract claims the framework identifies economically rational designs, but it doesn't report any specific metrics like percentage improvement in profit, comparison against a two-stage benchmark, or details on how the uncertainty is represented in the training episodes. Without those, it's hard to know if the method actually delivers better designs or just learns something reasonable. There's also the open question of whether including continuous sizing actions in the same policy as time-varying bids leads to training instability; the abstract doesn't describe any special parameterization or variance reduction steps that would address that. For readers working on renewable integration and market participation, this could be worth a look if the full paper includes ablation studies and reproducible code. It targets people who want to move beyond deterministic sizing tools toward data-driven methods that capture operational flexibility. I'd recommend sending it for peer review. The problem is timely, the approach is coherent on paper, and even if revisions are needed for the empirical section, the framing alone justifies referee time.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a deep reinforcement learning (DRL) co-optimization framework for solar-battery hybrid resources that embeds system design variables (solar and battery capacities) directly into the policy learning process. This enables joint optimization of static system sizing and dynamic multi-market bidding strategies (energy and ancillary services) within a single stochastic formulation that accounts for uncertainty in renewable generation and market prices. Case studies using historical data are presented to demonstrate the framework's ability to identify economically rational designs.

Significance. If the results hold, the work would offer a practical advance in hybrid resource planning by unifying design and operational decisions under realistic multi-market and stochastic conditions, potentially improving profitability assessments beyond sequential or deterministic methods. The approach addresses a timely problem in variable renewable integration where conventional optimization struggles with joint sizing and bidding.

major comments (2)

[Abstract and §3] Abstract and §3 (framework description): The central claim that embedding continuous design variables into the RL policy enables stable joint optimization lacks any specification of action-space parameterization (e.g., Gaussian policy for continuous sizes vs. discretized), policy network architecture, or variance-reduction techniques. Without these, the formulation risks unstable gradients when mixing static sizing decisions with sequential bidding actions, as highlighted by the stress-test concern; this directly undermines evaluation of the unified stochastic formulation's effectiveness.
[§4] §4 (case studies): No quantitative results, error metrics, baseline comparisons (e.g., against separate sizing-then-bidding optimization or deterministic MILP), or details on uncertainty modeling (scenario generation, probability distributions) are provided. This prevents assessment of whether the learned designs are economically rational or superior under weather/price uncertainty, making the demonstration of effectiveness unverifiable.

minor comments (2)

[§2] Notation for design variables and market bids should be introduced consistently with clear units and bounds in the problem formulation section.
[§4] Figure captions for case-study results should explicitly state the number of scenarios, training episodes, and any hyperparameter settings used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We have revised the manuscript to address both major comments by expanding the methodological specifications in §3 and providing quantitative results, metrics, and comparisons in §4.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (framework description): The central claim that embedding continuous design variables into the RL policy enables stable joint optimization lacks any specification of action-space parameterization (e.g., Gaussian policy for continuous sizes vs. discretized), policy network architecture, or variance-reduction techniques. Without these, the formulation risks unstable gradients when mixing static sizing decisions with sequential bidding actions, as highlighted by the stress-test concern; this directly undermines evaluation of the unified stochastic formulation's effectiveness.

Authors: We agree that the original description of the DRL implementation was insufficiently detailed. In the revised manuscript we have added a dedicated subsection in §3 that specifies the action-space parameterization (continuous Gaussian policy for the design variables with separate heads for bidding actions), the policy network architecture (MLP with two hidden layers), and the variance-reduction techniques employed (GAE and entropy regularization). These additions directly address the concern about gradient stability when jointly optimizing static and sequential decisions. revision: yes
Referee: [§4] §4 (case studies): No quantitative results, error metrics, baseline comparisons (e.g., against separate sizing-then-bidding optimization or deterministic MILP), or details on uncertainty modeling (scenario generation, probability distributions) are provided. This prevents assessment of whether the learned designs are economically rational or superior under weather/price uncertainty, making the demonstration of effectiveness unverifiable.

Authors: We acknowledge that the original §4 presented only high-level demonstrations. The revised version expands the case studies with concrete numerical outcomes (optimal capacities and expected profits), error metrics, direct comparisons against a sequential sizing-then-bidding baseline and a deterministic MILP formulation, and explicit uncertainty modeling details (scenario generation from historical data using fitted distributions and Monte Carlo sampling). These additions allow verification of economic rationality under the stochastic conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: novel RL co-optimization framework stands as independent proposal

full rationale

The paper proposes a deep reinforcement learning-based co-optimization framework that embeds system design variables directly into the policy learning process for joint optimization of hybrid system sizing and multi-market bidding. No equations, fitted parameters, or derivations are shown that reduce the claimed joint optimization to a tautology, self-definition, or prior self-citation. The approach is framed as a new unified stochastic formulation supported by case studies on historical data, with no evidence of load-bearing self-citations, ansatz smuggling, or renaming of known results. The central claim remains self-contained and does not collapse by construction to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The framework implicitly assumes that historical renewable and price data are sufficient to train policies that generalize to future conditions.

pith-pipeline@v0.9.0 · 5489 in / 1136 out tokens · 31826 ms · 2026-05-15T05:24:59.078382+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

[1]

M.Ahlstrom,J.Mays,E.Gimon,A.Gelston,C.Murphy,P.Denholm, G.Nemet,Hybridresources:Challenges,implications,opportunities, andinnovation,IEEEPowerandEnergyMagazine19(6)(2021)37– 44.doi:10.1109/MPE.2021.3104077

work page doi:10.1109/mpe.2021.3104077 2021
[2]

Olatomiwa, S

L. Olatomiwa, S. Mekhilef, M. Ismail, M. Moghavvemi, Energy managementstrategiesinhybridrenewableenergysystems:Areview, RenewableandSustainableEnergyReviews62(2016)821–835.doi: https://doi.org/10.1016/j.rser.2016.05.040

work page doi:10.1016/j.rser.2016.05.040 2016
[3]

G. He, Q. Chen, C. Kang, Q. Xia, Optimal offering strategy for con- centrating solar power plants in joint energy, reserve and regulation markets,IEEETransactionsonSustainableEnergy7(3)(2016)1245– 1254.doi:10.1109/TSTE.2016.2533637

work page doi:10.1109/tste.2016.2533637 2016
[4]

K. Das, A. L. T. Philippe Grapperon, P. E. Sørensen, A. D. Hansen, Optimal battery operation for revenue maximization of wind-storage hybrid power plant, Electric Power Systems Research 189 (2020) 106631.doi:https://doi.org/10.1016/j.epsr.2020.106631

work page doi:10.1016/j.epsr.2020.106631 2020
[5]

Y.Xie,W.Guo,Q.Wu,K.Wang,RobustMPC-basedbiddingstrategy for wind storage systems in real-time energy and regulation markets, International Journal of Electrical Power & Energy Systems 124 (2021) 106361.doi:https://doi.org/10.1016/j.ijepes.2020.106361

work page doi:10.1016/j.ijepes.2020.106361 2021
[6]

W. B. Powell, Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions, John Wiley & Sons, 2022

work page 2022
[7]

Y. Dong, Z. Dong, T. Zhao, Z. Ding, A strategic day-ahead bidding strategyandoperationforbatteryenergystoragesystembyreinforce- ment learning, Electric power systems research 196 (2021) 107229

work page 2021
[8]

Anwar, C

M. Anwar, C. Wang, F. de Nijs, H. Wang, Proximal policy optimiza- tion based reinforcement learning for joint bidding in energy and frequencyregulationmarkets,in:2022IEEEPower&EnergySociety General Meeting (PESGM), 2022, pp. 1–5.doi:10.1109/PESGM48719. 2022.9917082

work page doi:10.1109/pesgm48719 2022
[9]

J. Li, C. Wang, Y. Zhang, H. Wang, Temporal-aware deep reinforce- ment learning for energy storage bidding in energy and contingency reserve markets, IEEE Transactions on Energy Markets, Policy and Regulation 2 (3) (2024) 392–406.doi:10.1109/TEMPR.2024.3372656

work page doi:10.1109/tempr.2024.3372656 2024
[10]

Kortmann, N

S. Kortmann, N. Zoller, S. Bouchkati, L. Böttcher, A. Ulbig, Re- inforcement learning for optimized multi-use operation of battery energy storage systems, SIGENERGY Energy Informatics Review 5 (3) (2025) 169–178.doi:10.1145/3777518.3777532

work page doi:10.1145/3777518.3777532 2025
[11]

Huang, J

B. Huang, J. Wang, Deep-reinforcement-learning-based capacity scheduling for PV-battery storage system, IEEE Trans. Smart Grid 12 (3) (2021) 2272–2283.doi:10.1109/TSG.2020.3047890

work page doi:10.1109/tsg.2020.3047890 2021
[12]

J. Li, C. Wang, H. Wang, Deep reinforcement learning for wind and energystoragecoordinationinwholesaleenergyandancillaryservice markets, Energy and AI 14 (2023) 100280

work page 2023
[13]

Cardo-Miota, H

J. Cardo-Miota, H. Beltran, E. Pérez, S. Khadem, M. Bahloul, Deep reinforcement learning-based strategy for maximizing returns from renewable energy and energy storage systems in multi-electricity markets, Applied Energy 388 (2025) 125561.doi:https://doi.org/ 10.1016/j.apenergy.2025.125561

work page doi:10.1016/j.apenergy.2025.125561 2025
[14]

California ISO, Initiative: Hybrid resources, Available online: https://stakeholdercenter.caiso.com/StakeholderInitiatives/ Hybrid-resources(accessed 2025-12-08)

work page 2025
[15]

MISO, Hybrid resource participation model: Co-located market participation, Available online:https://cdn.misoenergy.org/ 20250821%20MSC%20Item%2009%20Hybrid%20Resource%20Participation% 20Model%20(MSC-2020-2)714029.pdf(accessed 2025-12-08)

work page 2009
[16]

Kahrl, H

F. Kahrl, H. Kim, A. D. Mills, R. H. Wiser, C. Crespo Montañés, W. Gorman, Variable renewable energy participation in us ancil- lary services markets: Economic evaluation and key issues, Tech. rep., Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States) (2021)

work page 2021
[17]

Ericson, S

S. Ericson, S. Koebrich, S. Awara, A. Schleifer, J. Heeter, K. Cory, C. Murphy, P. Denholm, Influence of hybridization on the capacity valueofpvandbatteryresources,Tech.Rep.NREL/TP-5R00-75864, National Renewable Energy Laboratory (NREL), Golden, CO, USA (2022)

work page 2022
[18]

Gomes, H

I. Gomes, H. Pousinho, R. Melício, V. Mendes, Stochastic coordi- nation of joint wind and photovoltaic systems with energy storage in day-aheadmarket,Energy124(2017)310–320.doi:https://doi.org/ 10.1016/j.energy.2017.02.080

work page doi:10.1016/j.energy.2017.02.080 2017
[19]

S.Bhattacharjee,R.Sioshansi,H.Zareipour,Comparingparticipation models in electricity markets for hybrid energy-storage resources, IEEE Transactions on Power Systems 40 (1) (2025) 650–661.doi: 10.1109/TPWRS.2024.3397590

work page doi:10.1109/tpwrs.2024.3397590 2025
[20]

T. F. Agajie, A. Ali, A. Fopah-Lele, I. Amoussou, B. Khan, C. L. R. Velasco, E. Tanyi, A comprehensive review on techno-economic analysis and optimal sizing of hybrid renewable energy sources with energy storage systems, Energies 16 (2) (2023) 642

work page 2023
[21]

Energy, Homerpro,https://homerenergy.com/homer-pro, accessed: 2026-03-13

H. Energy, Homerpro,https://homerenergy.com/homer-pro, accessed: 2026-03-13

work page 2026
[22]

EMD, Energypro,https://www.emd-international.com/software/ energypro, accessed: 2026-03-13

work page 2026
[23]

Guittet, P

D. Guittet, P. Stanley, B. Hamilton, J. King, A. Barker, HOPP- hybrid optimization and performance platform, Tech. rep., National Renewable Energy Laboratory (NREL), Golden, CO (United States) (2022)

work page 2022
[24]

Gupta, J

M. Gupta, J. P. M. Leon, K. Das, Optimal sizing of hybrid power plants considering multiple electricity market participation, IEEE Transactions on Energy Markets, Policy and Regulation 3 (4) (2025) 498–510.doi:10.1109/TEMPR.2025.3625065

work page doi:10.1109/tempr.2025.3625065 2025
[25]

Achiam, D

J. Achiam, D. Held, A. Tamar, P. Abbeel, Constrained policy opti- mization, in: International Conference on Machine Learning, 2017, pp. 22–31

work page 2017
[26]

A. Ray, J. Achiam, D. Amodei, Benchmarking safe exploration in deep reinforcement learning, arXiv preprint arXiv:1910.01708 7 (1) (2019) 2

work page arXiv 1910
[27]

M. Cauz, A. Bolland, C. Ballif, N. Wyrsch, Reinforcement learning forefficientdesignandcontrolco-optimisationofenergysystems,in: ICML 2024 AI for Science Workshop, 2024, p. 68

work page 2024
[28]

Mantani, H

T. Mantani, H. Hoshino, E. Furutani, Optimal battery sizing for real-time renewableenergy bidding basedon reinforcement learning, IEEETransactionsonEnergyMarkets,PolicyandRegulation(2025) 1–12doi:10.1109/TEMPR.2025.3645733

work page doi:10.1109/tempr.2025.3645733 2025
[29]

Rahimiyan, L

M. Rahimiyan, L. Baringo, Strategic bidding for a virtual power plant in the day-ahead and real-time markets: A price-taker robust optimization approach, IEEE Transactions on Power Systems 31 (4) (2016) 2676–2687.doi:10.1109/TPWRS.2015.2483781

work page doi:10.1109/tpwrs.2015.2483781 2016
[30]

Mehdipourpicha, R

H. Mehdipourpicha, R. Bo, Optimal bidding strategy for physical marketparticipantswithvirtualbiddingcapabilityinday-aheadelec- tricity markets, IEEE Access 9 (2021) 85392–85402.doi:10.1109/ ACCESS.2021.3087728

work page arXiv 2021
[31]

Jeong, S

J. Jeong, S. W. Kim, H. Kim, Deep reinforcement learning based real-timerenewableenergybiddingwithbatterycontrol,IEEETrans. Energy Markets, Policy and Regulation 1 (2) (2023) 85–96

work page 2023
[32]

X. Yang, L. Fan, X. Li, L. Meng, Day-ahead and real-time market biddingandschedulingstrategyforwindpowerparticipationbasedon shared energy storage, Electric Power Systems Research 214 (2023) 108903.doi:https://doi.org/10.1016/j.epsr.2022.108903

work page doi:10.1016/j.epsr.2022.108903 2023
[33]

NERC, Balancing and frequency control reference document, Available online:https://www.nerc.com/globalassets/who-we-are/ standing-committees/rstc/rs/reference_document_nerc_balancing_ and_frequency_control.pdf(Accessed 2026-01-06). (2021)

work page 2026
[34]

A. E. Brooks, B. C. Lesieutre, A review of frequency regulation marketsinthreeUSISO/RTOs,TheElectricityJournal32(10)(2019) Page 16 of 17 106668

work page 2019
[35]

California Public Utilities Commission, Decision D.20-06-031, Available online:https://docs.cpuc.ca.gov/PublishedDocs/ Published/G000/M342/K083/342083913.PDF(accessed 2025-12-28). (2020)

work page 2025
[36]

C. J. Dent, R. Sioshansi, J. Reinhart, A. L. Wilson, S. Zachary, M. Lynch, C. Bothwell, C. Steele, Capacity value of solar power: Report of the ieee pes task force on capacity value of solar power, in: 2016 International Conference on Probabilistic Methods Applied toPowerSystems(PMAPS),2016,pp.1–7.doi:10.1109/PMAPS.2016. 7764197

work page doi:10.1109/pmaps.2016 2016
[37]

Denholm, W

P. Denholm, W. Cole, N. Blair, Moving beyond 4-hour Li-ion batter- ies: Challenges and opportunities for long(er)-duration energy stor- age, Tech. Rep. NREL/TP-6A40-85878, National Renewable Energy Laboratory (NREL), Golden, CO, USA (2023)

work page 2023
[38]

A. T. D. Perera, P. U. Wickramasinghe, V. M. Nik, J.-L. Scartezzini, Machine learning methods to assist energy system optimization, Applied energy 243 (2019) 191–205

work page 2019
[39]

Transportation Elec- trification 8 (1) (2022) 36–47.doi:10.1109/TTE.2021.3074792

J.Li,H.Wang,H.He,Z.Wei,Q.Yang,P.Igic,Batteryoptimalsizing underasynergisticframeworkwithDQN-basedpowermanagements for the fuel cell hybrid powertrain, IEEE Trans. Transportation Elec- trification 8 (1) (2022) 36–47.doi:10.1109/TTE.2021.3074792

work page doi:10.1109/tte.2021.3074792 2022
[40]

H. Kang, S. Jung, H. Kim, J. Hong, J. Jeoung, T. Hong, Multi- objectivesizingandreal-timeschedulingofbatteryenergystoragein energy-sharing community based on reinforcement learning, Renew. and Sust. Energ. Rev. 185 (2023) 113655.doi:https://doi.org/10. 1016/j.rser.2023.113655

work page arXiv 2023
[41]

Y. Pan, Y. Shen, J. Qin, L. Zhang, Deep reinforcement learning for multi-objective optimization in BIM-based green building design, Automation in Construction 166 (2024) 105598.doi:https://doi. org/10.1016/j.autcon.2024.105598

work page doi:10.1016/j.autcon.2024.105598 2024
[42]

H. Zhou, Y. Zhang, L. Yang, Q. Liu, K. Yan, Y. Du, Short-term photovoltaic power forecasting based on long short term memory neural network and attention mechanism, IEEE Access 7 (2019) 78063–78074

work page 2019
[43]

S. Zhou, L. Zhou, M. Mao, H.-M. Tai, Y. Wan, An optimized hetero- geneousstructurelstmnetworkforelectricitypriceforecasting,IEEE Access 7 (2019) 108161–108173.doi:10.1109/ACCESS.2019.2932999

work page doi:10.1109/access.2019.2932999 2019
[44]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971

work page internal anchor Pith review Pith/arXiv arXiv
[45]

S. Han, S. Han, H. Aki, A practical battery wear model for electric vehicle charging applications, Applied Energy 113 (2014) 1100– 1108

work page 2014
[46]

K. Kim, Y. Choi, H. Kim, Data-driven battery degradation model leveraging average degradation function fitting, Electronics Letters 53 (2) (2017) 102–104

work page 2017
[47]

R.S.Suttton,A.G.Barto,ReinforcementLearning:AnIntroduction, 2nd Edition, MIT Press, 2018

work page 2018
[48]

J. Seel, J. M. Kemp, A. Cheyette, D. Millstein, W. Gorman, S. Jeong, D. Robson, R. Setiawan, M. Bolinger, Utility-scale solar, 2024 edi- tion: Empirical trends in deployment, technology, cost, performance, ppa pricing, and value in the united states, Available online:https: //escholarship.org/uc/item/4q73115g(accessed 2026-03-30)

work page 2024
[49]

W.Cole,A.W.Frazier,C.Augustine,Costprojectionsforutility-scale battery storage: 2021 update, Tech. Rep. NREL/TP-6A20-79236, National Renewable Energy Lab.(NREL) (2021)

work page 2021
[50]

California Public Utilities Commission, 2022 resource adequacy report, Available online:https://www.cpuc.ca.gov/ industries-and-topics/electrical-energy/resource-adequacy (accessed 2026-03-30) (2022)

work page 2022
[51]

Distributed Prioritized Experience Replay

D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. Van Hasselt, D. Silver, Distributed prioritized experience replay, arXiv preprint arXiv:1803.00933. Page 17 of 17

work page internal anchor Pith review Pith/arXiv arXiv