pith. machine review for the scientific record. sign in

arxiv: 2605.14043 · v1 · submitted 2026-05-13 · 📡 eess.SY · cs.SY

Recognition: 1 theorem link

· Lean Theorem

Optimal design of solar-battery hybrid resources considering multi-market participation under weather and price uncertainty

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:24 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords solar-battery hybriddeep reinforcement learningmulti-market biddingsystem sizingstochastic optimizationancillary servicesuncertainty modeling
0
0 comments X

The pith

A deep reinforcement learning framework jointly optimizes solar-battery hybrid sizing and multi-market bidding strategies under uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method that treats solar panel and battery capacities as variables inside a reinforcement learning policy rather than fixing them first and optimizing bids later. This unified approach learns how to size the hybrid resource and how to allocate its limited power and energy across energy and ancillary services markets at the same time, while facing stochastic weather and price conditions. Traditional two-step methods separate sizing from operation and often produce designs that underperform when real variability arrives. By keeping everything inside one stochastic learning process, the framework aims to discover capacities that remain profitable across many possible scenarios drawn from historical data.

Core claim

The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation.

What carries the argument

A deep reinforcement learning policy whose action space includes continuous design variables for solar and battery capacities, allowing the agent to learn both resource sizing and market bidding decisions together under uncertainty.

If this is right

  • The learned policy produces hybrid capacities that allocate power and energy across markets in ways that respond to realized conditions rather than to fixed forecasts.
  • Economic assessment of the hybrid resource incorporates the value of flexibility across multiple revenue streams within a single optimization run.
  • Designs remain effective when renewable output and market prices deviate from the scenarios seen during training.
  • The method avoids the need for separate scenario reduction or robust optimization steps before sizing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same embedding technique could be tested on other hybrid combinations such as wind plus storage or solar plus demand response.
  • If policy gradients become unstable for very large capacity ranges, discretization or hierarchical RL might be required to keep learning tractable.
  • Real-time market participation would require extending the state to include live price signals and state-of-charge limits.

Load-bearing premise

Embedding continuous system sizes directly into the reinforcement learning policy produces stable learning and effective designs when uncertainty is present.

What would settle it

Train the policy on one set of historical weather and price traces, then evaluate the resulting fixed sizes and bidding policy on a completely held-out set of traces; if the achieved profit is lower than that of a well-tuned sequential optimization baseline, the joint-optimization claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.14043 by Eiko Furutani, Hikaru Hoshino, Taiyo Mantani.

Figure 1
Figure 1. Figure 1: Comparison of PV-battery coupling architectures 2.1. Background and Definitions PV-battery systems can be deployed under several ar￾chitectural and market-participation configurations. Despite market-specific differences, two key dimensions determine how these systems interact with the grid: • Electrical configuration: How PV and battery are interconnected behind the Point of Interconnection (POI) to the g… view at source ↗
Figure 2
Figure 2. Figure 2: Recovery of clipped energy in hybrid resources First, hybrid architectures enable the recovery of “clipped” PV energy in plants with a high DC/AC ratio, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: illustrates the overall co-optimization frame￾work. Let 𝜔 represent the system design parameter as in Eq. (1), and 𝜋𝜃 the operational policy parameterized by 𝜃. The upper part of the figure shows the operational learn￾ing component, which follows a standard DRL framework except that the design parameter 𝜔 is included in the state and remains fixed within each episode. Given a sampled design 𝜔, the agent in… view at source ↗
Figure 4
Figure 4. Figure 4: Schematic overview of the serial strategy Among three AS bids, the capacity for contingency reserve is first allocated. For simplicity of exposition, we assume that the contingency reserve is provided exclusively by the battery, without loss of generality2 . In this case, the feasible reserve capacity is constrained by the POI capac￾ity in Eq. (2) and the discharge power limit of the battery converter in E… view at source ↗
Figure 5
Figure 5. Figure 5: Progress of episode rewards during training Hybrid Co-located 10000 0 10000 20000 30000 40000 50000 60000 Average Revenue ($) Energy Revenue Ancillary Revenue Capacity Payment Imbalance Penalty Degradation Cost [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: presents the breakdown of the revenue com￾ponents, including energy market revenue (the first term in Eq. (12)), AS revenue, capacity payment, battery degra￾dation cost, and imbalance penalties (the second term in Eq. (12)). The results show that the imbalance penalties are 0 20 40 60 80 100 120 140 160 Time (hours) 20 15 10 5 0 5 10 15 20 Capacity (MW) 20 0 20 40 60 80 100 120 Price($/MWh) b_e Energy pric… view at source ↗
Figure 8
Figure 8. Figure 8: , the breakdown of revenues in [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Breakdown of revenue components of hybrid and co-located resources improvement demonstrates the importance of jointly opti￾mizing system design and operational strategies, as the co￾optimization framework identifies a configuration that more effectively exploits multi-market revenue opportunities. In this baseline case, the optimal design results in a battery with a duration of 2.71 h, which is below the c… view at source ↗
Figure 10
Figure 10. Figure 10: Time-series behaviors under hypothetical scenarios resulting battery duration remains around 2.50 h. This result suggests that, for hybrid resources, the incentive provided by the capacity market reform is not sufficient to promote long-duration storage, even under reduced battery costs, and the economic value of the battery is primarily derived from energy and AS markets. At the same time, the result hig… view at source ↗
Figure 11
Figure 11. Figure 11: Breakdown of revenue components over the year 0 20 40 60 80 100 120 140 160 Time (hours) 30 20 10 0 10 20 30 Capacity (MW) 2022-07-01 to 2022-07-07 0 20 40 60 80 100 Price ($/MWh) b_e Energy price b_res Spinning-Reserve price b_up Reg-Up price b_dn Reg-Down price actual PV power (a) 1st week of July 0 20 40 60 80 100 120 140 160 Time (hours) 30 20 10 0 10 20 30 Capacity (MW) 2022-09-01 to 2022-09-07 0 200… view at source ↗
Figure 12
Figure 12. Figure 12: Representative time series in selected months accounting for operational constraints and stochastic varia￾tions in prices and renewable generation. Numerical results demonstrated that the framework can effectively identify economically rational system configurations and operational policies, highlighting the advantages of hybrid resources over co-located alternatives. Furthermore, the applicability to lon… view at source ↗
read the original abstract

The rapid growth of variable renewable energy has increased the need for flexible and efficiently coordinated energy resources. In this context, hybrid resources that combine renewable generation and battery storage within a single market-participating entity have attracted growing attention. Such hybrid resources can have multiple revenue streams, while allocating limited power and energy capacity across multiple electricity markets including energy and ancillary services. This multi-market coordination increases operational complexity and complicates profitability assessment, making optimal system sizing a challenging design problem. In addition, uncertainty in renewable generation and market prices makes it difficult for conventional optimization approaches to determine system designs that remain effective under stochastic operating conditions. To address these challenges, this paper proposes a deep reinforcement learning-based co-optimization framework for hybrid solar-battery resources. The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation. Case studies using historical renewable generation and market data demonstrate the effectiveness of the proposed framework in identifying economically rational hybrid system design considering multi-market operation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a deep reinforcement learning (DRL) co-optimization framework for solar-battery hybrid resources that embeds system design variables (solar and battery capacities) directly into the policy learning process. This enables joint optimization of static system sizing and dynamic multi-market bidding strategies (energy and ancillary services) within a single stochastic formulation that accounts for uncertainty in renewable generation and market prices. Case studies using historical data are presented to demonstrate the framework's ability to identify economically rational designs.

Significance. If the results hold, the work would offer a practical advance in hybrid resource planning by unifying design and operational decisions under realistic multi-market and stochastic conditions, potentially improving profitability assessments beyond sequential or deterministic methods. The approach addresses a timely problem in variable renewable integration where conventional optimization struggles with joint sizing and bidding.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (framework description): The central claim that embedding continuous design variables into the RL policy enables stable joint optimization lacks any specification of action-space parameterization (e.g., Gaussian policy for continuous sizes vs. discretized), policy network architecture, or variance-reduction techniques. Without these, the formulation risks unstable gradients when mixing static sizing decisions with sequential bidding actions, as highlighted by the stress-test concern; this directly undermines evaluation of the unified stochastic formulation's effectiveness.
  2. [§4] §4 (case studies): No quantitative results, error metrics, baseline comparisons (e.g., against separate sizing-then-bidding optimization or deterministic MILP), or details on uncertainty modeling (scenario generation, probability distributions) are provided. This prevents assessment of whether the learned designs are economically rational or superior under weather/price uncertainty, making the demonstration of effectiveness unverifiable.
minor comments (2)
  1. [§2] Notation for design variables and market bids should be introduced consistently with clear units and bounds in the problem formulation section.
  2. [§4] Figure captions for case-study results should explicitly state the number of scenarios, training episodes, and any hyperparameter settings used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We have revised the manuscript to address both major comments by expanding the methodological specifications in §3 and providing quantitative results, metrics, and comparisons in §4.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (framework description): The central claim that embedding continuous design variables into the RL policy enables stable joint optimization lacks any specification of action-space parameterization (e.g., Gaussian policy for continuous sizes vs. discretized), policy network architecture, or variance-reduction techniques. Without these, the formulation risks unstable gradients when mixing static sizing decisions with sequential bidding actions, as highlighted by the stress-test concern; this directly undermines evaluation of the unified stochastic formulation's effectiveness.

    Authors: We agree that the original description of the DRL implementation was insufficiently detailed. In the revised manuscript we have added a dedicated subsection in §3 that specifies the action-space parameterization (continuous Gaussian policy for the design variables with separate heads for bidding actions), the policy network architecture (MLP with two hidden layers), and the variance-reduction techniques employed (GAE and entropy regularization). These additions directly address the concern about gradient stability when jointly optimizing static and sequential decisions. revision: yes

  2. Referee: [§4] §4 (case studies): No quantitative results, error metrics, baseline comparisons (e.g., against separate sizing-then-bidding optimization or deterministic MILP), or details on uncertainty modeling (scenario generation, probability distributions) are provided. This prevents assessment of whether the learned designs are economically rational or superior under weather/price uncertainty, making the demonstration of effectiveness unverifiable.

    Authors: We acknowledge that the original §4 presented only high-level demonstrations. The revised version expands the case studies with concrete numerical outcomes (optimal capacities and expected profits), error metrics, direct comparisons against a sequential sizing-then-bidding baseline and a deterministic MILP formulation, and explicit uncertainty modeling details (scenario generation from historical data using fitted distributions and Monte Carlo sampling). These additions allow verification of economic rationality under the stochastic conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: novel RL co-optimization framework stands as independent proposal

full rationale

The paper proposes a deep reinforcement learning-based co-optimization framework that embeds system design variables directly into the policy learning process for joint optimization of hybrid system sizing and multi-market bidding. No equations, fitted parameters, or derivations are shown that reduce the claimed joint optimization to a tautology, self-definition, or prior self-citation. The approach is framed as a new unified stochastic formulation supported by case studies on historical data, with no evidence of load-bearing self-citations, ansatz smuggling, or renaming of known results. The central claim remains self-contained and does not collapse by construction to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The framework implicitly assumes that historical renewable and price data are sufficient to train policies that generalize to future conditions.

pith-pipeline@v0.9.0 · 5489 in / 1136 out tokens · 31826 ms · 2026-05-15T05:24:59.078382+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

  1. [1]

    M.Ahlstrom,J.Mays,E.Gimon,A.Gelston,C.Murphy,P.Denholm, G.Nemet,Hybridresources:Challenges,implications,opportunities, andinnovation,IEEEPowerandEnergyMagazine19(6)(2021)37– 44.doi:10.1109/MPE.2021.3104077

  2. [2]

    Olatomiwa, S

    L. Olatomiwa, S. Mekhilef, M. Ismail, M. Moghavvemi, Energy managementstrategiesinhybridrenewableenergysystems:Areview, RenewableandSustainableEnergyReviews62(2016)821–835.doi: https://doi.org/10.1016/j.rser.2016.05.040

  3. [3]

    G. He, Q. Chen, C. Kang, Q. Xia, Optimal offering strategy for con- centrating solar power plants in joint energy, reserve and regulation markets,IEEETransactionsonSustainableEnergy7(3)(2016)1245– 1254.doi:10.1109/TSTE.2016.2533637

  4. [4]

    K. Das, A. L. T. Philippe Grapperon, P. E. Sørensen, A. D. Hansen, Optimal battery operation for revenue maximization of wind-storage hybrid power plant, Electric Power Systems Research 189 (2020) 106631.doi:https://doi.org/10.1016/j.epsr.2020.106631

  5. [5]

    Y.Xie,W.Guo,Q.Wu,K.Wang,RobustMPC-basedbiddingstrategy for wind storage systems in real-time energy and regulation markets, International Journal of Electrical Power & Energy Systems 124 (2021) 106361.doi:https://doi.org/10.1016/j.ijepes.2020.106361

  6. [6]

    W. B. Powell, Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions, John Wiley & Sons, 2022

  7. [7]

    Y. Dong, Z. Dong, T. Zhao, Z. Ding, A strategic day-ahead bidding strategyandoperationforbatteryenergystoragesystembyreinforce- ment learning, Electric power systems research 196 (2021) 107229

  8. [8]

    Anwar, C

    M. Anwar, C. Wang, F. de Nijs, H. Wang, Proximal policy optimiza- tion based reinforcement learning for joint bidding in energy and frequencyregulationmarkets,in:2022IEEEPower&EnergySociety General Meeting (PESGM), 2022, pp. 1–5.doi:10.1109/PESGM48719. 2022.9917082

  9. [9]

    J. Li, C. Wang, Y. Zhang, H. Wang, Temporal-aware deep reinforce- ment learning for energy storage bidding in energy and contingency reserve markets, IEEE Transactions on Energy Markets, Policy and Regulation 2 (3) (2024) 392–406.doi:10.1109/TEMPR.2024.3372656

  10. [10]

    Kortmann, N

    S. Kortmann, N. Zoller, S. Bouchkati, L. Böttcher, A. Ulbig, Re- inforcement learning for optimized multi-use operation of battery energy storage systems, SIGENERGY Energy Informatics Review 5 (3) (2025) 169–178.doi:10.1145/3777518.3777532

  11. [11]

    Huang, J

    B. Huang, J. Wang, Deep-reinforcement-learning-based capacity scheduling for PV-battery storage system, IEEE Trans. Smart Grid 12 (3) (2021) 2272–2283.doi:10.1109/TSG.2020.3047890

  12. [12]

    J. Li, C. Wang, H. Wang, Deep reinforcement learning for wind and energystoragecoordinationinwholesaleenergyandancillaryservice markets, Energy and AI 14 (2023) 100280

  13. [13]

    Cardo-Miota, H

    J. Cardo-Miota, H. Beltran, E. Pérez, S. Khadem, M. Bahloul, Deep reinforcement learning-based strategy for maximizing returns from renewable energy and energy storage systems in multi-electricity markets, Applied Energy 388 (2025) 125561.doi:https://doi.org/ 10.1016/j.apenergy.2025.125561

  14. [14]

    California ISO, Initiative: Hybrid resources, Available online: https://stakeholdercenter.caiso.com/StakeholderInitiatives/ Hybrid-resources(accessed 2025-12-08)

  15. [15]

    MISO, Hybrid resource participation model: Co-located market participation, Available online:https://cdn.misoenergy.org/ 20250821%20MSC%20Item%2009%20Hybrid%20Resource%20Participation% 20Model%20(MSC-2020-2)714029.pdf(accessed 2025-12-08)

  16. [16]

    Kahrl, H

    F. Kahrl, H. Kim, A. D. Mills, R. H. Wiser, C. Crespo Montañés, W. Gorman, Variable renewable energy participation in us ancil- lary services markets: Economic evaluation and key issues, Tech. rep., Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States) (2021)

  17. [17]

    Ericson, S

    S. Ericson, S. Koebrich, S. Awara, A. Schleifer, J. Heeter, K. Cory, C. Murphy, P. Denholm, Influence of hybridization on the capacity valueofpvandbatteryresources,Tech.Rep.NREL/TP-5R00-75864, National Renewable Energy Laboratory (NREL), Golden, CO, USA (2022)

  18. [18]

    Gomes, H

    I. Gomes, H. Pousinho, R. Melício, V. Mendes, Stochastic coordi- nation of joint wind and photovoltaic systems with energy storage in day-aheadmarket,Energy124(2017)310–320.doi:https://doi.org/ 10.1016/j.energy.2017.02.080

  19. [19]

    S.Bhattacharjee,R.Sioshansi,H.Zareipour,Comparingparticipation models in electricity markets for hybrid energy-storage resources, IEEE Transactions on Power Systems 40 (1) (2025) 650–661.doi: 10.1109/TPWRS.2024.3397590

  20. [20]

    T. F. Agajie, A. Ali, A. Fopah-Lele, I. Amoussou, B. Khan, C. L. R. Velasco, E. Tanyi, A comprehensive review on techno-economic analysis and optimal sizing of hybrid renewable energy sources with energy storage systems, Energies 16 (2) (2023) 642

  21. [21]

    Energy, Homerpro,https://homerenergy.com/homer-pro, accessed: 2026-03-13

    H. Energy, Homerpro,https://homerenergy.com/homer-pro, accessed: 2026-03-13

  22. [22]

    EMD, Energypro,https://www.emd-international.com/software/ energypro, accessed: 2026-03-13

  23. [23]

    Guittet, P

    D. Guittet, P. Stanley, B. Hamilton, J. King, A. Barker, HOPP- hybrid optimization and performance platform, Tech. rep., National Renewable Energy Laboratory (NREL), Golden, CO (United States) (2022)

  24. [24]

    Gupta, J

    M. Gupta, J. P. M. Leon, K. Das, Optimal sizing of hybrid power plants considering multiple electricity market participation, IEEE Transactions on Energy Markets, Policy and Regulation 3 (4) (2025) 498–510.doi:10.1109/TEMPR.2025.3625065

  25. [25]

    Achiam, D

    J. Achiam, D. Held, A. Tamar, P. Abbeel, Constrained policy opti- mization, in: International Conference on Machine Learning, 2017, pp. 22–31

  26. [26]

    A. Ray, J. Achiam, D. Amodei, Benchmarking safe exploration in deep reinforcement learning, arXiv preprint arXiv:1910.01708 7 (1) (2019) 2

  27. [27]

    M. Cauz, A. Bolland, C. Ballif, N. Wyrsch, Reinforcement learning forefficientdesignandcontrolco-optimisationofenergysystems,in: ICML 2024 AI for Science Workshop, 2024, p. 68

  28. [28]

    Mantani, H

    T. Mantani, H. Hoshino, E. Furutani, Optimal battery sizing for real-time renewableenergy bidding basedon reinforcement learning, IEEETransactionsonEnergyMarkets,PolicyandRegulation(2025) 1–12doi:10.1109/TEMPR.2025.3645733

  29. [29]

    Rahimiyan, L

    M. Rahimiyan, L. Baringo, Strategic bidding for a virtual power plant in the day-ahead and real-time markets: A price-taker robust optimization approach, IEEE Transactions on Power Systems 31 (4) (2016) 2676–2687.doi:10.1109/TPWRS.2015.2483781

  30. [30]

    Mehdipourpicha, R

    H. Mehdipourpicha, R. Bo, Optimal bidding strategy for physical marketparticipantswithvirtualbiddingcapabilityinday-aheadelec- tricity markets, IEEE Access 9 (2021) 85392–85402.doi:10.1109/ ACCESS.2021.3087728

  31. [31]

    Jeong, S

    J. Jeong, S. W. Kim, H. Kim, Deep reinforcement learning based real-timerenewableenergybiddingwithbatterycontrol,IEEETrans. Energy Markets, Policy and Regulation 1 (2) (2023) 85–96

  32. [32]

    X. Yang, L. Fan, X. Li, L. Meng, Day-ahead and real-time market biddingandschedulingstrategyforwindpowerparticipationbasedon shared energy storage, Electric Power Systems Research 214 (2023) 108903.doi:https://doi.org/10.1016/j.epsr.2022.108903

  33. [33]

    NERC, Balancing and frequency control reference document, Available online:https://www.nerc.com/globalassets/who-we-are/ standing-committees/rstc/rs/reference_document_nerc_balancing_ and_frequency_control.pdf(Accessed 2026-01-06). (2021)

  34. [34]

    A. E. Brooks, B. C. Lesieutre, A review of frequency regulation marketsinthreeUSISO/RTOs,TheElectricityJournal32(10)(2019) Page 16 of 17 106668

  35. [35]

    California Public Utilities Commission, Decision D.20-06-031, Available online:https://docs.cpuc.ca.gov/PublishedDocs/ Published/G000/M342/K083/342083913.PDF(accessed 2025-12-28). (2020)

  36. [36]

    C. J. Dent, R. Sioshansi, J. Reinhart, A. L. Wilson, S. Zachary, M. Lynch, C. Bothwell, C. Steele, Capacity value of solar power: Report of the ieee pes task force on capacity value of solar power, in: 2016 International Conference on Probabilistic Methods Applied toPowerSystems(PMAPS),2016,pp.1–7.doi:10.1109/PMAPS.2016. 7764197

  37. [37]

    Denholm, W

    P. Denholm, W. Cole, N. Blair, Moving beyond 4-hour Li-ion batter- ies: Challenges and opportunities for long(er)-duration energy stor- age, Tech. Rep. NREL/TP-6A40-85878, National Renewable Energy Laboratory (NREL), Golden, CO, USA (2023)

  38. [38]

    A. T. D. Perera, P. U. Wickramasinghe, V. M. Nik, J.-L. Scartezzini, Machine learning methods to assist energy system optimization, Applied energy 243 (2019) 191–205

  39. [39]

    Transportation Elec- trification 8 (1) (2022) 36–47.doi:10.1109/TTE.2021.3074792

    J.Li,H.Wang,H.He,Z.Wei,Q.Yang,P.Igic,Batteryoptimalsizing underasynergisticframeworkwithDQN-basedpowermanagements for the fuel cell hybrid powertrain, IEEE Trans. Transportation Elec- trification 8 (1) (2022) 36–47.doi:10.1109/TTE.2021.3074792

  40. [40]

    H. Kang, S. Jung, H. Kim, J. Hong, J. Jeoung, T. Hong, Multi- objectivesizingandreal-timeschedulingofbatteryenergystoragein energy-sharing community based on reinforcement learning, Renew. and Sust. Energ. Rev. 185 (2023) 113655.doi:https://doi.org/10. 1016/j.rser.2023.113655

  41. [41]

    Y. Pan, Y. Shen, J. Qin, L. Zhang, Deep reinforcement learning for multi-objective optimization in BIM-based green building design, Automation in Construction 166 (2024) 105598.doi:https://doi. org/10.1016/j.autcon.2024.105598

  42. [42]

    H. Zhou, Y. Zhang, L. Yang, Q. Liu, K. Yan, Y. Du, Short-term photovoltaic power forecasting based on long short term memory neural network and attention mechanism, IEEE Access 7 (2019) 78063–78074

  43. [43]

    S. Zhou, L. Zhou, M. Mao, H.-M. Tai, Y. Wan, An optimized hetero- geneousstructurelstmnetworkforelectricitypriceforecasting,IEEE Access 7 (2019) 108161–108173.doi:10.1109/ACCESS.2019.2932999

  44. [44]

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971

  45. [45]

    S. Han, S. Han, H. Aki, A practical battery wear model for electric vehicle charging applications, Applied Energy 113 (2014) 1100– 1108

  46. [46]

    K. Kim, Y. Choi, H. Kim, Data-driven battery degradation model leveraging average degradation function fitting, Electronics Letters 53 (2) (2017) 102–104

  47. [47]

    R.S.Suttton,A.G.Barto,ReinforcementLearning:AnIntroduction, 2nd Edition, MIT Press, 2018

  48. [48]

    J. Seel, J. M. Kemp, A. Cheyette, D. Millstein, W. Gorman, S. Jeong, D. Robson, R. Setiawan, M. Bolinger, Utility-scale solar, 2024 edi- tion: Empirical trends in deployment, technology, cost, performance, ppa pricing, and value in the united states, Available online:https: //escholarship.org/uc/item/4q73115g(accessed 2026-03-30)

  49. [49]

    W.Cole,A.W.Frazier,C.Augustine,Costprojectionsforutility-scale battery storage: 2021 update, Tech. Rep. NREL/TP-6A20-79236, National Renewable Energy Lab.(NREL) (2021)

  50. [50]

    California Public Utilities Commission, 2022 resource adequacy report, Available online:https://www.cpuc.ca.gov/ industries-and-topics/electrical-energy/resource-adequacy (accessed 2026-03-30) (2022)

  51. [51]

    Distributed Prioritized Experience Replay

    D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. Van Hasselt, D. Silver, Distributed prioritized experience replay, arXiv preprint arXiv:1803.00933. Page 17 of 17