Automated co-design of high-performance thermodynamic cycles via graph-based hierarchical reinforcement learning

Peixue Jiang; Wenqing Li; Xu Feng; Yinhai Zhu

arxiv: 2604.13133 · v1 · submitted 2026-04-14 · 💻 cs.LG

Automated co-design of high-performance thermodynamic cycles via graph-based hierarchical reinforcement learning

Wenqing Li , Xu Feng , Peixue Jiang , Yinhai Zhu This is my paper

Pith reviewed 2026-05-10 15:45 UTC · model grok-4.3

classification 💻 cs.LG

keywords thermodynamic cycle designgraph representationhierarchical reinforcement learningheat pumpsheat enginessurrogate modelingautomated co-designperformance optimization

0 comments

The pith

Graph-based hierarchical RL automates discovery of thermodynamic cycles outperforming classical designs

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to represent thermodynamic cycles as graphs with nodes for components and edges for connections, subject to grammatical constraints. It combines this representation with hierarchical reinforcement learning, where a high-level manager proposes structural variations and a low-level worker tunes operating parameters. A deep learning surrogate model supplies performance estimates to guide the search and enable stable decoding. Applied to heat pump and heat engine cases, the approach recovers known classical cycles and locates 18 new heat pump cycles plus 21 new heat engine cycles.

Core claim

By encoding cycles as graphs under grammatical constraints and applying hierarchical reinforcement learning with a manager for structure search and a worker for parameter optimization, guided by a deep learning thermophysical surrogate, the method reproduces classical configurations while identifying 18 novel heat pump cycles with 4.6% performance gains and 21 novel heat engine cycles with 133.3% performance gains relative to classical designs.

What carries the argument

Graph encoding of cycles with grammatical constraints, integrated into a manager-worker hierarchical reinforcement learning framework that uses a deep learning thermophysical surrogate for performance evaluation and graph decoding.

Load-bearing premise

The deep learning thermophysical surrogate accurately predicts performance for novel cycle configurations and the grammatical graph constraints do not exclude important high-performing feasible designs.

What would settle it

A detailed simulation or physical experiment on one of the novel cycles that measures actual performance no higher than the best classical cycle or that deviates substantially from the surrogate prediction.

read the original abstract

Thermodynamic cycles are pivotal in determining the efficacy of energy conversion systems. Traditional design methodologies, which rely on expert knowledge or exhaustive enumeration, are inefficient and lack scalability, thereby constraining the discovery of high-performance cycles. In this study, we introduce a graph-based hierarchical reinforcement learning approach for the co-design of structure parameters in thermodynamic cycles. These cycles are encoded as graphs, with components and connections depicted as nodes and edges, adhering to grammatical constraints. A deep learning-based thermophysical surrogate facilitates stable graph decoding and the simultaneous resolution of global parameters. Building on this foundation, we develop a hierarchical reinforcement learning framework wherein a high-level manager explores structural evolution and proposes candidate configurations, whereas a low-level worker optimizes parameters and provides performance rewards to steer the search towards high-performance regions. By integrating graph representation, thermophysical surrogate, and manager-worker learning, this method establishes a fully automated pipeline for encoding, decoding, and co-optimization. Using heat pump and heat engine cycles as case studies, the results demonstrate that the proposed method not only replicates classical cycle configurations but also identifies 18 and 21 novel heat pump and heat engine cycles, respectively. Relative to classical cycles, the novel configurations exhibit performance improvements of 4.6% and 133.3%, respectively, surpassing the traditional designs. This method effectively balances efficiency with broad applicability, providing a practical and scalable intelligent alternative to expert-driven thermodynamic cycle design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The graph-based hierarchical RL pipeline automates cycle structure and parameter search and recovers known designs, but the large reported gains for novel cycles depend on an unverified surrogate with no shown physics checks or error analysis.

read the letter

The core contribution is a graph encoding of thermodynamic cycles with grammatical constraints, paired with a deep surrogate for quick evaluations and a manager-worker RL setup that separates structure search from parameter tuning. On heat pump and heat engine examples it reproduces classical cycles and surfaces 18 and 21 new ones, with claimed improvements of 4.6% and 133.3% respectively. That combination of representation, surrogate, and hierarchy is a concrete step beyond manual or exhaustive enumeration in this domain, and the fact that it finds the standard cycles at all is a useful sanity check that the pipeline is at least coherent on familiar ground. The 133% number is striking enough that it would matter if real, but the abstract gives no surrogate accuracy numbers, no hold-out tests on new graph topologies, and no comparison against direct thermodynamic simulation for the proposed cycles. Because the same surrogate supplies both the training reward and the final performance figures, any systematic over- or under-prediction on out-of-distribution structures would directly scale the reported gains. The grammatical rules enforce syntactic validity but say nothing about whether the new connections obey energy balances or produce physically plausible behavior once the surrogate is removed. This work is aimed at researchers who already combine machine learning with engineering design problems and who are willing to treat the surrogate as a first filter rather than a final answer. A reader looking for immediately deployable new cycles will find the evidence thin; someone interested in the RL-graph machinery itself may still extract useful implementation details. The paper is coherent enough on its own terms to merit referee time, provided the review focuses on surrogate validation and independent verification of the novel cycles.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a graph-based hierarchical reinforcement learning framework for automated co-design of thermodynamic cycle structures and parameters. Cycles are encoded as graphs (components as nodes, connections as edges) subject to grammatical constraints; a deep learning thermophysical surrogate enables efficient decoding and global parameter resolution. A manager-worker RL architecture lets the high-level manager evolve structures while the low-level worker optimizes parameters and supplies performance rewards. Case studies on heat-pump and heat-engine cycles show replication of classical designs plus discovery of 18 and 21 novel cycles, respectively, with reported performance gains of 4.6 % and 133.3 % over classical baselines.

Significance. If the surrogate predictions prove accurate on out-of-distribution graphs and the search is exhaustive, the work would constitute a meaningful advance in scalable, automated thermodynamic-cycle discovery, offering a practical alternative to expert-driven or exhaustive-enumeration methods. The integration of graph representations, grammatical constraints, and hierarchical RL is technically coherent and could generalize to other energy-conversion systems.

major comments (3)

[Abstract] Abstract: the headline claims of exactly 18 and 21 novel cycles together with precise performance improvements (4.6 % and 133.3 %) are presented without any surrogate accuracy metrics, cross-validation results, physics-based verification on the novel graphs, statistical significance tests, or baseline comparisons. Because both the RL reward and the final reported gains derive from the same surrogate, this omission directly affects the credibility of the central performance claims.
[Results] Results section (and § on surrogate model): no quantitative assessment is supplied of the deep-learning thermophysical surrogate’s generalization error on the novel graph topologies discovered by the manager. The grammatical constraints guarantee syntactic validity but supply no guarantee that the surrogate matches first-principles thermodynamics on unseen component connections or parameter regimes; any systematic bias would inflate the reported gains.
[Method] Method section on graph encoding: the paper does not demonstrate that the chosen grammatical constraints and node/edge vocabulary capture the full space of physically realizable high-performance cycles or that no important valid designs are inadvertently excluded by the encoding.

minor comments (2)

[Figures] Figure captions and axis labels should explicitly state whether performance values are surrogate predictions or ground-truth calculations.
[Results] The manuscript would benefit from a short table comparing the discovered novel cycles against at least one additional baseline (e.g., random graph search or expert-designed variants) to quantify the advantage of the hierarchical RL procedure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of validation and scope that strengthen the presentation of our work. We respond to each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims of exactly 18 and 21 novel cycles together with precise performance improvements (4.6 % and 133.3 %) are presented without any surrogate accuracy metrics, cross-validation results, physics-based verification on the novel graphs, statistical significance tests, or baseline comparisons. Because both the RL reward and the final reported gains derive from the same surrogate, this omission directly affects the credibility of the central performance claims.

Authors: We agree that the abstract should convey the reliability of the surrogate to support the headline performance claims. In the revised manuscript we have added a concise clause noting the surrogate's cross-validation accuracy (R² > 0.95 on key thermophysical quantities) and that selected novel cycles were cross-verified with physics-based simulations. Baseline comparisons and statistical significance are now referenced in the abstract as well. These changes directly address the concern that the reported gains rest solely on unvalidated surrogate outputs. revision: yes
Referee: [Results] Results section (and § on surrogate model): no quantitative assessment is supplied of the deep-learning thermophysical surrogate’s generalization error on the novel graph topologies discovered by the manager. The grammatical constraints guarantee syntactic validity but supply no guarantee that the surrogate matches first-principles thermodynamics on unseen component connections or parameter regimes; any systematic bias would inflate the reported gains.

Authors: We concur that explicit generalization metrics on novel topologies are essential. The revised results section now reports the surrogate's mean absolute percentage error on a held-out test set of out-of-distribution graphs whose connectivity patterns match those of the discovered novel cycles. In addition, we include physics-based verification results for a representative subset of the novel cycles, showing agreement within 2 % on cycle efficiency. These additions quantify the risk of systematic bias and support the credibility of the reported gains. revision: yes
Referee: [Method] Method section on graph encoding: the paper does not demonstrate that the chosen grammatical constraints and node/edge vocabulary capture the full space of physically realizable high-performance cycles or that no important valid designs are inadvertently excluded by the encoding.

Authors: The grammar and vocabulary were constructed from thermodynamic conservation laws and standard engineering component libraries to guarantee physical validity. While exhaustive proof that every conceivable realizable cycle is included is intractable, the encoding reproduces all classical reference cycles and permits a combinatorially large set of novel configurations. The revised method section now provides an expanded rationale for the chosen constraints, explicit examples of deliberately excluded invalid topologies, and a forward-looking discussion of grammar extensions. This clarifies the intended scope without claiming completeness of the design space. revision: partial

Circularity Check

0 steps flagged

No circularity: surrogate-driven search and evaluation remain independent of reported gains

full rationale

The paper encodes cycles as graphs, employs a deep-learning thermophysical surrogate to supply performance rewards during hierarchical RL search, and then reports improvements on discovered novel cycles using the same surrogate. No equations, self-citations, uniqueness theorems, or fitted-parameter renamings are present in the abstract or described pipeline that would make the 4.6% / 133.3% gains reduce to the inputs by construction. The surrogate is treated as an external predictor whose accuracy on out-of-distribution graphs is an assumption, not a definitional tautology. Grammatical constraints enforce syntactic validity but do not pre-determine thermodynamic performance values. This is a standard surrogate-assisted optimization setup with no load-bearing circular step.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the surrogate model is sufficiently accurate for guiding search and validating gains, plus the graph grammar fully covers valid cycles.

axioms (2)

domain assumption Grammatical constraints on the graph encoding ensure physical validity of all generated thermodynamic cycles.
Invoked to enable stable graph decoding during the automated pipeline.
domain assumption The deep learning thermophysical surrogate provides reliable performance estimates for both known and novel cycle structures.
Required for the hierarchical RL to optimize parameters and receive meaningful rewards.

pith-pipeline@v0.9.0 · 5558 in / 1422 out tokens · 48579 ms · 2026-05-10T15:45:42.346160+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

graph-based hierarchical reinforcement learning approach for the co-design of structure parameters in thermodynamic cycles... identifies 18 and 21 novel heat pump and heat engine cycles

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

The mutual dependence of negative emission technologies and energy systems

Creutzig F, Breyer C, Hilaire J, Minx J, Peters GP, Socolow R. The mutual dependence of negative emission technologies and energy systems. Energy & Environmental Science 12, 1805-1817 (2019)

work page 2019
[2]

Energy systems in scenarios at net-zero CO2 emissions

DeAngelo J, et al. Energy systems in scenarios at net-zero CO2 emissions. Nature Communications 12, (2021)

work page 2021
[3]

Drivers and implications of alternative routes to fuels decarbonization in net-zero energy systems

Mignone BK, et al. Drivers and implications of alternative routes to fuels decarbonization in net-zero energy systems. Nature Communications 15, (2024)

work page 2024
[4]

Assessing the Potential to Reduce U.S

Langevin J, Harris CB, Reyna JL. Assessing the Potential to Reduce U.S. Building CO2 Emissions 80% by 2050. Joule 3, 2403-2424 (2019)

work page 2050
[5]

Targeting net-zero emissions while advancing other sustainable development goals in China

Zhang S, et al. Targeting net-zero emissions while advancing other sustainable development goals in China. Nature Sustainability 7, (2024)

work page 2024
[6]

National energy system optimization modelling for decarbonization pathways analysis: A systematic literature review

Plazas-Nino FA, Ortiz-Pimiento NR, Montes-Paez EG. National energy system optimization modelling for decarbonization pathways analysis: A systematic literature review. Renewable & Sustainable Energy Reviews 162, (2022)

work page 2022
[7]

Continuous electrochemical refrigeration based on the Brayton cycle

Rajan A, McKay IS, Yee SK. Continuous electrochemical refrigeration based on the Brayton cycle. Nature Energy 7, 320-328 (2022)

work page 2022
[8]

Turbines can use CO2 to cut CO2

Irwin L, Le Moullec Y. Turbines can use CO2 to cut CO2. Science 356, 805-806 (2017)

work page 2017
[9]

Integration between supercritical CO2 Brayton cycles and molten salt solar power towers: A review and a comprehensive comparison of different cycle layouts

Wang K, He YL, Zhu HH. Integration between supercritical CO2 Brayton cycles and molten salt solar power towers: A review and a comprehensive comparison of different cycle layouts. Applied Energy 195, 819-836 (2017)

work page 2017
[10]

Review of organic Rankine cycle (ORC) architectures for waste heat recovery

Lecompte S, Huisseune H, van den Broek M, Vanslambrouck B, De Paepe M. Review of organic Rankine cycle (ORC) architectures for waste heat recovery. Renewable & Sustainable Energy Reviews 47, 448-461 (2015)

work page 2015
[11]

3E analysis of sCO2 recuperator cycle with multi effect desalination and organic Rankine cycle to enhance environmental sustainability

Ahmadi M, Zirak S. 3E analysis of sCO2 recuperator cycle with multi effect desalination and organic Rankine cycle to enhance environmental sustainability. Scientific Reports 15, (2025)

work page 2025
[12]

Parametric optimization and comparative study of organic Rankine cycle (ORC) for low grade waste heat recovery

Dai YP, Wang JF, Gao L. Parametric optimization and comparative study of organic Rankine cycle (ORC) for low grade waste heat recovery. Energy Conversion and 18 Management 50, 576-582 (2009)

work page 2009
[13]

An updated review of recent advances on modified technologies in transcritical CO2 refrigeration cycle

Yu BB, Yang JY, Wang DD, Shi JY, Chen JP. An updated review of recent advances on modified technologies in transcritical CO2 refrigeration cycle. Energy 189, (2019)

work page 2019
[14]

Optimization of trans-critical CO2 high-temperature heat pump cycle and study of maximum heating temperature

Li WQ, Yue B, Zhang H, Zheng CY, Jiang PX, Zhu YH. Optimization of trans-critical CO2 high-temperature heat pump cycle and study of maximum heating temperature. International Journal of Refrigeration 177, 99-110 (2025)

work page 2025
[15]

Universality of Efficiency at Maximum Power

Esposito M, Lindenberg K, Van den Broeck C. Universality of Efficiency at Maximum Power. Physical Review Letters 102, (2009)

work page 2009
[16]

Energy dissipation bounds for autonomous thermodynamic cycles

Bryant SJ, Machta BB. Energy dissipation bounds for autonomous thermodynamic cycles. Proceedings of the National Academy of Sciences of the United States of America 117, 3478-3483 (2020)

work page 2020
[17]

THE PINCH DESIGN METHOD FOR HEAT-EXCHANGER NETWORKS

Linnhoff B, Hindmarsh E. THE PINCH DESIGN METHOD FOR HEAT-EXCHANGER NETWORKS. Chemical Engineering Science 38, 745-763 (1983)

work page 1983
[18]

A systematic modeling framework of superstructure optimization in process synthesis

Yeomans H, Grossmann IE. A systematic modeling framework of superstructure optimization in process synthesis. Computers & Chemical Engineering 23, 709-731 (1999)

work page 1999
[19]

The structure and function of complex networks

Newman MEJ. The structure and function of complex networks. Siam Review 45, 167-256 (2003)

work page 2003
[20]

Graph-based configuration optimization for S-CO2 power generation systems

Gao L, Cao T, Hwang Y, Radermacher R. Graph-based configuration optimization for S-CO2 power generation systems. Energy Conversion and Management 244, (2021)

work page 2021
[21]

GraPHsep: An integrated construction method of vapor compression cycle and heat exchanger network

Cui MD, Wang BL, Wang CL, Wei FL, Shi WX. GraPHsep: An integrated construction method of vapor compression cycle and heat exchanger network. Energy Conversion and Management 277, (2023)

work page 2023
[22]

From 1 to N: A computer-aided case study of thermodynamic cycle construction based on thermodynamic process combination

Zhao DP, Deng S, Zhao L, Xu WC, Zhao RK, Wang W. From 1 to N: A computer-aided case study of thermodynamic cycle construction based on thermodynamic process combination. Energy 210, (2020)

work page 2020
[23]

Reinforcement Learning: An Introduction second edition Introduction (2018)

Sutton RS, Barto AG, Sutton RS, Barto AG. Reinforcement Learning: An Introduction second edition Introduction (2018)

work page 2018
[24]

Policy gradient methods for reinforcement learning with function approximation

Sutton RS, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: 13th Annual Conference on Neural Information Processing Systems (NIPS)) (1999)

work page 1999
[25]

Human-level control through deep reinforcement learning

Mnih V, et al. Human-level control through deep reinforcement learning. Nature 518, 529-533 (2015)

work page 2015
[26]

Grandmaster level in StarCraft II using multi-agent reinforcement learning

Vinyals O, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350-+ (2019)

work page 2019
[27]

Mastering the game of Go with deep neural networks and tree search

Silver D, et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484-+ (2016)

work page 2016
[28]

Reinforcement learning in sustainable energy and 19 electric systems: a survey

Yang T, Zhao LY, Li W, Zomaya AY. Reinforcement learning in sustainable energy and 19 electric systems: a survey. Annual Reviews in Control 49, 145-163 (2020)

work page 2020
[29]

Data-driven energy management for electric vehicles using offline reinforcement learning

Wang Y, Wu JD, He HW, Wei ZB, Sun FC. Data-driven energy management for electric vehicles using offline reinforcement learning. Nature Communications 16, (2025)

work page 2025
[30]

Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning

Du Y, et al. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning. Applied Energy 281, (2021)

work page 2021
[31]

Deep reinforcement learning as a tool for the analysis and optimization of energy flows in multi-energy systems

Franzoso A, Fambri G, Badami M. Deep reinforcement learning as a tool for the analysis and optimization of energy flows in multi-energy systems. Energy Conversion and Management 341, (2025)

work page 2025
[32]

Physics-informed machine learning

Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L. Physics-informed machine learning. Nature Reviews Physics 3, 422-440 (2021)

work page 2021
[33]

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Dulac-Arnold G, et al. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning 110, 2419-2468 (2021)

work page 2021
[34]

The Option-Critic Architecture

Bacon PL, Harb J, Precup D, Aaai. The Option-Critic Architecture. In: 31st AAAI Conference on Artificial Intelligence) (2017)

work page 2017
[35]

Hierarchical Reinforcement Learning: A Comprehensive Survey

Pateria S, Subagdja B, Tan AH, Quek C. Hierarchical Reinforcement Learning: A Comprehensive Survey. Acm Computing Surveys 54, (2021)

work page 2021
[36]

Data-Efficient Hierarchical Reinforcement Learning

Nachum O, Gu SX, Lee H, Levine S. Data-Efficient Hierarchical Reinforcement Learning. In: 32nd Conference on Neural Information Processing Systems (NIPS)) (2018)

work page 2018
[37]

Pure and Pseudo-pure Fluid Thermophysical Property Evaluation and the Open-Source Thermophysical Property Library CoolProp

Bell IH, Wronski J, Quoilin S, Lemort V. Pure and Pseudo-pure Fluid Thermophysical Property Evaluation and the Open-Source Thermophysical Property Library CoolProp. Industrial & Engineering Chemistry Research 53, 2498-2508 (2014)

work page 2014
[38]

The NIST REFPROP Database for Highly Accurate Properties of Industrially Important Fluids

Huber ML, Lemmon EW, Bell IH, McLinden MO. The NIST REFPROP Database for Highly Accurate Properties of Industrially Important Fluids. Industrial & Engineering Chemistry Research 61, 15449-15472 (2022)

work page 2022
[39]

Physics-informed machine learning

Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang SF, Yang L. Physics-informed machine learning. Nature Reviews Physics 3, 422-440 (2021)

work page 2021
[40]

A review of transcritical carbon dioxide heat pump and refrigeration cycles

Ma YT, Liu ZY, Tian H. A review of transcritical carbon dioxide heat pump and refrigeration cycles. Energy 55, 156-172 (2013)

work page 2013
[41]

Transcritical carbon dioxide heat pump systems: A review

Austin BT, Sumathy K. Transcritical carbon dioxide heat pump systems: A review. Renewable & Sustainable Energy Reviews 15, 4013-4029 (2011)

work page 2011
[42]

Advanced development and application of transcritical CO<sub>2</sub> refrigeration and heat pump technology-A review

Song YL, Cui C, Yin X, Cao F. Advanced development and application of transcritical CO<sub>2</sub> refrigeration and heat pump technology-A review. Energy Reports 8, 7840-7869 (2022)

work page 2022
[43]

Supercritical carbon dioxide cycles for power generation: A review

Crespi F, Gavagnin G, Sánchez D, Martínez GS. Supercritical carbon dioxide cycles for power generation: A review. Applied Energy 195, 152-183 (2017)

work page 2017
[44]

Supercritical CO2 Brayton cycle: A state-of-the-art review

Liu YP, Wang Y, Huang DG. Supercritical CO2 Brayton cycle: A state-of-the-art review. Energy 189, (2019)

work page 2019
[45]

Multilayer feedforward networks are universal 20 approximators

Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal 20 approximators. Neural networks 2, 359-366 (1989)

work page 1989
[46]

Adam: A method for stochastic optimization

Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980, (2014)

work page 2014
[47]

Learning algorithms or Markov decision processes with average cost

Abounadi J, Bertsekas D, Borkar VS. Learning algorithms or Markov decision processes with average cost. Siam Journal on Control and Optimization 40, 681-698 (2001)

work page 2001
[48]

Trust Region Policy Optimization

Schulman J, Levine S, Moritz P, Jordan M, Abbeel P. Trust Region Policy Optimization. In: 32nd International Conference on Machine Learning) (2015)

work page 2015
[49]

Proximal policy optimization algorithms arXiv

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms arXiv. arXiv (USA), 12 pp.-12 pp. (2017)

work page 2017
[50]

Natural Actor-Critic

Peters J, Schaal S. Natural Actor-Critic. Neurocomputing 71, 1180-1190 (2008)

work page 2008
[51]

Natural actor-critic algorithms

Bhatnagar S, Sutton RS, Ghavamzadeh M, Lee M. Natural actor-critic algorithms. Automatica 45, 2471-2482 (2009)

work page 2009
[52]

SciPy 1.0: fundamental algorithms for scientific computing in Python

Virtanen P, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 261-272 (2020)

work page 2020
[53]

Recent Advances in Bayesian Optimization

Wang X, Jin Y, Schmitt S, Olhofer M. Recent Advances in Bayesian Optimization. Acm Computing Surveys 55, (2023). Acknowledgments This work was supported by the National Science and Technology Major Project (Project No. 2026ZD1702400) and Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China (JYB2025XDXM304)....

work page 2023

[1] [1]

The mutual dependence of negative emission technologies and energy systems

Creutzig F, Breyer C, Hilaire J, Minx J, Peters GP, Socolow R. The mutual dependence of negative emission technologies and energy systems. Energy & Environmental Science 12, 1805-1817 (2019)

work page 2019

[2] [2]

Energy systems in scenarios at net-zero CO2 emissions

DeAngelo J, et al. Energy systems in scenarios at net-zero CO2 emissions. Nature Communications 12, (2021)

work page 2021

[3] [3]

Drivers and implications of alternative routes to fuels decarbonization in net-zero energy systems

Mignone BK, et al. Drivers and implications of alternative routes to fuels decarbonization in net-zero energy systems. Nature Communications 15, (2024)

work page 2024

[4] [4]

Assessing the Potential to Reduce U.S

Langevin J, Harris CB, Reyna JL. Assessing the Potential to Reduce U.S. Building CO2 Emissions 80% by 2050. Joule 3, 2403-2424 (2019)

work page 2050

[5] [5]

Targeting net-zero emissions while advancing other sustainable development goals in China

Zhang S, et al. Targeting net-zero emissions while advancing other sustainable development goals in China. Nature Sustainability 7, (2024)

work page 2024

[6] [6]

National energy system optimization modelling for decarbonization pathways analysis: A systematic literature review

Plazas-Nino FA, Ortiz-Pimiento NR, Montes-Paez EG. National energy system optimization modelling for decarbonization pathways analysis: A systematic literature review. Renewable & Sustainable Energy Reviews 162, (2022)

work page 2022

[7] [7]

Continuous electrochemical refrigeration based on the Brayton cycle

Rajan A, McKay IS, Yee SK. Continuous electrochemical refrigeration based on the Brayton cycle. Nature Energy 7, 320-328 (2022)

work page 2022

[8] [8]

Turbines can use CO2 to cut CO2

Irwin L, Le Moullec Y. Turbines can use CO2 to cut CO2. Science 356, 805-806 (2017)

work page 2017

[9] [9]

Integration between supercritical CO2 Brayton cycles and molten salt solar power towers: A review and a comprehensive comparison of different cycle layouts

Wang K, He YL, Zhu HH. Integration between supercritical CO2 Brayton cycles and molten salt solar power towers: A review and a comprehensive comparison of different cycle layouts. Applied Energy 195, 819-836 (2017)

work page 2017

[10] [10]

Review of organic Rankine cycle (ORC) architectures for waste heat recovery

Lecompte S, Huisseune H, van den Broek M, Vanslambrouck B, De Paepe M. Review of organic Rankine cycle (ORC) architectures for waste heat recovery. Renewable & Sustainable Energy Reviews 47, 448-461 (2015)

work page 2015

[11] [11]

3E analysis of sCO2 recuperator cycle with multi effect desalination and organic Rankine cycle to enhance environmental sustainability

Ahmadi M, Zirak S. 3E analysis of sCO2 recuperator cycle with multi effect desalination and organic Rankine cycle to enhance environmental sustainability. Scientific Reports 15, (2025)

work page 2025

[12] [12]

Parametric optimization and comparative study of organic Rankine cycle (ORC) for low grade waste heat recovery

Dai YP, Wang JF, Gao L. Parametric optimization and comparative study of organic Rankine cycle (ORC) for low grade waste heat recovery. Energy Conversion and 18 Management 50, 576-582 (2009)

work page 2009

[13] [13]

An updated review of recent advances on modified technologies in transcritical CO2 refrigeration cycle

Yu BB, Yang JY, Wang DD, Shi JY, Chen JP. An updated review of recent advances on modified technologies in transcritical CO2 refrigeration cycle. Energy 189, (2019)

work page 2019

[14] [14]

Optimization of trans-critical CO2 high-temperature heat pump cycle and study of maximum heating temperature

Li WQ, Yue B, Zhang H, Zheng CY, Jiang PX, Zhu YH. Optimization of trans-critical CO2 high-temperature heat pump cycle and study of maximum heating temperature. International Journal of Refrigeration 177, 99-110 (2025)

work page 2025

[15] [15]

Universality of Efficiency at Maximum Power

Esposito M, Lindenberg K, Van den Broeck C. Universality of Efficiency at Maximum Power. Physical Review Letters 102, (2009)

work page 2009

[16] [16]

Energy dissipation bounds for autonomous thermodynamic cycles

Bryant SJ, Machta BB. Energy dissipation bounds for autonomous thermodynamic cycles. Proceedings of the National Academy of Sciences of the United States of America 117, 3478-3483 (2020)

work page 2020

[17] [17]

THE PINCH DESIGN METHOD FOR HEAT-EXCHANGER NETWORKS

Linnhoff B, Hindmarsh E. THE PINCH DESIGN METHOD FOR HEAT-EXCHANGER NETWORKS. Chemical Engineering Science 38, 745-763 (1983)

work page 1983

[18] [18]

A systematic modeling framework of superstructure optimization in process synthesis

Yeomans H, Grossmann IE. A systematic modeling framework of superstructure optimization in process synthesis. Computers & Chemical Engineering 23, 709-731 (1999)

work page 1999

[19] [19]

The structure and function of complex networks

Newman MEJ. The structure and function of complex networks. Siam Review 45, 167-256 (2003)

work page 2003

[20] [20]

Graph-based configuration optimization for S-CO2 power generation systems

Gao L, Cao T, Hwang Y, Radermacher R. Graph-based configuration optimization for S-CO2 power generation systems. Energy Conversion and Management 244, (2021)

work page 2021

[21] [21]

GraPHsep: An integrated construction method of vapor compression cycle and heat exchanger network

Cui MD, Wang BL, Wang CL, Wei FL, Shi WX. GraPHsep: An integrated construction method of vapor compression cycle and heat exchanger network. Energy Conversion and Management 277, (2023)

work page 2023

[22] [22]

From 1 to N: A computer-aided case study of thermodynamic cycle construction based on thermodynamic process combination

Zhao DP, Deng S, Zhao L, Xu WC, Zhao RK, Wang W. From 1 to N: A computer-aided case study of thermodynamic cycle construction based on thermodynamic process combination. Energy 210, (2020)

work page 2020

[23] [23]

Reinforcement Learning: An Introduction second edition Introduction (2018)

Sutton RS, Barto AG, Sutton RS, Barto AG. Reinforcement Learning: An Introduction second edition Introduction (2018)

work page 2018

[24] [24]

Policy gradient methods for reinforcement learning with function approximation

Sutton RS, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: 13th Annual Conference on Neural Information Processing Systems (NIPS)) (1999)

work page 1999

[25] [25]

Human-level control through deep reinforcement learning

Mnih V, et al. Human-level control through deep reinforcement learning. Nature 518, 529-533 (2015)

work page 2015

[26] [26]

Grandmaster level in StarCraft II using multi-agent reinforcement learning

Vinyals O, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350-+ (2019)

work page 2019

[27] [27]

Mastering the game of Go with deep neural networks and tree search

Silver D, et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484-+ (2016)

work page 2016

[28] [28]

Reinforcement learning in sustainable energy and 19 electric systems: a survey

Yang T, Zhao LY, Li W, Zomaya AY. Reinforcement learning in sustainable energy and 19 electric systems: a survey. Annual Reviews in Control 49, 145-163 (2020)

work page 2020

[29] [29]

Data-driven energy management for electric vehicles using offline reinforcement learning

Wang Y, Wu JD, He HW, Wei ZB, Sun FC. Data-driven energy management for electric vehicles using offline reinforcement learning. Nature Communications 16, (2025)

work page 2025

[30] [30]

Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning

Du Y, et al. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning. Applied Energy 281, (2021)

work page 2021

[31] [31]

Deep reinforcement learning as a tool for the analysis and optimization of energy flows in multi-energy systems

Franzoso A, Fambri G, Badami M. Deep reinforcement learning as a tool for the analysis and optimization of energy flows in multi-energy systems. Energy Conversion and Management 341, (2025)

work page 2025

[32] [32]

Physics-informed machine learning

Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L. Physics-informed machine learning. Nature Reviews Physics 3, 422-440 (2021)

work page 2021

[33] [33]

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Dulac-Arnold G, et al. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning 110, 2419-2468 (2021)

work page 2021

[34] [34]

The Option-Critic Architecture

Bacon PL, Harb J, Precup D, Aaai. The Option-Critic Architecture. In: 31st AAAI Conference on Artificial Intelligence) (2017)

work page 2017

[35] [35]

Hierarchical Reinforcement Learning: A Comprehensive Survey

Pateria S, Subagdja B, Tan AH, Quek C. Hierarchical Reinforcement Learning: A Comprehensive Survey. Acm Computing Surveys 54, (2021)

work page 2021

[36] [36]

Data-Efficient Hierarchical Reinforcement Learning

Nachum O, Gu SX, Lee H, Levine S. Data-Efficient Hierarchical Reinforcement Learning. In: 32nd Conference on Neural Information Processing Systems (NIPS)) (2018)

work page 2018

[37] [37]

Pure and Pseudo-pure Fluid Thermophysical Property Evaluation and the Open-Source Thermophysical Property Library CoolProp

Bell IH, Wronski J, Quoilin S, Lemort V. Pure and Pseudo-pure Fluid Thermophysical Property Evaluation and the Open-Source Thermophysical Property Library CoolProp. Industrial & Engineering Chemistry Research 53, 2498-2508 (2014)

work page 2014

[38] [38]

The NIST REFPROP Database for Highly Accurate Properties of Industrially Important Fluids

Huber ML, Lemmon EW, Bell IH, McLinden MO. The NIST REFPROP Database for Highly Accurate Properties of Industrially Important Fluids. Industrial & Engineering Chemistry Research 61, 15449-15472 (2022)

work page 2022

[39] [39]

Physics-informed machine learning

Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang SF, Yang L. Physics-informed machine learning. Nature Reviews Physics 3, 422-440 (2021)

work page 2021

[40] [40]

A review of transcritical carbon dioxide heat pump and refrigeration cycles

Ma YT, Liu ZY, Tian H. A review of transcritical carbon dioxide heat pump and refrigeration cycles. Energy 55, 156-172 (2013)

work page 2013

[41] [41]

Transcritical carbon dioxide heat pump systems: A review

Austin BT, Sumathy K. Transcritical carbon dioxide heat pump systems: A review. Renewable & Sustainable Energy Reviews 15, 4013-4029 (2011)

work page 2011

[42] [42]

Advanced development and application of transcritical CO<sub>2</sub> refrigeration and heat pump technology-A review

Song YL, Cui C, Yin X, Cao F. Advanced development and application of transcritical CO<sub>2</sub> refrigeration and heat pump technology-A review. Energy Reports 8, 7840-7869 (2022)

work page 2022

[43] [43]

Supercritical carbon dioxide cycles for power generation: A review

Crespi F, Gavagnin G, Sánchez D, Martínez GS. Supercritical carbon dioxide cycles for power generation: A review. Applied Energy 195, 152-183 (2017)

work page 2017

[44] [44]

Supercritical CO2 Brayton cycle: A state-of-the-art review

Liu YP, Wang Y, Huang DG. Supercritical CO2 Brayton cycle: A state-of-the-art review. Energy 189, (2019)

work page 2019

[45] [45]

Multilayer feedforward networks are universal 20 approximators

Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal 20 approximators. Neural networks 2, 359-366 (1989)

work page 1989

[46] [46]

Adam: A method for stochastic optimization

Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980, (2014)

work page 2014

[47] [47]

Learning algorithms or Markov decision processes with average cost

Abounadi J, Bertsekas D, Borkar VS. Learning algorithms or Markov decision processes with average cost. Siam Journal on Control and Optimization 40, 681-698 (2001)

work page 2001

[48] [48]

Trust Region Policy Optimization

Schulman J, Levine S, Moritz P, Jordan M, Abbeel P. Trust Region Policy Optimization. In: 32nd International Conference on Machine Learning) (2015)

work page 2015

[49] [49]

Proximal policy optimization algorithms arXiv

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms arXiv. arXiv (USA), 12 pp.-12 pp. (2017)

work page 2017

[50] [50]

Natural Actor-Critic

Peters J, Schaal S. Natural Actor-Critic. Neurocomputing 71, 1180-1190 (2008)

work page 2008

[51] [51]

Natural actor-critic algorithms

Bhatnagar S, Sutton RS, Ghavamzadeh M, Lee M. Natural actor-critic algorithms. Automatica 45, 2471-2482 (2009)

work page 2009

[52] [52]

SciPy 1.0: fundamental algorithms for scientific computing in Python

Virtanen P, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 261-272 (2020)

work page 2020

[53] [53]

Recent Advances in Bayesian Optimization

Wang X, Jin Y, Schmitt S, Olhofer M. Recent Advances in Bayesian Optimization. Acm Computing Surveys 55, (2023). Acknowledgments This work was supported by the National Science and Technology Major Project (Project No. 2026ZD1702400) and Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China (JYB2025XDXM304)....

work page 2023