pith. sign in

arxiv: 2605.28356 · v1 · pith:5IUFKBUNnew · submitted 2026-05-27 · 🧮 math.OC

Machine Learning for Exact Time Series Aggregation in Generation Expansion Planning with Energy Storage

Pith reviewed 2026-06-29 10:57 UTC · model grok-4.3

classification 🧮 math.OC
keywords generation expansion planningtime series aggregationmachine learningmarginal costsenergy storagetemporal aggregationoptimizationactive constraints
0
0 comments X

The pith

Machine learning estimates of marginal costs guide time series aggregation to preserve active constraints and achieve exact aggregation in generation expansion planning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops an iterative time series aggregation method for generation expansion planning models that include renewables, thermal generation, storage, market participation, and investment decisions. The method uses machine learning predictions of marginal costs to identify which constraints are binding in the full-scale model and then clusters time periods to keep those constraints active in the reduced model. Traditional aggregation relies only on input data patterns and remains heuristic, but this approach can evaluate the optimality gap and targets exact equivalence between aggregated and full models. Numerical tests show that adding the estimated marginal costs as clustering features improves the quality of the resulting aggregated model compared with standard input-only methods.

Core claim

By leveraging machine learning-based estimates of the GEP model marginal costs, the algorithm guides TSA to construct an aggregated model that preserves the active constraints of its full-scale counterpart, which has been shown to yield exact temporal aggregation.

What carries the argument

Iterative time series aggregation guided by machine learning estimates of marginal costs, used to identify and preserve active constraints from the full-scale model.

If this is right

  • The aggregated model can be solved with substantially lower computational effort while still matching the binding constraints of the original model.
  • An explicit optimality gap between the aggregated and full-scale solutions becomes computable at each iteration.
  • Adding estimated marginal costs as features measurably improves clustering quality over methods that use only raw input time series.
  • The approach applies directly to models that jointly optimize investment, operations, and market participation for systems containing storage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constraint-preservation logic could be tested on other large-scale linear programs where marginal-cost signals indicate binding limits.
  • If the ML predictor generalizes across similar GEP instances, repeated full-scale solves might be replaced by one-time training plus fast aggregated solves.
  • Extending the method to nonlinear or stochastic variants would require verifying that the marginal-cost estimates still reliably flag active constraints.

Load-bearing premise

The machine learning estimates of marginal costs must be accurate enough to correctly identify which constraints are active in the full-scale model.

What would settle it

Solve both the full-scale GEP model and the ML-guided aggregated model on identical data instances; check whether the sets of active constraints and the investment/operational decisions match exactly.

Figures

Figures reproduced from arXiv: 2605.28356 by Jakub Rybka, Luca Santosuosso, Sonja Wogrin, Thomas Klatzer.

Figure 1
Figure 1. Figure 1: Objective function bounds for a stylized cost minimiza [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: TSA performance relative to full-scale (F-S) optimiza [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: TSA performance relative to full-scale (F-S) optimiza [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

This paper investigates a generation expansion planning (GEP) problem encompassing renewable, thermal, and storage technologies while simultaneously optimizing market participation, operational expenditures, and capital investment. To alleviate the computational burden of the GEP model, we propose a novel iterative time series aggregation (TSA) method that constructs a temporally aggregated counterpart of the original full-scale GEP model. Unlike traditional TSA methods, which are purely heuristic, our method enables the assessment of the optimality gap between the aggregated and full-scale models. Moreover, by leveraging machine learning-based estimates of the GEP model marginal costs, the algorithm guides TSA to construct an aggregated model that preserves the active constraints of its full-scale counterpart, which has been shown to yield exact temporal aggregation. Numerical results show that incorporating estimated marginal costs as clustering features substantially improves the quality of temporal aggregation compared with traditional TSA methods that rely solely on input data analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a novel iterative time series aggregation (TSA) method for a generation expansion planning (GEP) problem that includes renewable, thermal, and storage technologies. The method uses machine learning estimates of marginal costs to guide the construction of an aggregated model that preserves the active constraints of the full-scale GEP model, which is asserted to yield exact temporal aggregation; numerical results are said to show improved aggregation quality over traditional TSA approaches that rely only on input data.

Significance. If the central claim holds, the work would offer a principled way to reduce the computational burden of large-scale GEP models while retaining exactness guarantees, which is valuable for long-term energy system planning. The explicit linkage of ML-derived marginal costs to active-constraint preservation and the provision of an optimality-gap assessment distinguish it from purely heuristic TSA methods.

major comments (2)
  1. [Abstract] Abstract: the central claim that ML-based marginal cost estimates enable preservation of active constraints (and thereby exact aggregation) is load-bearing, yet the manuscript supplies neither the ML architecture, training data source, error metrics on the marginal-cost predictions, nor any verification that the predicted active set matches the true active set obtained from the full-scale model.
  2. [Abstract] Abstract (and method description): the optimality-gap assessment is presented as independent of the ML step, but without reported checks on how misclassification of binding constraints propagates into the aggregated model, it is impossible to confirm that the exactness guarantee transfers from the referenced prior result on active-constraint preservation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of the ML component and the transfer of the exactness guarantee. We address each major comment below, indicating planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that ML-based marginal cost estimates enable preservation of active constraints (and thereby exact aggregation) is load-bearing, yet the manuscript supplies neither the ML architecture, training data source, error metrics on the marginal-cost predictions, nor any verification that the predicted active set matches the true active set obtained from the full-scale model.

    Authors: We agree that additional details on the machine learning component are necessary to support the central claim. The revised manuscript will include: (i) the specific ML architecture (a feed-forward neural network with two hidden layers), (ii) the training data source (marginal costs obtained by solving the full-scale GEP model on a representative subset of scenarios), (iii) error metrics (MAE and classification accuracy on the binding-constraint predictions), and (iv) a verification procedure that compares the predicted active set against the true active set on held-out instances. These additions will be placed in a new subsection of the methods and referenced from the abstract. revision: yes

  2. Referee: [Abstract] Abstract (and method description): the optimality-gap assessment is presented as independent of the ML step, but without reported checks on how misclassification of binding constraints propagates into the aggregated model, it is impossible to confirm that the exactness guarantee transfers from the referenced prior result on active-constraint preservation.

    Authors: The referee correctly notes that the current text does not quantify the effect of ML prediction errors on the optimality-gap bound. While the gap assessment itself follows directly from the prior active-set preservation theorem when the active set is correctly identified, we will add an empirical robustness study in the numerical results section. This study will report the frequency and impact of misclassified binding constraints on the aggregated solution and the resulting gap estimate across the test cases. If the analysis shows material degradation, we will also discuss a fallback mechanism (e.g., iterative refinement of the ML predictions). revision: yes

Circularity Check

0 steps flagged

Exactness follows from active-constraint preservation (prior result); ML estimates are an input, not a definitional reduction

full rationale

The derivation chain is: ML marginal-cost estimates → identify active constraints → preserve them in aggregated model → exact TSA (by the cited prior result on active-constraint preservation). The prior result is invoked as an external fact rather than derived inside this paper; the ML step supplies an estimate but does not redefine or tautologically force the exactness guarantee. No equation equates a fitted quantity to its own prediction, no self-citation chain is load-bearing for the core claim, and the optimality-gap assessment is presented as an independent check. This yields only a minor self-citation risk at most.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.1-grok · 5689 in / 1046 out tokens · 31012 ms · 2026-06-29T10:57:17.038047+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 1 canonical work pages

  1. [1]

    State-of-the-art generation expansion planning: A review,

    N. E. Koltsaklis and A. S. Dagoumas, “State-of-the-art generation expansion planning: A review,”Appl. Energy, vol. 230, pp. 563-589, Nov. 2018

  2. [2]

    An integrated planning model in centralized power systems,

    F. L ´opez-Ramos, S. Nasini, and M. H. Sayed, “An integrated planning model in centralized power systems,”Eur . J. Oper . Res., vol. 287, no. 1, pp. 361-377, Nov. 2020

  3. [3]

    Economic model predictive control for the energy management problem of a virtual power plant including resources at different voltage levels,

    L. Santosuosso et al., “Economic model predictive control for the energy management problem of a virtual power plant including resources at different voltage levels,” in27th Int. Conf. Electricity Distrib. (CIRED 2023), Rome, Italy, 2023, pp. 2044–2048

  4. [4]

    A comprehensive sequential review study through the generation expansion planning,

    H. Sadeghi, M. Rashidinejad, and A. Abdollahi, “A comprehensive sequential review study through the generation expansion planning,” Renew. Sustain. Energy Rev., vol. 67, pp. 1369–1394, Jan. 2017

  5. [5]

    Stochastic economic model predictive control for renewable energy and ancillary services trading with storage,

    L. Santosuosso, S. Camal, F. Liberati, A. Di Giorgio, A. Michiorri, and G. Kariniotakis, “Stochastic economic model predictive control for renewable energy and ancillary services trading with storage,”Sustain. Energy, Grids Netw., vol. 38, pp. 101373, June 2024

  6. [6]

    Mixed-integer linear programming models and algorithms for generation and transmission expansion planning of power systems,

    C. Li, A. J. Conejo, P. Liu, B. P. Omell, J. D. Siirola, and I. E. Grossmann, “Mixed-integer linear programming models and algorithms for generation and transmission expansion planning of power systems,” Eur . J. Oper . Res., vol. 297, no. 3, pp. 1071-1082, Mar. 2022

  7. [7]

    High temporal resolution generation expansion planning for the clean energy transition,

    T. Levin, P. L. Blaisdell-Pijuan, J. Kwon, and W. N. Mann, “High temporal resolution generation expansion planning for the clean energy transition,”Renew. Sustain. Energy Transit., vol. 5, p. 100072, Aug. 2024

  8. [8]

    A modeler’s guide to handle complexity in energy systems optimization,

    L. Kotzur et al., “A modeler’s guide to handle complexity in energy systems optimization,”Adv. Appl. Energy, vol. 4, p. 100063, Nov. 2021

  9. [9]

    A review on time series aggregation methods for energy system models,

    M. Hoffmann, L. Kotzur, D. Stolten, and M. Robinius, “A review on time series aggregation methods for energy system models,”Energies, vol. 13, no. 3, p. 641, Feb. 2020

  10. [10]

    Chronological time-period clustering for optimal capacity expansion planning with storage,

    S. Pineda and J. M. Morales, “Chronological time-period clustering for optimal capacity expansion planning with storage,”IEEE Trans. Power Syst., vol. 33, no. 6, pp. 7162–7170, Nov. 2018

  11. [11]

    Time series aggregation for energy system design: Modeling seasonal storage,

    L. Kotzur, P. Markewitz, M. Robinius, and D. Stolten, “Time series aggregation for energy system design: Modeling seasonal storage,”Appl. Energy, vol. 213, pp. 123–135, Mar. 2018

  12. [12]

    Clustering methods to find rep- resentative periods for the optimization of energy systems: an initial framework and comparison,

    H. Teichgraeber and A. R. Brandt, “Clustering methods to find rep- resentative periods for the optimization of energy systems: an initial framework and comparison,”Appl. Energy, vol. 239, pp. 1283–1293, Apr. 2019

  13. [13]

    Capturing chronology and extreme values of representative days for planning of transmission lines and long-term energy storage systems,

    M. Moradi-Sepahvand and S. H. Tindemans, “Capturing chronology and extreme values of representative days for planning of transmission lines and long-term energy storage systems,” in Proc.2023 IEEE Belgrade PowerTech, Belgrade, Serbia, 2023

  14. [14]

    Time-series aggregation for the optimization of energy systems: Goals, challenges, approaches, and opportunities,

    H. Teichgraeber and A. R. Brandt, “Time-series aggregation for the optimization of energy systems: Goals, challenges, approaches, and opportunities,”Renew. Sustain. Energy Rev., vol. 157, p. 111984, Apr. 2022

  15. [15]

    A model- adaptive clustering-based time aggregation method for low-carbon en- ergy system optimization,

    Y . Zhang, V . Cheng, D. S. Mallapragada, J. Song, and G. He, “A model- adaptive clustering-based time aggregation method for low-carbon en- ergy system optimization,”IEEE Trans. Sustain. Energy, vol. 14, no. 1, pp. 55–64, Aug. 2023

  16. [16]

    Data-driven representative day selection for investment decisions: A cost-oriented approach,

    M. Sun, F. Teng, X. Zhang, G. Strbac, and D. Pudjianto, “Data-driven representative day selection for investment decisions: A cost-oriented approach,”IEEE Trans. Power Syst., vol. 34, no. 4, pp. 2925–2936, July 2019

  17. [17]

    On represen- tative day selection for capacity expansion planning of power systems under extreme operating conditions,

    C. Li, A. J. Conejo, J. D. Siirola, and I. E. Grossmann, “On represen- tative day selection for capacity expansion planning of power systems under extreme operating conditions,”Int. J. Electr . Power Energy Syst., vol. 137, p. 107697, May 2022

  18. [18]

    Reducing climate risk in energy system planning: A posteriori time series aggregation for models with storage,

    A. P. Hilbers, D. J. Brayshaw, and A. Gandy, “Reducing climate risk in energy system planning: A posteriori time series aggregation for models with storage,”Appl. Energy, vol. 334, p. 120624, Mar. 2023

  19. [19]

    Time series aggregation for optimization: One-size-fits-all?,

    S. Wogrin, “Time series aggregation for optimization: One-size-fits-all?,” IEEE Trans. Smart Grid, vol. 14, no. 3, pp. 2489–2492, Feb. 2023

  20. [20]

    Towards exact temporal aggregation of time-coupled energy storage models via active constraint set identification and machine learning,

    T. Klatzer, D. Cardona-Vasquez, L. Santosuosso, and S. Wogrin, “Towards exact temporal aggregation of time-coupled energy storage models via active constraint set identification and machine learning,” arXiv:2504.19699, Oct. 2025

  21. [21]

    A tutorial on kernel density estimation and recent ad- vances,

    Y . C. Chen, “A tutorial on kernel density estimation and recent ad- vances,”Biostatistics Epidemiol., vol. 1, no. 1, pp. 161–187, Oct. 2017

  22. [22]

    A review on random forest: An ensemble classifier,

    A. Parmar, R. Katariya, and V . Patel, “A review on random forest: An ensemble classifier,” inInt. Conf. Intell. Data Commun. Technol. Internet Things (ICICI), Coimbatore, India, 2018, pp. 758–763

  23. [23]

    Optimal virtual power plant investment planning via time series aggregation with bounded error,

    L. Santosuosso and S. Wogrin, “Optimal virtual power plant investment planning via time series aggregation with bounded error,” in2025 IEEE PES Innov. Smart Grid Technol. Conf. Europe (ISGT Europe), Valletta, Malta, 2025, pp. 1–5

  24. [24]

    The ENTSO-E trans- parency platform – A review of Europe’s most ambitious electricity data platform,

    L. Hirth, J. M ¨uhlenpfordt, and M. Bulkeley, “The ENTSO-E trans- parency platform – A review of Europe’s most ambitious electricity data platform,”Appl. Energy, vol. 225, pp. 1054–1067, Sep. 2018. APPENDIXA FULL-SCALEMODEL The aggregated model presented in Subsection II-A col- lapses to the full-scale model when each representative time step corresponds ...