pith. sign in

arxiv: 2605.23194 · v1 · pith:3KWC23CTnew · submitted 2026-05-22 · 💻 cs.LG · cs.AI

Scalable Heterogeneous Graph Foundation Models for Data-Driven Optimal Power Flow in Smart Grids

Pith reviewed 2026-05-25 05:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords optimal power flowheterogeneous graph neural networksgraph foundation modelsfeasibility classificationN-1 contingency analysissmart gridsdistributed traininghyperparameter optimization
0
0 comments X

The pith

Pretraining a heterogeneous graph neural network on three million power-grid instances produces a foundation model whose fine-tuning improves low-data accuracy on optimal power flow tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a distributed workflow on HydraGNN that keeps the native node and edge types of power networks intact while training across millions of graph instances drawn from ten different PGLib cases. Compact models identified through large-scale hyperparameter search serve as foundation models that can be adapted to new downstream problems. Experiments on feasibility classification and N-1 contingency regression demonstrate that partial or head-only fine-tuning of these pretrained models raises accuracy, stabilizes training, speeds convergence, and lowers the data and compute needed for adaptation. A sympathetic reader cares because real grids frequently present new topologies or operating points with only limited labeled data, so a reusable pretrained surrogate could reduce the engineering effort required for fast, reliable OPF approximations.

Core claim

Training compact heterogeneous graph neural networks (approximately 1.6-1.7 million parameters) on three million instances that span ten PGLib-OPF cases from 14 to 13,659 buses yields OPF foundation models; when these models are fine-tuned with partial-layer or head-only updates on feasibility classification and N-1 contingency regression tasks, low-data accuracy rises, training stabilizes, convergence accelerates, and adaptation cost falls relative to training from random initialization.

What carries the argument

The HydraGNN-based scalable heterogeneous GNN workflow that preserves distinct node types (buses, generators, loads, shunts) and edge types (AC lines, transformers, device-to-bus couplings) and supports distributed preprocessing, training, and hyperparameter optimization on leadership-class supercomputers.

If this is right

  • Partial or head-only fine-tuning of the pretrained model reduces the data volume and compute required to reach target accuracy on new OPF surrogate tasks.
  • Pretraining across multiple grid topologies stabilizes the training trajectory and shortens the number of epochs needed for convergence on downstream feasibility and contingency problems.
  • Models discovered by the DeepHyper campaign on Frontier achieve the lowest validation losses among the compact architectures tested.
  • The workflow scales preprocessing and training to grids containing more than thirteen thousand buses while maintaining the heterogeneous structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the representation learned on the ten cases captures enough common structure, the same foundation model could be adapted to grid instances never seen during pretraining without full retraining.
  • The approach suggests a path toward reusable surrogates that could be updated incrementally as new sensor data or topology changes arrive in operational smart-grid settings.
  • Extending the heterogeneous typing to additional device classes (for example, renewable inverters or storage) would require only modest changes to the same workflow.

Load-bearing premise

The three million graph instances drawn from only ten PGLib-OPF cases are representative enough that fine-tuning benefits transfer to arbitrary real-world grids and operating conditions.

What would settle it

Apply the same pretraining-plus-fine-tuning protocol to a power-grid case that lies outside the original ten PGLib instances and measure whether the pretrained model still shows accuracy, stability, or convergence gains over random initialization in the low-data regime.

Figures

Figures reproduced from arXiv: 2605.23194 by Kibaek Kim, Massimiliano Lupo Pasini, Teja Kuruganti, Yijiang Li.

Figure 1
Figure 1. Figure 1: Distribution of per-trial best validation losses across [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Validation loss vs. number of trainable parameters per [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FT1 feasibility classification: validation BCE loss vs. training epoch for HeteroSAGE (top row) and HeteroHEAT [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FT2 N−1 contingency regression: validation MSE loss versus training epoch for HeteroSAGE and HeteroHEAT across multiple labeled-data regimes. and computational cost. Full fine-tuning is less reliable be￾cause unconstrained updates to all layers can overwrite useful pretrained representations, consistent with catastrophic for￾getting [23], and should be reserved for cases where the downstream dataset is lar… view at source ↗
read the original abstract

Fast and reliable optimal power flow (OPF) approximation is essential for reliable smart-grid operation, yet many learning-based surrogates either flatten the native heterogeneous structure of power networks, target a limited set of grid topologies, or lack scalable infrastructure for graph foundation model (GFM) training. This paper presents a scalable heterogeneous graph neural network (GNN) workflow, built on HydraGNN, for data-driven OPF surrogate modeling and OPF-GFM development. The workflow preserves the distinct node and edge types of power grids -- buses, generators, loads, shunts, AC lines, transformers, and device-to-bus couplings -- and supports distributed preprocessing, training, hyperparameter optimization (HPO), and downstream fine-tuning on leadership-class supercomputers. Using three million heterogeneous graph instances spanning ten PGLib-OPF cases, from 14 to 13,659 buses, we conduct DeepHyper-driven HPO on the ORNL Frontier supercomputer. The campaign identifies compact models ($\sim$1.6--1.7M parameters) with the lowest validation losses. Downstream experiments on feasibility classification and N-1 contingency regression show that fine-tuning pretrained OPF GFM improves low-data accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost when partial or head-only fine-tuning is used.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a scalable heterogeneous graph neural network workflow based on HydraGNN for developing graph foundation models (GFMs) for data-driven optimal power flow (OPF) approximation. It generates three million heterogeneous graph instances from ten PGLib-OPF cases (14–13,659 buses), performs DeepHyper-driven hyperparameter optimization on the ORNL Frontier supercomputer to identify compact models (~1.6–1.7M parameters), and reports that fine-tuning the resulting pretrained OPF GFM improves low-data accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost on downstream feasibility classification and N-1 contingency regression tasks when using partial or head-only fine-tuning.

Significance. If the fine-tuning benefits hold under broader evaluation, the work could advance scalable, structure-preserving surrogates for OPF that exploit large-scale pretraining on heterogeneous power-grid graphs. The distributed preprocessing/training infrastructure and leadership-class HPO campaign are concrete strengths that address scalability barriers in the field.

major comments (2)
  1. [Abstract, experiments paragraph] Abstract, experiments paragraph: the claim that fine-tuning improves accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost supplies no quantitative metrics, baseline comparisons, error bars, or data-split details, so the magnitude and reliability of the reported gains cannot be assessed.
  2. [Abstract, experiments paragraph] Abstract, experiments paragraph: all three million pretraining instances are drawn from only ten PGLib-OPF base cases; the downstream experiments give no indication of evaluation on held-out topologies, an eleventh PGLib case, or real utility data. This leaves the transferability required for the foundation-model claim untested.
minor comments (1)
  1. [Abstract] Abstract: the bus-count range '14 to 13,659 buses' should explicitly list the ten cases or clarify whether these are the exact sizes used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and experimental claims. We address each major comment below and will revise the manuscript to improve clarity and evidence presentation.

read point-by-point responses
  1. Referee: [Abstract, experiments paragraph] Abstract, experiments paragraph: the claim that fine-tuning improves accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost supplies no quantitative metrics, baseline comparisons, error bars, or data-split details, so the magnitude and reliability of the reported gains cannot be assessed.

    Authors: We agree that the abstract would benefit from quantitative support. The body of the manuscript reports detailed results including baselines, error bars, and data splits for the fine-tuning experiments. In the revision we will augment the abstract with representative quantitative metrics (e.g., accuracy deltas, convergence iterations, and adaptation-cost reductions) drawn from those sections. revision: yes

  2. Referee: [Abstract, experiments paragraph] Abstract, experiments paragraph: all three million pretraining instances are drawn from only ten PGLib-OPF base cases; the downstream experiments give no indication of evaluation on held-out topologies, an eleventh PGLib case, or real utility data. This leaves the transferability required for the foundation-model claim untested.

    Authors: The ten PGLib-OPF cases were deliberately chosen to cover a wide range of bus counts (14–13,659) and structural characteristics, allowing the pretraining to expose the model to substantial topological diversity. Downstream fine-tuning results are reported across these varied instances. We acknowledge that explicit evaluation on entirely held-out topologies would provide stronger support for the foundation-model transferability claim. We will revise the manuscript to state this scope limitation explicitly and, where feasible, add results on an eleventh case or note it as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical results on external benchmarks

full rationale

The paper presents a data-driven workflow for training heterogeneous GNNs on three million graph instances generated from ten PGLib-OPF cases, followed by empirical fine-tuning experiments measuring accuracy, convergence, and adaptation cost on feasibility classification and N-1 regression tasks. No equations, self-definitions, or self-citation chains reduce any reported prediction or benefit to quantities defined by the same fitted parameters. All central claims rest on measured outcomes against external benchmark data rather than internal tautologies or renamed fits.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Central claims rest on the domain assumption that heterogeneous graph structure adds predictive value over flattened representations and that pretraining across the ten chosen PGLib cases produces transferable features; model architecture choices and loss functions are free parameters selected by HPO but not enumerated.

free parameters (2)
  • GNN layer counts, hidden dimensions, and attention heads
    Determined by DeepHyper HPO; specific values not reported in abstract
  • Training set size and topology sampling strategy
    Fixed at three million instances from ten PGLib cases; choice affects claimed generalization
axioms (2)
  • domain assumption Preserving distinct node and edge types (buses, generators, AC lines, transformers, etc.) improves surrogate accuracy for OPF
    Invoked to justify the heterogeneous GNN design over flattening approaches
  • domain assumption Pretraining on diverse PGLib topologies yields features that stabilize and accelerate fine-tuning on downstream tasks
    Load-bearing premise for the reported fine-tuning benefits

pith-pipeline@v0.9.0 · 5779 in / 1633 out tokens · 43085 ms · 2026-05-25T05:25:59.433864+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 3 internal anchors

  1. [1]

    History of optimal power flow and formulations,

    M. B. Cain, R. P. O’Neill, and A. Castillo, “History of optimal power flow and formulations,” Federal Energy Regulatory Commission (FERC), Tech. Rep., 2012

  2. [2]

    Optimal power flow: A bibliographic survey I – formulations and deterministic methods,

    S. Frank, I. Steponavice, and S. Rebennack, “Optimal power flow: A bibliographic survey I – formulations and deterministic methods,” Energy Systems, vol. 3, no. 3, pp. 221–258, 2012

  3. [3]

    Optimal power flow using graph neural networks,

    D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,”arXiv preprint arXiv:1910.09658, 2019

  4. [4]

    Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,

    S. Liu, C. Wu, and H. Zhu, “Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,”IEEE Transactions on Power Systems, 2023

  5. [5]

    A directed acyclic graph neural network for ac optimal power flow,

    Z. Guo, K. Sun, B. Park, S. Simunovic, and W. Kang, “A directed acyclic graph neural network for ac optimal power flow,” in2023 IEEE Power & Energy Society General Meeting (PESGM), 2023

  6. [6]

    Initial estimate of ac optimal power flow with graph neural networks,

    A. Deihim, D. Apostolopoulou, and E. Alonso, “Initial estimate of ac optimal power flow with graph neural networks,”Electric Power Systems Research, vol. 234, p. 110782, 2024

  7. [7]

    Physics-informed neural networks for ac optimal power flow,

    F. Fioretto, T. W. K. Mak, and P. Van Hentenryck, “Physics-informed neural networks for ac optimal power flow,”Electric Power Systems Research, vol. 212, p. 108412, 2022

  8. [8]

    OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,

    S. Ghamizi, A. Ma, J. Cao, and P. Rodriguez Cortes, “OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,” in2024 IEEE Power & Energy Society General Meeting (PESGM), 2024

  9. [9]

    Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,

    A. Trigui, M. Olama, G. Siopsis, H. Eldakhakhni, and M. Salhi, “Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,” in2025 57th North American Power Symposium (NAPS), 2025

  10. [10]

    Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,

    A. Wen, B. Wen, J. Li, and J. Xu, “Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,”Applied System Innovation, vol. 9, no. 1, p. 18, 2026

  11. [11]

    LUMINA: Foundation Models for Topology Transferable ACOPF

    Y . Li, Z. Memon, H. Jin, S. Fenu, K. Song, S. B. Sharma, P. Gasana, H. Kim, L. Zhao, and K. Kim, “LUMINA: Foundation models for topology transferable ACOPF,” inInternational Conference on Learning Representations (ICLR), 2026, arXiv:2603.04300. [Online]. Available: https://arxiv.org/abs/2603.04300

  12. [12]

    LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning

    H. Jin, K. Song, Z. Memon, Y . Li, S. Fenu, H. Kim, L. Zhao, and K. Kim, “LUMINA: A grid foundation model for benchmarking AC optimal power flow surrogate learning,” arXiv preprint arXiv:2605.02133, 2026. [Online]. Available: https://arxiv.org/abs/2605.02133

  13. [13]

    Towards Systematic Generalization for Power Grid Optimization Problems

    Z. Memon, Y . Li, H. Jin, K. Kim, and L. Zhao, “Towards systematic generalization for power grid optimization problems,” arXiv preprint arXiv:2605.02026, 2026. [Online]. Available: https: //arxiv.org/abs/2605.02026

  14. [14]

    HydraGNN,

    M. Lupo Pasini, S. T. Reeve, P. Zhang, and J. Y . Choi, “HydraGNN,” Distributed PyTorch implementation of multi-headed graph convolutional neural networks, United States, Oct. 2021. [Online]. Available: https://www.osti.gov/biblio/code-65891

  15. [15]

    Inductive representation learning on large graphs,

    W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” inAdvances in Neural Information Processing Systems, 2017

  16. [16]

    Graph attention networks,

    P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” inInternational Conference on Learning Representations, 2018

  17. [17]

    How attentive are graph attention networks?

    S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inInternational Conference on Learning Representations, 2022

  18. [18]

    Principal neighbourhood aggregation for graph nets,

    G. Corso, L. Cavalleri, D. Beaini, P. Li `o, and P. Veli ˇckovi´c, “Principal neighbourhood aggregation for graph nets,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 13 260–13 271

  19. [19]

    Heterogeneous graph trans- former,

    Z. Hu, Y . Dong, K. Wang, and Y . Sun, “Heterogeneous graph trans- former,” inProceedings of The Web Conference 2020, 2020, pp. 2704– 2710

  20. [20]

    Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,

    X. Mo, Y . Xing, and C. Lv, “Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,”arXiv preprint arXiv:2106.07161, 2021

  21. [21]

    OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,

    T. Lovett, A. Buovich, A. Sharma, S. Pegg, S. Cohen, S. Stephens, A. Tucker, P. Pope, J. Eiselen, F. Buchaca, C. Sutton, J. Mantilla- Bilbao, T. Roeder, Y . Lin, E. Bridgett-Tomkinson, J. Garratt, J. Patterson, S. Lyons, A. Hales, and V . Petar, “OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,”arXiv preprint arXiv:240...

  22. [22]

    The power grid library for benchmarking ac optimal power flow algorithms,

    S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, C. Josz, R. Korab, B. Lesieutre, J. Maeght, D. K. Molzahn, T. J. Overbye, P. Panciatici, B. Park, J. Snodgrass, A. Tbaileh, and R. D. Zimmerman, “The power grid library for benchmarking ac optimal power flow algorithms,”arXiv p...

  23. [23]

    Overcoming catastrophic forgetting in neural networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017