Scalable Heterogeneous Graph Foundation Models for Data-Driven Optimal Power Flow in Smart Grids
Pith reviewed 2026-05-25 05:25 UTC · model grok-4.3
The pith
Pretraining a heterogeneous graph neural network on three million power-grid instances produces a foundation model whose fine-tuning improves low-data accuracy on optimal power flow tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Training compact heterogeneous graph neural networks (approximately 1.6-1.7 million parameters) on three million instances that span ten PGLib-OPF cases from 14 to 13,659 buses yields OPF foundation models; when these models are fine-tuned with partial-layer or head-only updates on feasibility classification and N-1 contingency regression tasks, low-data accuracy rises, training stabilizes, convergence accelerates, and adaptation cost falls relative to training from random initialization.
What carries the argument
The HydraGNN-based scalable heterogeneous GNN workflow that preserves distinct node types (buses, generators, loads, shunts) and edge types (AC lines, transformers, device-to-bus couplings) and supports distributed preprocessing, training, and hyperparameter optimization on leadership-class supercomputers.
If this is right
- Partial or head-only fine-tuning of the pretrained model reduces the data volume and compute required to reach target accuracy on new OPF surrogate tasks.
- Pretraining across multiple grid topologies stabilizes the training trajectory and shortens the number of epochs needed for convergence on downstream feasibility and contingency problems.
- Models discovered by the DeepHyper campaign on Frontier achieve the lowest validation losses among the compact architectures tested.
- The workflow scales preprocessing and training to grids containing more than thirteen thousand buses while maintaining the heterogeneous structure.
Where Pith is reading between the lines
- If the representation learned on the ten cases captures enough common structure, the same foundation model could be adapted to grid instances never seen during pretraining without full retraining.
- The approach suggests a path toward reusable surrogates that could be updated incrementally as new sensor data or topology changes arrive in operational smart-grid settings.
- Extending the heterogeneous typing to additional device classes (for example, renewable inverters or storage) would require only modest changes to the same workflow.
Load-bearing premise
The three million graph instances drawn from only ten PGLib-OPF cases are representative enough that fine-tuning benefits transfer to arbitrary real-world grids and operating conditions.
What would settle it
Apply the same pretraining-plus-fine-tuning protocol to a power-grid case that lies outside the original ten PGLib instances and measure whether the pretrained model still shows accuracy, stability, or convergence gains over random initialization in the low-data regime.
Figures
read the original abstract
Fast and reliable optimal power flow (OPF) approximation is essential for reliable smart-grid operation, yet many learning-based surrogates either flatten the native heterogeneous structure of power networks, target a limited set of grid topologies, or lack scalable infrastructure for graph foundation model (GFM) training. This paper presents a scalable heterogeneous graph neural network (GNN) workflow, built on HydraGNN, for data-driven OPF surrogate modeling and OPF-GFM development. The workflow preserves the distinct node and edge types of power grids -- buses, generators, loads, shunts, AC lines, transformers, and device-to-bus couplings -- and supports distributed preprocessing, training, hyperparameter optimization (HPO), and downstream fine-tuning on leadership-class supercomputers. Using three million heterogeneous graph instances spanning ten PGLib-OPF cases, from 14 to 13,659 buses, we conduct DeepHyper-driven HPO on the ORNL Frontier supercomputer. The campaign identifies compact models ($\sim$1.6--1.7M parameters) with the lowest validation losses. Downstream experiments on feasibility classification and N-1 contingency regression show that fine-tuning pretrained OPF GFM improves low-data accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost when partial or head-only fine-tuning is used.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a scalable heterogeneous graph neural network workflow based on HydraGNN for developing graph foundation models (GFMs) for data-driven optimal power flow (OPF) approximation. It generates three million heterogeneous graph instances from ten PGLib-OPF cases (14–13,659 buses), performs DeepHyper-driven hyperparameter optimization on the ORNL Frontier supercomputer to identify compact models (~1.6–1.7M parameters), and reports that fine-tuning the resulting pretrained OPF GFM improves low-data accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost on downstream feasibility classification and N-1 contingency regression tasks when using partial or head-only fine-tuning.
Significance. If the fine-tuning benefits hold under broader evaluation, the work could advance scalable, structure-preserving surrogates for OPF that exploit large-scale pretraining on heterogeneous power-grid graphs. The distributed preprocessing/training infrastructure and leadership-class HPO campaign are concrete strengths that address scalability barriers in the field.
major comments (2)
- [Abstract, experiments paragraph] Abstract, experiments paragraph: the claim that fine-tuning improves accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost supplies no quantitative metrics, baseline comparisons, error bars, or data-split details, so the magnitude and reliability of the reported gains cannot be assessed.
- [Abstract, experiments paragraph] Abstract, experiments paragraph: all three million pretraining instances are drawn from only ten PGLib-OPF base cases; the downstream experiments give no indication of evaluation on held-out topologies, an eleventh PGLib case, or real utility data. This leaves the transferability required for the foundation-model claim untested.
minor comments (1)
- [Abstract] Abstract: the bus-count range '14 to 13,659 buses' should explicitly list the ten cases or clarify whether these are the exact sizes used.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract and experimental claims. We address each major comment below and will revise the manuscript to improve clarity and evidence presentation.
read point-by-point responses
-
Referee: [Abstract, experiments paragraph] Abstract, experiments paragraph: the claim that fine-tuning improves accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost supplies no quantitative metrics, baseline comparisons, error bars, or data-split details, so the magnitude and reliability of the reported gains cannot be assessed.
Authors: We agree that the abstract would benefit from quantitative support. The body of the manuscript reports detailed results including baselines, error bars, and data splits for the fine-tuning experiments. In the revision we will augment the abstract with representative quantitative metrics (e.g., accuracy deltas, convergence iterations, and adaptation-cost reductions) drawn from those sections. revision: yes
-
Referee: [Abstract, experiments paragraph] Abstract, experiments paragraph: all three million pretraining instances are drawn from only ten PGLib-OPF base cases; the downstream experiments give no indication of evaluation on held-out topologies, an eleventh PGLib case, or real utility data. This leaves the transferability required for the foundation-model claim untested.
Authors: The ten PGLib-OPF cases were deliberately chosen to cover a wide range of bus counts (14–13,659) and structural characteristics, allowing the pretraining to expose the model to substantial topological diversity. Downstream fine-tuning results are reported across these varied instances. We acknowledge that explicit evaluation on entirely held-out topologies would provide stronger support for the foundation-model transferability claim. We will revise the manuscript to state this scope limitation explicitly and, where feasible, add results on an eleventh case or note it as future work. revision: partial
Circularity Check
No significant circularity; empirical results on external benchmarks
full rationale
The paper presents a data-driven workflow for training heterogeneous GNNs on three million graph instances generated from ten PGLib-OPF cases, followed by empirical fine-tuning experiments measuring accuracy, convergence, and adaptation cost on feasibility classification and N-1 regression tasks. No equations, self-definitions, or self-citation chains reduce any reported prediction or benefit to quantities defined by the same fitted parameters. All central claims rest on measured outcomes against external benchmark data rather than internal tautologies or renamed fits.
Axiom & Free-Parameter Ledger
free parameters (2)
- GNN layer counts, hidden dimensions, and attention heads
- Training set size and topology sampling strategy
axioms (2)
- domain assumption Preserving distinct node and edge types (buses, generators, AC lines, transformers, etc.) improves surrogate accuracy for OPF
- domain assumption Pretraining on diverse PGLib topologies yields features that stabilize and accelerate fine-tuning on downstream tasks
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using three million heterogeneous graph instances spanning ten PGLib-OPF cases... fine-tuning pretrained OPF GFM improves low-data accuracy...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HeteroSAGE... relation-specific message passing... variable edge-attribute dimensions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
History of optimal power flow and formulations,
M. B. Cain, R. P. O’Neill, and A. Castillo, “History of optimal power flow and formulations,” Federal Energy Regulatory Commission (FERC), Tech. Rep., 2012
2012
-
[2]
Optimal power flow: A bibliographic survey I – formulations and deterministic methods,
S. Frank, I. Steponavice, and S. Rebennack, “Optimal power flow: A bibliographic survey I – formulations and deterministic methods,” Energy Systems, vol. 3, no. 3, pp. 221–258, 2012
2012
-
[3]
Optimal power flow using graph neural networks,
D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,”arXiv preprint arXiv:1910.09658, 2019
-
[4]
Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,
S. Liu, C. Wu, and H. Zhu, “Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,”IEEE Transactions on Power Systems, 2023
2023
-
[5]
A directed acyclic graph neural network for ac optimal power flow,
Z. Guo, K. Sun, B. Park, S. Simunovic, and W. Kang, “A directed acyclic graph neural network for ac optimal power flow,” in2023 IEEE Power & Energy Society General Meeting (PESGM), 2023
2023
-
[6]
Initial estimate of ac optimal power flow with graph neural networks,
A. Deihim, D. Apostolopoulou, and E. Alonso, “Initial estimate of ac optimal power flow with graph neural networks,”Electric Power Systems Research, vol. 234, p. 110782, 2024
2024
-
[7]
Physics-informed neural networks for ac optimal power flow,
F. Fioretto, T. W. K. Mak, and P. Van Hentenryck, “Physics-informed neural networks for ac optimal power flow,”Electric Power Systems Research, vol. 212, p. 108412, 2022
2022
-
[8]
OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,
S. Ghamizi, A. Ma, J. Cao, and P. Rodriguez Cortes, “OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,” in2024 IEEE Power & Energy Society General Meeting (PESGM), 2024
2024
-
[9]
Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,
A. Trigui, M. Olama, G. Siopsis, H. Eldakhakhni, and M. Salhi, “Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,” in2025 57th North American Power Symposium (NAPS), 2025
2025
-
[10]
Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,
A. Wen, B. Wen, J. Li, and J. Xu, “Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,”Applied System Innovation, vol. 9, no. 1, p. 18, 2026
2026
-
[11]
LUMINA: Foundation Models for Topology Transferable ACOPF
Y . Li, Z. Memon, H. Jin, S. Fenu, K. Song, S. B. Sharma, P. Gasana, H. Kim, L. Zhao, and K. Kim, “LUMINA: Foundation models for topology transferable ACOPF,” inInternational Conference on Learning Representations (ICLR), 2026, arXiv:2603.04300. [Online]. Available: https://arxiv.org/abs/2603.04300
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[12]
LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning
H. Jin, K. Song, Z. Memon, Y . Li, S. Fenu, H. Kim, L. Zhao, and K. Kim, “LUMINA: A grid foundation model for benchmarking AC optimal power flow surrogate learning,” arXiv preprint arXiv:2605.02133, 2026. [Online]. Available: https://arxiv.org/abs/2605.02133
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[13]
Towards Systematic Generalization for Power Grid Optimization Problems
Z. Memon, Y . Li, H. Jin, K. Kim, and L. Zhao, “Towards systematic generalization for power grid optimization problems,” arXiv preprint arXiv:2605.02026, 2026. [Online]. Available: https: //arxiv.org/abs/2605.02026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[14]
HydraGNN,
M. Lupo Pasini, S. T. Reeve, P. Zhang, and J. Y . Choi, “HydraGNN,” Distributed PyTorch implementation of multi-headed graph convolutional neural networks, United States, Oct. 2021. [Online]. Available: https://www.osti.gov/biblio/code-65891
2021
-
[15]
Inductive representation learning on large graphs,
W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” inAdvances in Neural Information Processing Systems, 2017
2017
-
[16]
Graph attention networks,
P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” inInternational Conference on Learning Representations, 2018
2018
-
[17]
How attentive are graph attention networks?
S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inInternational Conference on Learning Representations, 2022
2022
-
[18]
Principal neighbourhood aggregation for graph nets,
G. Corso, L. Cavalleri, D. Beaini, P. Li `o, and P. Veli ˇckovi´c, “Principal neighbourhood aggregation for graph nets,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 13 260–13 271
2020
-
[19]
Heterogeneous graph trans- former,
Z. Hu, Y . Dong, K. Wang, and Y . Sun, “Heterogeneous graph trans- former,” inProceedings of The Web Conference 2020, 2020, pp. 2704– 2710
2020
-
[20]
Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,
X. Mo, Y . Xing, and C. Lv, “Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,”arXiv preprint arXiv:2106.07161, 2021
-
[21]
OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,
T. Lovett, A. Buovich, A. Sharma, S. Pegg, S. Cohen, S. Stephens, A. Tucker, P. Pope, J. Eiselen, F. Buchaca, C. Sutton, J. Mantilla- Bilbao, T. Roeder, Y . Lin, E. Bridgett-Tomkinson, J. Garratt, J. Patterson, S. Lyons, A. Hales, and V . Petar, “OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,”arXiv preprint arXiv:240...
-
[22]
The power grid library for benchmarking ac optimal power flow algorithms,
S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, C. Josz, R. Korab, B. Lesieutre, J. Maeght, D. K. Molzahn, T. J. Overbye, P. Panciatici, B. Park, J. Snodgrass, A. Tbaileh, and R. D. Zimmerman, “The power grid library for benchmarking ac optimal power flow algorithms,”arXiv p...
-
[23]
Overcoming catastrophic forgetting in neural networks,
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.