Scalable Heterogeneous Graph Foundation Models for Data-Driven Optimal Power Flow in Smart Grids

Kibaek Kim; Massimiliano Lupo Pasini; Teja Kuruganti; Yijiang Li

arxiv: 2605.23194 · v1 · pith:3KWC23CTnew · submitted 2026-05-22 · 💻 cs.LG · cs.AI

Scalable Heterogeneous Graph Foundation Models for Data-Driven Optimal Power Flow in Smart Grids

Massimiliano Lupo Pasini , Yijiang Li , Kibaek Kim , Teja Kuruganti This is my paper

Pith reviewed 2026-05-25 05:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords optimal power flowheterogeneous graph neural networksgraph foundation modelsfeasibility classificationN-1 contingency analysissmart gridsdistributed traininghyperparameter optimization

0 comments

The pith

Pretraining a heterogeneous graph neural network on three million power-grid instances produces a foundation model whose fine-tuning improves low-data accuracy on optimal power flow tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a distributed workflow on HydraGNN that keeps the native node and edge types of power networks intact while training across millions of graph instances drawn from ten different PGLib cases. Compact models identified through large-scale hyperparameter search serve as foundation models that can be adapted to new downstream problems. Experiments on feasibility classification and N-1 contingency regression demonstrate that partial or head-only fine-tuning of these pretrained models raises accuracy, stabilizes training, speeds convergence, and lowers the data and compute needed for adaptation. A sympathetic reader cares because real grids frequently present new topologies or operating points with only limited labeled data, so a reusable pretrained surrogate could reduce the engineering effort required for fast, reliable OPF approximations.

Core claim

Training compact heterogeneous graph neural networks (approximately 1.6-1.7 million parameters) on three million instances that span ten PGLib-OPF cases from 14 to 13,659 buses yields OPF foundation models; when these models are fine-tuned with partial-layer or head-only updates on feasibility classification and N-1 contingency regression tasks, low-data accuracy rises, training stabilizes, convergence accelerates, and adaptation cost falls relative to training from random initialization.

What carries the argument

The HydraGNN-based scalable heterogeneous GNN workflow that preserves distinct node types (buses, generators, loads, shunts) and edge types (AC lines, transformers, device-to-bus couplings) and supports distributed preprocessing, training, and hyperparameter optimization on leadership-class supercomputers.

If this is right

Partial or head-only fine-tuning of the pretrained model reduces the data volume and compute required to reach target accuracy on new OPF surrogate tasks.
Pretraining across multiple grid topologies stabilizes the training trajectory and shortens the number of epochs needed for convergence on downstream feasibility and contingency problems.
Models discovered by the DeepHyper campaign on Frontier achieve the lowest validation losses among the compact architectures tested.
The workflow scales preprocessing and training to grids containing more than thirteen thousand buses while maintaining the heterogeneous structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the representation learned on the ten cases captures enough common structure, the same foundation model could be adapted to grid instances never seen during pretraining without full retraining.
The approach suggests a path toward reusable surrogates that could be updated incrementally as new sensor data or topology changes arrive in operational smart-grid settings.
Extending the heterogeneous typing to additional device classes (for example, renewable inverters or storage) would require only modest changes to the same workflow.

Load-bearing premise

The three million graph instances drawn from only ten PGLib-OPF cases are representative enough that fine-tuning benefits transfer to arbitrary real-world grids and operating conditions.

What would settle it

Apply the same pretraining-plus-fine-tuning protocol to a power-grid case that lies outside the original ten PGLib instances and measure whether the pretrained model still shows accuracy, stability, or convergence gains over random initialization in the low-data regime.

Figures

Figures reproduced from arXiv: 2605.23194 by Kibaek Kim, Massimiliano Lupo Pasini, Teja Kuruganti, Yijiang Li.

**Figure 2.** Figure 2: Validation loss vs. number of trainable parameters per [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: FT1 feasibility classification: validation BCE loss vs. training epoch for HeteroSAGE (top row) and HeteroHEAT [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: FT2 N−1 contingency regression: validation MSE loss versus training epoch for HeteroSAGE and HeteroHEAT across multiple labeled-data regimes. and computational cost. Full fine-tuning is less reliable because unconstrained updates to all layers can overwrite useful pretrained representations, consistent with catastrophic forgetting [23], and should be reserved for cases where the downstream dataset is lar… view at source ↗

read the original abstract

Fast and reliable optimal power flow (OPF) approximation is essential for reliable smart-grid operation, yet many learning-based surrogates either flatten the native heterogeneous structure of power networks, target a limited set of grid topologies, or lack scalable infrastructure for graph foundation model (GFM) training. This paper presents a scalable heterogeneous graph neural network (GNN) workflow, built on HydraGNN, for data-driven OPF surrogate modeling and OPF-GFM development. The workflow preserves the distinct node and edge types of power grids -- buses, generators, loads, shunts, AC lines, transformers, and device-to-bus couplings -- and supports distributed preprocessing, training, hyperparameter optimization (HPO), and downstream fine-tuning on leadership-class supercomputers. Using three million heterogeneous graph instances spanning ten PGLib-OPF cases, from 14 to 13,659 buses, we conduct DeepHyper-driven HPO on the ORNL Frontier supercomputer. The campaign identifies compact models ($\sim$1.6--1.7M parameters) with the lowest validation losses. Downstream experiments on feasibility classification and N-1 contingency regression show that fine-tuning pretrained OPF GFM improves low-data accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost when partial or head-only fine-tuning is used.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Scalable heterogeneous GNN pretraining for OPF works at supercomputer scale but transferability is untested beyond ten PGLib cases.

read the letter

This paper shows how to train heterogeneous graph models for optimal power flow at scale on a supercomputer, but the benefits of pretraining are hard to assess without numbers and the data comes from only ten base cases. They build on HydraGNN to keep buses, generators, loads, lines and so on as distinct types instead of flattening everything. They generate three million graphs from ten PGLib-OPF cases ranging up to thirteen thousand buses, run DeepHyper hyperparameter search on Frontier, and then look at fine-tuning for feasibility classification and N-1 regression. The workflow includes distributed preprocessing and supports partial or head-only fine-tuning to cut adaptation cost. The engineering part is solid. Running that volume of heterogeneous graphs through HPO on leadership hardware is not trivial, and they end up with models of about 1.6 million parameters. Preserving the native types is a clear improvement over approaches that lose the structure. The main limitation is the narrow set of source cases. All pretraining data comes from those ten topologies, and the abstract gives no sign that the test sets include grids outside them. That makes it difficult to know whether the reported gains in accuracy, stability, and convergence come from learning transferable features or just from seeing similar data during pretraining. The lack of any quantitative metrics, baselines, or error bars in the abstract also leaves the performance claims unverified for now. This work is aimed at people building machine learning surrogates for power system control, especially those who need to handle large or varying grids. A reader interested in graph foundation models for engineering domains would get value from the scaling details. It deserves a serious referee to verify the experimental setup and check whether they ran any cross-topology tests. I would send it to review.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a scalable heterogeneous graph neural network workflow based on HydraGNN for developing graph foundation models (GFMs) for data-driven optimal power flow (OPF) approximation. It generates three million heterogeneous graph instances from ten PGLib-OPF cases (14–13,659 buses), performs DeepHyper-driven hyperparameter optimization on the ORNL Frontier supercomputer to identify compact models (~1.6–1.7M parameters), and reports that fine-tuning the resulting pretrained OPF GFM improves low-data accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost on downstream feasibility classification and N-1 contingency regression tasks when using partial or head-only fine-tuning.

Significance. If the fine-tuning benefits hold under broader evaluation, the work could advance scalable, structure-preserving surrogates for OPF that exploit large-scale pretraining on heterogeneous power-grid graphs. The distributed preprocessing/training infrastructure and leadership-class HPO campaign are concrete strengths that address scalability barriers in the field.

major comments (2)

[Abstract, experiments paragraph] Abstract, experiments paragraph: the claim that fine-tuning improves accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost supplies no quantitative metrics, baseline comparisons, error bars, or data-split details, so the magnitude and reliability of the reported gains cannot be assessed.
[Abstract, experiments paragraph] Abstract, experiments paragraph: all three million pretraining instances are drawn from only ten PGLib-OPF base cases; the downstream experiments give no indication of evaluation on held-out topologies, an eleventh PGLib case, or real utility data. This leaves the transferability required for the foundation-model claim untested.

minor comments (1)

[Abstract] Abstract: the bus-count range '14 to 13,659 buses' should explicitly list the ten cases or clarify whether these are the exact sizes used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and experimental claims. We address each major comment below and will revise the manuscript to improve clarity and evidence presentation.

read point-by-point responses

Referee: [Abstract, experiments paragraph] Abstract, experiments paragraph: the claim that fine-tuning improves accuracy, stabilizes training, accelerates convergence, and reduces adaptation cost supplies no quantitative metrics, baseline comparisons, error bars, or data-split details, so the magnitude and reliability of the reported gains cannot be assessed.

Authors: We agree that the abstract would benefit from quantitative support. The body of the manuscript reports detailed results including baselines, error bars, and data splits for the fine-tuning experiments. In the revision we will augment the abstract with representative quantitative metrics (e.g., accuracy deltas, convergence iterations, and adaptation-cost reductions) drawn from those sections. revision: yes
Referee: [Abstract, experiments paragraph] Abstract, experiments paragraph: all three million pretraining instances are drawn from only ten PGLib-OPF base cases; the downstream experiments give no indication of evaluation on held-out topologies, an eleventh PGLib case, or real utility data. This leaves the transferability required for the foundation-model claim untested.

Authors: The ten PGLib-OPF cases were deliberately chosen to cover a wide range of bus counts (14–13,659) and structural characteristics, allowing the pretraining to expose the model to substantial topological diversity. Downstream fine-tuning results are reported across these varied instances. We acknowledge that explicit evaluation on entirely held-out topologies would provide stronger support for the foundation-model transferability claim. We will revise the manuscript to state this scope limitation explicitly and, where feasible, add results on an eleventh case or note it as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical results on external benchmarks

full rationale

The paper presents a data-driven workflow for training heterogeneous GNNs on three million graph instances generated from ten PGLib-OPF cases, followed by empirical fine-tuning experiments measuring accuracy, convergence, and adaptation cost on feasibility classification and N-1 regression tasks. No equations, self-definitions, or self-citation chains reduce any reported prediction or benefit to quantities defined by the same fitted parameters. All central claims rest on measured outcomes against external benchmark data rather than internal tautologies or renamed fits.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Central claims rest on the domain assumption that heterogeneous graph structure adds predictive value over flattened representations and that pretraining across the ten chosen PGLib cases produces transferable features; model architecture choices and loss functions are free parameters selected by HPO but not enumerated.

free parameters (2)

GNN layer counts, hidden dimensions, and attention heads
Determined by DeepHyper HPO; specific values not reported in abstract
Training set size and topology sampling strategy
Fixed at three million instances from ten PGLib cases; choice affects claimed generalization

axioms (2)

domain assumption Preserving distinct node and edge types (buses, generators, AC lines, transformers, etc.) improves surrogate accuracy for OPF
Invoked to justify the heterogeneous GNN design over flattening approaches
domain assumption Pretraining on diverse PGLib topologies yields features that stabilize and accelerate fine-tuning on downstream tasks
Load-bearing premise for the reported fine-tuning benefits

pith-pipeline@v0.9.0 · 5779 in / 1633 out tokens · 43085 ms · 2026-05-25T05:25:59.433864+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using three million heterogeneous graph instances spanning ten PGLib-OPF cases... fine-tuning pretrained OPF GFM improves low-data accuracy...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HeteroSAGE... relation-specific message passing... variable edge-attribute dimensions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 7 canonical work pages · 3 internal anchors

[1]

History of optimal power flow and formulations,

M. B. Cain, R. P. O’Neill, and A. Castillo, “History of optimal power flow and formulations,” Federal Energy Regulatory Commission (FERC), Tech. Rep., 2012

2012
[2]

Optimal power flow: A bibliographic survey I – formulations and deterministic methods,

S. Frank, I. Steponavice, and S. Rebennack, “Optimal power flow: A bibliographic survey I – formulations and deterministic methods,” Energy Systems, vol. 3, no. 3, pp. 221–258, 2012

2012
[3]

Optimal power flow using graph neural networks,

D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,”arXiv preprint arXiv:1910.09658, 2019

work page arXiv 1910
[4]

Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,

S. Liu, C. Wu, and H. Zhu, “Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,”IEEE Transactions on Power Systems, 2023

2023
[5]

A directed acyclic graph neural network for ac optimal power flow,

Z. Guo, K. Sun, B. Park, S. Simunovic, and W. Kang, “A directed acyclic graph neural network for ac optimal power flow,” in2023 IEEE Power & Energy Society General Meeting (PESGM), 2023

2023
[6]

Initial estimate of ac optimal power flow with graph neural networks,

A. Deihim, D. Apostolopoulou, and E. Alonso, “Initial estimate of ac optimal power flow with graph neural networks,”Electric Power Systems Research, vol. 234, p. 110782, 2024

2024
[7]

Physics-informed neural networks for ac optimal power flow,

F. Fioretto, T. W. K. Mak, and P. Van Hentenryck, “Physics-informed neural networks for ac optimal power flow,”Electric Power Systems Research, vol. 212, p. 108412, 2022

2022
[8]

OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,

S. Ghamizi, A. Ma, J. Cao, and P. Rodriguez Cortes, “OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,” in2024 IEEE Power & Energy Society General Meeting (PESGM), 2024

2024
[9]

Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,

A. Trigui, M. Olama, G. Siopsis, H. Eldakhakhni, and M. Salhi, “Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,” in2025 57th North American Power Symposium (NAPS), 2025

2025
[10]

Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,

A. Wen, B. Wen, J. Li, and J. Xu, “Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,”Applied System Innovation, vol. 9, no. 1, p. 18, 2026

2026
[11]

LUMINA: Foundation Models for Topology Transferable ACOPF

Y . Li, Z. Memon, H. Jin, S. Fenu, K. Song, S. B. Sharma, P. Gasana, H. Kim, L. Zhao, and K. Kim, “LUMINA: Foundation models for topology transferable ACOPF,” inInternational Conference on Learning Representations (ICLR), 2026, arXiv:2603.04300. [Online]. Available: https://arxiv.org/abs/2603.04300

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning

H. Jin, K. Song, Z. Memon, Y . Li, S. Fenu, H. Kim, L. Zhao, and K. Kim, “LUMINA: A grid foundation model for benchmarking AC optimal power flow surrogate learning,” arXiv preprint arXiv:2605.02133, 2026. [Online]. Available: https://arxiv.org/abs/2605.02133

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

Towards Systematic Generalization for Power Grid Optimization Problems

Z. Memon, Y . Li, H. Jin, K. Kim, and L. Zhao, “Towards systematic generalization for power grid optimization problems,” arXiv preprint arXiv:2605.02026, 2026. [Online]. Available: https: //arxiv.org/abs/2605.02026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[14]

HydraGNN,

M. Lupo Pasini, S. T. Reeve, P. Zhang, and J. Y . Choi, “HydraGNN,” Distributed PyTorch implementation of multi-headed graph convolutional neural networks, United States, Oct. 2021. [Online]. Available: https://www.osti.gov/biblio/code-65891

2021
[15]

Inductive representation learning on large graphs,

W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” inAdvances in Neural Information Processing Systems, 2017

2017
[16]

Graph attention networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” inInternational Conference on Learning Representations, 2018

2018
[17]

How attentive are graph attention networks?

S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inInternational Conference on Learning Representations, 2022

2022
[18]

Principal neighbourhood aggregation for graph nets,

G. Corso, L. Cavalleri, D. Beaini, P. Li `o, and P. Veli ˇckovi´c, “Principal neighbourhood aggregation for graph nets,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 13 260–13 271

2020
[19]

Heterogeneous graph trans- former,

Z. Hu, Y . Dong, K. Wang, and Y . Sun, “Heterogeneous graph trans- former,” inProceedings of The Web Conference 2020, 2020, pp. 2704– 2710

2020
[20]

Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,

X. Mo, Y . Xing, and C. Lv, “Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,”arXiv preprint arXiv:2106.07161, 2021

work page arXiv 2021
[21]

OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,

T. Lovett, A. Buovich, A. Sharma, S. Pegg, S. Cohen, S. Stephens, A. Tucker, P. Pope, J. Eiselen, F. Buchaca, C. Sutton, J. Mantilla- Bilbao, T. Roeder, Y . Lin, E. Bridgett-Tomkinson, J. Garratt, J. Patterson, S. Lyons, A. Hales, and V . Petar, “OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,”arXiv preprint arXiv:240...

work page arXiv 2024
[22]

The power grid library for benchmarking ac optimal power flow algorithms,

S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, C. Josz, R. Korab, B. Lesieutre, J. Maeght, D. K. Molzahn, T. J. Overbye, P. Panciatici, B. Park, J. Snodgrass, A. Tbaileh, and R. D. Zimmerman, “The power grid library for benchmarking ac optimal power flow algorithms,”arXiv p...

work page arXiv 1908
[23]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

2017

[1] [1]

History of optimal power flow and formulations,

M. B. Cain, R. P. O’Neill, and A. Castillo, “History of optimal power flow and formulations,” Federal Energy Regulatory Commission (FERC), Tech. Rep., 2012

2012

[2] [2]

Optimal power flow: A bibliographic survey I – formulations and deterministic methods,

S. Frank, I. Steponavice, and S. Rebennack, “Optimal power flow: A bibliographic survey I – formulations and deterministic methods,” Energy Systems, vol. 3, no. 3, pp. 221–258, 2012

2012

[3] [3]

Optimal power flow using graph neural networks,

D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,”arXiv preprint arXiv:1910.09658, 2019

work page arXiv 1910

[4] [4]

Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,

S. Liu, C. Wu, and H. Zhu, “Topology-aware graph neural networks for learning feasible and adaptive ac-opf solutions,”IEEE Transactions on Power Systems, 2023

2023

[5] [5]

A directed acyclic graph neural network for ac optimal power flow,

Z. Guo, K. Sun, B. Park, S. Simunovic, and W. Kang, “A directed acyclic graph neural network for ac optimal power flow,” in2023 IEEE Power & Energy Society General Meeting (PESGM), 2023

2023

[6] [6]

Initial estimate of ac optimal power flow with graph neural networks,

A. Deihim, D. Apostolopoulou, and E. Alonso, “Initial estimate of ac optimal power flow with graph neural networks,”Electric Power Systems Research, vol. 234, p. 110782, 2024

2024

[7] [7]

Physics-informed neural networks for ac optimal power flow,

F. Fioretto, T. W. K. Mak, and P. Van Hentenryck, “Physics-informed neural networks for ac optimal power flow,”Electric Power Systems Research, vol. 212, p. 108412, 2022

2022

[8] [8]

OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,

S. Ghamizi, A. Ma, J. Cao, and P. Rodriguez Cortes, “OPF-HGNN: Generalizable heterogeneous graph neural networks for ac optimal power flow,” in2024 IEEE Power & Energy Society General Meeting (PESGM), 2024

2024

[9] [9]

Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,

A. Trigui, M. Olama, G. Siopsis, H. Eldakhakhni, and M. Salhi, “Graph- based attention mechanisms for solving the ac optimal power flow problem in electrical power networks,” in2025 57th North American Power Symposium (NAPS), 2025

2025

[10] [10]

Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,

A. Wen, B. Wen, J. Li, and J. Xu, “Heterogeneous graph neural network with local and global message passing for ac-optimal power flow solutions,”Applied System Innovation, vol. 9, no. 1, p. 18, 2026

2026

[11] [11]

LUMINA: Foundation Models for Topology Transferable ACOPF

Y . Li, Z. Memon, H. Jin, S. Fenu, K. Song, S. B. Sharma, P. Gasana, H. Kim, L. Zhao, and K. Kim, “LUMINA: Foundation models for topology transferable ACOPF,” inInternational Conference on Learning Representations (ICLR), 2026, arXiv:2603.04300. [Online]. Available: https://arxiv.org/abs/2603.04300

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning

H. Jin, K. Song, Z. Memon, Y . Li, S. Fenu, H. Kim, L. Zhao, and K. Kim, “LUMINA: A grid foundation model for benchmarking AC optimal power flow surrogate learning,” arXiv preprint arXiv:2605.02133, 2026. [Online]. Available: https://arxiv.org/abs/2605.02133

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [13]

Towards Systematic Generalization for Power Grid Optimization Problems

Z. Memon, Y . Li, H. Jin, K. Kim, and L. Zhao, “Towards systematic generalization for power grid optimization problems,” arXiv preprint arXiv:2605.02026, 2026. [Online]. Available: https: //arxiv.org/abs/2605.02026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[14] [14]

HydraGNN,

M. Lupo Pasini, S. T. Reeve, P. Zhang, and J. Y . Choi, “HydraGNN,” Distributed PyTorch implementation of multi-headed graph convolutional neural networks, United States, Oct. 2021. [Online]. Available: https://www.osti.gov/biblio/code-65891

2021

[15] [15]

Inductive representation learning on large graphs,

W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” inAdvances in Neural Information Processing Systems, 2017

2017

[16] [16]

Graph attention networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” inInternational Conference on Learning Representations, 2018

2018

[17] [17]

How attentive are graph attention networks?

S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inInternational Conference on Learning Representations, 2022

2022

[18] [18]

Principal neighbourhood aggregation for graph nets,

G. Corso, L. Cavalleri, D. Beaini, P. Li `o, and P. Veli ˇckovi´c, “Principal neighbourhood aggregation for graph nets,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 13 260–13 271

2020

[19] [19]

Heterogeneous graph trans- former,

Z. Hu, Y . Dong, K. Wang, and Y . Sun, “Heterogeneous graph trans- former,” inProceedings of The Web Conference 2020, 2020, pp. 2704– 2710

2020

[20] [20]

Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,

X. Mo, Y . Xing, and C. Lv, “Heterogeneous edge-enhanced graph attention network for multi-agent trajectory prediction,”arXiv preprint arXiv:2106.07161, 2021

work page arXiv 2021

[21] [21]

OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,

T. Lovett, A. Buovich, A. Sharma, S. Pegg, S. Cohen, S. Stephens, A. Tucker, P. Pope, J. Eiselen, F. Buchaca, C. Sutton, J. Mantilla- Bilbao, T. Roeder, Y . Lin, E. Bridgett-Tomkinson, J. Garratt, J. Patterson, S. Lyons, A. Hales, and V . Petar, “OPFData: Large-scale datasets for machine learning-accelerated ac optimal power flow,”arXiv preprint arXiv:240...

work page arXiv 2024

[22] [22]

The power grid library for benchmarking ac optimal power flow algorithms,

S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, C. Josz, R. Korab, B. Lesieutre, J. Maeght, D. K. Molzahn, T. J. Overbye, P. Panciatici, B. Park, J. Snodgrass, A. Tbaileh, and R. D. Zimmerman, “The power grid library for benchmarking ac optimal power flow algorithms,”arXiv p...

work page arXiv 1908

[23] [23]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

2017