pith. sign in

arxiv: 2605.04289 · v1 · submitted 2026-05-05 · 📡 eess.SY · cs.SY

Building Power Grid Models from Open Data: A Complete Pipeline from OpenStreetMap to Optimal Power Flow

Pith reviewed 2026-05-08 17:33 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords transmission grid modelingOpenStreetMapoptimal power flowopen dataAC-OPFpower systemsEIA datagrid topology
0
0 comments X

The pith

A pipeline constructs complete, OPF-solvable transmission grid models for every US state using only open data from OpenStreetMap and EIA sources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that realistic transmission network models can be assembled without access to restricted critical-infrastructure data. It does this by chaining five automated steps: pulling power assets from OpenStreetMap, rebuilding the bus-branch structure, filling in electrical values from public tables, spreading hourly demand according to population, and running both DC and AC optimal power flow. A sympathetic reader would care because the method removes the main barrier to reproducible power-systems studies. Validation on all 48 contiguous states shows that 42 models solve AC-OPF at the strictest relaxation level during peak hours and 44 off-peak, while median costs and losses match observed wholesale-market figures.

Core claim

The paper claims that its five-stage pipeline, which extracts infrastructure via a local Overpass API, reconstructs topology through voltage inference and transformer detection, estimates parameters from EIA-calibrated voltage-class tables, allocates demand using Census population as a proxy, and solves OPF in PowerModels.jl with automatic relaxation, produces complete models for all 48 single-state and six multi-state regions. Of those 48 models, 42 converge at the strictest AC-OPF relaxation level at peak hour and 44 off-peak, with median dispatch costs of $22 per MWh and median losses of 1.0 percent, both consistent with real market outcomes. All 54 models are released publicly.

What carries the argument

The five-stage open-data pipeline that extracts OSM infrastructure, reconstructs topology with voltage rules and transformer detection, populates parameters from EIA-calibrated lookup tables, distributes demand by population, and solves relaxed DC and AC optimal power flow.

If this is right

  • All 48 single-state and six multi-state models are released as open data for any researcher to use.
  • The progressive relaxation strategy allows the same pipeline to produce usable solutions even when input data contain moderate errors.
  • Dispatch costs and losses produced by the models fall in the same range as observed wholesale-market outcomes.
  • The approach works at both single-state and full-interconnection scales, including the 21,697-bus Eastern Interconnection.
  • Every step relies exclusively on publicly available sources, removing dependence on proprietary grid data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The released models could serve as standard test cases for comparing new optimal-power-flow algorithms or renewable-integration studies.
  • The same extraction and estimation steps could be applied to other countries that publish open mapping and census data.
  • Periodic re-runs of the pipeline would let the models track infrastructure changes visible in OpenStreetMap over time.

Load-bearing premise

Electrical parameters estimated via voltage-class lookup tables calibrated on EIA plant data, combined with population-based demand allocation and voltage inference rules, produce sufficiently accurate models for OPF convergence and realistic cost and loss outputs.

What would settle it

An independent comparison of one or more pipeline-generated state models against a known real transmission network would show whether the solved OPF costs, losses, and line flows fall within a few percent of actual recorded values.

Figures

Figures reproduced from arXiv: 2605.04289 by Andrea Britto, Baosen Zhang, Chris White, Spencer Fowers, Thiago Spina, Weiwei Yang.

Figure 1
Figure 1. Figure 1: Voltage inference for Virginia: 3,878 OSM-tagged lines (green), 60 inferred by neighbor view at source ↗
Figure 2
Figure 2. Figure 2: Transmission filter (≥69 kV) for Virginia: 3,735 segments retained, colored by voltage class (69 kV blue through 765 kV brown). 3.3.4 Circuit Count Parsing A single OSM way may carry multiple electrical circuits – for example, a double-circuit tower supports two independent three-phase circuits on the same structure, and a multi-voltage corridor may carry circuits at 345 kV and 138 kV side by side. Correct… view at source ↗
Figure 3
Figure 3. Figure 3: Circuit classification for Virginia: 875 inter-facility circuits (green solid), 441 single view at source ↗
Figure 4
Figure 4. Figure 4: Final bus-branch model for Virginia (largest connected component): 661 buses colored view at source ↗
Figure 5
Figure 5. Figure 5: Generator parameters for Virginia: 65 generators colored by fuel type, sized proportional view at source ↗
Figure 6
Figure 6. Figure 6: Capacity and fuel mix for Virginia: 65 generators totaling 9,273 MW nameplate, sized view at source ↗
Figure 7
Figure 7. Figure 7: BA detection for Virginia at 4 PM: PJM serves 650 buses (6.1% of BA capacity view at source ↗
Figure 8
Figure 8. Figure 8: Final load distribution for Virginia at 4 PM: 9,299 MW allocated to 661 buses (sized view at source ↗
Figure 9
Figure 9. Figure 9: AC-OPF economic dispatch for Virginia at 4 PM: 9,392 MW total generation across view at source ↗
Figure 10
Figure 10. Figure 10: AC-OPF line congestion for Virginia at 4 PM: 1,263 branches colored by loading ratio view at source ↗
Figure 11
Figure 11. Figure 11: DC-OPF vs. AC-OPF generation cost ($/hr) for all 48 contiguous states at 4 PM (peak). view at source ↗
Figure 12
Figure 12. Figure 12: AC-OPF cost premium over DC-OPF (%) for each state. The average premium across view at source ↗
Figure 13
Figure 13. Figure 13: States ranked by DC-OPF dispatch cost ($/MWh, left panel) alongside their installed view at source ↗
read the original abstract

Access to realistic transmission grid models is essential for power systems research, yet detailed network data in the United States remains restricted under critical-infrastructure regulations. We present a pipeline that constructs complete, OPF-solvable transmission network models entirely from publicly available data. The five-stage pipeline (1) extracts power infrastructure from OpenStreetMap via a local Overpass API instance, (2) reconstructs bus-branch topology through voltage inference, line merging, and transformer detection, (3) estimates electrical parameters using voltage-class lookup tables calibrated with U.S. Energy Information Administration (EIA) plant-level data, (4) allocates hourly demand from EIA-930 to individual buses using US Census population as a spatial proxy, and (5) solves both DC and AC optimal power flow using PowerModels.jl with a progressive relaxation strategy that automatically loosens constraints on imprecise models. We validate the pipeline on all 48 contiguous US states and six multi-state regions, including the full Western (5,076 buses) and Eastern (21,697 buses) Interconnections. Of the 48 single-state models, 42 (88%) converge at the strictest relaxation level for AC-OPF at peak hour and 44 (92%) off-peak. Dispatch costs (median $22/MWh) and system losses (median 1.0%) are consistent with real wholesale-market outcomes. The pipeline relies exclusively on open data sources, enabling reproducible grid analysis without proprietary data. All 54 models (48 single-state and 6 multi-state) are publicly released at https://github.com/microsoft/GridSFM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a five-stage pipeline to build complete, OPF-solvable transmission grid models for the 48 contiguous US states and six multi-state regions entirely from open data (OpenStreetMap via Overpass, EIA-930, and US Census). The stages extract infrastructure, reconstruct bus-branch topology via voltage inference and line merging, estimate parameters with voltage-class lookup tables calibrated on EIA plant data, allocate hourly demand via population proxies, and solve DC/AC OPF in PowerModels.jl using progressive relaxation. Validation reports 88-92% convergence at the strictest AC-OPF relaxation level across single-state models, with median dispatch costs of $22/MWh and losses of 1.0% consistent with real markets; all 54 models are released publicly on GitHub.

Significance. If the models prove sufficiently accurate, the work would enable large-scale, fully reproducible power-systems research without proprietary data, covering full interconnections up to 21k buses. The exclusive use of open sources, public model release, and integration with PowerModels.jl are clear strengths that support broader adoption. The scale and automation are notable, but the indirect validation (convergence plus median statistics) limits immediate significance for studies requiring high-fidelity local flows or voltages until direct fidelity metrics are added.

major comments (2)
  1. [Validation results (48 single-state models)] Validation of the 48 single-state models: the reported 88% (peak) and 92% (off-peak) convergence at the strictest relaxation level, together with median cost/loss statistics, provides only indirect support for the central claim of sufficient accuracy for OPF studies. Convergence can occur even when topology, line parameters, or bus-level loads are systematically biased, especially given the progressive relaxation that loosens constraints on imprecise models; no direct numerical comparison of estimated parameters or loads against independent real-grid measurements is described.
  2. [Pipeline stage 3] Stage 3 (electrical parameter estimation): the voltage-class lookup tables are calibrated solely on EIA plant-level data, yet the manuscript provides neither the explicit calibration factors nor any sensitivity analysis or error bounds on how these approximations propagate into OPF solutions and convergence rates. This is load-bearing for the accuracy claim because the tables directly determine impedances and admittances used in all subsequent OPF solves.
minor comments (2)
  1. [Abstract and results] The abstract and results would benefit from an explicit statement of the limitations of population-based demand allocation and voltage inference rules, including any known failure modes for rural or high-voltage-only areas.
  2. [Abstract] The GitHub repository link should be accompanied by a permanent archive (e.g., Zenodo DOI) to ensure long-term reproducibility of the released models.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below and have revised the manuscript to improve documentation and expand discussion of limitations.

read point-by-point responses
  1. Referee: Validation of the 48 single-state models: the reported 88% (peak) and 92% (off-peak) convergence at the strictest relaxation level, together with median cost/loss statistics, provides only indirect support for the central claim of sufficient accuracy for OPF studies. Convergence can occur even when topology, line parameters, or bus-level loads are systematically biased, especially given the progressive relaxation that loosens constraints on imprecise models; no direct numerical comparison of estimated parameters or loads against independent real-grid measurements is described.

    Authors: We agree that convergence under progressive relaxation provides only indirect evidence and does not rule out systematic biases. Direct numerical comparisons to real-grid measurements are not possible, as detailed US transmission data are restricted. In the revised manuscript we have added aggregate validation against public EIA statistics (total capacity, generation mix) and an explicit limitations subsection discussing potential biases in topology and load allocation. These changes strengthen the presentation while preserving the paper's focus on open-data reproducibility. revision: partial

  2. Referee: Stage 3 (electrical parameter estimation): the voltage-class lookup tables are calibrated solely on EIA plant-level data, yet the manuscript provides neither the explicit calibration factors nor any sensitivity analysis or error bounds on how these approximations propagate into OPF solutions and convergence rates. This is load-bearing for the accuracy claim because the tables directly determine impedances and admittances used in all subsequent OPF solves.

    Authors: We thank the referee for noting this gap. The revised manuscript now includes the full voltage-class lookup tables, the calibration procedure using EIA plant data, and a sensitivity study that perturbs impedances within literature-derived ranges. The study reports resulting changes in AC-OPF convergence rates and objective values, confirming that errors remain bounded and consistent with the observed 88-92% success rates. revision: yes

standing simulated objections not resolved
  • Direct numerical comparison of estimated parameters or loads against independent real-grid measurements, as such proprietary data are not publicly available.

Circularity Check

0 steps flagged

No circularity: constructive pipeline from external open data with computed outputs

full rationale

The paper describes a five-stage pipeline that extracts infrastructure from OSM, reconstructs topology, estimates parameters from voltage-class tables calibrated on EIA plant data, allocates demand via census proxies, and solves OPF. All inputs are external public datasets. The reported convergence statistics, median dispatch costs, and losses are direct outputs of the PowerModels.jl solver applied to the constructed models; they are not used to define, fit, or calibrate any stage of the pipeline. No self-citations, self-definitional equations, fitted-input predictions, or ansatzes appear in the derivation chain. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The pipeline depends on several domain assumptions for topology inference and parameter estimation plus calibration steps that introduce fitted values; no new physical entities are postulated.

free parameters (1)
  • voltage-class lookup table calibration factors
    Electrical parameters (resistance, reactance, capacity) are estimated from voltage-class tables that are calibrated against EIA plant-level data.
axioms (2)
  • domain assumption Population density is a valid spatial proxy for allocating EIA-930 hourly demand to individual buses
    Demand allocation step uses US Census population as the spatial weighting factor.
  • domain assumption Voltage inference, line merging, and transformer detection rules from OSM tags produce a correct bus-branch topology
    Stage 2 of the pipeline relies on these reconstruction heuristics.

pith-pipeline@v0.9.0 · 5607 in / 1510 out tokens · 28796 ms · 2026-05-08T17:33:48.745684+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    Energy and ai

    International Energy Agency. Energy and ai. Technical report, IEA, 2025. URLhttps: //www.iea.org/reports/energy-and-ai/. Published 10 April 2025

  2. [2]

    Energy Information Administration

    U.S. Energy Information Administration. Electric power annual 2024. Technical report, U.S. EIA, 2025. URLhttps://www.eia.gov/electricity/annual/. Published 2025, data year 2024, accessed 2026

  3. [3]

    King, Erhan Kutanoglu, Benjamin D

    Carey W. King, Erhan Kutanoglu, Benjamin D. Leibowicz, Ning Lin, Dev Niyogi, Varun Rai, Joshua D. Rhodes, Surya Santoso, David Spence, Stathis Tompaidis, Jay Zarnikau, and Hao Zhu. The timeline and events of the February 2021 Texas electric grid blackouts. Technical report, The University of Texas at Austin Energy Institute, 2021. URLhttps://energy. utexa...

  4. [4]

    Weather-related power outages rising, 2024

    Climate Central. Weather-related power outages rising, 2024. URLhttps://www. climatecentral.org/climate-matters/weather-related-power-outages-rising. Ac- cessed 2026

  5. [5]

    Critical energy infrastructure information (CEII), 2023

    FERC. Critical energy infrastructure information (CEII), 2023. URLhttps://www.ecfr. gov/current/title-18/chapter-I/subchapter-X/part-388/section-388.113. 18 CFR § 388.113

  6. [6]

    Critical infrastructure protection standards CIP-002 through CIP-014, 2023

    North American Electric Reliability Corporation. Critical infrastructure protection standards CIP-002 through CIP-014, 2023. URLhttps://www.nerc.com/standards/ reliability-standards/cip

  7. [7]

    Cengage Learning Stamford, CT, USA, 2012

    J Duncan Glover, Mulukutla S Sarma, Thomas Jeffrey Overbye, and NP Padhy.Power system analysis and design, volume 2008. Cengage Learning Stamford, CT, USA, 2012

  8. [8]

    Sogol Babaeinejadsarookolaee, Adam Birchfield, Richard D. Christie, Carleton Coffrin, Christopher DeMarco, Ruisheng Diao, Michael Ferris, Stephane Fliscounakis, Scott Greene, Renke Huang, Cedric Josz, Roman Korab, Bernard Lesieutre, Jean Maeght, Terrence W. K. Mak, Daniel K. Molzahn, Thomas J. Overbye, Patrick Panciatici, Byungkwon Park, Jonathan Snodgras...

  9. [9]

    Birchfield, Ti Xu, Kathleen M

    Adam B. Birchfield, Ti Xu, Kathleen M. Gegner, Komal S. Shetye, and Thomas J. Overbye. Grid structural characteristics as validation criteria for synthetic networks.IEEE Transactions on Power Systems, 32(4):3258–3265, 2017. doi: 10.1109/TPWRS.2016.2616385

  10. [10]

    Birchfield, Komal S

    Ti Xu, Adam B. Birchfield, Komal S. Shetye, and Thomas J. Overbye. Creation of synthetic electric grid models for transient stability studies. InProceedings of the IREP Symposium (Bulk Power System Dynamics and Control), 2017. URLhttps://overbye.engr.tamu.edu/ wp-content/uploads/sites/146/2022/01/IREP_Ti_WithFooter_ARCHIVE.pdf

  11. [11]

    Pearson Education India, 2009

    Arthur R Bergen.Power systems analysis. Pearson Education India, 2009

  12. [12]

    John Wiley & Sons, 2024

    DanielSKirschen.Power Systems: Fundamental Concepts and the Transition to Sustainability. John Wiley & Sons, 2024. 37

  13. [13]

    Cambridge University Press, 2026

    Steven Low.Power System Analysis: Analytical tools and structural properties. Cambridge University Press, 2026

  14. [14]

    Geometry of injection regions of power networks.IEEE Trans- actions on Power Systems, 28(2):788–797, 2012

    Baosen Zhang and David Tse. Geometry of injection regions of power networks.IEEE Trans- actions on Power Systems, 28(2):788–797, 2012

  15. [15]

    Convex relaxation of optimal power flow–part i: Formulations and equivalence

    Steven H Low. Convex relaxation of optimal power flow–part i: Formulations and equivalence. IEEE Transactions on Control of Network Systems, 1(1):15–27, 2014

  16. [16]

    Convex relaxation of optimal power flow—part ii: Exactness.IEEE Transac- tions on Control of Network Systems, 1(2):177–189, 2014

    Steven H Low. Convex relaxation of optimal power flow—part ii: Exactness.IEEE Transac- tions on Control of Network Systems, 1(2):177–189, 2014

  17. [17]

    Andreas Wächter and Lorenz T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical Programming, 106: 25–57, 2006. doi: 10.1007/s10107-004-0559-y

  18. [18]

    Molzahn and Ian A

    Daniel K. Molzahn and Ian A. Hiskens. A survey of relaxations and approximations of the power flow equations.Foundations and Trends in Electric Energy Systems, 4(1–2):1–221, 2019

  19. [19]

    Stats — openstreetmap wiki, 2026

    OpenStreetMap Wiki contributors. Stats — openstreetmap wiki, 2026. URLhttps://wiki. openstreetmap.org/wiki/Stats. Accessed 2026

  20. [20]

    Openstreetmap taginfo, 2026

    Jochen Topf and contributors. Openstreetmap taginfo, 2026. URLhttps://taginfo. openstreetmap.org/. Tag usage statistics, accessed 2026-03-30

  21. [21]

    Christopher Arderne, Conrad Zorn, Claire Nicolas, and Elco E. Koks. Predictive mapping of the global power system using open data.Scientific Data, 7:19, 2020. doi: 10.1038/ s41597-019-0347-4

  22. [22]

    PowerMod- els.jl: An open-source framework for exploring power flow formulations

    Carleton Coffrin, Russell Bent, Kaarthik Sundar, Yeesian Ng, and Miles Lubin. PowerMod- els.jl: An open-source framework for exploring power flow formulations. InProceedings of the Power Systems Computation Conference (PSCC), 2018

  23. [23]

    OpenStreetMap, 2026

    OpenStreetMap Contributors. OpenStreetMap, 2026. URLhttps://www.openstreetmap. org. Accessed 2026

  24. [24]

    Energy Information Administration

    U.S. Energy Information Administration. Form EIA-860: Annual electric generator report,

  25. [25]

    Accessed 2026

    URLhttps://www.eia.gov/electricity/data/eia860/. Accessed 2026

  26. [26]

    Energy Information Administration

    U.S. Energy Information Administration. Form EIA-923: Power plant operations report, 2024. URLhttps://www.eia.gov/electricity/data/eia923/. Accessed 2026

  27. [27]

    Energy Information Administration

    U.S. Energy Information Administration. EIA-930 hourly electric grid monitor, 2026. URL https://www.eia.gov/electricity/gridmonitor/. Accessed 2026

  28. [28]

    Census Bureau

    U.S. Census Bureau. American community survey 5-year estimates, 2024. URLhttps:// data.census.gov. Accessed 2026

  29. [29]

    The demand for electricity: a survey.The Bell Journal of Economics, pages 74–110, 1975

    Lester D Taylor. The demand for electricity: a survey.The Bell Journal of Economics, pages 74–110, 1975

  30. [30]

    Density forecasting for long-term peak electricity demand

    Rob J Hyndman and Shu Fan. Density forecasting for long-term peak electricity demand. IEEE Transactions on Power Systems, 25(2):1142–1153, 2009. 38

  31. [31]

    Department of Homeland Security

    U.S. Department of Homeland Security. Electric planning areas (balancing authorities),

  32. [32]

    Homeland Infrastructure Foundation-Level Data (HIFLD); ArcGIS Feature Server

    URLhttps://services5.arcgis.com/HDRa0B57OVrv2E1q/arcgis/rest/services/ Electric_Planning_Areas/FeatureServer/0. Homeland Infrastructure Foundation-Level Data (HIFLD); ArcGIS Feature Server

  33. [33]

    John Wiley & Sons, 2026

    Daniel S Kirschen and Goran Strbac.Fundamentals of power system economics. John Wiley & Sons, 2026

  34. [34]

    Energy Information Administration

    U.S. Energy Information Administration. Form EIA-861: Annual electric power industry report, 2024. URLhttps://www.eia.gov/electricity/data/eia861/. Accessed 2026

  35. [35]

    PyPSA-Eur: An open optimisation model of the European transmission system.Energy Strategy Reviews, 22: 207–215, 2018

    Jonas Hörsch, Fabian Hofmann, David Schlachtberger, and Tom Brown. PyPSA-Eur: An open optimisation model of the European transmission system.Energy Strategy Reviews, 22: 207–215, 2018. doi: 10.1016/j.esr.2018.08.012

  36. [36]

    GridKit: European and North American extracts, 2016

    Bart Wiegmans. GridKit: European and North American extracts, 2016. URLhttps:// zenodo.org/record/47317. Dataset, Zenodo, accessed 2026. 39