Building Power Grid Models from Open Data: A Complete Pipeline from OpenStreetMap to Optimal Power Flow
Pith reviewed 2026-05-08 17:33 UTC · model grok-4.3
The pith
A pipeline constructs complete, OPF-solvable transmission grid models for every US state using only open data from OpenStreetMap and EIA sources.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that its five-stage pipeline, which extracts infrastructure via a local Overpass API, reconstructs topology through voltage inference and transformer detection, estimates parameters from EIA-calibrated voltage-class tables, allocates demand using Census population as a proxy, and solves OPF in PowerModels.jl with automatic relaxation, produces complete models for all 48 single-state and six multi-state regions. Of those 48 models, 42 converge at the strictest AC-OPF relaxation level at peak hour and 44 off-peak, with median dispatch costs of $22 per MWh and median losses of 1.0 percent, both consistent with real market outcomes. All 54 models are released publicly.
What carries the argument
The five-stage open-data pipeline that extracts OSM infrastructure, reconstructs topology with voltage rules and transformer detection, populates parameters from EIA-calibrated lookup tables, distributes demand by population, and solves relaxed DC and AC optimal power flow.
If this is right
- All 48 single-state and six multi-state models are released as open data for any researcher to use.
- The progressive relaxation strategy allows the same pipeline to produce usable solutions even when input data contain moderate errors.
- Dispatch costs and losses produced by the models fall in the same range as observed wholesale-market outcomes.
- The approach works at both single-state and full-interconnection scales, including the 21,697-bus Eastern Interconnection.
- Every step relies exclusively on publicly available sources, removing dependence on proprietary grid data.
Where Pith is reading between the lines
- The released models could serve as standard test cases for comparing new optimal-power-flow algorithms or renewable-integration studies.
- The same extraction and estimation steps could be applied to other countries that publish open mapping and census data.
- Periodic re-runs of the pipeline would let the models track infrastructure changes visible in OpenStreetMap over time.
Load-bearing premise
Electrical parameters estimated via voltage-class lookup tables calibrated on EIA plant data, combined with population-based demand allocation and voltage inference rules, produce sufficiently accurate models for OPF convergence and realistic cost and loss outputs.
What would settle it
An independent comparison of one or more pipeline-generated state models against a known real transmission network would show whether the solved OPF costs, losses, and line flows fall within a few percent of actual recorded values.
Figures
read the original abstract
Access to realistic transmission grid models is essential for power systems research, yet detailed network data in the United States remains restricted under critical-infrastructure regulations. We present a pipeline that constructs complete, OPF-solvable transmission network models entirely from publicly available data. The five-stage pipeline (1) extracts power infrastructure from OpenStreetMap via a local Overpass API instance, (2) reconstructs bus-branch topology through voltage inference, line merging, and transformer detection, (3) estimates electrical parameters using voltage-class lookup tables calibrated with U.S. Energy Information Administration (EIA) plant-level data, (4) allocates hourly demand from EIA-930 to individual buses using US Census population as a spatial proxy, and (5) solves both DC and AC optimal power flow using PowerModels.jl with a progressive relaxation strategy that automatically loosens constraints on imprecise models. We validate the pipeline on all 48 contiguous US states and six multi-state regions, including the full Western (5,076 buses) and Eastern (21,697 buses) Interconnections. Of the 48 single-state models, 42 (88%) converge at the strictest relaxation level for AC-OPF at peak hour and 44 (92%) off-peak. Dispatch costs (median $22/MWh) and system losses (median 1.0%) are consistent with real wholesale-market outcomes. The pipeline relies exclusively on open data sources, enabling reproducible grid analysis without proprietary data. All 54 models (48 single-state and 6 multi-state) are publicly released at https://github.com/microsoft/GridSFM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a five-stage pipeline to build complete, OPF-solvable transmission grid models for the 48 contiguous US states and six multi-state regions entirely from open data (OpenStreetMap via Overpass, EIA-930, and US Census). The stages extract infrastructure, reconstruct bus-branch topology via voltage inference and line merging, estimate parameters with voltage-class lookup tables calibrated on EIA plant data, allocate hourly demand via population proxies, and solve DC/AC OPF in PowerModels.jl using progressive relaxation. Validation reports 88-92% convergence at the strictest AC-OPF relaxation level across single-state models, with median dispatch costs of $22/MWh and losses of 1.0% consistent with real markets; all 54 models are released publicly on GitHub.
Significance. If the models prove sufficiently accurate, the work would enable large-scale, fully reproducible power-systems research without proprietary data, covering full interconnections up to 21k buses. The exclusive use of open sources, public model release, and integration with PowerModels.jl are clear strengths that support broader adoption. The scale and automation are notable, but the indirect validation (convergence plus median statistics) limits immediate significance for studies requiring high-fidelity local flows or voltages until direct fidelity metrics are added.
major comments (2)
- [Validation results (48 single-state models)] Validation of the 48 single-state models: the reported 88% (peak) and 92% (off-peak) convergence at the strictest relaxation level, together with median cost/loss statistics, provides only indirect support for the central claim of sufficient accuracy for OPF studies. Convergence can occur even when topology, line parameters, or bus-level loads are systematically biased, especially given the progressive relaxation that loosens constraints on imprecise models; no direct numerical comparison of estimated parameters or loads against independent real-grid measurements is described.
- [Pipeline stage 3] Stage 3 (electrical parameter estimation): the voltage-class lookup tables are calibrated solely on EIA plant-level data, yet the manuscript provides neither the explicit calibration factors nor any sensitivity analysis or error bounds on how these approximations propagate into OPF solutions and convergence rates. This is load-bearing for the accuracy claim because the tables directly determine impedances and admittances used in all subsequent OPF solves.
minor comments (2)
- [Abstract and results] The abstract and results would benefit from an explicit statement of the limitations of population-based demand allocation and voltage inference rules, including any known failure modes for rural or high-voltage-only areas.
- [Abstract] The GitHub repository link should be accompanied by a permanent archive (e.g., Zenodo DOI) to ensure long-term reproducibility of the released models.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We respond to each major comment below and have revised the manuscript to improve documentation and expand discussion of limitations.
read point-by-point responses
-
Referee: Validation of the 48 single-state models: the reported 88% (peak) and 92% (off-peak) convergence at the strictest relaxation level, together with median cost/loss statistics, provides only indirect support for the central claim of sufficient accuracy for OPF studies. Convergence can occur even when topology, line parameters, or bus-level loads are systematically biased, especially given the progressive relaxation that loosens constraints on imprecise models; no direct numerical comparison of estimated parameters or loads against independent real-grid measurements is described.
Authors: We agree that convergence under progressive relaxation provides only indirect evidence and does not rule out systematic biases. Direct numerical comparisons to real-grid measurements are not possible, as detailed US transmission data are restricted. In the revised manuscript we have added aggregate validation against public EIA statistics (total capacity, generation mix) and an explicit limitations subsection discussing potential biases in topology and load allocation. These changes strengthen the presentation while preserving the paper's focus on open-data reproducibility. revision: partial
-
Referee: Stage 3 (electrical parameter estimation): the voltage-class lookup tables are calibrated solely on EIA plant-level data, yet the manuscript provides neither the explicit calibration factors nor any sensitivity analysis or error bounds on how these approximations propagate into OPF solutions and convergence rates. This is load-bearing for the accuracy claim because the tables directly determine impedances and admittances used in all subsequent OPF solves.
Authors: We thank the referee for noting this gap. The revised manuscript now includes the full voltage-class lookup tables, the calibration procedure using EIA plant data, and a sensitivity study that perturbs impedances within literature-derived ranges. The study reports resulting changes in AC-OPF convergence rates and objective values, confirming that errors remain bounded and consistent with the observed 88-92% success rates. revision: yes
- Direct numerical comparison of estimated parameters or loads against independent real-grid measurements, as such proprietary data are not publicly available.
Circularity Check
No circularity: constructive pipeline from external open data with computed outputs
full rationale
The paper describes a five-stage pipeline that extracts infrastructure from OSM, reconstructs topology, estimates parameters from voltage-class tables calibrated on EIA plant data, allocates demand via census proxies, and solves OPF. All inputs are external public datasets. The reported convergence statistics, median dispatch costs, and losses are direct outputs of the PowerModels.jl solver applied to the constructed models; they are not used to define, fit, or calibrate any stage of the pipeline. No self-citations, self-definitional equations, fitted-input predictions, or ansatzes appear in the derivation chain. The approach is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- voltage-class lookup table calibration factors
axioms (2)
- domain assumption Population density is a valid spatial proxy for allocating EIA-930 hourly demand to individual buses
- domain assumption Voltage inference, line merging, and transformer detection rules from OSM tags produce a correct bus-branch topology
Reference graph
Works this paper leans on
-
[1]
International Energy Agency. Energy and ai. Technical report, IEA, 2025. URLhttps: //www.iea.org/reports/energy-and-ai/. Published 10 April 2025
work page 2025
-
[2]
Energy Information Administration
U.S. Energy Information Administration. Electric power annual 2024. Technical report, U.S. EIA, 2025. URLhttps://www.eia.gov/electricity/annual/. Published 2025, data year 2024, accessed 2026
work page 2024
-
[3]
King, Erhan Kutanoglu, Benjamin D
Carey W. King, Erhan Kutanoglu, Benjamin D. Leibowicz, Ning Lin, Dev Niyogi, Varun Rai, Joshua D. Rhodes, Surya Santoso, David Spence, Stathis Tompaidis, Jay Zarnikau, and Hao Zhu. The timeline and events of the February 2021 Texas electric grid blackouts. Technical report, The University of Texas at Austin Energy Institute, 2021. URLhttps://energy. utexa...
work page 2021
-
[4]
Weather-related power outages rising, 2024
Climate Central. Weather-related power outages rising, 2024. URLhttps://www. climatecentral.org/climate-matters/weather-related-power-outages-rising. Ac- cessed 2026
work page 2024
-
[5]
Critical energy infrastructure information (CEII), 2023
FERC. Critical energy infrastructure information (CEII), 2023. URLhttps://www.ecfr. gov/current/title-18/chapter-I/subchapter-X/part-388/section-388.113. 18 CFR § 388.113
work page 2023
-
[6]
Critical infrastructure protection standards CIP-002 through CIP-014, 2023
North American Electric Reliability Corporation. Critical infrastructure protection standards CIP-002 through CIP-014, 2023. URLhttps://www.nerc.com/standards/ reliability-standards/cip
work page 2023
-
[7]
Cengage Learning Stamford, CT, USA, 2012
J Duncan Glover, Mulukutla S Sarma, Thomas Jeffrey Overbye, and NP Padhy.Power system analysis and design, volume 2008. Cengage Learning Stamford, CT, USA, 2012
work page 2008
-
[8]
Sogol Babaeinejadsarookolaee, Adam Birchfield, Richard D. Christie, Carleton Coffrin, Christopher DeMarco, Ruisheng Diao, Michael Ferris, Stephane Fliscounakis, Scott Greene, Renke Huang, Cedric Josz, Roman Korab, Bernard Lesieutre, Jean Maeght, Terrence W. K. Mak, Daniel K. Molzahn, Thomas J. Overbye, Patrick Panciatici, Byungkwon Park, Jonathan Snodgras...
-
[9]
Adam B. Birchfield, Ti Xu, Kathleen M. Gegner, Komal S. Shetye, and Thomas J. Overbye. Grid structural characteristics as validation criteria for synthetic networks.IEEE Transactions on Power Systems, 32(4):3258–3265, 2017. doi: 10.1109/TPWRS.2016.2616385
-
[10]
Ti Xu, Adam B. Birchfield, Komal S. Shetye, and Thomas J. Overbye. Creation of synthetic electric grid models for transient stability studies. InProceedings of the IREP Symposium (Bulk Power System Dynamics and Control), 2017. URLhttps://overbye.engr.tamu.edu/ wp-content/uploads/sites/146/2022/01/IREP_Ti_WithFooter_ARCHIVE.pdf
work page 2017
-
[11]
Arthur R Bergen.Power systems analysis. Pearson Education India, 2009
work page 2009
-
[12]
DanielSKirschen.Power Systems: Fundamental Concepts and the Transition to Sustainability. John Wiley & Sons, 2024. 37
work page 2024
-
[13]
Cambridge University Press, 2026
Steven Low.Power System Analysis: Analytical tools and structural properties. Cambridge University Press, 2026
work page 2026
-
[14]
Baosen Zhang and David Tse. Geometry of injection regions of power networks.IEEE Trans- actions on Power Systems, 28(2):788–797, 2012
work page 2012
-
[15]
Convex relaxation of optimal power flow–part i: Formulations and equivalence
Steven H Low. Convex relaxation of optimal power flow–part i: Formulations and equivalence. IEEE Transactions on Control of Network Systems, 1(1):15–27, 2014
work page 2014
-
[16]
Steven H Low. Convex relaxation of optimal power flow—part ii: Exactness.IEEE Transac- tions on Control of Network Systems, 1(2):177–189, 2014
work page 2014
-
[17]
Andreas Wächter and Lorenz T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical Programming, 106: 25–57, 2006. doi: 10.1007/s10107-004-0559-y
-
[18]
Daniel K. Molzahn and Ian A. Hiskens. A survey of relaxations and approximations of the power flow equations.Foundations and Trends in Electric Energy Systems, 4(1–2):1–221, 2019
work page 2019
-
[19]
Stats — openstreetmap wiki, 2026
OpenStreetMap Wiki contributors. Stats — openstreetmap wiki, 2026. URLhttps://wiki. openstreetmap.org/wiki/Stats. Accessed 2026
work page 2026
-
[20]
Jochen Topf and contributors. Openstreetmap taginfo, 2026. URLhttps://taginfo. openstreetmap.org/. Tag usage statistics, accessed 2026-03-30
work page 2026
-
[21]
Christopher Arderne, Conrad Zorn, Claire Nicolas, and Elco E. Koks. Predictive mapping of the global power system using open data.Scientific Data, 7:19, 2020. doi: 10.1038/ s41597-019-0347-4
work page 2020
-
[22]
PowerMod- els.jl: An open-source framework for exploring power flow formulations
Carleton Coffrin, Russell Bent, Kaarthik Sundar, Yeesian Ng, and Miles Lubin. PowerMod- els.jl: An open-source framework for exploring power flow formulations. InProceedings of the Power Systems Computation Conference (PSCC), 2018
work page 2018
-
[23]
OpenStreetMap Contributors. OpenStreetMap, 2026. URLhttps://www.openstreetmap. org. Accessed 2026
work page 2026
-
[24]
Energy Information Administration
U.S. Energy Information Administration. Form EIA-860: Annual electric generator report,
- [25]
-
[26]
Energy Information Administration
U.S. Energy Information Administration. Form EIA-923: Power plant operations report, 2024. URLhttps://www.eia.gov/electricity/data/eia923/. Accessed 2026
work page 2024
-
[27]
Energy Information Administration
U.S. Energy Information Administration. EIA-930 hourly electric grid monitor, 2026. URL https://www.eia.gov/electricity/gridmonitor/. Accessed 2026
work page 2026
-
[28]
U.S. Census Bureau. American community survey 5-year estimates, 2024. URLhttps:// data.census.gov. Accessed 2026
work page 2024
-
[29]
The demand for electricity: a survey.The Bell Journal of Economics, pages 74–110, 1975
Lester D Taylor. The demand for electricity: a survey.The Bell Journal of Economics, pages 74–110, 1975
work page 1975
-
[30]
Density forecasting for long-term peak electricity demand
Rob J Hyndman and Shu Fan. Density forecasting for long-term peak electricity demand. IEEE Transactions on Power Systems, 25(2):1142–1153, 2009. 38
work page 2009
-
[31]
Department of Homeland Security
U.S. Department of Homeland Security. Electric planning areas (balancing authorities),
-
[32]
Homeland Infrastructure Foundation-Level Data (HIFLD); ArcGIS Feature Server
URLhttps://services5.arcgis.com/HDRa0B57OVrv2E1q/arcgis/rest/services/ Electric_Planning_Areas/FeatureServer/0. Homeland Infrastructure Foundation-Level Data (HIFLD); ArcGIS Feature Server
-
[33]
Daniel S Kirschen and Goran Strbac.Fundamentals of power system economics. John Wiley & Sons, 2026
work page 2026
-
[34]
Energy Information Administration
U.S. Energy Information Administration. Form EIA-861: Annual electric power industry report, 2024. URLhttps://www.eia.gov/electricity/data/eia861/. Accessed 2026
work page 2024
-
[35]
Jonas Hörsch, Fabian Hofmann, David Schlachtberger, and Tom Brown. PyPSA-Eur: An open optimisation model of the European transmission system.Energy Strategy Reviews, 22: 207–215, 2018. doi: 10.1016/j.esr.2018.08.012
-
[36]
GridKit: European and North American extracts, 2016
Bart Wiegmans. GridKit: European and North American extracts, 2016. URLhttps:// zenodo.org/record/47317. Dataset, Zenodo, accessed 2026. 39
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.