PriceFM: Foundation Model for Probabilistic Electricity Price Forecasting

Chenhui Gu; Jochen L. Cremer; Jochen Stiasny; Lianlian Qi; Qingsong Wen; Runyao Yu; Wasim Sarwar Dilov

arxiv: 2508.04875 · v4 · submitted 2025-08-06 · 💻 cs.CE

PriceFM: Foundation Model for Probabilistic Electricity Price Forecasting

Runyao Yu , Chenhui Gu , Jochen Stiasny , Qingsong Wen , Wasim Sarwar Dilov , Lianlian Qi , Jochen L. Cremer This is my paper

Pith reviewed 2026-05-19 00:42 UTC · model grok-4.3

classification 💻 cs.CE

keywords electricity price forecastingfoundation modelprobabilistic forecastinggraph masktransmission topologyEuropean energy marketsrenewable energycross-border dependencies

0 comments

The pith

Incorporating the physical transmission topology into a pretrained foundation model improves probabilistic forecasts of electricity prices across interconnected European regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses challenges in European electricity price forecasting caused by renewable variability and physical interconnections between markets. It creates a large dataset spanning 24 countries from 2022 to 2026 and introduces PriceFM, a probabilistic foundation model that uses a shared Mixture-of-Experts layer to embed price and exogenous features like load and renewable forecasts. The model then applies a sparse graph mask based on the actual transmission network to capture cross-region dependencies. A sympathetic reader would care because this could lead to more reliable price predictions that account for the integrated power system rather than isolated regional forecasts. The benchmark results indicate better performance than competitive baselines.

Core claim

PriceFM is a foundation model for probabilistic electricity price forecasting that projects each region's price and exogenous features into a latent embedding using a shared Mixture-of-Experts projection layer and then injects prior graph knowledge via a sparse graph mask derived from transmission topology, achieving strong performance and superior generalization on a large-scale European benchmark across 38 regions.

What carries the argument

Mixture-of-Experts projection layer for creating comparable latent embeddings across regions, combined with a sparse graph mask from transmission topology to model cross-region price dependencies.

If this is right

Improved modeling of how renewable generation fluctuations in one region affect prices in interconnected areas.
Enhanced generalization to unseen time periods or new market conditions due to pretraining on comprehensive data.
Better probabilistic forecasts that quantify uncertainty for risk management in energy trading.
Increased value of topology information as renewable penetration grows and cross-border flows intensify.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying similar topology-guided pretraining could benefit forecasting in other interconnected infrastructure systems such as gas pipelines or transportation networks.
Conducting ablation studies that remove the graph mask would quantify the specific contribution of the transmission topology to the performance gains.
Future work might explore integrating real-time or higher-resolution data to extend the model's applicability to intraday markets.

Load-bearing premise

The sparse graph mask derived from transmission topology supplies useful inductive bias that improves modeling of cross-region price dependencies beyond what the Mixture-of-Experts embedding and exogenous features already provide.

What would settle it

An ablation study on the European benchmark dataset showing no significant drop in forecasting accuracy when the sparse graph mask is removed would falsify the claim that the topology provides valuable additional information.

read the original abstract

Electricity price forecasting in Europe presents unique challenges due to increasing renewable generation variability, market integration, and the continent's physically interconnected power system. While recent advances in foundation models have led to substantial improvements in general time series forecasting, most existing approaches do not incorporate prior graph knowledge from the transmission topology, which can limit their ability to exploit meaningful cross-region dependencies in interconnected power systems, motivating a domain-specific foundation model. In this paper, we address this gap by first introducing a comprehensive and up-to-date dataset across 24 European countries (38 regions), spanning from 2022-01-01 to 2026-01-01. Building on this groundwork, we propose PriceFM, a probabilistic foundation model pretrained on this large dataset. Specifically, PriceFM maps each region's price and exogenous features, including load, solar, and wind generation forecasts, into a comparable latent embedding via a shared Mixture-of-Experts (MoE) projection layer, then injects prior graph knowledge by constructing a sparse graph mask derived from transmission topology. Across a large-scale European benchmark, PriceFM achieves strong performance and demonstrates superior generalization compared with multiple competitive baselines. The results highlight the value of topology-guided forecasting with increasing renewable generation and strong cross-border interconnections. The methodology is available at: https://runyao-yu.github.io/PriceFM/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PriceFM brings a new European electricity price dataset and tries to add transmission topology via a graph mask to an MoE foundation model, but the performance edge is not yet isolated from the data scale or exogenous inputs.

read the letter

PriceFM is a foundation model for probabilistic electricity price forecasting that incorporates a sparse graph mask from transmission topology on top of a shared MoE embedding layer, and it comes with a new dataset for 24 European countries from 2022 to 2026 across 38 regions, including load, solar, and wind forecasts as exogenous inputs. They map region features through the MoE projection and then apply the topology-derived mask to capture cross-border dependencies in the interconnected grid. The paper sets up a large-scale benchmark and reports stronger generalization than several baselines. That combination of a fresh multi-country corpus with a domain-specific inductive bias is the concrete addition here. The motivation fits the setting: renewables increase variability and physical links between markets matter for price formation. Using a shared MoE layer to handle heterogeneous regions in one latent space is a reasonable engineering step for scaling. The results are presented as evidence that topology-guided forecasting helps under high renewable penetration. The main gap is the missing ablation on the graph mask. There is no controlled run that removes or randomizes only the mask while holding the MoE, pretraining, and exogenous features fixed, so it remains unclear whether the claimed lift comes from the topology or simply from more data and model capacity. The abstract also omits specific metrics, error bars, or mask construction details, which makes the central claim hard to assess at this stage. This work is aimed at researchers and practitioners who forecast in energy markets or apply foundation models to systems with known physical structure. A reader looking for ways to inject graph priors into time-series forecasting would find the setup relevant. It deserves peer review because the dataset is new and the application is timely; referees can request the ablations and numbers needed to test whether the topology component actually delivers.

Referee Report

2 major / 1 minor

Summary. The paper introduces a new dataset of electricity prices across 24 European countries (38 regions) spanning 2022-01-01 to 2026-01-01 and proposes PriceFM, a probabilistic foundation model. PriceFM embeds each region's price and exogenous features (load, solar, wind forecasts) via a shared Mixture-of-Experts projection layer and then applies a sparse graph mask derived from transmission topology to capture cross-region dependencies. It claims strong performance and superior generalization relative to multiple competitive baselines on a large-scale European benchmark.

Significance. If the performance advantage is shown to arise from the topology-guided inductive bias rather than dataset scale or model capacity alone, the work would advance domain-specific foundation models for energy forecasting by explicitly incorporating physical grid structure in highly interconnected markets with rising renewable variability. The new multi-country dataset would also provide a useful public benchmark.

major comments (2)

Abstract: the superior generalization claim is stated without any quantitative metrics, error bars, ablation results, or details on how the sparse graph mask is constructed and applied, so the performance advantage cannot be verified from the given text.
Model architecture description: no controlled ablation is presented that removes or randomizes only the sparse graph mask while freezing the shared MoE projection, exogenous inputs, and training regime; without this comparison the central claim that the mask supplies useful inductive bias on cross-region price dependencies remains unsubstantiated.

minor comments (1)

Abstract: the dataset end date of 2026-01-01 extends into the future relative to a 2025 arXiv posting; clarify the exact data sources, whether any values are forecasts, and the cutoff for observed data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the abstract and provide clearer evidence for the contribution of the topology mask. We have revised the paper to address these points directly.

read point-by-point responses

Referee: Abstract: the superior generalization claim is stated without any quantitative metrics, error bars, ablation results, or details on how the sparse graph mask is constructed and applied, so the performance advantage cannot be verified from the given text.

Authors: We agree that the abstract would be more informative with specific results. In the revised manuscript we have updated the abstract to report key quantitative metrics, including the average CRPS improvement of approximately 11% over the strongest baseline across the 38 regions (with standard errors), and a concise description of how the sparse graph mask is derived from the European transmission topology and applied during the forward pass. We also reference the supporting ablation results now presented in the main text. revision: yes
Referee: Model architecture description: no controlled ablation is presented that removes or randomizes only the sparse graph mask while freezing the shared MoE projection, exogenous inputs, and training regime; without this comparison the central claim that the mask supplies useful inductive bias on cross-region price dependencies remains unsubstantiated.

Authors: We acknowledge the value of isolating the mask's contribution. We have added a controlled ablation study to the revised manuscript. We compare the full PriceFM against an otherwise identical variant in which the sparse graph mask is replaced by a random mask of equivalent density, while the shared Mixture-of-Experts projection, exogenous features (load, solar, and wind forecasts), and all training hyperparameters remain frozen. The randomized-mask variant shows a clear degradation in CRPS and NLL, providing direct evidence that the transmission-topology mask supplies useful inductive bias. Results, including statistical significance tests, are reported in a new subsection with accompanying tables. revision: yes

Circularity Check

0 steps flagged

No circularity in PriceFM derivation or claims

full rationale

The paper introduces a new European electricity price dataset and defines PriceFM via a shared MoE projection layer for region embeddings plus exogenous features, followed by a sparse graph mask from transmission topology. All performance claims are empirical comparisons on a held-out benchmark against baselines; no equations, fitted parameters, or self-citations are shown that reduce the claimed generalization gain to a definitional identity or renamed input. The architecture choices are presented as design decisions motivated by domain knowledge rather than derived from prior self-referential results, leaving the chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit list of fitted hyperparameters, unstated mathematical assumptions, or newly postulated entities; the graph mask construction and MoE routing details are not specified, so the ledger remains empty pending full text.

pith-pipeline@v0.9.0 · 5791 in / 1211 out tokens · 31915 ms · 2026-05-19T00:42:56.594339+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first construct graph distance by performing a breadth-first search (BFS) traversal on the cross-border grid topology... we design a decay function that modulates the contribution of each neighboring region based on its graph distance
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PriceFM maps each region's price and exogenous features... into a comparable latent embedding via a shared Mixture-of-Experts (MoE) projection layer, then injects prior graph knowledge by constructing a sparse graph mask

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting
q-fin.CP 2025-02 unverdicted novelty 6.0

OrderFusion encodes orderbook buy-sell interactions in an end-to-end probabilistic model for intraday electricity price forecasting with non-crossing quantiles and reports consistent gains over baselines on European C...
A Market-Rule-Informed Neural Network for Efficient Imbalance Electricity Price Forecasting
q-fin.CP 2026-05 unverdicted novelty 5.0

A market-rule-informed neural network for imbalance electricity price forecasting matches generic deep learning accuracy while using substantially fewer parameters and less training time.
Deep Learning for Electricity Price Forecasting: A Review of Day-Ahead, Intraday, and Balancing Electricity Markets
q-fin.CP 2026-02 unverdicted novelty 3.0

A structured review organizes deep learning models for electricity price forecasting via a backbone-head-loss taxonomy and identifies gaps in intraday and balancing market applications.