arxiv: 2512.14400 · v2 · submitted 2025-12-16 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

GRAFT: Grid-Aware Load Forecasting with Multi-Source Textual Alignment and Fusion

Fangzhou Lin , Guoshun He , Zhenyu Guo , Zhe Huang , Jinsong Tao

Authors on Pith no claims yet

Pith reviewed 2026-05-16 21:41 UTC · model grok-4.3

classification 💻 cs.LG

keywords load forecastingtextual alignmentcross-attentionmulti-source fusiongrid-aware forecastingelectricity demandbenchmark datasetAustralian power grid

0 comments

The pith

GRAFT aligns daily news, social media and policy texts with half-hour electricity loads through cross-attention to improve grid forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GRAFT as a modification of an existing forecasting model that incorporates multi-source textual data to capture effects from events and policies on electricity demand. It enforces strict alignment between daily-aggregated texts and half-hour load intervals, then applies cross-attention to inject text signals at the correct time positions during both training and rolling prediction. A new benchmark dataset covering five Australian states from 2019 to 2021 is released with aligned load, weather, calendar and text sources. Experiments show the approach outperforms strong baselines and reaches state-of-the-art results across hourly, daily and monthly horizons. The model also supports interpretation by reading out which texts influence which load periods.

Core claim

GRAFT strictly aligns daily-aggregated news, social media and policy texts with half-hour load, realizes text-guided fusion to specific time positions via cross-attention during both training and rolling forecasting, and provides a plug-and-play external-memory interface, achieving significant gains over baselines and state-of-the-art performance on a released 2019-2021 benchmark for five Australian states at multiple time scales.

What carries the argument

Cross-attention fusion that maps daily text embeddings onto specific half-hour time slots inside the load forecasting sequence.

If this is right

GRAFT significantly outperforms strong baselines and reaches or surpasses state-of-the-art results across multiple regions and forecasting horizons.
The model remains robust in event-driven scenarios.
Attention read-out enables temporal localization and source-level interpretation of text-to-load effects.
The plug-and-play external-memory interface supports different information sources in real-world deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Textual signals could help anticipate policy-driven demand shifts earlier than numerical indicators alone.
The same alignment and fusion approach may extend to other time-series domains such as traffic or water-demand forecasting that also receive sudden textual inputs.
Finer-grained timestamped texts, if available, could further tighten short-horizon accuracy without changing the core architecture.

Load-bearing premise

Daily-aggregated textual data can be reliably aligned to half-hour load intervals so that cross-attention extracts causally relevant signals rather than spurious correlations.

What would settle it

Removing the textual alignment and cross-attention components produces no improvement or a performance drop on the released Australian benchmark across the tested regions and horizons.

Figures

Figures reproduced from arXiv: 2512.14400 by Fangzhou Lin, Guoshun He, Jinsong Tao, Zhe Huang, Zhenyu Guo.

**Figure 1.** Figure 1: Regional boundaries, major interconnectors, and representative generation mix in the National Electricity Market (NEM) (schematic map from AEMO [19]). 10 [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗

**Figure 2.** Figure 2: Annual electrical load surfaces in 2019 for NEM states. 15 [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 4.** Figure 4: Representative external text events (2019–2021) from news, social media, and policy sources. For any source s ∈ {News, Reddit, Policy}, state r, and calendar day t, denote the cleaned text collection of that day as Ds,r,t = {d1, . . . , dns,r,t}. For a single document d, the Sentence-BERT (SBERT) encoder E(·) [18] is used to obtain the vector representation and perform unit normalization: ed = E(d) ∥E(d)∥… view at source ↗

**Figure 3.** Figure 3: Word cloud visualizations of the three text corpora across NEM states. 23 [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 5.** Figure 5: Overall architecture of the proposed GRAFT model. The central block is the numerical backbone STanHop (Memory Patterns, Memory Plugin, CoarseGraining, TimeGSH, SeriesGSH) [17]. Three textual external sources (News / Reddit / Policy) are encoded as memory patterns, gated and fused through the Source Gating and Fusion modules, and then injected into the backbone to form TimeGSH and SeriesGSH representations… view at source ↗

**Figure 6.** Figure 6: Forward-computation flow of GRAFT, illustrating how historical load sequences, textual external sources and multi-scale representations are processed in the encoding, retrieval and decoding stages. The design builds on the STanHop backbone [17] and the theory of sparse modern Hopfield networks [27, 28]. Along the iteration (14), the energy H(xt) is monotonically non-increasing; if a limit point exists, it … view at source ↗

**Figure 7.** Figure 7: Time–source attribution heatmaps. 34 [PITH_FULL_IMAGE:figures/full_fig_p035_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of five-state load curves under different information-source configurations in very short-term windows (VSTLF, W = 16). Hardware and reproducibility. Example environment for reference: GeForce RTX 4060 8GB, Driver 572.16, CUDA 11.8, and PyTorch version 2.7.1, which can support the default hyperparameters used for training and evaluation in this paper. 4.2. Main experiments compared with external… view at source ↗

**Figure 9.** Figure 9: Comparison of 24-hour load curves of five states under different information-source configurations in short-term windows (STLF, W = 48). is denoted as RMSEk,i. Let the evaluation window length of the i-th task be Wi (number of steps, with a step size of 30 minutes), and let the set of evaluation time indices be Ωi , such that |Ωi | = Wi . To quantify the degree of error reduction relative to a unified stat… view at source ↗

**Figure 10.** Figure 10: Three-dimensional waterfall plots and two-dimensional projections of 60-day loads in five states under medium-term windows (MTLF, W = 2880), illustrating the overall performance of different information-source configurations over long sequences. its error reduction relative to the statistical baseline, is defined as Skillk,i = 1 − RMSEk,i RMSEstat,i . (35) When Skillk,i > 0, the RMSE of source k on task i… view at source ↗

**Figure 11.** Figure 11: Cross-model comparison: RMSE, MAE, and MAPE 49 [PITH_FULL_IMAGE:figures/full_fig_p050_11.png] view at source ↗

read the original abstract

Electric load is simultaneously affected across multiple time scales by exogenous factors such as weather and calendar rhythms, sudden events, and policies. Therefore, this paper proposes GRAFT (GRid-Aware Forecasting with Text), which modifies and improves STanHOP to better support grid-aware forecasting and multi-source textual interventions. Specifically, GRAFT strictly aligns daily-aggregated news, social media, and policy texts with half-hour load, and realizes text-guided fusion to specific time positions via cross-attention during both training and rolling forecasting. In addition, GRAFT provides a plug-and-play external-memory interface to accommodate different information sources in real-world deployment. We construct and release a unified aligned benchmark covering 2019--2021 for five Australian states (half-hour load, daily-aligned weather/calendar variables, and three categories of external texts), and conduct systematic, reproducible evaluations at three scales -- hourly, daily, and monthly -- under a unified protocol for comparison across regions, external sources, and time scales. Experimental results show that GRAFT significantly outperforms strong baselines and reaches or surpasses the state of the art across multiple regions and forecasting horizons. Moreover, the model is robust in event-driven scenarios and enables temporal localization and source-level interpretation of text-to-load effects through attention read-out. We release the benchmark, preprocessing scripts, and forecasting results to facilitate standardized empirical evaluation and reproducibility in power grid load forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRAFT adds a practical daily-text to half-hour alignment plus cross-attention fusion for load forecasting and ships a reproducible Australian benchmark, but three years of data leaves the gains vulnerable to dataset-specific correlations.

read the letter

The main advance is the strict daily-to-half-hour text alignment combined with cross-attention that runs during both training and rolling inference, plus the plug-and-play external memory so different text sources can be swapped in without retraining the core model. They also built and released a unified benchmark with half-hour load, weather, calendar, and three categories of daily text for five Australian states from 2019-2021, then ran the same protocol at hourly, daily, and monthly horizons. That level of standardization and open data is genuinely helpful for anyone who wants to test text-augmented forecasting without starting from scratch.

Referee Report

2 major / 2 minor

Summary. The paper proposes GRAFT, an extension of STanHOP for grid-aware electric load forecasting. It strictly aligns daily-aggregated multi-source textual data (news, social media, policy) to half-hour load intervals and uses cross-attention for text-guided fusion during both training and rolling forecasts. The work releases a unified benchmark for five Australian states (2019-2021) covering half-hour load, weather/calendar variables, and texts, then evaluates at hourly, daily, and monthly scales under a unified protocol, claiming significant outperformance over strong baselines and reaching or surpassing SOTA, plus robustness in event-driven scenarios and interpretability via attention read-outs.

Significance. If the empirical claims hold under rigorous validation, the contribution lies in a reproducible, plug-and-play framework for fusing textual exogenous signals into load forecasting at multiple scales. The public release of the aligned benchmark, preprocessing scripts, and results is a clear strength that supports standardized evaluation in the field.

major comments (2)

[Experimental results] The headline performance claim (reaching or surpassing SOTA across regions and horizons) depends on cross-attention extracting causally relevant signals rather than spurious correlations from daily text aligned to half-hour loads. With only three years (2019-2021) of data for five states, calendar/event patterns in the texts are likely to co-vary with load in dataset-specific ways; this assumption is load-bearing and requires explicit controls such as ablation on text sources, out-of-distribution testing, or temporal hold-out beyond the current corpus (see Experimental results and Methods sections).
[Methods] The description of the cross-attention fusion and plug-and-play memory interface during rolling forecasts lacks sufficient implementation detail (e.g., exact alignment procedure, attention masking for future text, and how external memory is updated) to allow independent replication of the claimed gains; this directly affects verifiability of the multi-source fusion mechanism.

minor comments (2)

[Abstract] Clarify in the abstract and §1 which exact baselines (e.g., specific STanHOP variants or other SOTA models) are used for the 'significantly outperforms' claim and report effect sizes or statistical significance tests.
[Evaluation protocol] The unified protocol for multi-scale evaluation is a positive feature, but the paper should explicitly state how forecast horizons are defined at daily and monthly scales to avoid ambiguity in cross-region comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, providing the strongest honest defense of the manuscript while agreeing to revisions that improve clarity and rigor where needed.

read point-by-point responses

Referee: [Experimental results] The headline performance claim (reaching or surpassing SOTA across regions and horizons) depends on cross-attention extracting causally relevant signals rather than spurious correlations from daily text aligned to half-hour loads. With only three years (2019-2021) of data for five states, calendar/event patterns in the texts are likely to co-vary with load in dataset-specific ways; this assumption is load-bearing and requires explicit controls such as ablation on text sources, out-of-distribution testing, or temporal hold-out beyond the current corpus (see Experimental results and Methods sections).

Authors: We appreciate the referee's emphasis on validating that performance gains stem from causally relevant textual signals. The manuscript already reports source-level ablations (news-only, social-media-only, policy-only, and full multi-source) demonstrating that combining sources yields consistent gains over any single source across regions and horizons. The evaluation uses a rolling-forecast protocol with strict temporal separation between training and test windows within the 2019-2021 corpus, which functions as an internal temporal hold-out. To further address the concern about dataset-specific co-variation, we will add an explicit out-of-distribution experiment in the revised Experimental results section: we will hold out specific high-impact event windows (e.g., major policy announcements and extreme weather periods identified in the texts) and quantify performance drop when those textual signals are removed or masked. This addition will be supported by the already-released benchmark and preprocessing scripts, allowing readers to extend the analysis. revision: yes
Referee: [Methods] The description of the cross-attention fusion and plug-and-play memory interface during rolling forecasts lacks sufficient implementation detail (e.g., exact alignment procedure, attention masking for future text, and how external memory is updated) to allow independent replication of the claimed gains; this directly affects verifiability of the multi-source fusion mechanism.

Authors: We agree that the current Methods description is insufficient for full independent replication. In the revised manuscript we will add a dedicated subsection titled 'Cross-Attention Fusion and External Memory Interface' that specifies: (1) the exact alignment procedure, which maps each daily-aggregated text document to all half-hour load intervals of the corresponding calendar day via timestamp matching; (2) the attention masking rule used at inference time, which applies a causal mask so that the model only attends to text available up to the forecast start time (preventing any future-text leakage); and (3) the update protocol for the plug-and-play external memory, including how new text embeddings are appended and how the memory is refreshed at each rolling step without retraining the core model. We will also include pseudocode and an expanded diagram of the memory interface. revision: yes

Circularity Check

0 steps flagged

Empirical performance claims rest on held-out evaluation; minor self-citation to base model not load-bearing

full rationale

The paper's derivation consists of constructing an aligned benchmark dataset (2019-2021 Australian load + daily texts), modifying the STanHOP architecture with cross-attention fusion and external memory, then reporting MSE/MAE improvements on rolling forecasts across hourly/daily/monthly horizons. These results are measured against external baselines on held-out periods and do not reduce to any fitted parameter being renamed as a prediction or to a self-referential definition. The single citation to STanHOP is used only to describe the starting architecture; the claimed gains are independently verified by the new experiments and benchmark release. No uniqueness theorem, ansatz smuggling, or renaming of known results occurs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The model rests on standard transformer attention mechanisms and the assumption that textual data contains predictive information for load; no new physical entities are postulated and the only free parameters are typical neural-network hyperparameters.

free parameters (1)

cross-attention hyperparameters
Dimensions and number of heads in the cross-attention layers are chosen during model design and affect fusion quality.

axioms (2)

domain assumption Textual data from news, social media, and policy documents contains information causally relevant to future electricity demand beyond what numerical weather and calendar variables already provide.
Invoked when claiming that text-guided fusion improves forecasting accuracy.
domain assumption Daily aggregation of text can be meaningfully aligned to half-hour load intervals without losing critical timing information.
Required for the strict alignment step described in the abstract.

pith-pipeline@v0.9.0 · 5560 in / 1499 out tokens · 30115 ms · 2026-05-16T21:41:00.077547+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GRAFT strictly aligns daily-aggregated news, social media, and policy texts with half-hour load, and realizes text-guided fusion to specific time positions via cross-attention during both training and rolling forecasting... Y_text_r,t = [e(news) W_news ; ... ] ... Z_text = α-EntMax(β Q K^⊤) V
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Two-stage GSH... TimeGSH then SeriesGSH... PlugMemory(R, Y) = LN(R + GSH(R, Y))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

[1]

Probabilistic electric load forecast- ing: A tutorial review

Hong T, Fan S. Probabilistic electric load forecast- ing: A tutorial review. Int J Forecast 2016;32:914–938. https://doi.org/10.1016/j.ijforecast.2015.11.011. 53

work page doi:10.1016/j.ijforecast.2015.11.011 2016
[2]

A hybrid model based on data prepro- cessing for electrical power forecasting

Xiao L, Wang J, Yang X, Xiao L. A hybrid model based on data prepro- cessing for electrical power forecasting. Int J Electr Power Energy Syst 2015;64:311–327. https://doi.org/10.1016/j.ijepes.2014.07.029

work page doi:10.1016/j.ijepes.2014.07.029 2015
[3]

Short-term load forecasting for the holidays using fuzzy linear regression method

Song K-B, Baek Y-S, Hong D-H, Jang G. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans Power Syst 2005;20(1):96–101. https://doi.org/10.1109/TPWRS.2004.835632

work page doi:10.1109/tpwrs.2004.835632 2005
[4]

Review of power system load forecasting and its development directions (in Chinese)

Kang C, Xia Q, Zhang B. Review of power system load forecasting and its development directions (in Chinese). Autom Electr Power Syst 2004;28(17):1–11. DOI 10.3321/j.issn:1000-1026.2004.17.001.https:// qikan.cqvip.com/Qikan/Article/Detail?id=10507343

work page doi:10.3321/j.issn:1000-1026.2004.17.001.https:// 2004
[5]

On the enrichment of time series with textual data for forecasting agricultural commodity prices

Reis Filho IJ, Marcacini RM, Rezende SO. On the enrichment of time series with textual data for forecasting agricultural commodity prices. MethodsX 2022;9:101758. https://doi.org/10.1016/j.mex.2022.101758

work page doi:10.1016/j.mex.2022.101758 2022
[6]

Text-based crude oil price forecast- ing: A deep learning approach

Li X, Shang W, Wang S. Text-based crude oil price forecast- ing: A deep learning approach. Int J Forecast 2019;35:1548–1560. https://doi.org/10.1016/j.ijforecast.2018.10.004

work page doi:10.1016/j.ijforecast.2018.10.004 2019
[7]

Beyond trend and periodic- ity: Guiding time series forecasting with textual cues

Xu Z, Bian Y, Zhong J, Wen X, Xu Q. Beyond trend and periodic- ity: Guiding time series forecasting with textual cues. arXiv preprint arXiv:2405.13522; 2024. https://doi.org/10.48550/arXiv.2405.13522

work page doi:10.48550/arxiv.2405.13522 2024
[8]

From news to fore- cast: Integrating event analysis in LLM-based time series forecast- ing with reflection

Wang X, Feng M, Qiu J, Gu J, Zhao J. From news to fore- cast: Integrating event analysis in LLM-based time series forecast- ing with reflection. In: Adv Neural Inf Process Syst 37; 2024. https://doi.org/10.52202/079017-1853

work page doi:10.52202/079017-1853 2024
[9]

Dual-Forecaster: A multimodal time series model integrating descriptive and predictive texts

Wu W, Zhang G, Tan Z, Wang Y, Qi H. Dual-Forecaster: A multimodal time series model integrating descriptive and predictive texts. arXiv preprint arXiv:2505.01135; 2025. https://doi.org/10.48550/arXiv.2505.01135

work page doi:10.48550/arxiv.2505.01135 2025
[10]

Context-aware probabilistic modeling with LLM for multimodal time series forecasting

Yao Y, Li J, Dai X, Zhang M, Gong X, Wang F-Y, Lv Y. Context-aware probabilistic modeling with LLM for multimodal time series forecasting. arXiv preprint arXiv:2505.10774; 2025. https://doi.org/10.48550/arXiv.2505.10774. 54

work page doi:10.48550/arxiv.2505.10774 2025
[11]

Deep learning for time series forecasting: A survey

Torres JF, Hadjout D, Sebaa A, Martínez-Álvarez F, Troncoso A. Deep learning for time series forecasting: A survey. Big Data 2021;9(1):3–21. https://doi.org/10.1089/big.2020.0159

work page doi:10.1089/big.2020.0159 2021
[12]

Deep learning for time series forecasting: A survey

Kong X, Chen Z, Liu W, Ning K, Zhang L, Marier SM, Liu Y, Chen Y, Xia F. Deep learning for time series forecasting: A survey. Int J Mach Learn Cybern 2025;16:5079–5112. https://doi.org/10.1007/s13042-025- 02560-w

work page doi:10.1007/s13042-025- 2025
[13]

Deep learning framework to fore- cast electricity demand

Bedi J, Toshniwal D. Deep learning framework to fore- cast electricity demand. Appl Energy 2019;238:1312–1326. https://doi.org/10.1016/j.apenergy.2019.01.113

work page doi:10.1016/j.apenergy.2019.01.113 2019
[14]

A multi-task learning method for multi-energy load forecasting based on synthesis correlation analysis and load participation factor

Tan M, Liao C, Chen J, Cao Y, Wang R, Su Y. A multi-task learning method for multi-energy load forecasting based on synthesis correlation analysis and load participation factor. Appl Energy 2023;343:121177. https://doi.org/10.1016/j.apenergy.2023.121177

work page doi:10.1016/j.apenergy.2023.121177 2023
[15]

Multi-energy load forecasting via hierarchical multi-task learn- ing and spatiotemporal attention

Song C, Yang H, Cai J, Yang P, Bao H, Xu K, Meng X- B. Multi-energy load forecasting via hierarchical multi-task learn- ing and spatiotemporal attention. Appl Energy 2024;373:123788. https://doi.org/10.1016/j.apenergy.2024.123788

work page doi:10.1016/j.apenergy.2024.123788 2024
[16]

A novel bidirectional mecha- nismbasedontimeseriesmodelforwindpowerforecasting.ApplEnergy 2016;177:793–803

Zhao Y, Ye L, Li Z, Song X, Lang Y, Su J. A novel bidirectional mecha- nismbasedontimeseriesmodelforwindpowerforecasting.ApplEnergy 2016;177:793–803. https://doi.org/10.1016/j.apenergy.2016.03.096

work page doi:10.1016/j.apenergy.2016.03.096 2016
[17]

STanHop: Sparse tandem Hop- field model for memory-enhanced time series prediction

Wu D, Hu JYC, Li W, Chen BY, Liu H. STanHop: Sparse tandem Hop- field model for memory-enhanced time series prediction. arXiv preprint arXiv:2312.17346; 2023.https://arxiv.org/abs/2312.17346

work page arXiv 2023
[18]

Sentence-BERT: Sentence embeddings using Siamese BERT-networks

Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proc 2019 Conf Empir Methods Nat Lang Process (EMNLP); 2019, p. 3982–3992. https://doi.org/10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[19]

Australian Energy Market Operator (AEMO). NEM regional boundaries map [PDF].https://www.aemo.com.au/-/media/ files/electricity/nem/planning_and_forecasting/maps/ nem-regional-boundaries-map-web.pdf?la=en(accessed 23 Nov 2025). 55

work page 2025
[20]

Aggregated price and demand data – National Electricity Market (NEM) [dataset]

Australian Energy Market Operator (AEMO). Aggregated price and demand data – National Electricity Market (NEM) [dataset]. 2025.https://www.aemo.com.au/energy-systems/electricity/ national-electricity-market-nem/data-nem/aggregated-data (accessed 23 Nov 2025)

work page 2025
[21]

Operational man- agement of low demand in South Australia

Australian Energy Market Operator (AEMO). Operational man- agement of low demand in South Australia. Melbourne: AEMO; 2020.https://www.aemo.com.au/-/media/files/electricity/ nem/security_and_reliability/congestion-information/2020/ operational-management-of-low-demand-in-south-australia. pdf(accessed 23 Nov 2025)

work page 2020
[22]

News releases — Australian En- ergy Regulator

Australian Energy Regulator. News releases — Australian En- ergy Regulator. 2025.https://www.aer.gov.au/news/articles/ news-releases(accessed 17 Oct 2025)

work page 2025
[23]

Market notices for the National Electricity Market (NEM) [online]

Australian Energy Market Operator (AEMO). Market notices for the National Electricity Market (NEM) [online]. 2025.https://www.aemo. com.au/market-notices(accessed 23 Nov 2025)

work page 2025
[24]

Discussions on renewable energy and load fore- casting

Reddit Energy Forum. Discussions on renewable energy and load fore- casting. 2025.https://www.reddit.com/r/energy/(accessed 17 Oct 2025)

work page 2025
[25]

2025.https://www.aemc.gov.au/(accessed 17 Oct 2025)

AustralianEnergyMarketCommission.Overviewofenergymarketrules and policy updates. 2025.https://www.aemc.gov.au/(accessed 17 Oct 2025)

work page 2025
[26]

Energy strategies and frameworks — Australian Government

Department of Climate Change, Energy, the Environment and Water. Energy strategies and frameworks — Australian Government. 2025. https://www.dcceew.gov.au/energy/strategies-and-frameworks (accessed 17 Oct 2025)

work page 2025
[27]

Hopfield Networks is All You Need

Ramsauer H, Schäfl B, Lehner J, Seidl P, Widrich M, Adler T, et al. Hopfieldnetworksisallyouneed.arXivpreprintarXiv:2008.02217; 2020. https://arxiv.org/abs/2008.02217

work page internal anchor Pith review arXiv 2008
[28]

On sparse modern Hopfieldmodel.In: AdvNeuralInfProcessSyst36; 2023.arXivpreprint arXiv:2309.12673; 2023

Hu JYC, Yang D, Wu D, Xu C, Chen B-Y, Liu H. On sparse modern Hopfieldmodel.In: AdvNeuralInfProcessSyst36; 2023.arXivpreprint arXiv:2309.12673; 2023. https://doi.org/10.48550/arXiv.2309.12673. 56

work page doi:10.48550/arxiv.2309.12673 2023
[29]

Possible generalization of Boltzmann–Gibbs statistics

Tsallis C. Possible generalization of Boltzmann–Gibbs statistics. J Stat Phys 1988;52:479–487. https://doi.org/10.1007/BF01016429

work page doi:10.1007/bf01016429 1988
[30]

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

Martins AFT, Astudillo RF. From softmax to sparsemax: A sparse model of attention and multi-label classification. In: Proc 33rd Int Conf Mach Learn (ICML); 2016, p. 1614–1623. arXiv:1602.02068

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

Sparse sequence-to-sequence models

Peters B, Niculae V, Martins AFT. Sparse sequence-to-sequence models. In: Proc 57th Annu Meet Assoc Comput Linguist (ACL); 2019, p. 1504–

work page 2019
[32]

https://doi.org/10.18653/v1/P19-1152

work page doi:10.18653/v1/p19-1152
[33]

A sparse quantized Hopfield network for online continual associative memory

Alonso N, Brea J, Rajendran B. A sparse quantized Hopfield network for online continual associative memory. Nat Commun 2024;15:3427. https://doi.org/10.1038/s41467-024-46976-4

work page doi:10.1038/s41467-024-46976-4 2024
[34]

Another look at measures of forecast accuracy

Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J Forecast 2006;22:679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001

work page doi:10.1016/j.ijforecast.2006.03.001 2006
[35]

Comparing predic- tive accuracy

Diebold FX, Mariano RS. Comparing predic- tive accuracy. J Bus Econ Stat 1995;13:253–263. https://doi.org/10.1080/07350015.1995.10524599

work page doi:10.1080/07350015.1995.10524599 1995
[36]

In- former: Beyond efficient transformer for long sequence time-series forecasting

Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W. In- former: Beyond efficient transformer for long sequence time-series forecasting. Proc AAAI Conf Artif Intell 2021;35(12):11106–11115. https://doi.org/10.1609/aaai.v35i12.17325

work page doi:10.1609/aaai.v35i12.17325 2021
[37]

TS2Vec: Towards universal representation of time series

Yue Z, Wang Y, Duan J, Yang T, Huang C, Tong Y, Xu B. TS2Vec: Towards universal representation of time series. Proc AAAI Conf Artif Intell 2022;36(8):8980–8987. https://doi.org/10.1609/aaai.v36i8.20881

work page doi:10.1609/aaai.v36i8.20881 2022
[38]

energy function + gradient-based retrieval

Xue H, Salim FD. PromptCast: A new prompt- based learning paradigm for time series forecasting. IEEE Trans Knowl Data Eng 2024;36(11):12523–12536. https://doi.org/10.1109/TKDE.2023.3342137. 57 Appendix A. Additional figures 58 (a)NoExt (without external texts). (b)News only. (c)Reddit only. (d)Policy only. (e)All (News + Reddit + Policy). Figure .12:State...

work page doi:10.1109/tkde.2023.3342137 2024