pith. machine review for the scientific record. sign in

arxiv: 2512.14400 · v2 · submitted 2025-12-16 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

GRAFT: Grid-Aware Load Forecasting with Multi-Source Textual Alignment and Fusion

Authors on Pith no claims yet

Pith reviewed 2026-05-16 21:41 UTC · model grok-4.3

classification 💻 cs.LG
keywords load forecastingtextual alignmentcross-attentionmulti-source fusiongrid-aware forecastingelectricity demandbenchmark datasetAustralian power grid
0
0 comments X

The pith

GRAFT aligns daily news, social media and policy texts with half-hour electricity loads through cross-attention to improve grid forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GRAFT as a modification of an existing forecasting model that incorporates multi-source textual data to capture effects from events and policies on electricity demand. It enforces strict alignment between daily-aggregated texts and half-hour load intervals, then applies cross-attention to inject text signals at the correct time positions during both training and rolling prediction. A new benchmark dataset covering five Australian states from 2019 to 2021 is released with aligned load, weather, calendar and text sources. Experiments show the approach outperforms strong baselines and reaches state-of-the-art results across hourly, daily and monthly horizons. The model also supports interpretation by reading out which texts influence which load periods.

Core claim

GRAFT strictly aligns daily-aggregated news, social media and policy texts with half-hour load, realizes text-guided fusion to specific time positions via cross-attention during both training and rolling forecasting, and provides a plug-and-play external-memory interface, achieving significant gains over baselines and state-of-the-art performance on a released 2019-2021 benchmark for five Australian states at multiple time scales.

What carries the argument

Cross-attention fusion that maps daily text embeddings onto specific half-hour time slots inside the load forecasting sequence.

If this is right

  • GRAFT significantly outperforms strong baselines and reaches or surpasses state-of-the-art results across multiple regions and forecasting horizons.
  • The model remains robust in event-driven scenarios.
  • Attention read-out enables temporal localization and source-level interpretation of text-to-load effects.
  • The plug-and-play external-memory interface supports different information sources in real-world deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Textual signals could help anticipate policy-driven demand shifts earlier than numerical indicators alone.
  • The same alignment and fusion approach may extend to other time-series domains such as traffic or water-demand forecasting that also receive sudden textual inputs.
  • Finer-grained timestamped texts, if available, could further tighten short-horizon accuracy without changing the core architecture.

Load-bearing premise

Daily-aggregated textual data can be reliably aligned to half-hour load intervals so that cross-attention extracts causally relevant signals rather than spurious correlations.

What would settle it

Removing the textual alignment and cross-attention components produces no improvement or a performance drop on the released Australian benchmark across the tested regions and horizons.

Figures

Figures reproduced from arXiv: 2512.14400 by Fangzhou Lin, Guoshun He, Jinsong Tao, Zhe Huang, Zhenyu Guo.

Figure 1
Figure 1. Figure 1: Regional boundaries, major interconnectors, and representative gen￾eration mix in the National Electricity Market (NEM) (schematic map from AEMO [19]). 10 [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Annual electrical load surfaces in 2019 for NEM states. 15 [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representative external text events (2019–2021) from news, social me￾dia, and policy sources. For any source s ∈ {News, Reddit, Policy}, state r, and calendar day t, denote the cleaned text collection of that day as Ds,r,t = {d1, . . . , dns,r,t}. For a single document d, the Sentence-BERT (SBERT) encoder E(·) [18] is used to obtain the vector representation and perform unit normalization: ed = E(d) ∥E(d)∥… view at source ↗
Figure 3
Figure 3. Figure 3: Word cloud visualizations of the three text corpora across NEM states. 23 [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overall architecture of the proposed GRAFT model. The central block is the numerical backbone STanHop (Memory Patterns, Memory Plugin, Coarse￾Graining, TimeGSH, SeriesGSH) [17]. Three textual external sources (News / Reddit / Policy) are encoded as memory patterns, gated and fused through the Source Gating and Fusion modules, and then injected into the backbone to form TimeGSH and SeriesGSH representations… view at source ↗
Figure 6
Figure 6. Figure 6: Forward-computation flow of GRAFT, illustrating how historical load sequences, textual external sources and multi-scale representations are processed in the encoding, retrieval and decoding stages. The design builds on the STanHop backbone [17] and the theory of sparse modern Hopfield networks [27, 28]. Along the iteration (14), the energy H(xt) is monotonically non-increasing; if a limit point exists, it … view at source ↗
Figure 7
Figure 7. Figure 7: Time–source attribution heatmaps. 34 [PITH_FULL_IMAGE:figures/full_fig_p035_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of five-state load curves under different information-source configurations in very short-term windows (VSTLF, W = 16). Hardware and reproducibility. Example environment for reference: GeForce RTX 4060 8GB, Driver 572.16, CUDA 11.8, and PyTorch version 2.7.1, which can support the default hyperparameters used for training and evaluation in this paper. 4.2. Main experiments compared with external… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of 24-hour load curves of five states under different information-source configurations in short-term windows (STLF, W = 48). is denoted as RMSEk,i. Let the evaluation window length of the i-th task be Wi (number of steps, with a step size of 30 minutes), and let the set of evaluation time indices be Ωi , such that |Ωi | = Wi . To quantify the degree of error reduction relative to a unified stat… view at source ↗
Figure 10
Figure 10. Figure 10: Three-dimensional waterfall plots and two-dimensional projections of 60-day loads in five states under medium-term windows (MTLF, W = 2880), illustrating the overall performance of different information-source configurations over long sequences. its error reduction relative to the statistical baseline, is defined as Skillk,i = 1 − RMSEk,i RMSEstat,i . (35) When Skillk,i > 0, the RMSE of source k on task i… view at source ↗
Figure 11
Figure 11. Figure 11: Cross-model comparison: RMSE, MAE, and MAPE 49 [PITH_FULL_IMAGE:figures/full_fig_p050_11.png] view at source ↗
read the original abstract

Electric load is simultaneously affected across multiple time scales by exogenous factors such as weather and calendar rhythms, sudden events, and policies. Therefore, this paper proposes GRAFT (GRid-Aware Forecasting with Text), which modifies and improves STanHOP to better support grid-aware forecasting and multi-source textual interventions. Specifically, GRAFT strictly aligns daily-aggregated news, social media, and policy texts with half-hour load, and realizes text-guided fusion to specific time positions via cross-attention during both training and rolling forecasting. In addition, GRAFT provides a plug-and-play external-memory interface to accommodate different information sources in real-world deployment. We construct and release a unified aligned benchmark covering 2019--2021 for five Australian states (half-hour load, daily-aligned weather/calendar variables, and three categories of external texts), and conduct systematic, reproducible evaluations at three scales -- hourly, daily, and monthly -- under a unified protocol for comparison across regions, external sources, and time scales. Experimental results show that GRAFT significantly outperforms strong baselines and reaches or surpasses the state of the art across multiple regions and forecasting horizons. Moreover, the model is robust in event-driven scenarios and enables temporal localization and source-level interpretation of text-to-load effects through attention read-out. We release the benchmark, preprocessing scripts, and forecasting results to facilitate standardized empirical evaluation and reproducibility in power grid load forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes GRAFT, an extension of STanHOP for grid-aware electric load forecasting. It strictly aligns daily-aggregated multi-source textual data (news, social media, policy) to half-hour load intervals and uses cross-attention for text-guided fusion during both training and rolling forecasts. The work releases a unified benchmark for five Australian states (2019-2021) covering half-hour load, weather/calendar variables, and texts, then evaluates at hourly, daily, and monthly scales under a unified protocol, claiming significant outperformance over strong baselines and reaching or surpassing SOTA, plus robustness in event-driven scenarios and interpretability via attention read-outs.

Significance. If the empirical claims hold under rigorous validation, the contribution lies in a reproducible, plug-and-play framework for fusing textual exogenous signals into load forecasting at multiple scales. The public release of the aligned benchmark, preprocessing scripts, and results is a clear strength that supports standardized evaluation in the field.

major comments (2)
  1. [Experimental results] The headline performance claim (reaching or surpassing SOTA across regions and horizons) depends on cross-attention extracting causally relevant signals rather than spurious correlations from daily text aligned to half-hour loads. With only three years (2019-2021) of data for five states, calendar/event patterns in the texts are likely to co-vary with load in dataset-specific ways; this assumption is load-bearing and requires explicit controls such as ablation on text sources, out-of-distribution testing, or temporal hold-out beyond the current corpus (see Experimental results and Methods sections).
  2. [Methods] The description of the cross-attention fusion and plug-and-play memory interface during rolling forecasts lacks sufficient implementation detail (e.g., exact alignment procedure, attention masking for future text, and how external memory is updated) to allow independent replication of the claimed gains; this directly affects verifiability of the multi-source fusion mechanism.
minor comments (2)
  1. [Abstract] Clarify in the abstract and §1 which exact baselines (e.g., specific STanHOP variants or other SOTA models) are used for the 'significantly outperforms' claim and report effect sizes or statistical significance tests.
  2. [Evaluation protocol] The unified protocol for multi-scale evaluation is a positive feature, but the paper should explicitly state how forecast horizons are defined at daily and monthly scales to avoid ambiguity in cross-region comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, providing the strongest honest defense of the manuscript while agreeing to revisions that improve clarity and rigor where needed.

read point-by-point responses
  1. Referee: [Experimental results] The headline performance claim (reaching or surpassing SOTA across regions and horizons) depends on cross-attention extracting causally relevant signals rather than spurious correlations from daily text aligned to half-hour loads. With only three years (2019-2021) of data for five states, calendar/event patterns in the texts are likely to co-vary with load in dataset-specific ways; this assumption is load-bearing and requires explicit controls such as ablation on text sources, out-of-distribution testing, or temporal hold-out beyond the current corpus (see Experimental results and Methods sections).

    Authors: We appreciate the referee's emphasis on validating that performance gains stem from causally relevant textual signals. The manuscript already reports source-level ablations (news-only, social-media-only, policy-only, and full multi-source) demonstrating that combining sources yields consistent gains over any single source across regions and horizons. The evaluation uses a rolling-forecast protocol with strict temporal separation between training and test windows within the 2019-2021 corpus, which functions as an internal temporal hold-out. To further address the concern about dataset-specific co-variation, we will add an explicit out-of-distribution experiment in the revised Experimental results section: we will hold out specific high-impact event windows (e.g., major policy announcements and extreme weather periods identified in the texts) and quantify performance drop when those textual signals are removed or masked. This addition will be supported by the already-released benchmark and preprocessing scripts, allowing readers to extend the analysis. revision: yes

  2. Referee: [Methods] The description of the cross-attention fusion and plug-and-play memory interface during rolling forecasts lacks sufficient implementation detail (e.g., exact alignment procedure, attention masking for future text, and how external memory is updated) to allow independent replication of the claimed gains; this directly affects verifiability of the multi-source fusion mechanism.

    Authors: We agree that the current Methods description is insufficient for full independent replication. In the revised manuscript we will add a dedicated subsection titled 'Cross-Attention Fusion and External Memory Interface' that specifies: (1) the exact alignment procedure, which maps each daily-aggregated text document to all half-hour load intervals of the corresponding calendar day via timestamp matching; (2) the attention masking rule used at inference time, which applies a causal mask so that the model only attends to text available up to the forecast start time (preventing any future-text leakage); and (3) the update protocol for the plug-and-play external memory, including how new text embeddings are appended and how the memory is refreshed at each rolling step without retraining the core model. We will also include pseudocode and an expanded diagram of the memory interface. revision: yes

Circularity Check

0 steps flagged

Empirical performance claims rest on held-out evaluation; minor self-citation to base model not load-bearing

full rationale

The paper's derivation consists of constructing an aligned benchmark dataset (2019-2021 Australian load + daily texts), modifying the STanHOP architecture with cross-attention fusion and external memory, then reporting MSE/MAE improvements on rolling forecasts across hourly/daily/monthly horizons. These results are measured against external baselines on held-out periods and do not reduce to any fitted parameter being renamed as a prediction or to a self-referential definition. The single citation to STanHOP is used only to describe the starting architecture; the claimed gains are independently verified by the new experiments and benchmark release. No uniqueness theorem, ansatz smuggling, or renaming of known results occurs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The model rests on standard transformer attention mechanisms and the assumption that textual data contains predictive information for load; no new physical entities are postulated and the only free parameters are typical neural-network hyperparameters.

free parameters (1)
  • cross-attention hyperparameters
    Dimensions and number of heads in the cross-attention layers are chosen during model design and affect fusion quality.
axioms (2)
  • domain assumption Textual data from news, social media, and policy documents contains information causally relevant to future electricity demand beyond what numerical weather and calendar variables already provide.
    Invoked when claiming that text-guided fusion improves forecasting accuracy.
  • domain assumption Daily aggregation of text can be meaningfully aligned to half-hour load intervals without losing critical timing information.
    Required for the strict alignment step described in the abstract.

pith-pipeline@v0.9.0 · 5560 in / 1499 out tokens · 30115 ms · 2026-05-16T21:41:00.077547+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

  1. [1]

    Probabilistic electric load forecast- ing: A tutorial review

    Hong T, Fan S. Probabilistic electric load forecast- ing: A tutorial review. Int J Forecast 2016;32:914–938. https://doi.org/10.1016/j.ijforecast.2015.11.011. 53

  2. [2]

    A hybrid model based on data prepro- cessing for electrical power forecasting

    Xiao L, Wang J, Yang X, Xiao L. A hybrid model based on data prepro- cessing for electrical power forecasting. Int J Electr Power Energy Syst 2015;64:311–327. https://doi.org/10.1016/j.ijepes.2014.07.029

  3. [3]

    Short-term load forecasting for the holidays using fuzzy linear regression method

    Song K-B, Baek Y-S, Hong D-H, Jang G. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans Power Syst 2005;20(1):96–101. https://doi.org/10.1109/TPWRS.2004.835632

  4. [4]

    Review of power system load forecasting and its development directions (in Chinese)

    Kang C, Xia Q, Zhang B. Review of power system load forecasting and its development directions (in Chinese). Autom Electr Power Syst 2004;28(17):1–11. DOI 10.3321/j.issn:1000-1026.2004.17.001.https:// qikan.cqvip.com/Qikan/Article/Detail?id=10507343

  5. [5]

    On the enrichment of time series with textual data for forecasting agricultural commodity prices

    Reis Filho IJ, Marcacini RM, Rezende SO. On the enrichment of time series with textual data for forecasting agricultural commodity prices. MethodsX 2022;9:101758. https://doi.org/10.1016/j.mex.2022.101758

  6. [6]

    Text-based crude oil price forecast- ing: A deep learning approach

    Li X, Shang W, Wang S. Text-based crude oil price forecast- ing: A deep learning approach. Int J Forecast 2019;35:1548–1560. https://doi.org/10.1016/j.ijforecast.2018.10.004

  7. [7]

    Beyond trend and periodic- ity: Guiding time series forecasting with textual cues

    Xu Z, Bian Y, Zhong J, Wen X, Xu Q. Beyond trend and periodic- ity: Guiding time series forecasting with textual cues. arXiv preprint arXiv:2405.13522; 2024. https://doi.org/10.48550/arXiv.2405.13522

  8. [8]

    From news to fore- cast: Integrating event analysis in LLM-based time series forecast- ing with reflection

    Wang X, Feng M, Qiu J, Gu J, Zhao J. From news to fore- cast: Integrating event analysis in LLM-based time series forecast- ing with reflection. In: Adv Neural Inf Process Syst 37; 2024. https://doi.org/10.52202/079017-1853

  9. [9]

    Dual-Forecaster: A multimodal time series model integrating descriptive and predictive texts

    Wu W, Zhang G, Tan Z, Wang Y, Qi H. Dual-Forecaster: A multimodal time series model integrating descriptive and predictive texts. arXiv preprint arXiv:2505.01135; 2025. https://doi.org/10.48550/arXiv.2505.01135

  10. [10]

    Context-aware probabilistic modeling with LLM for multimodal time series forecasting

    Yao Y, Li J, Dai X, Zhang M, Gong X, Wang F-Y, Lv Y. Context-aware probabilistic modeling with LLM for multimodal time series forecasting. arXiv preprint arXiv:2505.10774; 2025. https://doi.org/10.48550/arXiv.2505.10774. 54

  11. [11]

    Deep learning for time series forecasting: A survey

    Torres JF, Hadjout D, Sebaa A, Martínez-Álvarez F, Troncoso A. Deep learning for time series forecasting: A survey. Big Data 2021;9(1):3–21. https://doi.org/10.1089/big.2020.0159

  12. [12]

    Deep learning for time series forecasting: A survey

    Kong X, Chen Z, Liu W, Ning K, Zhang L, Marier SM, Liu Y, Chen Y, Xia F. Deep learning for time series forecasting: A survey. Int J Mach Learn Cybern 2025;16:5079–5112. https://doi.org/10.1007/s13042-025- 02560-w

  13. [13]

    Deep learning framework to fore- cast electricity demand

    Bedi J, Toshniwal D. Deep learning framework to fore- cast electricity demand. Appl Energy 2019;238:1312–1326. https://doi.org/10.1016/j.apenergy.2019.01.113

  14. [14]

    A multi-task learning method for multi-energy load forecasting based on synthesis correlation analysis and load participation factor

    Tan M, Liao C, Chen J, Cao Y, Wang R, Su Y. A multi-task learning method for multi-energy load forecasting based on synthesis correlation analysis and load participation factor. Appl Energy 2023;343:121177. https://doi.org/10.1016/j.apenergy.2023.121177

  15. [15]

    Multi-energy load forecasting via hierarchical multi-task learn- ing and spatiotemporal attention

    Song C, Yang H, Cai J, Yang P, Bao H, Xu K, Meng X- B. Multi-energy load forecasting via hierarchical multi-task learn- ing and spatiotemporal attention. Appl Energy 2024;373:123788. https://doi.org/10.1016/j.apenergy.2024.123788

  16. [16]

    A novel bidirectional mecha- nismbasedontimeseriesmodelforwindpowerforecasting.ApplEnergy 2016;177:793–803

    Zhao Y, Ye L, Li Z, Song X, Lang Y, Su J. A novel bidirectional mecha- nismbasedontimeseriesmodelforwindpowerforecasting.ApplEnergy 2016;177:793–803. https://doi.org/10.1016/j.apenergy.2016.03.096

  17. [17]

    STanHop: Sparse tandem Hop- field model for memory-enhanced time series prediction

    Wu D, Hu JYC, Li W, Chen BY, Liu H. STanHop: Sparse tandem Hop- field model for memory-enhanced time series prediction. arXiv preprint arXiv:2312.17346; 2023.https://arxiv.org/abs/2312.17346

  18. [18]

    Sentence-BERT: Sentence embeddings using Siamese BERT-networks

    Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proc 2019 Conf Empir Methods Nat Lang Process (EMNLP); 2019, p. 3982–3992. https://doi.org/10.18653/v1/D19-1410

  19. [19]

    Australian Energy Market Operator (AEMO). NEM regional boundaries map [PDF].https://www.aemo.com.au/-/media/ files/electricity/nem/planning_and_forecasting/maps/ nem-regional-boundaries-map-web.pdf?la=en(accessed 23 Nov 2025). 55

  20. [20]

    Aggregated price and demand data – National Electricity Market (NEM) [dataset]

    Australian Energy Market Operator (AEMO). Aggregated price and demand data – National Electricity Market (NEM) [dataset]. 2025.https://www.aemo.com.au/energy-systems/electricity/ national-electricity-market-nem/data-nem/aggregated-data (accessed 23 Nov 2025)

  21. [21]

    Operational man- agement of low demand in South Australia

    Australian Energy Market Operator (AEMO). Operational man- agement of low demand in South Australia. Melbourne: AEMO; 2020.https://www.aemo.com.au/-/media/files/electricity/ nem/security_and_reliability/congestion-information/2020/ operational-management-of-low-demand-in-south-australia. pdf(accessed 23 Nov 2025)

  22. [22]

    News releases — Australian En- ergy Regulator

    Australian Energy Regulator. News releases — Australian En- ergy Regulator. 2025.https://www.aer.gov.au/news/articles/ news-releases(accessed 17 Oct 2025)

  23. [23]

    Market notices for the National Electricity Market (NEM) [online]

    Australian Energy Market Operator (AEMO). Market notices for the National Electricity Market (NEM) [online]. 2025.https://www.aemo. com.au/market-notices(accessed 23 Nov 2025)

  24. [24]

    Discussions on renewable energy and load fore- casting

    Reddit Energy Forum. Discussions on renewable energy and load fore- casting. 2025.https://www.reddit.com/r/energy/(accessed 17 Oct 2025)

  25. [25]

    2025.https://www.aemc.gov.au/(accessed 17 Oct 2025)

    AustralianEnergyMarketCommission.Overviewofenergymarketrules and policy updates. 2025.https://www.aemc.gov.au/(accessed 17 Oct 2025)

  26. [26]

    Energy strategies and frameworks — Australian Government

    Department of Climate Change, Energy, the Environment and Water. Energy strategies and frameworks — Australian Government. 2025. https://www.dcceew.gov.au/energy/strategies-and-frameworks (accessed 17 Oct 2025)

  27. [27]

    Hopfield Networks is All You Need

    Ramsauer H, Schäfl B, Lehner J, Seidl P, Widrich M, Adler T, et al. Hopfieldnetworksisallyouneed.arXivpreprintarXiv:2008.02217; 2020. https://arxiv.org/abs/2008.02217

  28. [28]

    On sparse modern Hopfieldmodel.In: AdvNeuralInfProcessSyst36; 2023.arXivpreprint arXiv:2309.12673; 2023

    Hu JYC, Yang D, Wu D, Xu C, Chen B-Y, Liu H. On sparse modern Hopfieldmodel.In: AdvNeuralInfProcessSyst36; 2023.arXivpreprint arXiv:2309.12673; 2023. https://doi.org/10.48550/arXiv.2309.12673. 56

  29. [29]

    Possible generalization of Boltzmann–Gibbs statistics

    Tsallis C. Possible generalization of Boltzmann–Gibbs statistics. J Stat Phys 1988;52:479–487. https://doi.org/10.1007/BF01016429

  30. [30]

    From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

    Martins AFT, Astudillo RF. From softmax to sparsemax: A sparse model of attention and multi-label classification. In: Proc 33rd Int Conf Mach Learn (ICML); 2016, p. 1614–1623. arXiv:1602.02068

  31. [31]

    Sparse sequence-to-sequence models

    Peters B, Niculae V, Martins AFT. Sparse sequence-to-sequence models. In: Proc 57th Annu Meet Assoc Comput Linguist (ACL); 2019, p. 1504–

  32. [32]

    https://doi.org/10.18653/v1/P19-1152

  33. [33]

    A sparse quantized Hopfield network for online continual associative memory

    Alonso N, Brea J, Rajendran B. A sparse quantized Hopfield network for online continual associative memory. Nat Commun 2024;15:3427. https://doi.org/10.1038/s41467-024-46976-4

  34. [34]

    Another look at measures of forecast accuracy

    Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J Forecast 2006;22:679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001

  35. [35]

    Comparing predic- tive accuracy

    Diebold FX, Mariano RS. Comparing predic- tive accuracy. J Bus Econ Stat 1995;13:253–263. https://doi.org/10.1080/07350015.1995.10524599

  36. [36]

    In- former: Beyond efficient transformer for long sequence time-series forecasting

    Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W. In- former: Beyond efficient transformer for long sequence time-series forecasting. Proc AAAI Conf Artif Intell 2021;35(12):11106–11115. https://doi.org/10.1609/aaai.v35i12.17325

  37. [37]

    TS2Vec: Towards universal representation of time series

    Yue Z, Wang Y, Duan J, Yang T, Huang C, Tong Y, Xu B. TS2Vec: Towards universal representation of time series. Proc AAAI Conf Artif Intell 2022;36(8):8980–8987. https://doi.org/10.1609/aaai.v36i8.20881

  38. [38]

    energy function + gradient-based retrieval

    Xue H, Salim FD. PromptCast: A new prompt- based learning paradigm for time series forecasting. IEEE Trans Knowl Data Eng 2024;36(11):12523–12536. https://doi.org/10.1109/TKDE.2023.3342137. 57 Appendix A. Additional figures 58 (a)NoExt (without external texts). (b)News only. (c)Reddit only. (d)Policy only. (e)All (News + Reddit + Policy). Figure .12:State...