FTimeXer: Frequency-aware Time-series Transformer with Exogenous variables for Robust Carbon Footprint Forecasting

Hui Ma; Jinhai Sa; Qingchang Ma; Qingzhong Li; Yue Hu; Zhou Long

arxiv: 2604.02347 · v1 · submitted 2026-02-16 · 💻 cs.LG

FTimeXer: Frequency-aware Time-series Transformer with Exogenous variables for Robust Carbon Footprint Forecasting

Qingzhong Li , Yue Hu , Zhou Long , Qingchang Ma , Hui Ma , Jinhai Sa This is my paper

Pith reviewed 2026-05-15 22:15 UTC · model grok-4.3

classification 💻 cs.LG

keywords time series forecastingtransformercarbon footprintfrequency analysisexogenous variablesFFTnon-stationary datamasking regularization

0 comments

The pith

FTimeXer uses an FFT frequency branch and stochastic exogenous masking to improve carbon intensity forecasts on non-stationary grid data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FTimeXer as a Transformer model that adds a frequency-aware branch to capture multi-scale periodic patterns in power grid carbon intensity. It pairs this with a gated fusion mechanism and a training scheme that randomly masks exogenous inputs while enforcing consistency to limit reliance on noisy or missing side data. The authors test the model on three real-world datasets and report gains over existing baselines. These gains matter because carbon footprint accounting for products depends on timely, stable forecasts of grid emissions, which feed into decarbonization choices. If the method holds, it would supply more dependable inputs for environmental planning under irregular data conditions.

Core claim

FTimeXer features an FFT-driven frequency branch combined with gated time-frequency fusion, allowing it to capture multi-scale periodicity effectively. It also employs stochastic exogenous masking in conjunction with consistency regularization, which helps reduce spurious correlations and enhance stability. Experiments conducted on three real-world datasets show consistent improvements over strong baselines.

What carries the argument

FFT-driven frequency branch with gated time-frequency fusion, plus stochastic exogenous masking paired with consistency regularization.

If this is right

Forecasts of grid carbon intensity become more reliable when periodic components vary across scales.
Training remains stable even when exogenous variables arrive with gaps or timing offsets.
Product carbon footprint calculations can use fresher emission factors without large error spikes.
Decarbonization decisions gain a more consistent data foundation across different regions or seasons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same frequency-plus-masking design could apply to other forecasting tasks that mix periodic signals with incomplete side information, such as energy demand or renewable output.
Consistency regularization might serve as a lightweight way to regularize any Transformer that ingests exogenous channels prone to missing entries.
If the gated fusion proves general, it could be ported to other hybrid time-frequency architectures without redesigning the entire backbone.

Load-bearing premise

The performance gains come from the frequency branch and masking scheme rather than from unstated tuning choices or dataset-specific traits.

What would settle it

Retraining FTimeXer and the baselines on a new grid dataset that exhibits different dominant frequencies or higher rates of exogenous misalignment, then checking whether the reported accuracy edge disappears.

Figures

Figures reproduced from arXiv: 2604.02347 by Hui Ma, Jinhai Sa, Qingchang Ma, Qingzhong Li, Yue Hu, Zhou Long.

**Figure 1.** Figure 1: Structure of FTimeXer model. interactions, we typically perform an aggregation operation on Z (x) 0 along the temporal dimension T. In practice, common aggregation methods include global average pooling and a lightweight attention pooling layer.Finally, we obtain a variable token for each exogenous variable, denoted as vj ∈ Rd , where j indexes the different exogenous variables. The tokens of all exogenou… view at source ↗

**Figure 2.** Figure 2: Visualization results on Magnolia, California [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Accurate and up-to-date forecasting of the power grid's carbon footprint is crucial for effective product carbon footprint (PCF) accounting and informed decarbonization decisions. However, the carbon intensity of the grid exhibits high non-stationarity, and existing methods often struggle to effectively leverage periodic and oscillatory patterns. Furthermore, these methods tend to perform poorly when confronted with irregular exogenous inputs, such as missing data or misalignment. To tackle these challenges, we propose FTimeXer, a frequency-aware time-series Transformer designed with a robust training scheme that accommodates exogenous factors. FTimeXer features an Fast Fourier Transform (FFT)-driven frequency branch combined with gated time-frequency fusion, allowing it to capture multi-scale periodicity effectively. It also employs stochastic exogenous masking in conjunction with consistency regularization, which helps reduce spurious correlations and enhance stability. Experiments conducted on three real-world datasets show consistent improvements over strong baselines. As a result, these enhancements lead to more reliable forecasts of grid carbon factors, which are essential for effective PCF accounting and informed decision-making regarding decarbonization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FTimeXer adds an FFT frequency branch with gated fusion and stochastic exogenous masking to time-series Transformers for carbon intensity forecasting, but the abstract gives no ablations or controls to show those pieces actually drive the reported gains.

read the letter

The main takeaway is that this paper introduces FTimeXer, a Transformer that routes an FFT-based frequency branch through gated fusion to pick up multi-scale periodicity in grid carbon data, then adds stochastic masking on exogenous inputs plus consistency regularization to limit spurious correlations from missing or misaligned features. The target problem is real: carbon intensity is non-stationary and exogenous signals are often irregular, so a method that explicitly handles both could matter for product carbon accounting work. The architecture choices line up with the stated difficulties and are presented as a concrete combination not directly copied from prior time-series Transformers. What the paper does well is keep the focus narrow and practical rather than claiming broad new theory. The design is straightforward to describe and the application to decarbonization planning is clear. The soft spot is exactly the one in the stress-test note. The abstract only says the model shows consistent improvements on three real-world datasets over strong baselines, with no numbers, no ablation tables, no statement that baselines got the same tuning budget, and no checks on whether removing the frequency branch or the masking regularizer erases the edge. Without those controls it is impossible to attribute any lift to the new components rather than implementation details or dataset quirks. This is the kind of paper that belongs in an energy-forecasting or applied time-series reading group if the full experiments turn out to be solid. A practitioner looking for a ready-to-try model for carbon-factor prediction might get value from the ideas once the claims are backed by ablations. I would send it to peer review because the problem is useful, the proposal is specific, and the gaps are fixable with standard additions rather than fundamental flaws.

Referee Report

2 major / 2 minor

Summary. The paper proposes FTimeXer, a frequency-aware time-series Transformer for forecasting power grid carbon intensity. It introduces an FFT-driven frequency branch with gated time-frequency fusion to capture multi-scale periodicity and employs stochastic exogenous masking combined with consistency regularization to mitigate spurious correlations from irregular exogenous inputs. The central empirical claim is that this architecture yields consistent improvements over strong baselines on three real-world datasets, enabling more reliable carbon footprint forecasts for PCF accounting and decarbonization decisions.

Significance. If the reported gains are shown to arise specifically from the frequency branch and masking regularizer rather than tuning artifacts, the work could offer a practical advance in handling non-stationary time series with exogenous variables in energy applications. The approach targets a concrete sustainability use case where robustness to missing or misaligned inputs matters.

major comments (2)

[Experiments] Experiments section: the claim of 'consistent improvements over strong baselines' on three datasets is presented without ablation results that isolate the FFT frequency branch or the stochastic masking + consistency regularization (e.g., performance drop when either component is removed). This is load-bearing for the architectural contribution, as gains could arise from hyperparameter search effort or dataset-specific factors rather than the proposed mechanisms.
[Abstract] Abstract and Experiments: no information is given on the exact metrics (MAE, RMSE, etc.), baseline implementations, hyperparameter tuning budgets for baselines versus FTimeXer, or statistical significance of the deltas. Without these controls, the robustness claims under non-stationarity shifts cannot be evaluated.

minor comments (2)

[Abstract] The abstract would be clearer if it named the three datasets and reported the magnitude of improvements (e.g., average percentage reduction in error).
Notation for the gated fusion and consistency loss terms should be defined explicitly when first introduced to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and will revise the paper accordingly to strengthen the empirical validation.

read point-by-point responses

Referee: [Experiments] Experiments section: the claim of 'consistent improvements over strong baselines' on three datasets is presented without ablation results that isolate the FFT frequency branch or the stochastic masking + consistency regularization (e.g., performance drop when either component is removed). This is load-bearing for the architectural contribution, as gains could arise from hyperparameter search effort or dataset-specific factors rather than the proposed mechanisms.

Authors: We agree that ablation studies are necessary to isolate the contributions of the FFT frequency branch and the stochastic exogenous masking with consistency regularization. In the revised manuscript, we will add these ablations across all three datasets, reporting the performance drops when each component is removed. This will demonstrate that the gains arise from the proposed mechanisms. revision: yes
Referee: [Abstract] Abstract and Experiments: no information is given on the exact metrics (MAE, RMSE, etc.), baseline implementations, hyperparameter tuning budgets for baselines versus FTimeXer, or statistical significance of the deltas. Without these controls, the robustness claims under non-stationarity shifts cannot be evaluated.

Authors: We will update the abstract to specify the primary evaluation metrics (MAE and RMSE) and expand the Experiments section to detail baseline implementations (using official codebases where available), the hyperparameter tuning protocol with equivalent search budgets for all models, and statistical significance tests (e.g., paired t-tests) on the observed improvements. These revisions will allow proper assessment of the robustness claims. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposal rests on empirical validation without self-referential derivations

full rationale

The paper introduces FTimeXer as a Transformer variant incorporating an FFT-driven frequency branch with gated fusion, plus stochastic exogenous masking and consistency regularization. These are presented as design choices whose value is assessed via experiments on three real-world datasets showing improvements over baselines. No equations, parameter-fitting steps, or derivation chains appear in the text that reduce any claimed prediction or result back to the inputs by construction. No self-citation load-bearing premises, uniqueness theorems, or ansatzes smuggled via prior work are invoked. The central claims therefore remain independent of the circularity patterns enumerated in the analysis criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Abstract-only review prevents exhaustive extraction; the design implicitly assumes standard time-series properties and that frequency content plus masking will improve robustness.

axioms (2)

domain assumption Grid carbon intensity exhibits multi-scale periodicity that FFT can usefully isolate.
Invoked to justify the frequency branch design.
domain assumption Stochastic masking plus consistency regularization reduces spurious correlations from exogenous inputs.
Core justification for the training scheme.

invented entities (1)

FTimeXer architecture no independent evidence
purpose: Frequency-aware Transformer with robust exogenous handling for carbon forecasting.
New model introduced in the paper; no independent evidence supplied beyond the claimed experimental gains.

pith-pipeline@v0.9.0 · 5498 in / 1348 out tokens · 39423 ms · 2026-05-15T22:15:07.789225+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FTimeXer features an FFT-driven frequency branch combined with gated time-frequency fusion, allowing it to capture multi-scale periodicity effectively.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

It also employs stochastic exogenous masking in conjunction with consistency regularization

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

Climate change 2023: synthesis report. contribution of working groups i, ii and iii to the sixth assessment report of the intergovernmental panel on climate change,

H. Lee, K. Calvin, D. Dasgupta, G. Krinner, A. Mukherji, P. Thorne, C. Trisos, J. Romero, P. Aldunce, K. Barrettet al., “Climate change 2023: synthesis report. contribution of working groups i, ii and iii to the sixth assessment report of the intergovernmental panel on climate change,” 2023

work page 2023
[2]

Regulation (eu) 2023/956 of the european parliament and of the council of 10 may 2023 establishing a carbon border adjustment mechanism,

European Union, “Regulation (eu) 2023/956 of the european parliament and of the council of 10 may 2023 establishing a carbon border adjustment mechanism,” Official Journal of the European Union, 2023, oJ L 130, 16.5.2023. Accessed: 2026-01-29

work page 2023
[3]

Ghg protocol scope 2 guidance,

G. Protocol, “Ghg protocol scope 2 guidance,”An amendment to the GHG Protocol Corporate Standard, 2015

work page 2015
[4]

Dacf: day-ahead carbon intensity forecasting of power grids using machine learning,

D. Maji, R. K. Sitaraman, and P. Shenoy, “Dacf: day-ahead carbon intensity forecasting of power grids using machine learning,” inPro- ceedings of the Thirteenth ACM International Conference on Future Energy Systems, 2022, pp. 188–192

work page 2022
[5]

Application of a combined prediction method based on temporal decomposition and convolutional neural networks for the prediction of consumption in polysilicon reduction furnaces,

R. Ma, L. Zhang, X. Chao, S. Zheng, B. Xia, and Y . Zhao, “Application of a combined prediction method based on temporal decomposition and convolutional neural networks for the prediction of consumption in polysilicon reduction furnaces,”Processes, vol. 10, no. 7, p. 1311, 2022. （a）（b）（c） Fig. 2. Visualization results on Magnolia, California CT2, and Ne...

work page 2022
[6]

Modeling and random search optimization for the polysilicon cvd reactor,

B. Xi, G. Xiong, K. A. Kozin, C. He, T. S. Tamir, Y . Song, X. Liu, and Z. Shen, “Modeling and random search optimization for the polysilicon cvd reactor,”Results in Control and Optimization, vol. 13, p. 100320, 2023

work page 2023
[7]

Optimized scheduling model considering the demand response and sequential requirements of polysilicon production,

X. Wang, B. Chen, Y . Xiao, S. Liao, X. Ye, and J. Bai, “Optimized scheduling model considering the demand response and sequential requirements of polysilicon production,”Energies, vol. 17, no. 23, p. 6048, 2024

work page 2024
[8]

Frequency adaptive normalization for non-stationary time series forecasting,

W. Ye, S. Deng, Q. Zou, and N. Gui, “Frequency adaptive normalization for non-stationary time series forecasting,” inAdvances in Neural Information Processing Systems, 2024

work page 2024
[9]

Timexer: Empowering transformers for time series fore- casting with exogenous variables,

Y . Wang, H. Wu, J. Dong, G. Qin, H. Zhang, Y . Liu, Y . Qiu, J. Wang, and M. Long, “Timexer: Empowering transformers for time series fore- casting with exogenous variables,” inAdvances in Neural Information Processing Systems, 2024

work page 2024
[10]

Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,

K. G. Olivares, C. Challu, G. Marcjasz, R. Weron, and A. Dubrawski, “Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,”International Journal of Forecasting, 2023

work page 2023
[11]

Compat- ible transformer for irregularly sampled multivariate time series,

Y . Wei, J. Peng, T. He, C. Xu, J. Zhang, S. Pan, and S. Chen, “Compat- ible transformer for irregularly sampled multivariate time series,”arXiv preprint, 2023

work page 2023
[12]

Temporal fusion transform- ers for interpretable multi-horizon time series forecasting,

B. Lim, S. ¨O. Arik, N. Loeff, and T. Pfister, “Temporal fusion transform- ers for interpretable multi-horizon time series forecasting,”International Journal of Forecasting, vol. 37, no. 4, pp. 1748–1764, 2021

work page 2021
[13]

Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,

K. G. Olivares, C. Challu, G. Marcjasz, R. Weron, and A. Dubrawski, “Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,”International Journal of Forecasting, vol. 39, no. 2, pp. 884–900, 2023

work page 2023
[14]

Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series,

S. Tipirneni and C. K. Reddy, “Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series,”ACM Trans- actions on Knowledge Discovery from Data, vol. 16, no. 6, 2022

work page 2022
[15]

Csdi: Conditional score- based diffusion models for probabilistic time series imputation,

Y . Tashiro, J. Song, Y . Song, and S. Ermon, “Csdi: Conditional score- based diffusion models for probabilistic time series imputation,” in Advances in Neural Information Processing Systems, 2021

work page 2021
[16]

Saits: Self-attention-based imputation for time series,

W. Du, D. Cote, and Y . Liu, “Saits: Self-attention-based imputation for time series,”Expert Systems with Applications, vol. 219, p. 119619, 2023

work page 2023
[17]

Exploiting language power for time series forecasting with exogenous variables,

Q. Huanget al., “Exploiting language power for time series forecasting with exogenous variables,” inProceedings of the ACM Web Conference, 2025

work page 2025
[18]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, 2021, pp. 11 106–11 115

work page 2021
[19]

Etsformer: Exponen- tial smoothing transformers for time-series forecasting,

G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “Etsformer: Exponen- tial smoothing transformers for time-series forecasting,” inInternational Conference on Learning Representations, 2023

work page 2023
[20]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Advances in Neural Information Processing Systems, 2021

work page 2021
[21]

A time series is worth 64 words: Long-term forecasting with transformers,

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in International Conference on Learning Representations, 2023

work page 2023
[22]

itrans- former: Inverted transformers are effective for time series forecasting,

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itrans- former: Inverted transformers are effective for time series forecasting,” inInternational Conference on Learning Representations, 2024

work page 2024
[23]

A survey on deep learning based time series analysis with frequency transformation,

K. Yi, Q. Zhang, W. Fan, L. Cao, S. Wang, G. Long, L. Hu, H. He, Q. Wen, and H. Xiong, “A survey on deep learning based time series analysis with frequency transformation,” 2025

work page 2025
[24]

FEDformer: Frequency enhanced decomposed transformer for long-term series fore- casting,

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “FEDformer: Frequency enhanced decomposed transformer for long-term series fore- casting,” inInternational Conference on Machine Learning, 2022

work page 2022
[25]

Spectral temporal graph neural network for multivariate time-series forecasting,

D. Cao, Y . Wang, J. Duan, C. Zhang, X. Zhu, C. Huang, Y . Tong, B. Xu, J. Bai, J. Tong, and Q. Zhang, “Spectral temporal graph neural network for multivariate time-series forecasting,” inAdvances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020

work page 2020
[26]

Fourier neural operator for para- metric partial differential equations,

Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar, “Fourier neural operator for para- metric partial differential equations,” in9th International Conference on Learning Representations (ICLR 2021). OpenReview.net, 2021

work page 2021
[27]

Simmtm: A simple pre-training framework for masked time-series modeling,

J. Dong, H. Wu, H. Zhang, L. Zhang, J. Wang, and M. Long, “Simmtm: A simple pre-training framework for masked time-series modeling,” in Advances in Neural Information Processing Systems, 2023

work page 2023
[28]

Context consistency regularization for label sparsity in time series,

Y . Shin, S. Yoon, H. Song, D. Park, B. Kim, J.-G. Lee, and B. S. Lee, “Context consistency regularization for label sparsity in time series,” in Proceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 31 579–31 595

work page 2023

[1] [1]

Climate change 2023: synthesis report. contribution of working groups i, ii and iii to the sixth assessment report of the intergovernmental panel on climate change,

H. Lee, K. Calvin, D. Dasgupta, G. Krinner, A. Mukherji, P. Thorne, C. Trisos, J. Romero, P. Aldunce, K. Barrettet al., “Climate change 2023: synthesis report. contribution of working groups i, ii and iii to the sixth assessment report of the intergovernmental panel on climate change,” 2023

work page 2023

[2] [2]

Regulation (eu) 2023/956 of the european parliament and of the council of 10 may 2023 establishing a carbon border adjustment mechanism,

European Union, “Regulation (eu) 2023/956 of the european parliament and of the council of 10 may 2023 establishing a carbon border adjustment mechanism,” Official Journal of the European Union, 2023, oJ L 130, 16.5.2023. Accessed: 2026-01-29

work page 2023

[3] [3]

Ghg protocol scope 2 guidance,

G. Protocol, “Ghg protocol scope 2 guidance,”An amendment to the GHG Protocol Corporate Standard, 2015

work page 2015

[4] [4]

Dacf: day-ahead carbon intensity forecasting of power grids using machine learning,

D. Maji, R. K. Sitaraman, and P. Shenoy, “Dacf: day-ahead carbon intensity forecasting of power grids using machine learning,” inPro- ceedings of the Thirteenth ACM International Conference on Future Energy Systems, 2022, pp. 188–192

work page 2022

[5] [5]

Application of a combined prediction method based on temporal decomposition and convolutional neural networks for the prediction of consumption in polysilicon reduction furnaces,

R. Ma, L. Zhang, X. Chao, S. Zheng, B. Xia, and Y . Zhao, “Application of a combined prediction method based on temporal decomposition and convolutional neural networks for the prediction of consumption in polysilicon reduction furnaces,”Processes, vol. 10, no. 7, p. 1311, 2022. （a）（b）（c） Fig. 2. Visualization results on Magnolia, California CT2, and Ne...

work page 2022

[6] [6]

Modeling and random search optimization for the polysilicon cvd reactor,

B. Xi, G. Xiong, K. A. Kozin, C. He, T. S. Tamir, Y . Song, X. Liu, and Z. Shen, “Modeling and random search optimization for the polysilicon cvd reactor,”Results in Control and Optimization, vol. 13, p. 100320, 2023

work page 2023

[7] [7]

Optimized scheduling model considering the demand response and sequential requirements of polysilicon production,

X. Wang, B. Chen, Y . Xiao, S. Liao, X. Ye, and J. Bai, “Optimized scheduling model considering the demand response and sequential requirements of polysilicon production,”Energies, vol. 17, no. 23, p. 6048, 2024

work page 2024

[8] [8]

Frequency adaptive normalization for non-stationary time series forecasting,

W. Ye, S. Deng, Q. Zou, and N. Gui, “Frequency adaptive normalization for non-stationary time series forecasting,” inAdvances in Neural Information Processing Systems, 2024

work page 2024

[9] [9]

Timexer: Empowering transformers for time series fore- casting with exogenous variables,

Y . Wang, H. Wu, J. Dong, G. Qin, H. Zhang, Y . Liu, Y . Qiu, J. Wang, and M. Long, “Timexer: Empowering transformers for time series fore- casting with exogenous variables,” inAdvances in Neural Information Processing Systems, 2024

work page 2024

[10] [10]

Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,

K. G. Olivares, C. Challu, G. Marcjasz, R. Weron, and A. Dubrawski, “Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,”International Journal of Forecasting, 2023

work page 2023

[11] [11]

Compat- ible transformer for irregularly sampled multivariate time series,

Y . Wei, J. Peng, T. He, C. Xu, J. Zhang, S. Pan, and S. Chen, “Compat- ible transformer for irregularly sampled multivariate time series,”arXiv preprint, 2023

work page 2023

[12] [12]

Temporal fusion transform- ers for interpretable multi-horizon time series forecasting,

B. Lim, S. ¨O. Arik, N. Loeff, and T. Pfister, “Temporal fusion transform- ers for interpretable multi-horizon time series forecasting,”International Journal of Forecasting, vol. 37, no. 4, pp. 1748–1764, 2021

work page 2021

[13] [13]

Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,

K. G. Olivares, C. Challu, G. Marcjasz, R. Weron, and A. Dubrawski, “Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx,”International Journal of Forecasting, vol. 39, no. 2, pp. 884–900, 2023

work page 2023

[14] [14]

Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series,

S. Tipirneni and C. K. Reddy, “Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series,”ACM Trans- actions on Knowledge Discovery from Data, vol. 16, no. 6, 2022

work page 2022

[15] [15]

Csdi: Conditional score- based diffusion models for probabilistic time series imputation,

Y . Tashiro, J. Song, Y . Song, and S. Ermon, “Csdi: Conditional score- based diffusion models for probabilistic time series imputation,” in Advances in Neural Information Processing Systems, 2021

work page 2021

[16] [16]

Saits: Self-attention-based imputation for time series,

W. Du, D. Cote, and Y . Liu, “Saits: Self-attention-based imputation for time series,”Expert Systems with Applications, vol. 219, p. 119619, 2023

work page 2023

[17] [17]

Exploiting language power for time series forecasting with exogenous variables,

Q. Huanget al., “Exploiting language power for time series forecasting with exogenous variables,” inProceedings of the ACM Web Conference, 2025

work page 2025

[18] [18]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, 2021, pp. 11 106–11 115

work page 2021

[19] [19]

Etsformer: Exponen- tial smoothing transformers for time-series forecasting,

G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “Etsformer: Exponen- tial smoothing transformers for time-series forecasting,” inInternational Conference on Learning Representations, 2023

work page 2023

[20] [20]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” in Advances in Neural Information Processing Systems, 2021

work page 2021

[21] [21]

A time series is worth 64 words: Long-term forecasting with transformers,

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in International Conference on Learning Representations, 2023

work page 2023

[22] [22]

itrans- former: Inverted transformers are effective for time series forecasting,

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itrans- former: Inverted transformers are effective for time series forecasting,” inInternational Conference on Learning Representations, 2024

work page 2024

[23] [23]

A survey on deep learning based time series analysis with frequency transformation,

K. Yi, Q. Zhang, W. Fan, L. Cao, S. Wang, G. Long, L. Hu, H. He, Q. Wen, and H. Xiong, “A survey on deep learning based time series analysis with frequency transformation,” 2025

work page 2025

[24] [24]

FEDformer: Frequency enhanced decomposed transformer for long-term series fore- casting,

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “FEDformer: Frequency enhanced decomposed transformer for long-term series fore- casting,” inInternational Conference on Machine Learning, 2022

work page 2022

[25] [25]

Spectral temporal graph neural network for multivariate time-series forecasting,

D. Cao, Y . Wang, J. Duan, C. Zhang, X. Zhu, C. Huang, Y . Tong, B. Xu, J. Bai, J. Tong, and Q. Zhang, “Spectral temporal graph neural network for multivariate time-series forecasting,” inAdvances in Neural Information Processing Systems 33 (NeurIPS 2020), 2020

work page 2020

[26] [26]

Fourier neural operator for para- metric partial differential equations,

Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar, “Fourier neural operator for para- metric partial differential equations,” in9th International Conference on Learning Representations (ICLR 2021). OpenReview.net, 2021

work page 2021

[27] [27]

Simmtm: A simple pre-training framework for masked time-series modeling,

J. Dong, H. Wu, H. Zhang, L. Zhang, J. Wang, and M. Long, “Simmtm: A simple pre-training framework for masked time-series modeling,” in Advances in Neural Information Processing Systems, 2023

work page 2023

[28] [28]

Context consistency regularization for label sparsity in time series,

Y . Shin, S. Yoon, H. Song, D. Park, B. Kim, J.-G. Lee, and B. S. Lee, “Context consistency regularization for label sparsity in time series,” in Proceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 31 579–31 595

work page 2023