{"paper":{"title":"A decoder-only foundation model for time-series forecasting","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A pretrained decoder-only model achieves zero-shot time-series forecasting accuracy close to supervised state-of-the-art on public datasets.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CL","authors_text":"Abhimanyu Das, Rajat Sen, Weihao Kong, Yichen Zhou","submitted_at":"2023-10-14T17:01:37Z","abstract_excerpt":"Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities."},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"our model ... whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That pretraining on the chosen large time-series corpus produces representations that generalize to unseen datasets and varying temporal granularities without any fine-tuning or dataset-specific adaptation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A pretrained decoder-only patched transformer achieves near state-of-the-art zero-shot forecasting performance across diverse time series datasets and settings.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A pretrained decoder-only model achieves zero-shot time-series forecasting accuracy close to supervised state-of-the-art on public datasets.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"10432633a53889c67bddf3d7ba4be692d7d259a10dd2e0708d31c6f361e56127"},"source":{"id":"2310.10688","kind":"arxiv","version":4},"verdict":{"id":"a11669d4-0631-4026-827e-a3a9dde075c0","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T18:02:54.453677Z","strongest_claim":"our model ... whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset.","one_line_summary":"A pretrained decoder-only patched transformer achieves near state-of-the-art zero-shot forecasting performance across diverse time series datasets and settings.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That pretraining on the chosen large time-series corpus produces representations that generalize to unseen datasets and varying temporal granularities without any fine-tuning or dataset-specific adaptation.","pith_extraction_headline":"A pretrained decoder-only model achieves zero-shot time-series forecasting accuracy close to supervised state-of-the-art on public datasets."},"references":{"count":23,"sample":[{"doi":"","year":null,"title":"On the benefits of maximum likelihood estimation for regression and forecasting","work_id":"a1639cb9-9c45-4bbb-9fc4-a097a401eeb7","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Conditional time series forecast- ing with convolutional neural networks","work_id":"7f4b6e11-19a1-4dc6-b780-879e5a12fe12","ref_index":2,"cited_arxiv_id":"1703.04691","is_internal_anchor":true},{"doi":"","year":null,"title":"Tsmixer: An all-mlp architecture for time series forecasting","work_id":"54394ebd-5fcb-41f1-a7f5-9d0a980d589c","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"[COO+23] Cristian Challu, Kin G. Olivares, Boris N. Oreshkin, Federico Garza, Max Mergenthaler, and Artur Dubrawski. NHITS: Neural Hierarchical Interpolation for Time Series forecasting. In The Associ","work_id":"c851a0e6-3482-457d-b72a-65e82a9a81bb","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms","work_id":"b6995864-2702-40f5-8597-58a3119807ef","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":23,"snapshot_sha256":"eec9f8030303f29cee0c17c1753bb68be8846bc45c61685defb4c3fa6952f8ac","internal_anchors":7},"formal_canon":{"evidence_count":3,"snapshot_sha256":"a718f7ccf92f77ac4bbf7b962f3352106e146d48b5ee9484b04cc7d5ac4acebe"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}