TimeRouter: Efficient and Adaptive Routing of Time-Series Foundation Models

Anderson Schneider; Dongjin Song; Kanghui Ning; Kashif Rasul; Yuriy Nevmyvaka; Yushan Jiang

arxiv: 2606.11625 · v1 · pith:LRGUKERSnew · submitted 2026-06-10 · 💻 cs.LG

TimeRouter: Efficient and Adaptive Routing of Time-Series Foundation Models

Kanghui Ning , Yushan Jiang , Kashif Rasul , Anderson Schneider , Yuriy Nevmyvaka , Dongjin Song This is my paper

Pith reviewed 2026-06-27 10:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords time-series forecastingfoundation modelsmodel routingensemble methodsadaptive selectionlightweight inferenceTSFM

0 comments

The pith

TimeRouter routes among time-series foundation models with a learned head, selective gate, and ensemble fallback to reach 0.6765 LB MASE on GIFT-EVAL without LLM calls at inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Time-series foundation models show different strengths across forecasting regimes, so no single model wins on every task. Existing agentic systems often hand the choice to an LLM controller, which adds heavy inference cost. TimeRouter replaces that controller with a small learned routing head that scores experts, a selective gate that decides whether to use one expert or fall back to an ensemble, and an ensemble option when uncertainty is high. On the GIFT-EVAL leaderboard the method records the lowest MASE yet reported. Ablations inside the work indicate that both the composition of the model pool and the presence of the gate contribute to the gains.

Core claim

TimeRouter combines a learned routing head, a selective gate, and an ensemble fallback to adaptively select among a pool of pretrained time-series foundation models. The routing head is trained discriminatively on input features to predict which expert will perform best; the gate then either routes to the top expert or triggers the ensemble when no expert is sufficiently confident. This design captures empirical complementarity across the pool at far lower cost than LLM-based selection and produces an LB MASE of 0.6765 on GIFT-EVAL.

What carries the argument

The learned routing head that maps input features to expert scores, together with the selective gate that thresholds those scores to choose single-model routing versus ensemble fallback.

If this is right

Pool composition directly affects how much the router can improve over any single model or static ensemble.
Selective gating outperforms both always-route-to-one and always-ensemble strategies on the evaluated tasks.
A modular routing layer can be inserted into future agentic time-series systems without changing the underlying foundation models.
Ablation results show that both the routing head and the gate are necessary for the reported gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lightweight routing pattern could be tested on pools of non-time-series foundation models where inference cost is also a concern.
If the router generalizes across datasets outside GIFT-EVAL, it would reduce the need for per-task model selection in deployed forecasting pipelines.
Curating pools that deliberately maximize complementarity may become a design goal once routing overhead is shown to be low.

Load-bearing premise

The empirical complementarity across the pool of pretrained TSFMs can be captured reliably by a lightweight discriminative router without requiring LLM-based selection at inference time.

What would settle it

An ablation that removes the learned routing head and replaces it with uniform random expert selection while keeping the same pool and gate; if the resulting MASE on GIFT-EVAL equals or beats the reported 0.6765, the claim that the router learns useful complementarity would be falsified.

Figures

Figures reproduced from arXiv: 2606.11625 by Anderson Schneider, Dongjin Song, Kanghui Ning, Kashif Rasul, Yuriy Nevmyvaka, Yushan Jiang.

**Figure 1.** Figure 1: Overview of TimeRouter. Given an input context, the router produces routing scores over a pool of time-series foundation models (TSFMs). A selective gate determines whether to commit to the top expert or defer to an ensemble fallback when confidence is low, enabling adaptive expert selection across heterogeneous forecasting regimes. each binary classifier is trained by minimising the expected binary cros… view at source ↗

**Figure 2.** Figure 2: Gate ablation on TimeRouter, stratified by GIFT-EVAL term (LB MASE, lower is better). Same head and fallback in both variants. Head LB MASE ∆ (bp) XGBoost(deployed) 0.6765 — LightGBM 0.6762 −3 Random Forest 0.6776 +11 MLP 0.6787 +22 Logistic Regression 0.6836 +71 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Time-series foundation models (TSFMs) are increasingly explored as predictive experts within emerging agentic time-series systems. However, TSFMs exhibit heterogeneous inductive biases, and no single model consistently dominates across forecasting regimes, making expert selection a critical challenge. Existing systems often delegate this decision to LLM-based controllers, incurring substantial inference overhead. We present TimeRouter, an efficient routing framework that leverages empirical complementarity across a pool of pretrained TSFMs through lightweight discriminative routing, selective gating, and ensemble fallback. Concretely, TimeRouter combines a learned routing head, a selective gate, and an ensemble fallback, enabling adaptive expert selection without invoking an LLM at inference time. TimeRouter achieves state-of-the-art performance on the GIFT-EVAL leaderboard, with an LB MASE of 0.6765. Beyond benchmark performance, our ablation studies provide empirical insights into TSFM routing design, highlighting the importance of pool composition and selective gating. Taken together, these results position TimeRouter as a modular and lightweight routing layer for future agentic time-series systems built upon foundation-model pools. Our code is available at https://github.com/UConn-DSIS/TimeRouter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TimeRouter shows a working lightweight router plus gate plus fallback that hits SOTA MASE 0.6765 on GIFT-EVAL and ships the code.

read the letter

TimeRouter gets state-of-the-art performance on the GIFT-EVAL leaderboard by using a learned routing head, a selective gate, and an ensemble fallback to choose among pretrained time-series foundation models. The main point is that this setup avoids calling an LLM at inference time while still exploiting the fact that different models are good at different regimes.

The new element is the three-component design applied specifically to pools of TSFMs. It is not a theoretical advance but a practical modular layer for agentic time-series systems. The ablations they ran on pool composition and the importance of selective gating are useful for understanding the design choices.

What the paper does well is release the code at the GitHub link. That makes the empirical claim checkable. They also position it clearly as a drop-in component rather than claiming to solve the general routing problem.

The soft spots are around the evaluation. The abstract gives the MASE number but does not describe the baselines used for comparison or whether statistical tests were run. Without the full paper's details on data splits and held-out performance, it is hard to know how robust the 0.6765 is. The weakest assumption is that a lightweight router can reliably pick the right expert based on the input without needing more complex selection. Since they provide code and a leaderboard entry, this can be tested directly.

This paper is for people working on time-series foundation models and agentic pipelines that need low-latency expert selection. A reader in that niche would get a concrete implementation and some design insights from the ablations.

It deserves a serious referee because the central result is an empirical one that can be falsified with the released materials, and the subfield benefits from documented routing methods even if they are engineering-focused.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces TimeRouter, a routing framework for pools of pretrained time-series foundation models (TSFMs). It combines a learned routing head, a selective gate, and an ensemble fallback to enable adaptive expert selection at inference time without invoking an LLM. The central empirical claim is state-of-the-art performance on the GIFT-EVAL leaderboard (LB MASE of 0.6765), with ablation studies highlighting the roles of pool composition and selective gating; code is released.

Significance. If the reported result is reproducible and properly controlled, the work supplies a lightweight, modular routing layer that exploits empirical complementarity among TSFMs while avoiding LLM inference overhead. The explicit release of code and the ablation-based insights into design choices constitute concrete strengths for an empirical contribution in this area.

major comments (1)

[Abstract] Abstract: the headline claim of SOTA performance (LB MASE = 0.6765) is presented without any information on the set of baselines, the statistical significance of the improvement, the train/validation/test splits, or confirmation that the reported MASE is computed on held-out data. These details are load-bearing for any empirical SOTA assertion and cannot be assessed from the supplied description.

minor comments (1)

The abstract states that ablations highlight the importance of pool composition and selective gating but does not enumerate the exact ablation configurations or report quantitative deltas for each component.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. The concern is valid, and we will revise the abstract in the next version to provide the requested context on baselines, evaluation protocol, and held-out data while preserving brevity.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of SOTA performance (LB MASE = 0.6765) is presented without any information on the set of baselines, the statistical significance of the improvement, the train/validation/test splits, or confirmation that the reported MASE is computed on held-out data. These details are load-bearing for any empirical SOTA assertion and cannot be assessed from the supplied description.

Authors: We agree that the abstract, as currently written, does not supply these supporting details. The full manuscript (Section 4) describes the GIFT-EVAL leaderboard comparison against the full set of submitted TSFMs and routing baselines, confirms that MASE is computed on the official held-out test splits, and reports the exact train/validation/test partitioning used for any learned components. Statistical significance is not currently quantified in the paper. We will revise the abstract to explicitly note the leaderboard setting, held-out evaluation, and pool of baselines. We will also add a brief statement on statistical significance if space allows or move the claim to a more qualified phrasing. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical routing system (learned head + selective gate + ensemble) evaluated on the external GIFT-EVAL leaderboard. No derivation chain, equations, fitted predictions, or first-principles results are present that could reduce to inputs by construction. Ablations are described as empirical, code is released, and the SOTA claim (MASE 0.6765) is benchmark performance rather than a self-referential quantity. No load-bearing self-citations or ansatzes are invoked in the supplied text.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that TSFMs possess complementary inductive biases that a small learned router can exploit; the routing head itself introduces fitted parameters whose values are not reported.

free parameters (1)

parameters of the learned routing head
The routing head is trained on data to produce routing decisions; its weights constitute free parameters fitted to the training distribution.

axioms (2)

domain assumption TSFMs exhibit heterogeneous inductive biases with no single model dominating across regimes
Explicitly stated in the abstract as the motivation for routing.
domain assumption Empirical complementarity across the chosen pool of pretrained TSFMs can be leveraged by lightweight discriminative routing
Required for the selective gate and ensemble fallback to improve over single-model or LLM routing baselines.

pith-pipeline@v0.9.1-grok · 5749 in / 1412 out tokens · 32904 ms · 2026-06-27T10:17:37.405461+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 2 linked inside Pith

[1]

Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393,

Aksu, T., Woo, G., Liu, J., Liu, X., Liu, C., Savarese, S., Xiong, C., and Sahoo, D. Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393,

arXiv
[2]

F., Stella, L., Turkmen, C., Zhang, X., Mer- cado, P., Shen, H., Shchur, O., Rangapuram, S

Ansari, A. F., Stella, L., Turkmen, C., Zhang, X., Mer- cado, P., Shen, H., Shchur, O., Rangapuram, S. S., Pineda Arango, S., Kapoor, S., et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815,

Pith/arXiv arXiv
[3]

Ansari, A. F. et al. Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821,

Pith/arXiv arXiv
[4]

Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

Auer, A., Podest, P., Klotz, D., B¨ock, S., Klambauer, G., and Hochreiter, S. Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. arXiv preprint arXiv:2505.23719,

arXiv
[5]

Conversational time series foundation models: Towards explainable and effective forecasting

Cao, D., Gee, M., Liu, J., Wang, H., Yang, W., Wang, R., and Liu, Y . Conversational time series foundation models: Towards explainable and effective forecasting. arXiv preprint arXiv:2512.16022,

arXiv
[6]

Das, S. S. S., Goyal, P., Parmar, M., Song, Y ., Le, L. T., Mi- culicich, L., Yoon, J., Zhang, R., Palangi, H., and Pfister, T. Synapse: Adaptive arbitration of complementary ex- pertise in time series foundational models.arXiv preprint arXiv:2511.05460,

arXiv
[7]

Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot forecasting of mul- tivariate time series.arXiv preprint arXiv:2401.03955,

Ekambaram, V ., Jati, A., Nguyen, N., Sinthong, P., and Kalagnanam, J. Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot forecasting of mul- tivariate time series.arXiv preprint arXiv:2401.03955,

arXiv
[8]

and Rosillo, R

Garza, A. and Rosillo, R. Timecopilot.arXiv preprint arXiv:2509.00616,

arXiv
[9]

Graf, L. et al. Flowstate: A sampling-rate-invariant ssm- based time-series foundation model.arXiv preprint arXiv:2508.05287,

arXiv
[10]

Liu, C. et al. Moirai 2.0: When less is more for time series forecasting.arXiv preprint arXiv:2511.11698, 2025a. Liu, X. et al. Moirai-moe: Empowering time series founda- tion models with sparse mixture of experts. InInterna- tional Conference on Machine Learning (ICML),

arXiv
[11]

arXiv:2410.10469. Liu, Y . et al. Sundial: A native flexible decoder transformer for time series. InInternational Conference on Machine Learning (ICML, Oral), 2025b. arXiv:2502.00816. Mozannar, H. and Sontag, D. Consistent estimators for learning to defer to an expert. InInternational Conference on Machine Learning (ICML),

arXiv
[12]

Salesforce AI Research

URLhttps://arxiv.org/abs/2310.08278. Salesforce AI Research. Moiraiagent: An agentic framework for context-aware time-series forecasting. Salesforce AI Research blog post, https://www. salesforce.com/blog/moiraiagent/,

arXiv
[13]

One-embedding-fits-all: Efficient zero-shot time series forecasting by a model zoo.arXiv preprint arXiv:2509.04208,

Shi, H.-N., Huang, T.-J., Han, L., Zhan, D.-C., and Ye, H.-J. One-embedding-fits-all: Efficient zero-shot time series forecasting by a model zoo.arXiv preprint arXiv:2509.04208,

arXiv
[14]

M., Reddy, C., Nguyen, L

Wen, Y ., Gifford, W. M., Reddy, C., Nguyen, L. M., Kalagnanam, J., and Julius, A. A. Revisiting the generic transformer: Deconstructing a strong baseline for time se- ries foundation models.arXiv preprint arXiv:2602.06909,

arXiv
[15]

C., Ansari, A

Yu, A., Maddix, D. C., Ansari, A. F., Mahoney, M. W., et al. Understanding the implicit biases of design choices for time series foundation models.arXiv preprint arXiv:2510.19236,

arXiv
[16]

Feature map ϕ.The feature vector for each (series,cutoff) row concatenates four blocks; total dimension d= 165 + 35K (d=305 for the four-FM pool)

The per-FM CV scores are computed from context-tail validation windows for both training and test inputs and used as routing features as well as ensemble weights. Feature map ϕ.The feature vector for each (series,cutoff) row concatenates four blocks; total dimension d= 165 + 35K (d=305 for the four-FM pool). (i)Context-window statistics( 31 dims): 18 time...

1992

[1] [1]

Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393,

Aksu, T., Woo, G., Liu, J., Liu, X., Liu, C., Savarese, S., Xiong, C., and Sahoo, D. Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393,

arXiv

[2] [2]

F., Stella, L., Turkmen, C., Zhang, X., Mer- cado, P., Shen, H., Shchur, O., Rangapuram, S

Ansari, A. F., Stella, L., Turkmen, C., Zhang, X., Mer- cado, P., Shen, H., Shchur, O., Rangapuram, S. S., Pineda Arango, S., Kapoor, S., et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815,

Pith/arXiv arXiv

[3] [3]

Ansari, A. F. et al. Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821,

Pith/arXiv arXiv

[4] [4]

Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

Auer, A., Podest, P., Klotz, D., B¨ock, S., Klambauer, G., and Hochreiter, S. Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning. arXiv preprint arXiv:2505.23719,

arXiv

[5] [5]

Conversational time series foundation models: Towards explainable and effective forecasting

Cao, D., Gee, M., Liu, J., Wang, H., Yang, W., Wang, R., and Liu, Y . Conversational time series foundation models: Towards explainable and effective forecasting. arXiv preprint arXiv:2512.16022,

arXiv

[6] [6]

Das, S. S. S., Goyal, P., Parmar, M., Song, Y ., Le, L. T., Mi- culicich, L., Yoon, J., Zhang, R., Palangi, H., and Pfister, T. Synapse: Adaptive arbitration of complementary ex- pertise in time series foundational models.arXiv preprint arXiv:2511.05460,

arXiv

[7] [7]

Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot forecasting of mul- tivariate time series.arXiv preprint arXiv:2401.03955,

Ekambaram, V ., Jati, A., Nguyen, N., Sinthong, P., and Kalagnanam, J. Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot forecasting of mul- tivariate time series.arXiv preprint arXiv:2401.03955,

arXiv

[8] [8]

and Rosillo, R

Garza, A. and Rosillo, R. Timecopilot.arXiv preprint arXiv:2509.00616,

arXiv

[9] [9]

Graf, L. et al. Flowstate: A sampling-rate-invariant ssm- based time-series foundation model.arXiv preprint arXiv:2508.05287,

arXiv

[10] [10]

Liu, C. et al. Moirai 2.0: When less is more for time series forecasting.arXiv preprint arXiv:2511.11698, 2025a. Liu, X. et al. Moirai-moe: Empowering time series founda- tion models with sparse mixture of experts. InInterna- tional Conference on Machine Learning (ICML),

arXiv

[11] [11]

arXiv:2410.10469. Liu, Y . et al. Sundial: A native flexible decoder transformer for time series. InInternational Conference on Machine Learning (ICML, Oral), 2025b. arXiv:2502.00816. Mozannar, H. and Sontag, D. Consistent estimators for learning to defer to an expert. InInternational Conference on Machine Learning (ICML),

arXiv

[12] [12]

Salesforce AI Research

URLhttps://arxiv.org/abs/2310.08278. Salesforce AI Research. Moiraiagent: An agentic framework for context-aware time-series forecasting. Salesforce AI Research blog post, https://www. salesforce.com/blog/moiraiagent/,

arXiv

[13] [13]

One-embedding-fits-all: Efficient zero-shot time series forecasting by a model zoo.arXiv preprint arXiv:2509.04208,

Shi, H.-N., Huang, T.-J., Han, L., Zhan, D.-C., and Ye, H.-J. One-embedding-fits-all: Efficient zero-shot time series forecasting by a model zoo.arXiv preprint arXiv:2509.04208,

arXiv

[14] [14]

M., Reddy, C., Nguyen, L

Wen, Y ., Gifford, W. M., Reddy, C., Nguyen, L. M., Kalagnanam, J., and Julius, A. A. Revisiting the generic transformer: Deconstructing a strong baseline for time se- ries foundation models.arXiv preprint arXiv:2602.06909,

arXiv

[15] [15]

C., Ansari, A

Yu, A., Maddix, D. C., Ansari, A. F., Mahoney, M. W., et al. Understanding the implicit biases of design choices for time series foundation models.arXiv preprint arXiv:2510.19236,

arXiv

[16] [16]

Feature map ϕ.The feature vector for each (series,cutoff) row concatenates four blocks; total dimension d= 165 + 35K (d=305 for the four-FM pool)

The per-FM CV scores are computed from context-tail validation windows for both training and test inputs and used as routing features as well as ensemble weights. Feature map ϕ.The feature vector for each (series,cutoff) row concatenates four blocks; total dimension d= 165 + 35K (d=305 for the four-FM pool). (i)Context-window statistics( 31 dims): 18 time...

1992