Learning to Defer in Non-Stationary Time Series via Switching State-Space Models

Axel Carlier; Lai Xing Ng; Letian Yu; Wei Tsang Ooi; Yannis Montreuil

arxiv: 2601.22538 · v2 · pith:QIZQ57UGnew · submitted 2026-01-30 · 💻 cs.LG · stat.AP

Learning to Defer in Non-Stationary Time Series via Switching State-Space Models

Yannis Montreuil , Letian Yu , Axel Carlier , Lai Xing Ng , Wei Tsang Ooi This is my paper

Pith reviewed 2026-05-21 13:56 UTC · model grok-4.3

classification 💻 cs.LG stat.AP

keywords learning to defernon-stationary time seriesswitching state-space modelsonline learningregret boundsexpert deferralstreaming decision making

0 comments

The pith

A factorized switching state-space model updates beliefs about unqueried experts from the always-observed internal residual, enabling bounded-regret deferral decisions in non-stationary streaming time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an online one-stage learning-to-defer framework called L2D-SLDS that handles non-stationary time series where both the data distribution and expert availability change over time. It models the residuals of the internal predictor and all candidate experts inside a shared switching linear-Gaussian state-space structure so that the continuously observed internal residual continuously refines beliefs about every expert that has not yet been queried. A query score then trades off immediate deferral cost against the expected reduction in future prediction error and information gain about the latent states. The authors prove an oracle inequality that decomposes total regret into a query-budget term, an SLDS model-error term, and the internal learner's own dynamic regret over intervals. On synthetic and real benchmarks the method matches or exceeds contextual and non-stationary bandit baselines while deferring on fewer than two percent of rounds.

Core claim

By placing all residuals inside a factorized switching linear-Gaussian state-space model that includes a discrete regime, a shared global factor, and per-expert idiosyncratic states, the framework obtains continuous Bayesian updates about every unqueried expert from the internal residual alone; the resulting learner-aware query policy yields an oracle inequality against a time-varying learn-and-defer comparator whose regret decomposes additively into a query-bonus budget, the SLDS predictive-cost error E_SLDS, and the internal learner's interval dynamic regret.

What carries the argument

factorized switching linear-Gaussian state-space model over all potential residuals (discrete regime, shared global factor, and per-expert idiosyncratic states) that transmits information from the always-observed internal residual to latent beliefs about every unqueried expert

If this is right

Regret decomposes cleanly into query cost, model mismatch, and the internal learner's own dynamic regret, so any improvement in either the state-space predictor or the base learner immediately tightens the overall bound.
Continuous belief updating from the internal residual alone removes the need to query experts solely for information, keeping deferral rates below 2 percent on real data.
The same decomposition applies when expert availability changes over time because the shared factor absorbs global regime shifts while idiosyncratic states track individual experts.
The query score explicitly balances immediate cost against one-step learner improvement, so the policy adapts when the internal model is still learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same shared-factor structure could be used in other partial-observation sequential problems such as adaptive sensor scheduling where one always-on measurement updates beliefs about many dormant sensors.
Replacing the linear-Gaussian dynamics with a nonparametric or deep state-space model would test whether the regret decomposition survives when the model class is enlarged.
The interval-dynamic-regret term suggests the framework could be paired with any online learner that already possesses a dynamic-regret guarantee, turning the deferral layer into a modular wrapper.

Load-bearing premise

The factorized switching linear-Gaussian state-space model correctly captures the joint evolution of the internal residual and all potential expert residuals through a shared global factor and per-expert idiosyncratic states.

What would settle it

Run the method on a new non-stationary series in which the residuals of the internal predictor and the experts demonstrably fail to share a low-dimensional global factor; if the measured E_SLDS term stays large and the empirical regret exceeds the decomposed bound by more than the query budget, the modeling assumption is refuted.

read the original abstract

Learning-to-defer (L2D) routes each decision to a system's own predictor or to an external expert. Streaming time-series settings break the offline-L2D assumptions: the data are non-stationary, expert availability shifts over time, and the internal predictor is trained online. We propose L2D-SLDS, a one-stage online L2D framework based on a factorized switching linear-Gaussian state-space model over all potential residuals: a discrete regime, a shared global factor, and per-expert idiosyncratic states. The always-observed internal residual continuously updates beliefs about every unqueried expert through the shared factor, and a learner-aware query score balances immediate cost against latent-state information gain and one-step learner improvement. We prove an oracle inequality against a time-varying learn-and-defer comparator, decomposing regret into a query-bonus budget, an SLDS predictive-cost-error term~$\mathcal{E}_{\mathrm{SLDS}}$, and the internal learner's interval dynamic regret. On synthetic, Melbourne, Jena, and 24-expert Delhi benchmarks, L2D-SLDS is competitive with or improves on contextual- and non-stationary-bandit baselines while deferring on ${<}2\%$ of real-data rounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable online deferral method for drifting time series by tracking residuals with a factorized switching state-space model, but the regret bound rests on that model capturing the right cross-expert correlations.

read the letter

The main takeaway is that this work supplies a one-stage online learning-to-defer scheme for non-stationary series. It models the internal predictor's residual and all expert residuals together in a factorized switching linear-Gaussian state-space model with one shared global factor plus per-expert states. Because the internal residual is always seen, the shared factor lets the system keep updating its beliefs about experts it has not queried yet. The query decision then weighs immediate cost against both information gain on the latent states and one-step improvement to the learner itself. They also derive an oracle inequality that splits regret into a query-budget term, an SLDS predictive error term, and the internal learner's interval dynamic regret against a time-varying comparator.

Referee Report

1 major / 2 minor

Summary. The paper proposes L2D-SLDS, an online one-stage learning-to-defer framework for non-stationary time series. It models all potential residuals via a factorized switching linear-Gaussian state-space model (discrete regime, shared global factor, per-expert idiosyncratic states) so that the always-observed internal residual updates beliefs about unqueried experts. A learner-aware query score balances cost against information gain and learner improvement. The central theoretical claim is an oracle inequality against a time-varying learn-and-defer comparator that decomposes regret into a query-bonus budget, an SLDS predictive-cost-error term E_SLDS, and the internal learner's interval dynamic regret. Experiments on synthetic data plus Melbourne, Jena, and 24-expert Delhi benchmarks report competitive performance with deferral rates below 2%.

Significance. If the oracle inequality is non-vacuous and E_SLDS can be controlled, the work supplies a principled online deferral method with regret guarantees that explicitly accounts for non-stationarity and partial expert observation. The regret decomposition and the use of a shared global factor to propagate information from the internal residual are concrete strengths; the low deferral rates on real benchmarks further suggest practical utility over contextual-bandit baselines.

major comments (1)

[Abstract / Theoretical Analysis] Abstract / Theoretical Analysis: The oracle inequality decomposes total regret into query-bonus budget + E_SLDS + internal learner's interval dynamic regret. Control of the E_SLDS term (and hence usefulness of the bound) rests on the claim that the factorized switching linear-Gaussian SSM correctly encodes all cross-expert residual correlations via one shared global factor plus per-expert idiosyncratic states. The manuscript should state the precise conditions under which this low-rank structure is sufficient for posterior updates on unqueried experts to remain accurate; without such a statement or a counter-example analysis, it is unclear whether E_SLDS vanishes with infinite data when the true correlation structure deviates from the assumed form.

minor comments (2)

[Method] Method section: the description of how SLDS parameters are learned online and the exact functional form of the query score (including the information-gain term) should be expanded so that the algorithm is fully reproducible from the text.
[Experiments] Experiments: report the precise values of the query-score hyperparameters used on each benchmark and any sensitivity analysis.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the work's significance and for the constructive major comment on the theoretical analysis. We address the point below.

read point-by-point responses

Referee: [Abstract / Theoretical Analysis] Abstract / Theoretical Analysis: The oracle inequality decomposes total regret into query-bonus budget + E_SLDS + internal learner's interval dynamic regret. Control of the E_SLDS term (and hence usefulness of the bound) rests on the claim that the factorized switching linear-Gaussian SSM correctly encodes all cross-expert residual correlations via one shared global factor plus per-expert idiosyncratic states. The manuscript should state the precise conditions under which this low-rank structure is sufficient for posterior updates on unqueried experts to remain accurate; without such a statement or a counter-example analysis, it is unclear whether E_SLDS vanishes with infinite data when the true correlation structure deviates from the assumed form.

Authors: We thank the referee for this precise observation. The factorized SLDS posits a shared global factor to capture common non-stationary trends and cross-expert residual correlations, with per-expert idiosyncratic states handling individual deviations; this is the modeling choice that enables the internal residual to update beliefs about unqueried experts. The oracle inequality is derived with respect to a comparator that is optimal within the assumed model class, and E_SLDS is the excess predictive cost incurred by using the SLDS posterior rather than the true residual distribution. Under the assumption that the true residuals are generated from (or well-approximated by) this factorized structure, standard consistency results for linear-Gaussian filtering imply that the posteriors concentrate and E_SLDS vanishes with infinite data. In the revision we will add an explicit statement of these modeling assumptions immediately preceding the oracle inequality, together with a short remark that the bound is model-dependent and that E_SLDS need not vanish under gross misspecification of the correlation rank. We believe this clarification addresses the request for precise conditions without a separate counter-example section, as the practical utility of the shared-factor mechanism remains even under moderate misspecification. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the oracle inequality or SLDS-based regret decomposition

full rationale

The paper states an oracle inequality that decomposes total regret into a query-bonus budget term, the SLDS predictive-cost-error E_SLDS, and the internal learner's interval dynamic regret. This is a standard-style regret bound derived under the modeling assumption of a factorized switching linear-Gaussian state-space model; the decomposition follows from the usual telescoping and bounding arguments rather than reducing any term to a fitted quantity by construction or redefining the target via the inputs. No self-definitional steps, fitted-input-called-prediction patterns, or load-bearing self-citations are exhibited in the provided claims or abstract. The SLDS factorization is an explicit modeling choice whose validity is external to the bound itself, and the bound is presented as holding conditionally on that model without circular renaming or ansatz smuggling. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling assumption that residuals can be factorized into a discrete regime, one shared global factor, and per-expert idiosyncratic states, plus the existence of a well-defined time-varying learn-and-defer comparator for the regret analysis.

free parameters (1)

query score hyperparameters
The learner-aware query score that balances immediate cost, latent-state information gain, and one-step learner improvement necessarily contains tunable weights or thresholds not fixed by the model equations.

axioms (1)

domain assumption The joint distribution of internal and expert residuals admits a factorized switching linear-Gaussian state-space representation.
Invoked to justify continuous belief updates about unqueried experts from the always-observed internal residual.

pith-pipeline@v0.9.0 · 5766 in / 1449 out tokens · 32207 ms · 2026-05-21T13:56:11.608908+00:00 · methodology

Learning to Defer in Non-Stationary Time Series via Switching State-Space Models

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)