A Horizon-Aware Decision-Support Framework for Demand Forecasting Model Selection in Resilient Production Planning

Adolfo Gonz\'alez; V\'ictor Parada

arxiv: 2602.13939 · v6 · pith:7C7X5U7Bnew · submitted 2026-02-15 · 💻 cs.LG · cs.AI

A Horizon-Aware Decision-Support Framework for Demand Forecasting Model Selection in Resilient Production Planning

Adolfo Gonz\'alez , V\'ictor Parada This is my paper

Pith reviewed 2026-05-15 21:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords demand forecastingmodel selectionforecast horizonintermittent demanderror projectionadaptive selectorsupply chain planning

0 comments

The pith

Projecting test-horizon error metrics forward to the operational horizon improves model selection for multi-step demand forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Metric Degradation by Forecast Horizon procedure to convert conventional static model evaluation into a scheme that accounts for how error rankings shift as forecasts extend into the future. In environments with intermittent demand, high variability, and planning needs that span multiple steps, standard test-period rankings often misalign with the actual horizons used for inventory and procurement decisions. From this basis the work derives a parsimonious horizon-adjusted metric and an adaptive selector that combines multiple signals when single-metric choices prove insufficient. A sympathetic reader would care because more aligned model assignments can reduce cumulative forecast error in supply-chain settings where decisions rest on forecasts that outrun the test window.

Core claim

The Metric Degradation by Forecast Horizon procedure projects out-of-sample error metrics from the test horizon to a future operational horizon under structural stability conditions, thereby turning static evaluation into horizon-aware selection. RMSSEh emerges as the most parsimonious operational form of this projection, while the Adaptive Hybrid Selector for Intermittency and Variability serves as an extension that handles cases of metric conflict, intermittency, variability, and bias. Empirical comparisons on the Walmart, M3, M4, and M5 collections with multiple partitions and twelve-step horizons indicate that the resulting selectors remain competitive across demand structures and gain,

What carries the argument

The Metric Degradation by Forecast Horizon procedure, which projects out-of-sample error metrics from a test horizon onto a future operational horizon under assumed structural stability conditions.

If this is right

RMSSEh supplies a simple horizon-adjusted ranking that replaces static test-horizon selection without added complexity.
The adaptive selector improves robustness when single metrics conflict because of intermittency or variability in demand patterns.
Model selection in multi-SKU environments shifts from a static choice to a structure-sensitive assignment aligned with operational planning horizons.
Forecast accuracy in procurement and inventory tasks rises when selectors incorporate projected degradation rather than test-period performance alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the projection holds under shifting conditions, periodic reapplication of the selector could maintain alignment as demand distributions evolve in live supply chains.
The same projection logic may extend to other time-series domains such as energy load or financial returns where planning horizons exceed short test windows.
A practical test would apply the adaptive selector to streaming demand data and check whether robustness persists when new structural breaks appear.

Load-bearing premise

The projection of error metrics from the test horizon to the operational horizon rests on structural stability conditions that are assumed rather than verified across the evaluated datasets or real-world shifts.

What would settle it

Direct measurement of actual error on a future operational horizon, compared against the values projected by the procedure from the test horizon, would falsify the central claim if the projected and realized errors diverge substantially on held-out series.

read the original abstract

Demand forecasting is a critical input for resilient production planning, inventory replenishment, procurement, and capacity decisions under demand intermittency, high variability, and operational uncertainty. In these contexts, selecting forecasting models solely on the basis of fixed test-horizon performance may lead to decisions misaligned with the future planning horizons in which forecasts are used. This study proposes the Metric Degradation by Forecast Horizon (MDFH) procedure as a horizon-aware decision-support framework for selecting demand forecasting models. MDFH projects eligible out-of-sample error metrics, specifically MAE, RMSE, and RMSSE, from an observed test horizon toward future operational horizons under explicit structural-stability conditions. Based on this layer, RMSSEh is derived as a parsimonious horizon-aware selector, while the Adaptive Hybrid Selector for Intermittency and Variability (AHSIV) is proposed as an adaptive extension for structurally heterogeneous demand series. ERA, a multivariate ranking-aggregation selector, is included as a comparator. The empirical evaluation uses the Walmart, M3, M4, and M5 datasets, three training-testing partitions, 22 forecasting models, and 12-step future horizons. Results show that RMSSEh and AHSIV provide more consistent downstream volumetric alignment than ERA when assessed through ex post Global Relative Accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MDFH gives a concrete way to adjust model selection for longer horizons in demand forecasting, but the stability assumption for the projection step is not directly checked in the experiments.

read the letter

The paper's main new piece is the MDFH procedure that takes error metrics from a test horizon and projects them forward to an operational horizon under structural stability. It then presents RMSSEh as the simplest version of that idea and AHSIV as an adaptive hybrid that tries to handle intermittency, variability, and metric conflicts at once. The evaluations run on the usual Walmart, M3, M4, and M5 collections with several train-test partitions and 12-step horizons, and the new selectors come out competitive against ERA while AHSIV shows some extra stability in messier series. That is useful for anyone who actually has to assign models to thousands of SKUs for multi-period inventory planning, where static test-set rankings often break down. The setup is straightforward and the datasets are standard, so the comparisons are easy to follow. The weak point is exactly the projection step. The results show that RMSSEh and AHSIV perform well, but there are no diagnostics that compare the projected errors to what actually happens on longer horizons, no sensitivity checks across regime shifts, and no stability tests between the partitions. Without those, it is hard to tell whether the horizon adjustment is doing real work or just riding on the assumption. The abstract also skips error bars and detailed equation breakdowns, which leaves the practical size of the gains unclear. This is the kind of paper that matters most to applied forecasting groups in supply-chain operations rather than to the core machine-learning community. It has a clear enough proposal and enough empirical grounding to go to referees, though any review will likely ask for explicit checks on the stability premise before the method can be recommended for production use.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Metric Degradation by Forecast Horizon (MDFH) procedure to project out-of-sample error metrics from test horizons to future operational horizons under assumed structural stability conditions. From this, it derives RMSSEh as the parsimonious realization and proposes the Adaptive Hybrid Selector for Intermittency and Variability (AHSIV) for cases involving intermittency, variability, and metric conflicts. The empirical study evaluates these selectors against ERA on the Walmart, M3, M4, and M5 datasets across multiple train-test partitions and 12-step horizons, concluding that MDFH enables coherent horizon-aware design, with RMSSEh and AHSIV performing competitively and AHSIV offering robustness in complex settings.

Significance. If the results hold, the work offers a valuable contribution to demand forecasting by shifting model selection from static test-horizon metrics to horizon-aware projections aligned with operational planning. This is particularly relevant for supply chain applications with intermittent demand. The use of multiple public datasets and partitions enhances the potential for reproducibility and generalizability.

major comments (2)

[Empirical evaluation] The comparison of RMSSEh and AHSIV on Walmart, M3, M4, and M5 datasets using multiple train-test partitions and 12-step horizons provides no diagnostics validating the structural stability assumption central to MDFH projections, such as projected versus realized errors on extended horizons or sensitivity to regime shifts. This omission undermines the claim that MDFH supplies a coherent basis for horizon-aware selector design.
[MDFH procedure description] The paper describes RMSSEh as the most parsimonious operational realization of MDFH, but without explicit equations or implementation details, it is unclear whether the horizon adjustment is parameter-free or reduces to a data-dependent scaling, raising questions about circularity in the metric definition.

minor comments (1)

[Abstract] The final sentence of the abstract states a general recommendation on treating model selection as a horizon-aware problem; this could be moved to the conclusion to avoid repetition with the empirical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Empirical evaluation] The comparison of RMSSEh and AHSIV on Walmart, M3, M4, and M5 datasets using multiple train-test partitions and 12-step horizons provides no diagnostics validating the structural stability assumption central to MDFH projections, such as projected versus realized errors on extended horizons or sensitivity to regime shifts. This omission undermines the claim that MDFH supplies a coherent basis for horizon-aware selector design.

Authors: We agree that explicit validation of the structural stability assumption is important for supporting the MDFH projections. In the revised manuscript, we will add new diagnostics that compare MDFH-projected error metrics against realized errors on extended horizons where the datasets permit (e.g., by holding out additional periods in M4 and M5), along with sensitivity analyses examining performance under different train-test partitions to assess robustness to potential regime shifts. These additions will directly address the concern and provide empirical grounding for the horizon-aware claims. revision: yes
Referee: [MDFH procedure description] The paper describes RMSSEh as the most parsimonious operational realization of MDFH, but without explicit equations or implementation details, it is unclear whether the horizon adjustment is parameter-free or reduces to a data-dependent scaling, raising questions about circularity in the metric definition.

Authors: The derivation of RMSSEh from MDFH is presented in the manuscript as a direct, parameter-free scaling of the test-horizon RMSSE using a degradation factor determined solely by the difference between test and operational horizons under the stability assumption; no additional parameters are fitted to the target data, avoiding circularity. To eliminate any ambiguity, we will revise the relevant section to include the explicit mathematical equations for the RMSSEh adjustment and a clear implementation algorithm, making the procedure fully transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: MDFH projection and RMSSEh derivation remain independent of target outputs

full rationale

The paper defines MDFH as a projection of test-horizon error metrics to operational horizons under explicitly stated structural stability conditions, then presents RMSSEh as its parsimonious operational form and AHSIV as an adaptive extension. No equations or definitions in the abstract reduce the projection step to a fitted scaling parameter on the target data itself, nor does any step rename a known result or import uniqueness via self-citation. Empirical comparisons on Walmart/M3/M4/M5 with multiple partitions supply external benchmarks, keeping the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested structural stability assumption required for error projection; no free parameters or invented entities are named in the abstract, but the projection step implicitly introduces a stability condition that functions as an ad-hoc domain assumption.

axioms (1)

domain assumption Structural stability conditions allow projection of out-of-sample error metrics from test horizon to operational horizon
Invoked to convert static evaluation into horizon-aware selection; location implied in MDFH definition

pith-pipeline@v0.9.0 · 5589 in / 1225 out tokens · 21134 ms · 2026-05-15T21:51:59.511153+00:00 · methodology

A Horizon-Aware Decision-Support Framework for Demand Forecasting Model Selection in Resilient Production Planning

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)