High-dimensional point forecast combinations for emergency department demand

Borame Lee Dickens; Esther Li Wen Choo; John Abishgenadan; Jue Tao Lim; Kelvin Bryan Tan; Kenwin Maung; Peihong Guo; Pei Ma; Wen Ye Loh

arxiv: 2501.11315 · v1 · submitted 2025-01-20 · 📊 stat.AP · q-bio.QM· stat.ML

High-dimensional point forecast combinations for emergency department demand

Peihong Guo , Wen Ye Loh , Kenwin Maung , Esther Li Wen Choo , Borame Lee Dickens , Kelvin Bryan Tan , John Abishgenadan , Pei Ma

show 1 more author

Jue Tao Lim

This is my paper

Pith reviewed 2026-05-23 05:34 UTC · model grok-4.3

classification 📊 stat.AP q-bio.QMstat.ML

keywords emergency department admissionsforecast combinationstime series forecastingcause-specific predictionshigh-dimensional covariatesmodel uncertaintyhealthcare demand

0 comments

The pith

Forecast combinations outperform individual models for emergency department admissions in more than half of scenarios across causes and horizons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether combining predictions from many different forecasting models can reduce uncertainty when projecting emergency department admissions broken down by specific causes. It applies the approach to national time series that include lagged weather, air quality, and admission data for 16 causes. Simple averaging of the individual forecasts produces accuracies from roughly 4 to 24 percent and beats single models with statistical significance in over 50 percent of the cause-horizon combinations examined. Adding large numbers of covariates or summing the cause-specific forecasts to obtain total admissions yields only modest further gains. The work shows how combinations can serve as a practical hedge when disease dynamics differ enough that no one model fits all cases equally well.

Core claim

High-dimensional forecast combination schemes applied to cause-specific ED admission series with extensive lagged meteorological, pollutant, and admission covariates achieve forecast accuracies of 3.81 to 23.54 percent and outperform the individual component models in a statistically significant manner in more than 50 percent of scenarios across all admission categories and forecast horizons.

What carries the argument

High-dimensional forecast combination schemes that aggregate outputs from numerous individual models built with varied lag structures and combination rules.

If this is right

Cause-specific forecasts supply finer-grained guidance for allocating staff and beds than total-admission forecasts alone.
Forecast combinations reduce the risk that a single poorly suited model will produce large errors when disease patterns shift.
Aggregating cause-specific predictions offers a modest but measurable improvement for all-cause ED demand estimates.
High-dimensional covariate sets can be incorporated without requiring model-by-model selection when the goal is overall accuracy.
The same combination framework can be applied to other admission categories or longer horizons where model uncertainty is high.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same averaging approach could be tested on hospital-level rather than national data to check whether local heterogeneity changes the size of the gains.
Real-time updating of the combination weights with each new observation might further stabilize forecasts during sudden surges.
Extending the method to joint prediction of admissions and length-of-stay could link demand forecasts more directly to capacity planning.

Load-bearing premise

The chosen collection of base models, lag choices, and combination methods is representative enough of possible approaches that the performance comparison reflects the true advantage of combinations rather than an artifact of the selected set.

What would settle it

Re-running the full set of comparisons on the same data but with an expanded or entirely different library of base models and finding that the share of statistically significant wins for combinations falls below 50 percent.

read the original abstract

Current work on forecasting emergency department (ED) admissions focuses on disease aggregates or singular disease types. However, given differences in the dynamics of individual diseases, it is unlikely that any single forecasting model would accurately account for each disease and for all time, leading to significant forecast model uncertainty. Yet, forecasting models for ED admissions to-date do not explore the utility of forecast combinations to improve forecast accuracy and stability. It is also unknown whether improvements in forecast accuracy can be yield from (1) incorporating a large number of environmental and anthropogenic covariates or (2) forecasting total ED causes by aggregating cause-specific ED forecasts. To address this gap, we propose high-dimensional forecast combination schemes to combine a large number of forecasting individual models for forecasting cause-specific ED admissions over multiple causes and forecast horizons. We use time series data of ED admissions with an extensive set of explanatory lagged variables at the national level, including meteorological/ambient air pollutant variables and ED admissions of all 16 causes studied. We show that the simple forecast combinations yield forecast accuracies of around 3.81%-23.54% across causes. Furthermore, forecast combinations outperform individual forecasting models, in more than 50% of scenarios (across all ED admission categories and horizons) in a statistically significant manner. Inclusion of high-dimensional covariates and aggregating cause-specific forecasts to provide all-cause ED forecasts provided modest improvements in forecast accuracy. Forecasting cause-specific ED admissions can provide fine-scale forward guidance on resource optimization and pandemic preparedness and forecast combinations can be used to hedge against model uncertainty when forecasting across a wide range of admission categories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Combinations improve ED cause-specific forecasts in over half the cases tested, but the significance claims are likely inflated by uncorrected multiple testing.

read the letter

The paper evaluates forecast combinations on 16 cause-specific ED admission series using a large set of lagged meteorological and air-pollutant covariates. Simple combinations beat the individual models in more than 50% of the cause-horizon pairs, with reported accuracy gains between roughly 4% and 24%. Aggregating the cause-specific forecasts to total ED volume gives only modest extra lift. That is the concrete result on offer: an empirical check of whether combination methods help in a setting where disease dynamics differ and model uncertainty matters for hospital operations.

Referee Report

2 major / 1 minor

Summary. The paper introduces high-dimensional forecast combination methods to predict cause-specific emergency department admissions for 16 causes using numerous individual time series models, extensive lagged covariates including meteorological and air quality variables, and national-level data. It reports that these combinations achieve accuracies in the range of 3.81% to 23.54%, outperform the individual models in more than 50% of scenarios across categories and horizons in a statistically significant way, and that incorporating high-dimensional covariates and aggregating forecasts for all-cause ED demand offers modest accuracy gains.

Significance. Should the outperformance be confirmed as robust to multiple-testing corrections and with full methodological transparency, this work would offer valuable evidence for the use of forecast combinations to mitigate model uncertainty in healthcare demand forecasting. The approach of cause-specific modeling and the broad covariate inclusion represent strengths that could inform resource planning and preparedness efforts if the statistical claims are solidified.

major comments (2)

[Abstract] The central claim that forecast combinations 'outperform individual forecasting models, in more than 50% of scenarios (across all ED admission categories and horizons) in a statistically significant manner' requires clarification on the handling of multiple comparisons. Given 16 causes and multiple horizons, the number of tests is substantial, yet no multiplicity adjustment is referenced. This is load-bearing for the claim of reliable, generalizable improvement.
[Abstract] Accuracy figures (3.81%-23.54%) and outperformance statements are given without error bars, standard deviations, or explicit definitions of the accuracy metric and the individual model baselines. The abstract also lacks detail on how statistical significance was assessed across the many tests performed.

minor comments (1)

Consider adding a table or section summarizing the individual models, combination schemes, lag structures, and covariate selection process to improve reproducibility and allow readers to assess the representativeness of the chosen setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract accordingly to improve transparency while preserving the manuscript's core claims.

read point-by-point responses

Referee: [Abstract] The central claim that forecast combinations 'outperform individual forecasting models, in more than 50% of scenarios (across all ED admission categories and horizons) in a statistically significant manner' requires clarification on the handling of multiple comparisons. Given 16 causes and multiple horizons, the number of tests is substantial, yet no multiplicity adjustment is referenced. This is load-bearing for the claim of reliable, generalizable improvement.

Authors: We agree that the abstract should clarify the multiple-testing context. Significance for each scenario was assessed independently via Diebold-Mariano tests (detailed in Section 4.3 of the manuscript). The reported figure is the proportion of scenarios showing statistically significant outperformance rather than a claim of universal superiority. We did not apply a multiplicity correction in the original analysis because the primary interest lies in the overall frequency of improvement across heterogeneous causes and horizons. In the revision we will (i) state the total number of tests performed, (ii) note the absence of adjustment, and (iii) add a supplementary sensitivity table showing the proportion that remains significant after Bonferroni correction. This will allow readers to evaluate robustness directly. revision: yes
Referee: [Abstract] Accuracy figures (3.81%-23.54%) and outperformance statements are given without error bars, standard deviations, or explicit definitions of the accuracy metric and the individual model baselines. The abstract also lacks detail on how statistical significance was assessed across the many tests performed.

Authors: We accept that the abstract is too terse on these points. The accuracy metric is mean absolute percentage error (MAPE), defined in Equation (3). The individual-model baselines comprise the full set of 22 univariate and multivariate time-series specifications enumerated in Table 1. Standard errors and 95% confidence intervals for all MAPE values appear in Tables 3–5 and Figures 2–4. Statistical significance was evaluated with Diebold-Mariano tests whose implementation is described in Section 4.3. In the revised abstract we will insert a parenthetical definition of MAPE, reference the baseline models, and indicate that significance testing follows the procedure in Section 4.3. Space constraints preclude error bars in the abstract itself, but the main text already supplies them. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results rest on held-out empirical evaluation

full rationale

The paper reports empirical forecast accuracies and statistical comparisons of combination methods versus individual models across causes and horizons. No equations, fitted parameters, or self-citations are shown that reduce the outperformance claim to a definitional identity or input by construction. The central result (combinations outperform in >50% of scenarios) is presented as arising from time-series fitting and evaluation on national ED admission data with covariates, which is an independent empirical test rather than a self-referential reduction. Multiple-testing concerns affect statistical validity but do not constitute circularity under the specified patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard time-series modeling assumptions and the choice of combination weights; no new entities are postulated.

free parameters (2)

combination weights or scheme
Weights or selection rule for combining the large number of individual models must be chosen or fitted.
lag structure and covariate selection
Choice of which lagged environmental and admission variables to include is a modeling decision.

axioms (2)

domain assumption The observed time series can be adequately modeled by standard autoregressive or regression methods once lags are included.
Invoked implicitly by the use of lagged explanatory variables for forecasting.
domain assumption Covariates such as meteorological and pollutant variables are exogenous and available at forecast time.
Required for the high-dimensional covariate set to be usable in out-of-sample forecasting.

pith-pipeline@v0.9.0 · 5845 in / 1282 out tokens · 69669 ms · 2026-05-23T05:34:23.851802+00:00 · methodology

High-dimensional point forecast combinations for emergency department demand

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)