A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls

Federico Vaggi; Konstantin Mishchenko; Mallory Montgomery

arxiv: 1906.10586 · v1 · pith:5CJBX5QCnew · submitted 2019-06-25 · 📊 stat.ML · cs.LG

A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls

Konstantin Mishchenko , Mallory Montgomery , Federico Vaggi This is my paper

Pith reviewed 2026-05-25 16:03 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords hierarchical forecastingtime series reconciliationmaximum likelihood estimationself-supervised learningsynthetic controlscounterfactual forecasting

0 comments

The pith

A new loss function incorporates hierarchical reconciliation directly into maximum likelihood training for time series forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a loss function that can be added to any maximum likelihood objective when data has a hierarchical structure. This produces forecasts that automatically respect the hierarchy and generates confidence intervals that include uncertainty from imperfect reconciliation. The method is evaluated on a counterfactual forecasting task using synthetic data generated by a non-linear model with covariates and ground truth access. It shows large gains over the prior state-of-the-art approach that forecasts independently and reconciles afterward.

Core claim

Incorporating the proposed loss into maximum likelihood estimation yields reconciled hierarchical forecasts together with confidence intervals that correctly widen to reflect uncertainty arising from imperfect reconciliation.

What carries the argument

The new loss function added to the maximum likelihood objective that enforces hierarchical consistency during training.

If this is right

Forecasts respect the hierarchy without requiring a separate post-processing reconciliation step.
Confidence intervals widen automatically to account for uncertainty introduced by reconciliation.
The loss can be plugged into any existing maximum likelihood model that uses hierarchical data.
Performance gains are observed on synthetic counterfactual tasks relative to independent forecasting plus reconciliation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same loss could be tested on economic or sales hierarchies where bottom-level series must sum to top-level aggregates.
Groupwise synthetic control applications may benefit from the built-in uncertainty quantification when estimating counterfactuals.
The approach might reduce the need for two-stage pipelines in any domain that requires coherent multi-level predictions.

Load-bearing premise

Synthetic data generated from a non-linear model with contemporaneous covariates and known ground truth is representative of the reconciliation errors and model misspecification found in real hierarchical forecasting problems.

What would settle it

Direct comparison of forecast accuracy and interval coverage on a real-world hierarchical dataset where ground truth future values are observed after the fact.

Figures

Figures reproduced from arXiv: 1906.10586 by Federico Vaggi, Konstantin Mishchenko, Mallory Montgomery.

**Figure 1.** Figure 1: Formally, we take a kernel k(·, ·) from Celerite and sample Xt j ∼ GP(0, k(Xt j , Xt 0 j )), j = 1, . . . , m y t i def = (Xt 1 θ1 + (Xt 1 · t)φ1, . . . , Xt mθm + (Xt m · t)φm + ) >. We then use a multi-layer perceptron (one hidden layer with 100 units and ReLU activations) to forecast Y using X as an input. The neural network gets the first 1000 time steps as a training set, and we then report the MSE (… view at source ↗

read the original abstract

When forecasting time series with a hierarchical structure, the existing state of the art is to forecast each time series independently, and, in a post-treatment step, to reconcile the time series in a way that respects the hierarchy (Hyndman et al., 2011; Wickramasuriya et al., 2018). We propose a new loss function that can be incorporated into any maximum likelihood objective with hierarchical data, resulting in reconciled estimates with confidence intervals that correctly account for additional uncertainty due to imperfect reconciliation. We evaluate our method using a non-linear model and synthetic data on a counterfactual forecasting problem, where we have access to the ground truth and contemporaneous covariates, and show that we largely improve over the existing state-of-the-art method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims to introduce a new loss function that can be incorporated into any maximum likelihood objective with hierarchical data, resulting in reconciled estimates with confidence intervals that correctly account for additional uncertainty due to imperfect reconciliation. It evaluates the approach using a non-linear model and synthetic data on a counterfactual forecasting problem with access to ground truth and contemporaneous covariates, claiming large improvements over existing state-of-the-art reconciliation methods.

Significance. If the central claims hold, the work would be significant for hierarchical forecasting by embedding reconciliation directly into the estimation process rather than relying on post-hoc adjustments, potentially yielding better-calibrated uncertainty estimates. The self-supervised formulation and application to groupwise synthetic controls are strengths. The evaluation design using synthetic data with known ground truth is a positive feature that enables direct measurement of errors.

major comments (2)

[Abstract] Abstract: the claim that the loss produces 'correct confidence intervals' and 'largely improves' over SOTA is asserted without any derivation, explicit form of the loss, or quantitative results, making it impossible to assess whether the math or data support the central claim.
[Evaluation] Evaluation: the synthetic data is generated from a non-linear model with contemporaneous covariates and known ground truth. This setup does not address the challenges of model misspecification, unknown hierarchy violations, or covariate noise that typically arise in real hierarchical series, weakening the support for the claim of correctly accounting for reconciliation uncertainty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We appreciate the positive remarks on the significance of embedding reconciliation into training and the use of synthetic data with ground truth. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the loss produces 'correct confidence intervals' and 'largely improves' over SOTA is asserted without any derivation, explicit form of the loss, or quantitative results, making it impossible to assess whether the math or data support the central claim.

Authors: The abstract is intended as a concise summary; the explicit form of the self-supervised loss, its derivation, and integration into the maximum likelihood objective appear in Section 3. Quantitative results, including error metrics and comparisons to post-hoc reconciliation methods, are reported with tables in Section 5. To improve accessibility, we will revise the abstract to reference these sections and include a brief statement of the observed improvements. revision: yes
Referee: [Evaluation] Evaluation: the synthetic data is generated from a non-linear model with contemporaneous covariates and known ground truth. This setup does not address the challenges of model misspecification, unknown hierarchy violations, or covariate noise that typically arise in real hierarchical series, weakening the support for the claim of correctly accounting for reconciliation uncertainty.

Authors: The synthetic design deliberately provides ground truth to enable direct quantification of reconciliation-induced uncertainty, which is a core contribution. We acknowledge that this controlled setting does not capture model misspecification, hierarchy violations, or covariate noise. We will add an explicit limitations paragraph in the evaluation section discussing these gaps and their implications for the uncertainty claims. revision: partial

Circularity Check

0 steps flagged

No circularity: new loss term and synthetic evaluation are independent of fitted inputs

full rationale

The paper defines a new loss function that augments any MLE objective for hierarchical series and evaluates the resulting reconciled forecasts plus uncertainty on synthetic data generated from a non-linear model with known ground truth and covariates. No derivation step equates a claimed prediction or CI property to a quantity defined by the same fitted parameters; the SOTA comparison uses externally generated data rather than a fitted-input-called-prediction pattern. Self-citations are absent from the load-bearing claims.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete equations or modeling choices, so no specific free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5653 in / 997 out tokens · 35456 ms · 2026-05-25T16:03:27.370300+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program

Abadie, A., Diamond, A., and Hainmueller, J. Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American statistical Association, 105 0 (490): 0 493--505, 2010

work page 2010
[3]

H., Gallusser, F., Koehler, J., Remy, N., and Scott, S

Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., and Scott, S. L. Inferring causal impact using bayesian structural time-series models. Annals of Applied Statistics, 9: 0 247--274, 2015

work page 2015
[4]

and Imbens, G

Doudchenko, N. and Imbens, G. W. Balancing, regression, difference-in-differences and synthetic control methods: A synthesis. Technical report, National Bureau of Economic Research, 2016

work page 2016
[5]

Fast and scalable gaussian process modeling with applications to astronomical time series

Foreman-Mackey, D., Agol, E., Ambikasaran, S., and Angus, R. Fast and scalable gaussian process modeling with applications to astronomical time series. The Astronomical Journal, 154 0 (6): 0 220, 2017

work page 2017
[6]

Contemporary Bayesian Econometrics and Statistics

Geweke, J. Contemporary Bayesian Econometrics and Statistics. Wiley, 2005

work page 2005
[7]

J., Ahmed, R

Hyndman, R. J., Ahmed, R. A., Athanasopoulos, G., and Shang, H. L. Optimal combination forecasts for hierarchical time series. Computational Statistics & Data Analysis, 55 0 (9): 0 2579--2589, 2011

work page 2011
[8]

Generating random correlation matrices based on vines and extended onion method

Lewandowski, D., Kurowicka, D., and Joe, H. Generating random correlation matrices based on vines and extended onion method. Journal of multivariate analysis, 100 0 (9): 0 1989--2001, 2009

work page 1989
[9]

A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints

Mishchenko, K. and Richt \'a rik, P. A stochastic penalty model for convex and nonconvex optimization with big constraints. arXiv preprint arXiv:1810.13387, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Semi-supervised learning with ladder networks

Rasmus, A., Berglund, M., Honkala, M., Valpola, H., and Raiko, T. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pp.\ 3546--3554, 2015

work page 2015
[11]

Scott, S. L. and Varian, H. R. Predicting the present with bayesian structural time series. Available at SSRN 2304426, 2013

work page 2013
[12]

L., Athanasopoulos, G., and Hyndman, R

Wickramasuriya, S. L., Athanasopoulos, G., and Hyndman, R. J. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 0 0 (0): 0 1--16, 2018. doi:10.1080/01621459.2018.1448825. URL https://doi.org/10.1080/01621459.2018.1448825

work page doi:10.1080/01621459.2018.1448825 2018
[13]

Generalized synthetic control method: Causal inference with interactive fixed effects models

Xu, Y. Generalized synthetic control method: Causal inference with interactive fixed effects models. Political Analysis, 25: 0 57--76, 2017. doi:10.1017/pan.2016.2

work page doi:10.1017/pan.2016.2 2017
[14]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[15]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[16]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program

Abadie, A., Diamond, A., and Hainmueller, J. Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American statistical Association, 105 0 (490): 0 493--505, 2010

work page 2010

[3] [3]

H., Gallusser, F., Koehler, J., Remy, N., and Scott, S

Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., and Scott, S. L. Inferring causal impact using bayesian structural time-series models. Annals of Applied Statistics, 9: 0 247--274, 2015

work page 2015

[4] [4]

and Imbens, G

Doudchenko, N. and Imbens, G. W. Balancing, regression, difference-in-differences and synthetic control methods: A synthesis. Technical report, National Bureau of Economic Research, 2016

work page 2016

[5] [5]

Fast and scalable gaussian process modeling with applications to astronomical time series

Foreman-Mackey, D., Agol, E., Ambikasaran, S., and Angus, R. Fast and scalable gaussian process modeling with applications to astronomical time series. The Astronomical Journal, 154 0 (6): 0 220, 2017

work page 2017

[6] [6]

Contemporary Bayesian Econometrics and Statistics

Geweke, J. Contemporary Bayesian Econometrics and Statistics. Wiley, 2005

work page 2005

[7] [7]

J., Ahmed, R

Hyndman, R. J., Ahmed, R. A., Athanasopoulos, G., and Shang, H. L. Optimal combination forecasts for hierarchical time series. Computational Statistics & Data Analysis, 55 0 (9): 0 2579--2589, 2011

work page 2011

[8] [8]

Generating random correlation matrices based on vines and extended onion method

Lewandowski, D., Kurowicka, D., and Joe, H. Generating random correlation matrices based on vines and extended onion method. Journal of multivariate analysis, 100 0 (9): 0 1989--2001, 2009

work page 1989

[9] [9]

A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints

Mishchenko, K. and Richt \'a rik, P. A stochastic penalty model for convex and nonconvex optimization with big constraints. arXiv preprint arXiv:1810.13387, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Semi-supervised learning with ladder networks

Rasmus, A., Berglund, M., Honkala, M., Valpola, H., and Raiko, T. Semi-supervised learning with ladder networks. In Advances in Neural Information Processing Systems, pp.\ 3546--3554, 2015

work page 2015

[11] [11]

Scott, S. L. and Varian, H. R. Predicting the present with bayesian structural time series. Available at SSRN 2304426, 2013

work page 2013

[12] [12]

L., Athanasopoulos, G., and Hyndman, R

Wickramasuriya, S. L., Athanasopoulos, G., and Hyndman, R. J. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 0 0 (0): 0 1--16, 2018. doi:10.1080/01621459.2018.1448825. URL https://doi.org/10.1080/01621459.2018.1448825

work page doi:10.1080/01621459.2018.1448825 2018

[13] [13]

Generalized synthetic control method: Causal inference with interactive fixed effects models

Xu, Y. Generalized synthetic control method: Causal inference with interactive fixed effects models. Political Analysis, 25: 0 57--76, 2017. doi:10.1017/pan.2016.2

work page doi:10.1017/pan.2016.2 2017

[14] [14]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[15] [15]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[16] [16]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page