Using a generative model for out-of-sample testing of two-stage stochastic programs

Ashutosh Shukla; Erhan Kutanoglu; John J. Hasenbein

arxiv: 2604.22221 · v1 · submitted 2026-04-24 · 🧮 math.OC

Using a generative model for out-of-sample testing of two-stage stochastic programs

Ashutosh Shukla , John J. Hasenbein , Erhan Kutanoglu This is my paper

Pith reviewed 2026-05-08 11:24 UTC · model grok-4.3

classification 🧮 math.OC

keywords NORTAstochastic programmingscenario generationout-of-sample testingpower grid resilienceflood planninggenerative modelstwo-stage optimization

0 comments

The pith

NORTA generative model produces synthetic scenarios that enable reliable out-of-sample validation of two-stage stochastic programs from limited data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to solve the problem of having too few representative scenarios for stochastic programming models, which often rely on costly simulations or measurements. It proposes fitting the NORTA model to a small set of high-fidelity scenarios and then generating many more synthetic ones that keep the original marginal distributions and correlations intact. In the Texas power-grid flood-resilience example, 16 real scenarios yield 800 synthetic ones whose statistical properties and resulting first-stage decisions perform as the original model predicts. A reader should care because this offers a practical way to check solution robustness when gathering extra real-world data is impractical.

Core claim

When the NORTA model is fitted to only 16 high-fidelity flood scenarios, the 800 synthetic scenarios it generates preserve the necessary statistical properties; consequently, the out-of-sample performance of the first-stage decisions obtained from the stochastic program closely matches what the original model expects.

What carries the argument

The Normal-to-Anything (NORTA) generative model, which maps correlated normal variables into arbitrary marginal distributions while preserving the target correlation structure, thereby creating synthetic scenarios without new physics-based simulations.

Load-bearing premise

That a NORTA model fitted to only 16 high-fidelity scenarios produces synthetic scenarios statistically close enough to the unknown true distribution to give trustworthy out-of-sample validation.

What would settle it

Collect a fresh set of real flood scenarios not used in fitting the NORTA model and check whether the out-of-sample performance metrics of the first-stage decisions diverge substantially from the values predicted by the original stochastic program.

read the original abstract

Stochastic programming models for decision-making under uncertainty often suffer from scenario scarcity, where obtaining representative samples of uncertain parameters requires expensive simulations or measurements. This work presents a framework that leverages the Normal-to-Anything (NORTA) generative model to enhance the reliability of two-stage stochastic programming solutions through comprehensive out-of-sample testing when scenario data is limited. The NORTA model efficiently generates synthetic scenarios that preserve both marginal distributions and correlation structures from limited available data, offering a computationally tractable alternative to expensive physics-based simulations. We demonstrate the approach through a case study on power grid resilience planning against flood events in Texas, where we use 16 high-fidelity flood scenarios to generate 800 additional synthetic scenarios for validation. The results show that NORTA-generated scenarios accurately capture essential statistical properties, with the out-of-sample performance of first-stage decisions closely matching expectations from the original stochastic programming model. This framework enables decision-makers to assess the robustness of their solutions when obtaining additional real-world data is prohibitively expensive. The approach bridges machine learning and operations research by providing a practical solution to scenario generation challenges in stochastic programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NORTA applied to generate 800 synthetics from 16 flood scenarios for out-of-sample SP validation in a Texas grid case, but the small n makes tail and correlation fidelity uncertain.

read the letter

The core of this paper is a straightforward workflow: fit NORTA to a small set of high-fidelity scenarios, generate a large batch of synthetics that keep the observed marginals and correlations, then use those synthetics to test the out-of-sample behavior of first-stage decisions from a two-stage stochastic program. They show it on power-grid flood resilience planning in Texas, moving from 16 real scenarios to 800 generated ones. That combination and the specific application are new enough to be worth noting, even if NORTA itself is established. It gives practitioners a concrete way to run more validation when extra physics-based runs are too costly, which is a real bottleneck in energy and resilience work. The abstract claims the synthetics preserve the needed statistical properties and that the first-stage solutions perform as expected on them, so the framework at least looks usable on paper. Credit for laying out the steps clearly and tying it to an actual planning problem rather than a toy example. The main soft spot is the sample size. Fitting correlations and marginals, especially in the upper tails for flood events, from only 16 points leaves a lot of estimation variance. Any reported match between in-sample and synthetic out-of-sample objectives could largely reflect how faithfully NORTA reproduces the empirical distribution rather than how close it is to the unknown true one. Without quantitative metrics, error bars, or checks against held-out real data, it is hard to judge whether the validation is independent or circular. The stress-test concern about inaccurate tail approximation lands. This is aimed at operations-research people who already work with stochastic programs and need a practical scenario-generation trick when data is scarce. A reader in energy modeling or resilience planning could extract the workflow and try it on their own problem. It is not a deep theoretical advance, but the concrete demonstration makes it worth a serious referee to check the fitting details, the reported matches, and whether the small-sample issue is addressed. I would send it for review rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a framework that applies the Normal-to-Anything (NORTA) generative model to produce 800 synthetic scenarios from only 16 high-fidelity flood scenarios, enabling out-of-sample testing of first-stage decisions in a two-stage stochastic program for Texas power-grid resilience planning. It claims that the generated scenarios preserve the marginal distributions and correlation structure of the limited data and that the out-of-sample objective values closely match those expected from the original stochastic program, thereby providing a practical validation method when additional real-world data are prohibitively expensive.

Significance. If the NORTA approximation to the unknown true distribution is demonstrably accurate, the approach would supply a computationally inexpensive route to robustness assessment for stochastic programs in data-scarce settings such as disaster-resilient infrastructure planning. The explicit linkage of a standard generative-modeling tool to two-stage stochastic programming is a constructive contribution at the ML-OR interface.

major comments (3)

[Abstract and Case Study] Abstract and Case Study section: the assertion that 'NORTA-generated scenarios accurately capture essential statistical properties' is unsupported by any quantitative metrics (e.g., Kolmogorov-Smirnov statistics, Wasserstein distances, or element-wise errors on the sample correlation matrix). With only 16 scenarios available for parameter estimation, the fitted marginals and correlations are subject to substantial sampling variability, particularly in the upper tails relevant to flood events; without these diagnostics the claim that out-of-sample performance 'closely matches expectations' cannot be assessed.
[Methodology] NORTA fitting procedure (Methodology section): the same 16 scenarios are used both to estimate the NORTA correlation matrix and marginal parameters and to formulate the stochastic program itself. This creates a circularity risk: synthetic out-of-sample performance may simply reflect fidelity to the empirical distribution encoded in the fitted NORTA model rather than providing an independent check against the true (unknown) distribution.
[Results] Results on the Texas grid instance: no sensitivity analysis, bootstrap resampling, or held-out validation of the NORTA parameters is reported. Consequently it is impossible to determine whether the reported alignment between in-sample and synthetic out-of-sample objectives is robust to the high variance inherent in estimating a 16-by-16 correlation matrix and the associated marginal transformations from such a small sample.

minor comments (2)

[Abstract] The abstract states that the out-of-sample performance 'closely matches' model expectations but supplies no numerical values, tables, or figures that would allow a reader to judge the magnitude of any discrepancy.
[Notation] Notation for the two-stage stochastic program (first-stage decisions, recourse variables, and scenario probabilities) should be introduced once and used consistently when describing both the original model and the NORTA-augmented validation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, agreeing where revisions are needed to strengthen the presentation and providing clarifications on the methodological intent.

read point-by-point responses

Referee: [Abstract and Case Study] Abstract and Case Study section: the assertion that 'NORTA-generated scenarios accurately capture essential statistical properties' is unsupported by any quantitative metrics (e.g., Kolmogorov-Smirnov statistics, Wasserstein distances, or element-wise errors on the sample correlation matrix). With only 16 scenarios available for parameter estimation, the fitted marginals and correlations are subject to substantial sampling variability, particularly in the upper tails relevant to flood events; without these diagnostics the claim that out-of-sample performance 'closely matches expectations' cannot be assessed.

Authors: We agree that the current manuscript does not include quantitative diagnostics to support the claims regarding statistical fidelity. In the revised version we will add Kolmogorov-Smirnov statistics comparing marginal distributions, Wasserstein distances between the original and generated scenario sets, and maximum absolute errors on the estimated correlation matrix. We will also include a brief discussion of the sampling variability inherent in estimating parameters from only 16 scenarios, with particular attention to tail behavior for flood events. These additions will allow readers to evaluate the strength of the out-of-sample performance claims. revision: yes
Referee: [Methodology] NORTA fitting procedure (Methodology section): the same 16 scenarios are used both to estimate the NORTA correlation matrix and marginal parameters and to formulate the stochastic program itself. This creates a circularity risk: synthetic out-of-sample performance may simply reflect fidelity to the empirical distribution encoded in the fitted NORTA model rather than providing an independent check against the true (unknown) distribution.

Authors: The referee correctly identifies that the NORTA parameters are estimated from the same 16 scenarios used to build the stochastic program. The out-of-sample exercise is therefore a test of first-stage decisions against additional draws from the fitted NORTA model rather than against an independent realization of the unknown true distribution. We will revise the methodology and discussion sections to state this limitation explicitly and to clarify that the procedure provides a computationally inexpensive way to probe robustness under a parametric approximation to the empirical distribution when further real data cannot be obtained. No change to the core experimental design is planned, as the approach is intended for precisely this data-scarce regime. revision: partial
Referee: [Results] Results on the Texas grid instance: no sensitivity analysis, bootstrap resampling, or held-out validation of the NORTA parameters is reported. Consequently it is impossible to determine whether the reported alignment between in-sample and synthetic out-of-sample objectives is robust to the high variance inherent in estimating a 16-by-16 correlation matrix and the associated marginal transformations from such a small sample.

Authors: We acknowledge that the absence of sensitivity or bootstrap analysis leaves the stability of the reported alignment open to question. In the revision we will add a bootstrap resampling study of the NORTA correlation matrix and marginal parameters (resampling the 16 scenarios with replacement) and report the resulting variability in the out-of-sample objective values. This will quantify the impact of estimation uncertainty on the observed in-sample versus synthetic out-of-sample agreement. revision: yes

Circularity Check

1 steps flagged

NORTA out-of-sample validation reduces to reproduction of fitted input statistics by construction

specific steps

fitted input called prediction [Abstract]
"we use 16 high-fidelity flood scenarios to generate 800 additional synthetic scenarios for validation. The results show that NORTA-generated scenarios accurately capture essential statistical properties, with the out-of-sample performance of first-stage decisions closely matching expectations from the original stochastic programming model."

NORTA is explicitly fitted to the 16 scenarios to preserve their marginal distributions and correlation structures. The generated 800 scenarios are therefore statistically equivalent to the input data by the model's construction. Reporting that out-of-sample performance on these synthetics closely matches the original model is tautological and does not constitute independent validation against the true (unknown) distribution.

full rationale

The paper fits NORTA to the same 16 scenarios used for the stochastic program, then generates synthetics and reports that performance on them matches the original model. This match is expected because NORTA is constructed to preserve the exact marginals and correlations of its training data; the 'validation' therefore tests fidelity to the empirical distribution rather than providing an independent check against the unknown true distribution. The central claim of trustworthy out-of-sample testing therefore reduces to the generative model's design properties.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability of NORTA to faithfully reproduce joint distributions from a small sample and on the assumption that the initial 16 scenarios are representative of the true uncertainty; no new entities are postulated.

free parameters (1)

NORTA correlation and marginal parameters
Parameters are fitted to the 16 flood scenarios to match observed marginal distributions and correlations; these fitted values directly determine the synthetic scenarios used for validation.

axioms (1)

domain assumption NORTA can generate scenarios whose statistical properties are sufficiently close to the true distribution for reliable out-of-sample testing
Invoked when claiming that the 800 synthetic scenarios provide trustworthy validation of first-stage decisions.

pith-pipeline@v0.9.0 · 5503 in / 1309 out tokens · 63467 ms · 2026-05-08T11:24:15.218588+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

A. B. Birchfield, T. Xu, K. M. Gegner, K. S. Shetye, and T. J. Overbye. Grid structural characteristics as validation criteria for synthetic networks.IEEE Transactions on Power Systems, 32(4):3258–3265, 2017

work page 2017
[2]

Birge and François Louveaux.Introduction to Stochastic Programming

John R. Birge and François Louveaux.Introduction to Stochastic Programming. Springer-Verlag, 1997. 4

work page 1997
[3]

Cario and B

M.C. Cario and B. L. Nelson. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report, Northwestern University, 1997

work page 1997
[4]

Ghosh and S

S. Ghosh and S. G. Henderson. Behavior of the NORTA method for correlated random vector generation as the dimension increases.ACM Trans. Model. Comput. Simul., 13(3):276–294, 2003

work page 2003
[5]

Glahn, A

B. Glahn, A. Taylor, N. Kurkowski, and W.A. Shaffer. The role of the SLOSH model in National Weather Service storm surge forecasting.National Weather Digest, pages 1–12

work page
[6]

Movahednia, A

M. Movahednia, A. Kargarian, C. E. Ozdemir, and S. C. Hagen. Power grid resilience enhance- ment via protecting electrical substations against flood hazards: A stochastic framework.IEEE Transactions on Industrial Informatics, 18(3):2132–2143, 2022

work page 2022
[7]

Transmission-distribution co- ordination for enhancing grid resiliency against flood hazards.IEEE Transactions on Power Systems, pages 1–11, 2023

Mohadese Movahednia, Reza Mahroo, and Amin Kargarian. Transmission-distribution co- ordination for enhancing grid resiliency against flood hazards.IEEE Transactions on Power Systems, pages 1–11, 2023

work page 2023
[8]

Shukla, J.J

A. Shukla, J.J. Hasenbein, and E. Kutanoglu. A Scenario-based Optimization Approach for Electric Grid Substation Hardening Against Storm Surge Flooding. InIIE Annual Conference Proceedings, pages 1004–1009, 2021

work page 2021
[9]

PhD thesis, University of Texas at Austin, 2024

Ashutosh Shukla.Models for power grid resilience to flooding: optimal budgeting, coordination, and scenario generation. PhD thesis, University of Texas at Austin, 2024

work page 2024
[10]

Flood scenario generation using the norta model

Ashutosh Shukla, John Hasenbein, and Erhan Kutanoglu. Flood scenario generation using the norta model. In2024 Winter Simulation Conference (WSC), pages 3358–3367, 2024

work page 2024
[11]

Souto, J

L. Souto, J. Yip, W.Y . Wu, B. Austgen, E. Kutanoglu, J.J. Hasenbein, Z.L Yang, C.W. King, and S. Santoso. Power system resilience to floods: Modeling, impact assessment, and mid-term mitigation strategies.International Journal of Electrical Power & Energy Systems, 135:107545, 2022

work page 2022
[12]

B. C. Zachry, W. J. Booth, J. R. Rhome, and T. M. Sharon. A national view of storm surge risk and inundation.Weather, Climate, and Society, 7(2):109–117, 2015. A Sample generation using the NORTA model The following algorithm is used to generate synthetic samples for our case study. Algorithm 1NORTA Sampling 1:Input:Cholesky factorM, number of samplesm 2:...

work page 2015

[1] [1]

A. B. Birchfield, T. Xu, K. M. Gegner, K. S. Shetye, and T. J. Overbye. Grid structural characteristics as validation criteria for synthetic networks.IEEE Transactions on Power Systems, 32(4):3258–3265, 2017

work page 2017

[2] [2]

Birge and François Louveaux.Introduction to Stochastic Programming

John R. Birge and François Louveaux.Introduction to Stochastic Programming. Springer-Verlag, 1997. 4

work page 1997

[3] [3]

Cario and B

M.C. Cario and B. L. Nelson. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report, Northwestern University, 1997

work page 1997

[4] [4]

Ghosh and S

S. Ghosh and S. G. Henderson. Behavior of the NORTA method for correlated random vector generation as the dimension increases.ACM Trans. Model. Comput. Simul., 13(3):276–294, 2003

work page 2003

[5] [5]

Glahn, A

B. Glahn, A. Taylor, N. Kurkowski, and W.A. Shaffer. The role of the SLOSH model in National Weather Service storm surge forecasting.National Weather Digest, pages 1–12

work page

[6] [6]

Movahednia, A

M. Movahednia, A. Kargarian, C. E. Ozdemir, and S. C. Hagen. Power grid resilience enhance- ment via protecting electrical substations against flood hazards: A stochastic framework.IEEE Transactions on Industrial Informatics, 18(3):2132–2143, 2022

work page 2022

[7] [7]

Transmission-distribution co- ordination for enhancing grid resiliency against flood hazards.IEEE Transactions on Power Systems, pages 1–11, 2023

Mohadese Movahednia, Reza Mahroo, and Amin Kargarian. Transmission-distribution co- ordination for enhancing grid resiliency against flood hazards.IEEE Transactions on Power Systems, pages 1–11, 2023

work page 2023

[8] [8]

Shukla, J.J

A. Shukla, J.J. Hasenbein, and E. Kutanoglu. A Scenario-based Optimization Approach for Electric Grid Substation Hardening Against Storm Surge Flooding. InIIE Annual Conference Proceedings, pages 1004–1009, 2021

work page 2021

[9] [9]

PhD thesis, University of Texas at Austin, 2024

Ashutosh Shukla.Models for power grid resilience to flooding: optimal budgeting, coordination, and scenario generation. PhD thesis, University of Texas at Austin, 2024

work page 2024

[10] [10]

Flood scenario generation using the norta model

Ashutosh Shukla, John Hasenbein, and Erhan Kutanoglu. Flood scenario generation using the norta model. In2024 Winter Simulation Conference (WSC), pages 3358–3367, 2024

work page 2024

[11] [11]

Souto, J

L. Souto, J. Yip, W.Y . Wu, B. Austgen, E. Kutanoglu, J.J. Hasenbein, Z.L Yang, C.W. King, and S. Santoso. Power system resilience to floods: Modeling, impact assessment, and mid-term mitigation strategies.International Journal of Electrical Power & Energy Systems, 135:107545, 2022

work page 2022

[12] [12]

B. C. Zachry, W. J. Booth, J. R. Rhome, and T. M. Sharon. A national view of storm surge risk and inundation.Weather, Climate, and Society, 7(2):109–117, 2015. A Sample generation using the NORTA model The following algorithm is used to generate synthetic samples for our case study. Algorithm 1NORTA Sampling 1:Input:Cholesky factorM, number of samplesm 2:...

work page 2015