ICNN-enhanced 2SP: Leveraging input convex neural networks for solving two-stage stochastic programming
Pith reviewed 2026-05-22 16:25 UTC · model grok-4.3
The pith
Input convex neural networks turn recourse approximations in two-stage stochastic programming into exact linear programs, eliminating integer variables.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By using input convex neural networks instead of ordinary neural networks to model the recourse function, the two-stage stochastic program can be embedded exactly into a linear programming formulation. The architectural convexity removes the need for integer variables required by mixed-integer programming embeddings. On benchmark problems the approach delivers solution times up to 100 times faster than MIP baselines while producing first-stage solutions of comparable or better quality.
What carries the argument
Input Convex Neural Networks (ICNNs), whose architecture enforces convexity so that their output can be expressed exactly as a set of linear inequalities inside the overall optimization model.
If this is right
- Solution times decrease sharply as the number of scenarios or decision variables grows.
- Training time for the ICNN surrogate remains close to that of a standard neural network.
- Validation accuracy of the ICNN matches that of ordinary networks on the tested problems.
- The largest speedups appear on the most computationally demanding benchmark instances.
Where Pith is reading between the lines
- The same convexity-enforcing idea could be tested on other convex stochastic or robust optimization models that currently rely on non-convex surrogates.
- Combining the ICNN embedding with Benders decomposition or column generation might further scale the method to problems with very large uncertainty sets.
- Real-time applications such as dynamic pricing or energy dispatch under demand uncertainty could become solvable at higher resolution.
Load-bearing premise
The recourse functions in the target problems are convex enough that an ICNN can be trained to approximate them with error low enough to preserve good first-stage decisions.
What would settle it
On a collection of instances whose true recourse functions are known to be convex, compare the first-stage objective values obtained from the ICNN-linear program against those from an exact MIP solver; a consistent gap larger than the reported validation error would falsify the performance claim.
Figures
read the original abstract
Two-stage stochastic programming (2SP) offers a basic framework for modelling decision-making under uncertainty, yet scalability remains a challenge due to the computational complexity of recourse function evaluation. Existing learning-based methods like Neural Two-Stage Stochastic Programming (Neur2SP) employ neural networks (NNs) as recourse function surrogates but rely on computationally intensive mixed-integer programming (MIP) formulations. We propose ICNN-enhanced 2SP, a method that leverages Input Convex Neural Networks (ICNNs) to exploit linear programming (LP) representability in convex 2SP problems. By architecturally enforcing convexity and enabling exact inference through LP, our approach eliminates the need for integer variables inherent to the conventional MIP-based formulation while retaining an exact embedding of the ICNN surrogate within the 2SP framework. This results in a more computationally efficient alternative, and we show that good solution quality can be maintained. Comprehensive experiments reveal that ICNNs incur only marginally longer training times while achieving validation accuracy on par with their standard NN counterparts. Across benchmark problems, ICNN-enhanced 2SP often exhibits considerably faster solution times than the MIP-based formulations while preserving solution quality, with these advantages becoming significantly more pronounced as problem scale increases. For the most challenging instances, the method achieves speedups of up to 100$\times$ and solution quality superior to MIP-based formulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ICNN-enhanced 2SP, a method that replaces standard neural network surrogates for recourse functions in two-stage stochastic programming with Input Convex Neural Networks. By architecturally enforcing convexity, the ICNN surrogate can be exactly embedded as linear constraints in an LP formulation of the first-stage problem, eliminating the integer variables required by MIP-based embeddings used in prior work such as Neur2SP. Experiments on benchmark problems report that ICNN training accuracy is comparable to standard NNs, solution quality is preserved or improved, and solution times are substantially faster (up to 100x on large instances) as problem scale increases.
Significance. If the central performance claims are substantiated, the approach would offer a scalable alternative for convex 2SP by converting a hard MIP into a faster LP while retaining decision quality. The work correctly exploits known ICNN properties for convexity and LP representability, and the reported speedups on challenging instances suggest practical impact for large-scale stochastic optimization.
major comments (2)
- [Experiments] Experimental section: the claim that 'good solution quality can be maintained' and is 'superior to MIP-based formulations' on the most challenging instances requires explicit post-hoc evaluation of the true expected recourse cost (using the exact solver on the first-stage decisions obtained from the ICNN-LP) together with statistical significance tests or confidence intervals across replications. Surrogate validation accuracy alone does not guarantee that approximation error in regions visited by the optimizer leaves first-stage optimality intact.
- [Method] Formulation section: the method assumes the recourse functions are sufficiently convex for an ICNN to achieve low approximation error in the relevant domain; the paper should state this assumption explicitly, provide a diagnostic (e.g., convexity check or residual analysis on validation points near optimal first-stage solutions), and discuss the risk that violation of the assumption could degrade true objective values even when surrogate metrics appear acceptable.
minor comments (2)
- [Abstract] Abstract and experiments: replace the qualitative phrase 'on par with' by reporting concrete metrics (e.g., mean squared error or relative error on the validation set) and reference the corresponding table or figure.
- [Formulation] Notation: define the precise mapping from ICNN weights/biases to the linear inequalities that embed the surrogate inside the first-stage LP; a small illustrative example would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will incorporate to strengthen the manuscript.
read point-by-point responses
-
Referee: [Experiments] Experimental section: the claim that 'good solution quality can be maintained' and is 'superior to MIP-based formulations' on the most challenging instances requires explicit post-hoc evaluation of the true expected recourse cost (using the exact solver on the first-stage decisions obtained from the ICNN-LP) together with statistical significance tests or confidence intervals across replications. Surrogate validation accuracy alone does not guarantee that approximation error in regions visited by the optimizer leaves first-stage optimality intact.
Authors: We agree that surrogate validation accuracy alone is insufficient to fully substantiate claims of maintained or superior solution quality. In the revised manuscript we will add explicit post-hoc evaluations of the true expected recourse cost by applying the exact solver to the first-stage decisions obtained from the ICNN-LP formulation. We will also report statistical significance tests or confidence intervals computed across multiple independent replications. These additions will directly address whether approximation error in optimizer-visited regions preserves first-stage optimality and will allow a more rigorous comparison with MIP-based baselines. revision: yes
-
Referee: [Method] Formulation section: the method assumes the recourse functions are sufficiently convex for an ICNN to achieve low approximation error in the relevant domain; the paper should state this assumption explicitly, provide a diagnostic (e.g., convexity check or residual analysis on validation points near optimal first-stage solutions), and discuss the risk that violation of the assumption could degrade true objective values even when surrogate metrics appear acceptable.
Authors: We acknowledge the value of making the convexity assumption fully explicit. Although the manuscript already frames the method for convex 2SP problems, we will add a dedicated statement of the assumption in the formulation section. We will further include a diagnostic (residual analysis on validation points near candidate optimal first-stage solutions) and a discussion of the risks if the assumption is violated, including potential degradation of true objective values. These changes will clarify the scope and limitations of the approach. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper proposes embedding ICNN surrogates into the first-stage LP of 2SP problems by exploiting the known convexity-preserving architecture of ICNNs, which permits an exact LP representation without integer variables. This step rests on standard properties of ICNNs established in prior independent literature rather than any self-definition, fitted parameter renamed as prediction, or self-citation chain that reduces the claimed speedup or solution quality to the inputs by construction. Empirical speedups and maintained solution quality are reported from benchmark experiments and are not asserted tautologically; the derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- ICNN weights and biases
axioms (1)
- domain assumption Recourse function is convex in the first-stage decision for the problems considered
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By architecturally enforcing convexity and enabling exact inference through LP, our approach eliminates the need for integer variables... ICNNs are architecturally constrained to ensure convexity with respect to their inputs, enabling exact inference through LP formulations
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leancostAlphaLog_high_calibrated_iff unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Q(x, ξs) is the pointwise maximum of affine functions... Thus Q(x) is convex in x
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Relaxation-Informed Training of Neural Network Surrogate Models
Regularizers that penalize big-M constants, unstable neurons, and per-sample LP relaxation gaps during neural network training reduce MILP solve times by up to four orders of magnitude while preserving surrogate accuracy.
-
SOC-ICNN: From Polyhedral to Conic Geometry for Learning Convex Surrogate Functions
SOC-ICNN generalizes ReLU-based ICNNs to SOCP, strictly expanding the class of representable convex functions while preserving similar forward-pass complexity.
Reference graph
Works this paper leans on
-
[1]
W. Römisch, S. Vigerske, Recent Progress in Two-stage Mixed-integer Stochastic Programming with Applications to Power Production Planning, in: P. M. Pardalos, S. Rebennack, M. V. F. 23 Pereira, N. A. Iliadis (Eds.), Handbook of Power Systems I, Springer, Berlin, Heidelberg, 2010, pp. 177–208. doi:10.1007/978-3-642-02493-1_8
-
[2]
C. S. Khor, A. Elkamel, K. Ponnambalam, P. L. Douglas, Two-stage stochastic programming with fixed recourse via scenario planning with economic and operational risk management for petroleum refinery planning under uncertainty, Chemical Engineering and Processing: Process Intensification 47 (2008) 1744–1764. doi:10.1016/j.cep.2007.09.016
-
[3]
A.Shapiro, T.Homem-deMello, Asimulation-basedapproachtotwo-stagestochasticprogram- ming with recourse, Mathematical Programming 81 (1998) 301–325. doi:10.1007/BF01580086
-
[4]
H. Heitsch, W. Römisch, Scenario Reduction Algorithms in Stochastic Programming, Com- putational Optimization and Applications 24 (2003) 187–206. doi:10.1023/A:1021805924152
-
[5]
H. D. Sherali, B. M. Fraticelli, A modification of Benders’ decomposition algorithm for discrete subproblems: An approach for stochastic programs with integer recourse, Journal of Global Optimization 22 (2002) 319–342. doi:10.1023/A:1013827731218
-
[6]
C. C. Carøe, R. Schultz, Dual decomposition in stochastic integer programming, Operations Research Letters 24 (1999) 37–45. doi:10.1016/S0167-6377(98)00050-9
-
[7]
S. Küçükyavuz, S. Sen, An Introduction to Two-Stage Stochastic Mixed-Integer Programming, in: Leading Developments from INFORMS Communities, INFORMS TutORials in Operations Research, INFORMS, 2017, pp. 1–27. doi:10.1287/educ.2017.0171
-
[8]
J. Varga, E. Karlsson, G. R. Raidl, E. Rönnberg, F. Lindsten, T. Rodemann, Speeding Up Logic-Based Benders Decomposition by Strengthening Cuts with Graph Neural Networks, in: G. Nicosia, V. Ojha, E. La Malfa, G. La Malfa, P. M. Pardalos, R. Umeton (Eds.), Machine Learning, Optimization, and Data Science, Springer Nature Switzerland, Cham, 2024, pp. 24–38....
-
[9]
H. Dai, Y. Xue, Z. Syed, D. Schuurmans, B. Dai, Neural Stochastic Dual Dynamic Program- ming, 2021. doi:10.48550/arXiv.2112.00874
-
[10]
H. Bae, J. Lee, W. C. Kim, Y. Lee, Deep Value Function Networks for Large-Scale Multi- stage Stochastic Programs, in: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 11267–11287. URL:https://proceedings.mlr. press/v206/bae23a.html
work page 2023
- [11]
-
[12]
T. Anh-Nguyen, J. Huchette, C. Tjandraatmadja, Learning Generalized Linear Programming Value Functions, in: A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tom- czak, C. Zhang (Eds.), Advances in Neural Information Processing Systems, volume 37, Cur- ran Associates, Inc., 2024, pp. 135437–135463. URL:https://dl.acm.org/doi/abs/10.5555/ 3737916.3742221
-
[13]
Y. Bengio, E. Frejinger, A. Lodi, R. Patel, S. Sankaranarayanan, A Learning-Based Al- gorithm to Quickly Compute Good Primal Solutions for Stochastic Integer Programs, in: E. Hebrard, N. Musliu (Eds.), Integration of Constraint Programming, Artificial Intelli- gence, and Operations Research, Springer International Publishing, Cham, 2020, pp. 99–111. doi:1...
-
[14]
Y. Wu, W. Song, Z. Cao, J. Zhang, Learning Scenario Representation for Solving Two-stage Stochastic Integer Programs, in: Proceedings of the 10th International Conference on Learning Representations, 2021. URL:https://openreview.net/forum?id=06Wy2BtxXrz
work page 2021
-
[15]
Y. Wu, Y. Zhang, Z. Liang, J. Cheng, HGCN2SP: Hierarchical Graph Convolutional Network for Two-Stage Stochastic Programming, in: Proceedings of the 41st International Conference on Machine Learning, 2024, pp. 53999 – 54014. doi:10.5555/3692070.3694285
- [16]
- [17]
-
[18]
F. Stranieri, E. Fadda, F. Stella, Combining deep reinforcement learning and multi-stage stochastic programming to address the supply chain inventory management problem, Interna- tionalJournalofProductionEconomics268(2024)109099.doi:10.1016/j.ijpe.2023.109099
-
[19]
J. Dumouchelle, R. Patel, E. B. Khalil, M. Bodur, Neur2SP: Neural Two-Stage Stochastic Programming, 2022. doi:10.48550/arXiv.2205.12006
-
[20]
M. Fischetti, J. Jo, Deep neural networks and mixed integer linear optimization, Constraints 23 (2018) 296–309. doi:10.1007/s10601-018-9285-6
-
[21]
J. Kronqvist, B. Li, J. Rolfes, S. Zhao, Alternating Mixed-Integer Programming and Neu- ral Network Training for Approximating Stochastic Two-Stage Problems, in: G. Nicosia, V. Ojha, E. La Malfa, G. La Malfa, P. M. Pardalos, R. Umeton (Eds.), Machine Learn- ing, Optimization, and Data Science, Springer Nature Switzerland, Cham, 2024, pp. 124–139. doi:10.1...
-
[22]
Y. Liu, F. Oliveira, Simulator-based surrogate optimisation employing adaptive uncertainty- aware sampling, Computers & Chemical Engineering 201 (2025) 109243. doi:10.1016/j. compchemeng.2025.109243
work page doi:10.1016/j 2025
-
[23]
B. Amos, L. Xu, J. Z. Kolter, Input Convex Neural Networks, in: Proceedings of the 34th International Conference on Machine Learning, PMLR, 2017, pp. 146–155. URL:https:// proceedings.mlr.press/v70/amos17b.html
work page 2017
-
[24]
L. Duchesne, Q. Louveaux, L. Wehenkel, Supervised learning of convex piecewise linear ap- proximations of optimization problems, in: Proceedings of the 29th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2021, pp. 493–498. doi:10.14428/esann/2021.ES2021-74
-
[25]
S. Agrawal, R. Jia, Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management, in: Proceedings of the 2019 ACM Conference on Economics and Computation, EC ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 743–744. doi:10.1145/3328526.3329565
-
[26]
H. Zhong, X. Yan, Z. Tan, Real-Time Distributed Economic Dispatch Adapted to General Convex Cost Functions: A Secant Approximation-Based Method, IEEE Transactions on Smart Grid 12 (2021) 2089–2101. doi:10.1109/TSG.2020.3049054
-
[27]
E. Luxenberg, S. Boyd, Portfolio construction with Gaussian mixture returns and exponential utility via convex optimization, Optimization and Engineering 25 (2024) 555–574. doi:10. 1007/s11081-023-09814-y
work page 2024
-
[28]
Y. Chen, Y. Shi, B. Zhang, Data-Driven Optimal Voltage Regulation Using Input Convex Neural Networks, Electric Power Systems Research 189 (2020) 106741. doi:10.1016/j.epsr. 2020.106741
-
[29]
A. Rosemberg, M. Tanneau, B. Fanzeres, J. Garcia, P. Van Hentenryck, Learning Optimal Power Flow value functions with input-convex neural networks, Electric Power Systems Re- search 235 (2024) 110643. doi:10.1016/j.epsr.2024.110643
-
[30]
S. Yang, B. W. Bequette, Optimization-based control using input convex neural networks, Computers & Chemical Engineering 144 (2021) 107143. doi:10.1016/j.compchemeng.2020. 107143
-
[31]
F. Bünning, A. Schalbetter, A. Aboudonia, M. H. d. Badyn, P. Heer, J. Lygeros, Input Convex Neural Networks for Building MPC, in: Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR, 2021, pp. 251–262. URL:https://proceedings.mlr.press/ v144/bunning21a.html. 26
work page 2021
-
[32]
K. Jiang, C. Hu, F. Yan, Path-following control of autonomous ground vehicles based on input convex neural networks, Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 236 (2022) 2806–2816. doi:10.1177/09544070221114690
-
[33]
W. Wang, H. Zhang, Y. Wang, Y. Tian, Z. Wu, Fast Explicit Machine Learning-Based Model Predictive Control of Nonlinear Processes Using Input Convex Neural Networks, Industrial & Engineering Chemistry Research 63 (2024) 17279–17293. doi:10.1021/acs.iecr.4c02257
-
[34]
S. P. Boyd, L. Vandenberghe, Convex optimization, Cambridge university press, 2004. doi:10. 1017/CBO9780511804441
work page 2004
-
[35]
J. R. Birge, F. Louveaux, Introduction to Stochastic Programming, Springer Series in Op- erations Research and Financial Engineering, Springer, New York, NY, 2011. doi:10.1007/ 978-1-4614-0237-4
work page 2011
-
[36]
H. J. M. Peters, P. P. Wakker, Convex functions on non-convex domains, Economics Letters 22 (1986) 251–255. doi:10.1016/0165-1765(86)90242-9
-
[37]
A. Hassanzadeh, T. K. Ralphs, On the Value Function of a Mixed Integer Linear Optimization Problem and an Algorithm for its Construction, 2014. URL:https://optimization-online. org/?p=12986
work page 2014
- [38]
-
[39]
Gurobi Optimization, LLC, Gurobi Optimizer Reference Manual, 2024. URL:https://www. gurobi.com
work page 2024
-
[40]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Te- jani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An imperative style, high-performance deep learning library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. ...
-
[41]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, Édouard Duchesnay, Scikit-learn: Machine learning in python, Journal of Machine Learning Research 12 (2011) 2825–2830. URL:http://jmlr.org/papers/v12/pedregosa11a. html. 27
work page 2011
-
[42]
G. Cornuejols, R. Sridharan, J. M. Thizy, A comparison of heuristics and relaxations for the capacitated plant location problem, European Journal of Operational Research 50 (1991) 280–297. doi:10.1016/0377-2217(91)90261-S
-
[43]
L. Ntaimo, S. Sen, The Million-Variable “March” for Stochastic Combinatorial Optimization, Journal of Global Optimization 32 (2005) 385–400. doi:10.1007/s10898-004-5910-6
-
[44]
R. Schultz, L. Stougie, M. H. van der Vlerk, Solving stochastic programs with integer recourse by enumeration: A framework using Gröbner basis, Mathematical Programming 83 (1998) 229–252. doi:10.1007/BF02680560
-
[45]
J. Kronqvist, R. Misener, C. Tsay, P-split formulations: a class of intermediate formulations between big-M and convex hull for disjunctive constraints, Mathematical Programming (2025). doi:10.1007/s10107-025-02232-1
- [46]
-
[47]
URL:https://docs.gurobi.com/projects/optimizer/en/current/reference/ parameters.html
Gurobi Optimization, LLC, Parameter Reference - Gurobi Optimizer Reference Man- ual, 2025. URL:https://docs.gurobi.com/projects/optimizer/en/current/reference/ parameters.html. 28
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.