pith. sign in

arxiv: 2406.05697 · v3 · submitted 2024-06-09 · 🧮 math.OC

Decision-Focused Surrogate Modeling for Mixed-Integer Linear Optimization

Pith reviewed 2026-05-24 00:18 UTC · model grok-4.3

classification 🧮 math.OC
keywords mixed-integer linear programmingsurrogate modelingdecision-focused learningoptimization proxieslinear programmingreal-time decision makingdata-driven optimization
0
0 comments X

The pith

Decision-focused surrogate linear programs trained on MILP data predict near-optimal solutions while incorporating all original constraints for feasibility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Mixed-integer linear programs power many real-time decision systems but often cannot be solved fast enough for frequent updates. The paper constructs surrogate linear programs that solve much quicker by training them decision-focused on data generated from the original MILPs so that they return the same or close optimal solutions. The surrogates embed every linear constraint from the MILP, raising the chance that outputs stay feasible. Two case studies show the method needs little data and beats neural-network proxies on prediction accuracy for optimal solutions.

Core claim

We develop a data-driven approach for constructing surrogate optimization models in the form of linear programs (LPs) that can be solved much more efficiently than the corresponding MILPs. We train these surrogate LPs in a decision-focused manner such that for different model inputs, they achieve the same or close to the same optimal solutions as the original MILPs. One key advantage of the proposed method is that it allows the incorporation of all the original MILP's linear constraints, which significantly increases the likelihood of obtaining feasible predicted solutions.

What carries the argument

Decision-focused training of surrogate LPs that embed all original MILP linear constraints to match optimal solutions.

If this is right

  • Surrogate LPs solve much faster than MILPs, enabling real-time updates in online systems.
  • Inclusion of all original linear constraints raises the rate of feasible predicted solutions.
  • The method requires relatively little training data yet achieves high accuracy.
  • It outperforms neural-network-based optimization proxies on the tested cases.
  • The approach applies directly to time-critical decision-making problems that rely on MILPs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar decision-focused surrogates could be built for other optimization classes such as nonlinear programs.
  • The surrogates might serve as fast warm-starts or heuristics inside exact MILP solvers.
  • Periodic retraining on new data could let the surrogates track shifting problem parameters.
  • The data efficiency points to possible use in domains where generating large MILP datasets is costly.

Load-bearing premise

Training surrogate LPs decision-focused on MILP-generated data produces close optimal solutions for unseen inputs while the included constraints guarantee feasible outputs.

What would settle it

Solve the surrogate LP on a hold-out test set of inputs and compare its solutions to the true MILP optima; large objective gaps or infeasible points would disprove the claim.

Figures

Figures reproduced from arXiv: 2406.05697 by Qi Zhang, Rishabh Gupta, Shivi Dixit.

Figure 1
Figure 1. Figure 1: Illustrative example in which for the given cost vector [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: DFSOM performance compared to NN-based optimization proxies in the hybrid vehicle control problem. there are many solutions in terms of the continuous variables that are optimal or close to optimal. Moreover, as the cost parameter αt has a much larger value than βt, the contribution of the continuous variables to the total cost outweighs that of the discrete variables. As a result, from a cost standpoint, … view at source ↗
Figure 3
Figure 3. Figure 3: DFSOM performance for varying number of constraints learned ( [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Computational performance of DFSOMs evaluated across 1,000 random instances in the hybrid [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: DFSOM performance compared to NN-based optimization proxies in the production scheduling problem. errors with respect to the binary variables and optimality gaps for smaller training datasets, which indicates overfitting [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: DFSOM performance for varying number of constraints learned ( [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Computational performance of DFSOMs evaluated across 1,000 random instances in the production [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Computational performance of DFSOMs evaluated across 30 hard instances in the production scheduling case compared to solving the MILPs to the same objective function values achieved by the corresponding DFSOMs. 5 Conclusions Motivated by the need to solve difficult MILPs in real-time settings, we developed a data-driven approach for constructing efficient surrogate LPs that can replace the MILPs. When gene… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of feasible regions and optimal solutions of ( [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Bipartite graph representation of the MILP ( [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The predicted decision variables that are not integer-feasible are used in a projection problem to [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: DFSOM performance compared to NN-based optimization proxy for [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: DFSOM performance compared to NN-based optimization proxy for [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
read the original abstract

Mixed-integer optimization is at the core of many online decision-making systems that demand frequent updates of decisions in real time. However, due to their combinatorial nature, mixed-integer linear programs (MILPs) can be difficult to solve, rendering them often unsuitable for time-critical online applications. To address this challenge, we develop a data-driven approach for constructing surrogate optimization models in the form of linear programs (LPs) that can be solved much more efficiently than the corresponding MILPs. We train these surrogate LPs in a decision-focused manner such that for different model inputs, they achieve the same or close to the same optimal solutions as the original MILPs. One key advantage of the proposed method is that it allows the incorporation of all the original MILP's linear constraints, which significantly increases the likelihood of obtaining feasible predicted solutions. Results from two computational case studies indicate that this decision-focused surrogate modeling approach is highly data-efficient and provides very accurate predictions of the optimal solutions. In these examples, it outperforms more commonly used neural-network-based optimization proxies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes training surrogate linear programs (LPs) for mixed-integer linear programs (MILPs) in a decision-focused manner so that the surrogates produce the same or close optimal solutions as the original MILPs for varying inputs. All original linear constraints are retained in the surrogates to guarantee feasibility of the predicted solutions. Two computational case studies are used to show that the approach is highly data-efficient, yields accurate predictions, and outperforms neural-network-based optimization proxies.

Significance. If the empirical results hold, the method offers a practical route to real-time optimization in settings where MILPs are too slow, with the built-in feasibility guarantee and decision-focused objective providing advantages over standard neural proxies. The data-efficiency claim, if substantiated, would be particularly valuable for applications with limited training data.

minor comments (2)
  1. [Abstract] Abstract: the claim of 'highly data-efficient' and 'very accurate predictions' is stated without any numerical metrics, number of training samples, or feasibility rates; adding a brief quantitative summary would strengthen the abstract.
  2. The description of the training procedure and data-generation process for the two case studies should include explicit statements of the loss function, optimizer, and how test instances were sampled relative to training instances to allow replication.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core contribution is an empirical, data-driven construction of surrogate LPs trained decision-focused on MILP-generated instances, with feasibility enforced by retaining all original linear constraints. Performance claims rest on two external computational case studies measuring accuracy and data efficiency against NN proxies on unseen inputs. No derivation, equation, or self-citation reduces the reported outcomes to a definitional tautology or fitted input renamed as prediction; the method is validated externally rather than by internal self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard linear programming theory and data-driven training without introducing new mathematical entities or fitted constants beyond typical machine-learning hyperparameters.

axioms (2)
  • standard math Linear programs can be solved efficiently compared with MILPs
    Invoked to justify use of LP surrogates for real-time applications.
  • domain assumption Training data generated from the original MILP is representative of future inputs
    Required for the surrogate to generalize to new model inputs.

pith-pipeline@v0.9.0 · 5710 in / 1163 out tokens · 25323 ms · 2026-05-24T00:18:06.860101+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Zico Kolter

    Brandon Amos and J. Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. 3 2017

  2. [2]

    A learning-based algorithm to quickly compute good primal solutions for stochastic integer programs

    Yoshua Bengio, Emma Frejinger, Andrea Lodi, Rahul Patel, and Sriram Sankaranarayanan. A learning-based algorithm to quickly compute good primal solutions for stochastic integer programs. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 17th International Conference, CPAIOR 2020, Vienna, Austria, September 21--24,...

  3. [3]

    Machine learning for combinatorial optimization: a methodological tour d’horizon

    Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290 0 (2): 0 405--421, 2021

  4. [4]

    Online mixed-integer optimization in milliseconds

    Dimitris Bertsimas and Bartolomeo Stellato. Online mixed-integer optimization in milliseconds. INFORMS Journal on Computing, 34 0 (4): 0 2229--2248, 2022

  5. [5]

    Advances in surrogate based modeling, feasibility analysis, and optimization: A review

    Atharv Bhosekar and Marianthi Ierapetritou. Advances in surrogate based modeling, feasibility analysis, and optimization: A review . Computers and Chemical Engineering, 108: 0 250--267, 2018

  6. [6]

    Lee, Vijay Kumar, George J

    Steven Chen, Kelsey Saulnier, Nikolay Atanasov, Daniel D. Lee, Vijay Kumar, George J. Pappas, and Manfred Morari. Approximating Explicit Model Predictive Control Using Constrained Neural Networks . In Proceedings of the American Control Conference, pages 1520--1527. AACC, 2018

  7. [7]

    Machine learning for cutting planes in integer programming: A survey

    Arnaud Deza and Elias B Khalil. Machine learning for cutting planes in integer programming: A survey. arXiv preprint arXiv:2302.09166, 2023

  8. [8]

    Task-based End-to-End Model Learning in Stochastic Optimization

    Priya L Donti, Brandon Amos, and J Zico Kolter. Task-based End-to-End Model Learning in Stochastic Optimization . In Advances in Neural Information Processing Systems 30, 2017

  9. [9]

    Jump: A modeling language for mathematical optimization

    Iain Dunning, Joey Huchette, and Miles Lubin. Jump: A modeling language for mathematical optimization. SIAM Review, 59 0 (2): 0 295--320, 2017

  10. [10]

    Predict, then Optimize

    Adam N. Elmachtoub and Paul Grigas. Smart "Predict, then Optimize" . Management Science, 68 0 (1): 0 9--26, 2022

  11. [11]

    MIPaaL: Mixed integer program as a layer

    Aaron Ferber, Bryan Wilder, Bistra Dilkina, and Milind Tambe. MIPaaL: Mixed integer program as a layer . In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, pages 1504--1511, 2020

  12. [12]

    Surco: Learning linear surrogates for combinatorial nonlinear optimization problems

    Aaron M Ferber, Taoan Huang, Daochen Zha, Martin Schubert, Benoit Steiner, Bistra Dilkina, and Yuandong Tian. Surco: Learning linear surrogates for combinatorial nonlinear optimization problems. In International Conference on Machine Learning, pages 10034--10052. PMLR, 2023

  13. [13]

    Ferdinando Fioretto, Terrence W. K. Mak, and Pascal van Hentenryck. Predicting AC optimal power flows: Combining deep learning and Lagrangian dual methods . In Proceedings of the AAAI Conference on Artificial Intelligence, pages 630--637, 2020

  14. [14]

    Efficient learning of decision-making models: A penalty block coordinate descent algorithm for data-driven inverse optimization

    Rishabh Gupta and Qi Zhang. Efficient learning of decision-making models: A penalty block coordinate descent algorithm for data-driven inverse optimization. Computers & Chemical Engineering, page 108123, 2022

  15. [15]

    Data-driven decision-focused surrogate modeling

    Rishabh Gupta and Qi Zhang. Data-driven decision-focused surrogate modeling. AIChE Journal, page e18338, 2024

  16. [16]

    Learning to select cuts for efficient mixed-integer programming

    Zeren Huang, Kerong Wang, Furui Liu, Hui-Ling Zhen, Weinan Zhang, Mingxuan Yuan, Jianye Hao, Yong Yu, and Jun Wang. Learning to select cuts for efficient mixed-integer programming. Pattern Recognition, 123: 0 108353, 2022

  17. [17]

    Warm-starting constraint generation for mixed-integer optimization: A machine learning approach

    Asunci \'o n Jim \'e nez-Cordero, Juan Miguel Morales, and Salvador Pineda. Warm-starting constraint generation for mixed-integer optimization: A machine learning approach. Knowledge-Based Systems, 253: 0 109570, 2022

  18. [18]

    Efficient Representation and Approximation of Model Predictive Control Laws via Deep Learning

    Benjamin Karg and Sergio Lucia. Efficient Representation and Approximation of Model Predictive Control Laws via Deep Learning . IEEE Transactions on Cybernetics, 50 0 (9): 0 3866--3878, 2020

  19. [19]

    Learning to branch in mixed integer programming

    Elias Khalil, Pierre Le Bodic, Le Song, George Nemhauser, and Bistra Dilkina. Learning to branch in mixed integer programming. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016

  20. [20]

    Fast Approximations for Job Shop Scheduling: A Lagrangian Dual Deep Learning Method

    James Kotary, Ferdinando Fioretto, and Pascal Van Hentenryck . Fast Approximations for Job Shop Scheduling: A Lagrangian Dual Deep Learning Method . In Proceedings of the AAAI Conference on Artificial Intelligence, 2022

  21. [21]

    Rawlings, and Stephen J

    Pratyush Kumar, James B. Rawlings, and Stephen J. Wright. Industrial, large-scale model predictive control with structured neural networks . Computers and Chemical Engineering, 150: 0 107291, 2021

  22. [22]

    On learning and branching: a survey

    Andrea Lodi and Giulia Zarpellon. On learning and branching: a survey. Top, 25 0 (2): 0 207--236, 2017

  23. [23]

    Stuckey, and Tias Guns

    Jayanta Mandi, Emir Demirovi \' c , Peter J. Stuckey, and Tias Guns. Smart predict-and-optimize for hard combinatorial optimization problems . In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, pages 1603--1610, 2020

  24. [24]

    Chemical Production Scheduling: Mixed-Integer Programming Models and Methods

    Christos T Maravelias. Chemical Production Scheduling: Mixed-Integer Programming Models and Methods. Cambridge University Press, 2021

  25. [25]

    Cutting planes in integer and mixed integer programming

    Hugues Marchand, Alexander Martin, Robert Weismantel, and Laurence Wolsey. Cutting planes in integer and mixed integer programming. Discrete Applied Mathematics, 123 0 (1-3): 0 397--446, 2002

  26. [26]

    DeepOPF: Deep neural network for DC optimal power flow

    Xiang Pan, Tianyu Zhao, and Minghua Chen. DeepOPF: Deep neural network for DC optimal power flow . In Proceedings of the 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). IEEE, 2019

  27. [27]

    Paulson and Ali Mesbah

    Joel A. Paulson and Ali Mesbah. Approximate Closed-Loop Robust Model Predictive Control with Guaranteed Stability and Constraint Satisfaction . IEEE Control Systems Letters, 4 0 (3): 0 719--724, 2020

  28. [28]

    Theory of linear and integer programming

    Alexander Schrijver. Theory of linear and integer programming. John Wiley & Sons, 1998

  29. [29]

    Learning primal heuristics for mixed integer programs

    Yunzhuang Shen, Yuan Sun, Andrew Eberhard, and Xiaodong Li. Learning primal heuristics for mixed integer programs. In 2021 international joint conference on neural networks (ijcnn), pages 1--8. IEEE, 2021

  30. [30]

    Sidiropoulos

    Haoran Sun, Xiangyi Chen, Qingjiang Shi, Mingyi Hong, Xiao Fu, and Nicholas D. Sidiropoulos. Learning to Optimize: Training Deep Neural Networks for Interference Management . IEEE Transactions on Signal Processing, 66 0 (20): 0 5438--5453, 2018

  31. [31]

    A simple effective heuristic for embedded mixed-integer quadratic programming

    Reza Takapoui, Nicholas Moehle, Stephen Boyd, and Alberto Bemporad. A simple effective heuristic for embedded mixed-integer quadratic programming. 9 2015

  32. [32]

    Reinforcement learning for integer programming: Learning to cut

    Yunhao Tang, Shipra Agrawal, and Yuri Faenza. Reinforcement learning for integer programming: Learning to cut. In International conference on machine learning, pages 9367--9376. PMLR, 2020

  33. [33]

    Combining Deep Learning and Optimization for Preventive Security-Constrained DC Optimal Power Flow

    Alexandre Velloso and Pascal Van Hentenryck . Combining Deep Learning and Optimization for Preventive Security-Constrained DC Optimal Power Flow . IEEE Transactions on Power Systems, 36 0 (4): 0 3618--3628, 2021

  34. [34]

    Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization

    Bryan Wilder, Bistra Dilkina, and Milind Tambe. Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization . In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelli...

  35. [35]

    Zamzam and Kyri Baker

    Ahmed S. Zamzam and Kyri Baker. Learning optimal solutions for extremely fast ac optimal power flow . Proceedings of the 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), 2020