MIPaaL: Mixed Integer Program as a Layer

Aaron Ferber; Bistra Dilkina; Bryan Wilder; Milind Tambe

arxiv: 1907.05912 · v2 · pith:7LW5NAOEnew · submitted 2019-07-12 · 💻 cs.LG · cs.AI

MIPaaL: Mixed Integer Program as a Layer

Aaron Ferber , Bryan Wilder , Bistra Dilkina , Milind Tambe This is my paper

Pith reviewed 2026-05-24 22:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords decision-focused learningmixed integer programmingdifferentiable optimizationcutting planesend-to-end learningcombinatorial optimizationmachine learning for optimization

0 comments

The pith

Differentiating through mixed integer programs allows predictive models to be trained directly on downstream decision quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates how to embed mixed integer linear programs as differentiable layers inside machine learning pipelines. This extends decision-focused learning from linear programs to the wider class of MIPs that include discrete variables and arbitrary linear constraints. The key step is to differentiate through an exact cutting-planes solver that iteratively tightens a continuous relaxation until an integral solution appears. On several real-world domains the resulting end-to-end models produce better decisions than either a two-stage predict-then-optimize pipeline or a decision-focused model trained only on the LP relaxation.

Core claim

By employing a cutting planes solution approach, which is an exact algorithm that iteratively adds constraints to a continuous relaxation of the problem until an integral solution is found, it is possible to differentiate through a MIP. This enables decision-focused learning for the broad class of problems encoded as mixed integer linear programs, supporting arbitrary linear constraints over discrete and continuous variables, and yields higher-quality decisions than treating prediction and prescription separately or than applying decision-focused learning only to the LP relaxation.

What carries the argument

Cutting-planes solution approach that iteratively adds constraints to the continuous relaxation until integrality is reached, thereby permitting gradient flow back through the MIP solver.

If this is right

MIPs encoding discrete choices can now participate directly in end-to-end gradient training.
Predictive models can be optimized for the quality of the MIP solutions they induce rather than for prediction accuracy alone.
The approach applies to any problem expressible with arbitrary linear constraints on mixed discrete and continuous variables.
On evaluated domains the method outperforms both separate prediction-plus-optimization and decision-focused learning on the LP relaxation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique may allow decision-focused training on larger combinatorial problems once more efficient cutting-plane implementations are used inside the layer.
Similar differentiation could be attempted with other exact MIP solvers such as branch-and-bound if their solution paths can be made differentiable.
The method opens a route to jointly learning prediction models and the structure of the MIP constraints themselves when both are uncertain.

Load-bearing premise

Gradients computed through the iterative cutting-planes process remain stable and informative enough to train the upstream predictive model effectively.

What would settle it

If training with the MIP layer produces worse final decisions than the LP-relaxation baseline on the same real-world instances, the claimed advantage of the differentiable MIP approach would not hold.

Figures

Figures reproduced from arXiv: 1907.05912 by Aaron Ferber, Bistra Dilkina, Bryan Wilder, Milind Tambe.

read the original abstract

Machine learning components commonly appear in larger decision-making pipelines; however, the model training process typically focuses only on a loss that measures accuracy between predicted values and ground truth values. Decision-focused learning explicitly integrates the downstream decision problem when training the predictive model, in order to optimize the quality of decisions induced by the predictions. It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization. However, these previous applications have uniformly focused on problems from specific classes with simple constraints. Here, we enable decision-focused learning for the broad class of problems that can be encoded as a Mixed Integer Linear Program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables. We show how to differentiate through a MIP by employing a cutting planes solution approach, which is an exact algorithm that iteratively adds constraints to a continuous relaxation of the problem until an integral solution is found. We evaluate our new end-to-end approach on several real world domains and show that it outperforms the standard two phase approaches that treat prediction and prescription separately, as well as a baseline approach of simply applying decision-focused learning to the LP relaxation of the MIP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper makes general MIPs differentiable for decision-focused learning by backpropagating through cutting-plane iterations, extending the approach beyond LPs.

read the letter

The main thing to know is that this paper extends decision-focused learning to general mixed integer programs by differentiating through a cutting-planes solver. That step lets the method handle arbitrary linear constraints over discrete and continuous variables, which prior work on LPs and submodular problems did not cover. They backpropagate through the sequence of relaxed LPs solved as cuts are added until an integral solution is reached, then show on real-world domains that the end-to-end approach beats both separate prediction-plus-optimization and decision-focused learning applied only to the LP relaxation. The experiments on actual applications are the clearest strength, and the citation pattern stays focused on the relevant decision-focused learning literature without padding. The soft spot is the differentiability step itself. The sequence of cuts depends on the current fractional vertex, which depends on the upstream predictions, so the active constraint set can shift discontinuously. If the method simply differentiates through the final active set while treating it as fixed, the gradients may not reflect those jumps and could end up uninformative or unstable during training. The abstract gives no derivation or stability analysis, so that part needs direct checking in the full text. This paper is for researchers working on learning for combinatorial optimization who need to handle MIPs rather than just continuous relaxations. A reader already following the decision-focused learning line would get concrete value from seeing the extension work in practice. It deserves a serious referee because the core claim addresses a real limitation in the existing literature even if the gradient details require more evidence.

Referee Report

1 major / 1 minor

Summary. The paper presents MIPaaL, a technique to embed mixed-integer linear programs as differentiable layers within neural networks for decision-focused learning. It differentiates through an exact cutting-planes solver by back-propagating through the sequence of LP relaxations solved during cut generation, and reports that the resulting end-to-end model outperforms both separate prediction-plus-optimization pipelines and a decision-focused LP-relaxation baseline on several real-world tasks.

Significance. If the differentiation is rigorously defined and the gradients remain informative, the work would meaningfully extend decision-focused learning beyond LPs and submodular problems to the much larger class of MIPs. The empirical claim of outperformance over both two-stage and LP-relaxation baselines, if reproducible, would constitute a practical contribution.

major comments (1)

[Abstract (method description)] The central technical claim—that gradients can be obtained by back-propagation through the iterative cutting-plane process—does not address the dependence of cut generation on the current fractional vertex, which itself depends on the upstream predictions. Because the sequence of active cuts can change discontinuously, the mapping from predictions to the final integral solution is piecewise and the pieces can switch abruptly; no argument is supplied that the resulting subgradient remains well-defined or useful for training across these switches.

minor comments (1)

[Abstract] The abstract refers to evaluation on 'several real world domains' without naming the domains or providing any quantitative results, making it impossible to assess the strength of the empirical support from the provided text alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting an important subtlety in the differentiability argument. We address the concern directly below and will incorporate clarifications into a revised manuscript.

read point-by-point responses

Referee: [Abstract (method description)] The central technical claim—that gradients can be obtained by back-propagation through the iterative cutting-plane process—does not address the dependence of cut generation on the current fractional vertex, which itself depends on the upstream predictions. Because the sequence of active cuts can change discontinuously, the mapping from predictions to the final integral solution is piecewise and the pieces can switch abruptly; no argument is supplied that the resulting subgradient remains well-defined or useful for training across these switches.

Authors: We agree that cut selection depends on the current fractional vertex and that this dependence can cause the active-cut sequence to change discontinuously with the predictions. Our differentiation procedure unrolls the cutting-plane loop and applies the standard LP differentiation operator (via the KKT conditions of each successive relaxation) to the sequence of LPs that were actually solved for a given prediction. For any fixed sequence of cuts the end-to-end map is piecewise linear and therefore admits a well-defined subgradient almost everywhere. At the (measure-zero) points where the sequence jumps we simply retain the subgradient corresponding to the cuts that were active in the forward pass; this is the natural extension of the subgradient used by other decision-focused methods that also encounter combinatorial choices (e.g., active-set changes in quadratic programs). While a fully rigorous proof that these subgradients remain informative for every possible MIP is beyond the scope of the paper, the empirical results on multiple real-world domains demonstrate that the resulting gradients are sufficiently informative to outperform both two-stage and LP-relaxation baselines. We will add an expanded discussion of this subgradient construction, its relation to existing work on non-smooth optimization, and the practical evidence of its utility in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation introduces independent differentiation technique

full rationale

The paper's core contribution is a novel method to differentiate through MIPs by backpropagating through the cutting-planes algorithm's sequence of LP relaxations until integrality. This is presented as an extension of prior decision-focused learning work to the MIP setting, with the technical steps (iterative constraint addition and gradient computation) described directly rather than defined in terms of the target predictions or fitted parameters. No load-bearing step reduces by construction to a self-citation, ansatz, or renaming; the evaluations compare against baselines including LP relaxation, confirming the claim rests on the new algorithmic construction rather than circular reduction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no free parameters, axioms, or invented entities are identifiable; the contribution is a differentiation technique rather than new modeling assumptions.

pith-pipeline@v0.9.0 · 5744 in / 1183 out tokens · 27363 ms · 2026-05-24T22:11:18.243956+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 1 internal anchor

[1]

A supervised machine learning approach to variable branching in branch-and-bound

Alejandro Marcos Alvarez, Quentin Louveaux, and Louis Wehenkel. A supervised machine learning approach to variable branching in branch-and-bound. In ECML, 2014

work page 2014
[2]

Amos and J

B. Amos and J. Z. Kolter. Optnet: Differentiable optimization as a layer in neural networks. In ICML, 2017

work page 2017
[3]

Gomory cuts revisited

Egon Balas, Sebastian Ceria, Gérard Cornuéjols, and N Natraj. Gomory cuts revisited. Opera- tions Research Letters, 19(1):1–9, 1996

work page 1996
[4]

Learning to branch

Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In ICML, 2018

work page 2018
[5]

Diver- sity constraints in public housing allocation

Nawal Benabbou, Mithun Chakraborty, Xuan-Vinh Ho, Jakub Sliwinski, and Yair Zick. Diver- sity constraints in public housing allocation. In AAMAS, 2018

work page 2018
[6]

Using a ﬁnancial training criterion rather than a prediction criterion

Yoshua Bengio. Using a ﬁnancial training criterion rather than a prediction criterion. Interna- tional Journal of Neural Systems, 8(04):433–443, 1997

work page 1997
[7]

Machine learning for combinatorial optimization: a methodological tour d’horizon

Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. arXiv preprint arXiv:1811.06128, 2018

work page arXiv 2018
[8]

Portfolio construction through mixed-integer programming at grantham, mayo, van otterloo and company.Interfaces, 29(1):49– 66, 1999

Dimitris Bertsimas, Christopher Darnell, and Robert Soucy. Portfolio construction through mixed-integer programming at grantham, mayo, van otterloo and company.Interfaces, 29(1):49– 66, 1999

work page 1999
[9]

Cutting planes from extended lp formulations

Merve Bodur, Sanjeeb Dash, and Oktay Günlük. Cutting planes from extended lp formulations. Mathematical Programming, 161(1-2):159–192, 2017

work page 2017
[10]

Optimizing intensive care unit discharge decisions with patient readmissions

Carri W Chan, Vivek F Farias, Nicholas Bambos, and Gabriel J Escobar. Optimizing intensive care unit discharge decisions with patient readmissions. Operations research, 60(6):1323–1341, 2012

work page 2012
[11]

An anatomy of trading strategies

Jennifer Conrad and Gautam Kaul. An anatomy of trading strategies. The Review of Financial Studies, 11(3):489–519, 1998

work page 1998
[12]

Dobbs, Oktay Günlük, Tomasz J

Sanjeeb Dash, Neil B. Dobbs, Oktay Günlük, Tomasz J. Nowicki, and Grzegorz M. ´Swirszcz. Lattice-free sets, multi-branch split disjunctions, and mixed-integer programming.Mathematical Programming, 145(1):483–508, 2014

work page 2014
[13]

End-to-end differentiable physics for learning and control

Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J Zico Kolter. End-to-end differentiable physics for learning and control. In Advances in Neural Information Processing Systems, pages 7178–7189, 2018

work page 2018
[14]

Effect of the accuracy of price forecasting on proﬁt in a price based unit commitment

Erik Delarue, Pieterjan Van Den Bosch, and William D’haeseleer. Effect of the accuracy of price forecasting on proﬁt in a price based unit commitment. Electric power systems research, 80(10):1306–1313, 2010

work page 2010
[15]

Position-indexed formulations for kidney exchange

John P Dickerson, David F Manlove, Benjamin Plaut, Tuomas Sandholm, and James Trim- ble. Position-indexed formulations for kidney exchange. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 25–42. ACM, 2016

work page 2016
[16]

predict, then optimize

Adam N Elmachtoub and Paul Grigas. Smart" predict, then optimize". arXiv preprint arXiv:1710.08005, 2017

work page arXiv 2017
[17]

An algorithm for the mixed integer problem

Ralph Gomory. An algorithm for the mixed integer problem. Technical report, RAND CORP SANTA MONICA CA, 1960

work page 1960
[18]

Roy, and Xiang Yan

Yi hao Kao, Benjamin V . Roy, and Xiang Yan. Directed regression. In Y . Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, pages 889–897. Curran Associates, Inc., 2009

work page 2009
[19]

Learning to search in branch and bound algorithms

He He, Hal Daume III, and Jason M Eisner. Learning to search in branch and bound algorithms. In Advances in Neural Information Processing Systems, pages 3293–3301, 2014. 10

work page 2014
[20]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 , ICML’15, pages 448–456. JMLR.org, 2015

work page 2015
[21]

Minima of functions of several variables with inequalities as side constraints

William Karush. Minima of functions of several variables with inequalities as side constraints. M. Sc. Dissertation. Dept. of Mathematics, Univ. of Chicago, 1939

work page 1939
[22]

A fast and high quality multilevel scheme for partitioning irregular graphs

George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientiﬁc Computing, 20(1):359–392, 1998

work page 1998
[23]

Learning combinatorial optimization algorithms over graphs

Elias B Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, pages 6348–6358, 2017

work page 2017
[24]

Learning to run heuristics in tree search

Elias B Khalil, Bistra Dilkina, George L Nemhauser, Shabbir Ahmed, and Yufen Shao. Learning to run heuristics in tree search. In IJCAI, pages 659–666, 2017

work page 2017
[25]

Learning to branch in mixed integer programming

Elias B Khalil, Pierre Le Bodic, Le Song, George L Nemhauser, and Bistra Dilkina. Learning to branch in mixed integer programming. In Proceedings of the 30th AAAI Conference on Artiﬁcial Intelligence, 2016

work page 2016
[26]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[27]

Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

work page 2019
[28]

What game are we playing? end-to-end learning in normal and extensive form games

Chun Kai Ling, Fei Fang, and J Zico Kolter. What game are we playing? end-to-end learning in normal and extensive form games. IJCAI, 2018

work page 2018
[29]

On learning and branching: a survey

Andrea Lodi and Giulia Zarpellon. On learning and branching: a survey. Top, 25(2):207–236, 2017

work page 2017
[30]

Rectiﬁer nonlinearities improve neural network acoustic models

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectiﬁer nonlinearities improve neural network acoustic models. In Proc. ICML, volume 30, 2013

work page 2013
[31]

Portfolio selection

Harry Markowitz. Portfolio selection. The journal of ﬁnance, 7(1):77–91, 1952

work page 1952
[32]

Optimal residential load control with price prediction in real-time electricity pricing environments

Amir-Hamed Mohsenian-Rad and Alberto Leon-Garcia. Optimal residential load control with price prediction in real-time electricity pricing environments. IEEE Trans. Smart Grid, 1(2):120–133, 2010

work page 2010
[33]

Reinforcement learning for solving the vehicle routing problem

Mohammadreza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takác. Reinforcement learning for solving the vehicle routing problem. In Advances in Neural Information Processing Systems, pages 9839–9849, 2018

work page 2018
[34]

Integer programming: The global impact

George L Nemhauser. Integer programming: The global impact. ISyE DOS Optimization Seminar presentation at Georgia Tech http://hdl.handle.net/1853/49829, 2013

work page 2013
[35]

Data analysis and optimization for (citi) bike sharing

Eoin O’Mahony and David B Shmoys. Data analysis and optimization for (citi) bike sharing. In Proceedings of the 29th AAAI Conference on Artiﬁcial Intelligence, 2015

work page 2015
[36]

Automatic differentiation in PyTorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, 2017

work page 2017
[37]

Decision-focused learning of adversary behavior in security games

Andrew Perrault, Bryan Wilder, Eric Ewing, Aditya Mate, Bistra Dilkina, and Milind Tambe. Decision-focused learning of adversary behavior in security games. arXiv preprint arXiv:1903.00958, 2019

work page arXiv 1903
[38]

Various end-of-day data

Quandl. Various end-of-day data. https://www.quandl.com/, 2019. 11

work page 2019
[39]

Guiding combinatorial optimization with uct

Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy. Guiding combinatorial optimization with uct. InInternational Conference on Integration of Artiﬁcial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pages 356–361. Springer, 2012

work page 2012
[40]

Ipknot: fast and accurate prediction of rna secondary structures with pseudoknots using integer programming

Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, and Kiyoshi Asai. Ipknot: fast and accurate prediction of rna secondary structures with pseudoknots using integer programming. Bioinformatics, 27(13):i85–i93, 2011

work page 2011
[41]

Collective classiﬁcation in network data

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi- Rad. Collective classiﬁcation in network data. AI magazine, 29(3):93–93, 2008

work page 2008
[42]

Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958, 2014

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958, 2014

work page 1929
[43]

Nurse scheduling using integer linear programming and constraint programming

Lorraine Trilling, Alain Guinet, and Dominiue Le Magny. Nurse scheduling using integer linear programming and constraint programming. IFAC Proceedings Volumes, 39(3):671–676, 2006

work page 2006
[44]

Pointer networks

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015

work page 2015
[45]

Satnet: Bridging deep learning and logical reasoning using a differentiable satisﬁability solver

Po-Wei Wang, Priya Donti, Bryan Wilder, and Zico Kolter. Satnet: Bridging deep learning and logical reasoning using a differentiable satisﬁability solver. In International Conference on Machine Learning, pages 6545–6554, 2019

work page 2019
[46]

Scheduling nursing personnel according to nursing preference: A mathe- matical programming approach

D Michael Warner. Scheduling nursing personnel according to nursing preference: A mathe- matical programming approach. Operations Research, 24(5):842–856, 1976

work page 1976
[47]

Melding the data-decisions pipeline: Decision- focused learning for combinatorial optimization

Bryan Wilder, Bistra Dilkina, and Milind Tambe. Melding the data-decisions pipeline: Decision- focused learning for combinatorial optimization. In Proceedings of the 33rd AAAI Conference on Artiﬁcial Intelligence, 2019

work page 2019
[48]

Integer and combinatorial optimization

Laurence A Wolsey and George L Nemhauser. Integer and combinatorial optimization. John Wiley & Sons, 2014

work page 2014
[49]

Deep reinforcement learning for green security game with online information

Lantao Yu, Yi Wu, Rohit Singh, Lucas Joppa, and Fei Fang. Deep reinforcement learning for green security game with online information. In Proceedings of the 33rd AAAI Conference on Artiﬁcial Intelligence, 2019. A Portfolio Optimization Speciﬁcs A.1 Data speciﬁcation We simulate our approach on this problem setting we use historical price and volume data d...

work page 2019

[1] [1]

A supervised machine learning approach to variable branching in branch-and-bound

Alejandro Marcos Alvarez, Quentin Louveaux, and Louis Wehenkel. A supervised machine learning approach to variable branching in branch-and-bound. In ECML, 2014

work page 2014

[2] [2]

Amos and J

B. Amos and J. Z. Kolter. Optnet: Differentiable optimization as a layer in neural networks. In ICML, 2017

work page 2017

[3] [3]

Gomory cuts revisited

Egon Balas, Sebastian Ceria, Gérard Cornuéjols, and N Natraj. Gomory cuts revisited. Opera- tions Research Letters, 19(1):1–9, 1996

work page 1996

[4] [4]

Learning to branch

Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, and Ellen Vitercik. Learning to branch. In ICML, 2018

work page 2018

[5] [5]

Diver- sity constraints in public housing allocation

Nawal Benabbou, Mithun Chakraborty, Xuan-Vinh Ho, Jakub Sliwinski, and Yair Zick. Diver- sity constraints in public housing allocation. In AAMAS, 2018

work page 2018

[6] [6]

Using a ﬁnancial training criterion rather than a prediction criterion

Yoshua Bengio. Using a ﬁnancial training criterion rather than a prediction criterion. Interna- tional Journal of Neural Systems, 8(04):433–443, 1997

work page 1997

[7] [7]

Machine learning for combinatorial optimization: a methodological tour d’horizon

Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. arXiv preprint arXiv:1811.06128, 2018

work page arXiv 2018

[8] [8]

Portfolio construction through mixed-integer programming at grantham, mayo, van otterloo and company.Interfaces, 29(1):49– 66, 1999

Dimitris Bertsimas, Christopher Darnell, and Robert Soucy. Portfolio construction through mixed-integer programming at grantham, mayo, van otterloo and company.Interfaces, 29(1):49– 66, 1999

work page 1999

[9] [9]

Cutting planes from extended lp formulations

Merve Bodur, Sanjeeb Dash, and Oktay Günlük. Cutting planes from extended lp formulations. Mathematical Programming, 161(1-2):159–192, 2017

work page 2017

[10] [10]

Optimizing intensive care unit discharge decisions with patient readmissions

Carri W Chan, Vivek F Farias, Nicholas Bambos, and Gabriel J Escobar. Optimizing intensive care unit discharge decisions with patient readmissions. Operations research, 60(6):1323–1341, 2012

work page 2012

[11] [11]

An anatomy of trading strategies

Jennifer Conrad and Gautam Kaul. An anatomy of trading strategies. The Review of Financial Studies, 11(3):489–519, 1998

work page 1998

[12] [12]

Dobbs, Oktay Günlük, Tomasz J

Sanjeeb Dash, Neil B. Dobbs, Oktay Günlük, Tomasz J. Nowicki, and Grzegorz M. ´Swirszcz. Lattice-free sets, multi-branch split disjunctions, and mixed-integer programming.Mathematical Programming, 145(1):483–508, 2014

work page 2014

[13] [13]

End-to-end differentiable physics for learning and control

Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J Zico Kolter. End-to-end differentiable physics for learning and control. In Advances in Neural Information Processing Systems, pages 7178–7189, 2018

work page 2018

[14] [14]

Effect of the accuracy of price forecasting on proﬁt in a price based unit commitment

Erik Delarue, Pieterjan Van Den Bosch, and William D’haeseleer. Effect of the accuracy of price forecasting on proﬁt in a price based unit commitment. Electric power systems research, 80(10):1306–1313, 2010

work page 2010

[15] [15]

Position-indexed formulations for kidney exchange

John P Dickerson, David F Manlove, Benjamin Plaut, Tuomas Sandholm, and James Trim- ble. Position-indexed formulations for kidney exchange. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 25–42. ACM, 2016

work page 2016

[16] [16]

predict, then optimize

Adam N Elmachtoub and Paul Grigas. Smart" predict, then optimize". arXiv preprint arXiv:1710.08005, 2017

work page arXiv 2017

[17] [17]

An algorithm for the mixed integer problem

Ralph Gomory. An algorithm for the mixed integer problem. Technical report, RAND CORP SANTA MONICA CA, 1960

work page 1960

[18] [18]

Roy, and Xiang Yan

Yi hao Kao, Benjamin V . Roy, and Xiang Yan. Directed regression. In Y . Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, pages 889–897. Curran Associates, Inc., 2009

work page 2009

[19] [19]

Learning to search in branch and bound algorithms

He He, Hal Daume III, and Jason M Eisner. Learning to search in branch and bound algorithms. In Advances in Neural Information Processing Systems, pages 3293–3301, 2014. 10

work page 2014

[20] [20]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 , ICML’15, pages 448–456. JMLR.org, 2015

work page 2015

[21] [21]

Minima of functions of several variables with inequalities as side constraints

William Karush. Minima of functions of several variables with inequalities as side constraints. M. Sc. Dissertation. Dept. of Mathematics, Univ. of Chicago, 1939

work page 1939

[22] [22]

A fast and high quality multilevel scheme for partitioning irregular graphs

George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientiﬁc Computing, 20(1):359–392, 1998

work page 1998

[23] [23]

Learning combinatorial optimization algorithms over graphs

Elias B Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, pages 6348–6358, 2017

work page 2017

[24] [24]

Learning to run heuristics in tree search

Elias B Khalil, Bistra Dilkina, George L Nemhauser, Shabbir Ahmed, and Yufen Shao. Learning to run heuristics in tree search. In IJCAI, pages 659–666, 2017

work page 2017

[25] [25]

Learning to branch in mixed integer programming

Elias B Khalil, Pierre Le Bodic, Le Song, George L Nemhauser, and Bistra Dilkina. Learning to branch in mixed integer programming. In Proceedings of the 30th AAAI Conference on Artiﬁcial Intelligence, 2016

work page 2016

[26] [26]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[27] [27]

Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019

work page 2019

[28] [28]

What game are we playing? end-to-end learning in normal and extensive form games

Chun Kai Ling, Fei Fang, and J Zico Kolter. What game are we playing? end-to-end learning in normal and extensive form games. IJCAI, 2018

work page 2018

[29] [29]

On learning and branching: a survey

Andrea Lodi and Giulia Zarpellon. On learning and branching: a survey. Top, 25(2):207–236, 2017

work page 2017

[30] [30]

Rectiﬁer nonlinearities improve neural network acoustic models

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectiﬁer nonlinearities improve neural network acoustic models. In Proc. ICML, volume 30, 2013

work page 2013

[31] [31]

Portfolio selection

Harry Markowitz. Portfolio selection. The journal of ﬁnance, 7(1):77–91, 1952

work page 1952

[32] [32]

Optimal residential load control with price prediction in real-time electricity pricing environments

Amir-Hamed Mohsenian-Rad and Alberto Leon-Garcia. Optimal residential load control with price prediction in real-time electricity pricing environments. IEEE Trans. Smart Grid, 1(2):120–133, 2010

work page 2010

[33] [33]

Reinforcement learning for solving the vehicle routing problem

Mohammadreza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takác. Reinforcement learning for solving the vehicle routing problem. In Advances in Neural Information Processing Systems, pages 9839–9849, 2018

work page 2018

[34] [34]

Integer programming: The global impact

George L Nemhauser. Integer programming: The global impact. ISyE DOS Optimization Seminar presentation at Georgia Tech http://hdl.handle.net/1853/49829, 2013

work page 2013

[35] [35]

Data analysis and optimization for (citi) bike sharing

Eoin O’Mahony and David B Shmoys. Data analysis and optimization for (citi) bike sharing. In Proceedings of the 29th AAAI Conference on Artiﬁcial Intelligence, 2015

work page 2015

[36] [36]

Automatic differentiation in PyTorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, 2017

work page 2017

[37] [37]

Decision-focused learning of adversary behavior in security games

Andrew Perrault, Bryan Wilder, Eric Ewing, Aditya Mate, Bistra Dilkina, and Milind Tambe. Decision-focused learning of adversary behavior in security games. arXiv preprint arXiv:1903.00958, 2019

work page arXiv 1903

[38] [38]

Various end-of-day data

Quandl. Various end-of-day data. https://www.quandl.com/, 2019. 11

work page 2019

[39] [39]

Guiding combinatorial optimization with uct

Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy. Guiding combinatorial optimization with uct. InInternational Conference on Integration of Artiﬁcial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pages 356–361. Springer, 2012

work page 2012

[40] [40]

Ipknot: fast and accurate prediction of rna secondary structures with pseudoknots using integer programming

Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, and Kiyoshi Asai. Ipknot: fast and accurate prediction of rna secondary structures with pseudoknots using integer programming. Bioinformatics, 27(13):i85–i93, 2011

work page 2011

[41] [41]

Collective classiﬁcation in network data

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi- Rad. Collective classiﬁcation in network data. AI magazine, 29(3):93–93, 2008

work page 2008

[42] [42]

Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958, 2014

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958, 2014

work page 1929

[43] [43]

Nurse scheduling using integer linear programming and constraint programming

Lorraine Trilling, Alain Guinet, and Dominiue Le Magny. Nurse scheduling using integer linear programming and constraint programming. IFAC Proceedings Volumes, 39(3):671–676, 2006

work page 2006

[44] [44]

Pointer networks

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015

work page 2015

[45] [45]

Satnet: Bridging deep learning and logical reasoning using a differentiable satisﬁability solver

Po-Wei Wang, Priya Donti, Bryan Wilder, and Zico Kolter. Satnet: Bridging deep learning and logical reasoning using a differentiable satisﬁability solver. In International Conference on Machine Learning, pages 6545–6554, 2019

work page 2019

[46] [46]

Scheduling nursing personnel according to nursing preference: A mathe- matical programming approach

D Michael Warner. Scheduling nursing personnel according to nursing preference: A mathe- matical programming approach. Operations Research, 24(5):842–856, 1976

work page 1976

[47] [47]

Melding the data-decisions pipeline: Decision- focused learning for combinatorial optimization

Bryan Wilder, Bistra Dilkina, and Milind Tambe. Melding the data-decisions pipeline: Decision- focused learning for combinatorial optimization. In Proceedings of the 33rd AAAI Conference on Artiﬁcial Intelligence, 2019

work page 2019

[48] [48]

Integer and combinatorial optimization

Laurence A Wolsey and George L Nemhauser. Integer and combinatorial optimization. John Wiley & Sons, 2014

work page 2014

[49] [49]

Deep reinforcement learning for green security game with online information

Lantao Yu, Yi Wu, Rohit Singh, Lucas Joppa, and Fei Fang. Deep reinforcement learning for green security game with online information. In Proceedings of the 33rd AAAI Conference on Artiﬁcial Intelligence, 2019. A Portfolio Optimization Speciﬁcs A.1 Data speciﬁcation We simulate our approach on this problem setting we use historical price and volume data d...

work page 2019