RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs
Pith reviewed 2026-05-23 16:44 UTC · model grok-4.3
The pith
A reinforcement learning policy independently generates feasible solutions for integer linear programs, including non-binary variables.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RL-SPH is a reinforcement learning-based start primal heuristic that independently generates feasible solutions for integer linear programs, including those with non-binary integer variables. It is trained to build solutions step by step and achieves a 100 percent feasibility rate along with an average 28.6 times lower primal gap and 2.6 times lower primal integral than prior start primal heuristics.
What carries the argument
The reinforcement learning policy that selects variables and assigns values to construct feasible ILP solutions without external solvers.
If this is right
- Branch-and-bound ILP solvers can initialize with higher-quality feasible points produced directly by the learned policy.
- Learning-based heuristics become usable for ILP classes that include integer variables beyond 0-1.
- The 100 percent feasibility removes the need for fallback mechanisms that existing end-to-end methods require.
- Lower primal gap and integral values imply faster overall convergence when the heuristic output seeds a full solver.
Where Pith is reading between the lines
- The same policy structure might be retrained on domain-specific ILP families to create specialized warm-starters without changing the underlying solver.
- If the policy scales to larger instances, it could reduce reliance on multiple classical heuristics run in parallel.
- Combining the policy with graph-based ILP encoders could improve generalization across problem sizes and structures.
Load-bearing premise
A policy trained on selected ILP instances will independently produce feasible solutions and generalize to unseen problems with non-binary variables while preserving the reported performance margins.
What would settle it
Test RL-SPH on a fresh collection of ILP instances containing non-binary integer variables and measure whether the feasibility rate falls below 100 percent or the reported primal gap and integral advantages disappear.
Figures
read the original abstract
Primal heuristics play a crucial role in quickly finding feasible solutions for NP-hard integer linear programming (ILP). Although $\textit{end-to-end learning}$-based primal heuristics (E2EPH) have recently been proposed, they are typically unable to independently generate feasible solutions. To address this challenge, we propose RL-SPH, a novel reinforcement learning-based start primal heuristic capable of independently generating feasible solutions, even for ILP involving non-binary integers. Empirically, RL-SPH rapidly obtains high-quality feasible solutions with a 100% feasibility rate, achieving on average a 28.6$\times$ lower primal gap and a 2.6$\times$ lower primal integral compared to existing start primal heuristics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes RL-SPH, a reinforcement learning-based start primal heuristic for integer linear programs (ILPs). Unlike prior end-to-end learning primal heuristics, RL-SPH is designed to independently generate feasible solutions, including on ILPs with non-binary integer variables. The central empirical claim is that RL-SPH achieves a 100% feasibility rate while delivering on average a 28.6× lower primal gap and 2.6× lower primal integral than existing start primal heuristics.
Significance. If the reported performance margins and generalization properties are substantiated with reproducible experimental protocols, the work would address a recognized limitation of learning-based primal heuristics (inability to guarantee feasibility without post-processing) and could influence the design of hybrid solvers that combine RL policies with traditional ILP methods. The absence of free parameters or invented entities in the core formulation is a positive feature.
major comments (2)
- [§4] §4 (Experiments): The abstract states 100% feasibility, 28.6× primal-gap reduction, and 2.6× primal-integral reduction, yet the manuscript supplies no description of the test-set construction, the distribution of variable types (binary vs. non-binary integers), the number of out-of-distribution instances, or any statistical tests/error bars. Without these details the headline generalization claim cannot be evaluated.
- [§3, §4] §3 (Method) and §4: The claim that the trained RL policy produces feasible solutions independently on unseen problems with non-binary integers is load-bearing for the contribution, but no ablation or result table isolates performance on non-binary instances or reports feasibility rates broken down by variable type.
minor comments (2)
- [§2] Notation for the primal gap and primal integral should be defined explicitly in §2 before being used in the abstract and results.
- [§4] Figure captions and table headers should state the exact benchmark library and instance counts rather than generic phrases such as 'selected ILP instances'.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments below and will make revisions to enhance the clarity and completeness of the experimental section.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The abstract states 100% feasibility, 28.6× primal-gap reduction, and 2.6× primal-integral reduction, yet the manuscript supplies no description of the test-set construction, the distribution of variable types (binary vs. non-binary integers), the number of out-of-distribution instances, or any statistical tests/error bars. Without these details the headline generalization claim cannot be evaluated.
Authors: We agree that providing more details on the experimental setup is important for evaluating the generalization claims. In the revised manuscript, we will expand §4 to include a full description of the test-set construction, including how instances were generated or selected, the distribution of binary versus non-binary integer variables, the number of out-of-distribution instances, and statistical measures such as error bars and any relevant statistical tests. This will allow readers to better assess the reported performance margins. revision: yes
-
Referee: [§3, §4] §3 (Method) and §4: The claim that the trained RL policy produces feasible solutions independently on unseen problems with non-binary integers is load-bearing for the contribution, but no ablation or result table isolates performance on non-binary instances or reports feasibility rates broken down by variable type.
Authors: The core contribution of RL-SPH is its ability to handle non-binary integers independently, as stated in the abstract and method. While the current manuscript reports overall 100% feasibility, we acknowledge the value of an explicit breakdown. We will add an ablation study or table in the revised §4 that reports feasibility rates, primal gaps, and primal integrals separately for instances with only binary variables and those with non-binary integer variables to directly support the claim. revision: yes
Circularity Check
No circularity: empirical performance metrics are independent measurements
full rationale
The paper presents RL-SPH as an RL-trained policy for generating feasible ILP solutions and reports 100% feasibility, 28.6× lower primal gap, and 2.6× lower primal integral as direct empirical comparisons to baselines. These quantities are measured outcomes on (presumably held-out) instances rather than quantities defined by construction from the training data, fitted parameters, or self-citations. No equations, uniqueness theorems, or ansatzes are shown that reduce the reported results to the inputs. The central claims rest on experimental validation and are therefore self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Albert, R. and Barabási, A.-L. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002
work page 2002
-
[2]
M., Louveaux, Q., and Wehenkel, L
Alvarez, A. M., Louveaux, Q., and Wehenkel, L. A machine learning-based approximation of strong branching. INFORMS Journal on Computing, 29(1):185–195, 2017
work page 2017
- [3]
-
[4]
Neural Combinatorial Optimization with Reinforcement Learning
Bello, I., Pham, H., Le, Q. V ., Norouzi, M., and Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[5]
Machine learning for combinatorial optimization: a methodological tour d’horizon
Bengio, Y ., Lodi, A., and Prouvost, A. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290(2):405–421, 2021
work page 2021
-
[6]
Primal heuristics for mixed integer programs
Berthold, T. Primal heuristics for mixed integer programs. PhD thesis, Zuse Institute Berlin (ZIB), 2006
work page 2006
-
[7]
Berthold, T. and Hendel, G. Learning to scale mixed-integer programs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 3661–3668, 2021
work page 2021
-
[8]
Bertsimas, D. and Tsitsiklis, J. N. Introduction to linear optimization , volume 6. Athena Scientific Belmont, MA, 1997
work page 1997
-
[9]
The scip optimization suite 8.0
Bestuzheva, K., Besançon, M., Chen, W.-K., Chmiela, A., Donkiewicz, T., Van Doornmalen, J., Eifler, L., Gaul, O., Gamrath, G., Gleixner, A., et al. The scip optimization suite 8.0. arXiv preprint arXiv:2112.08872, 2021
-
[10]
Cantürk, F., Varol, T., Aydo˘gan, R., and Özener, O. Ö. Scalable primal heuristics using graph neural networks for combinatorial optimization. Journal of Artificial Intelligence Research, 80: 327–376, 2024
work page 2024
-
[11]
Floudas, C. A. Nonlinear and mixed-integer optimization: fundamentals and applications . Oxford University Press, 1995
work page 1995
-
[12]
Exact combinatorial optimization with graph convolutional neural networks
Gasse, M., Chételat, D., Ferroni, N., Charlin, L., and Lodi, A. Exact combinatorial optimization with graph convolutional neural networks. Advances in neural information processing systems, 32, 2019
work page 2019
-
[13]
Gasse, M., Bowly, S., Cappart, Q., Charfreitag, J., Charlin, L., Chételat, D., Chmiela, A., Dumouchelle, J., Gleixner, A., Kazachkov, A. M., et al. The machine learning for combinatorial optimization competition (ml4co): Results and insights. In NeurIPS 2021 competitions and demonstrations track, pp. 220–231. PMLR, 2022
work page 2021
-
[14]
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp. 1263–1272. JMLR.org, 2017
work page 2017
-
[15]
On embeddings for numerical features in tabular deep learning
Gorishniy, Y ., Rubachev, I., and Babenko, A. On embeddings for numerical features in tabular deep learning. Advances in Neural Information Processing Systems, 35:24991–25004, 2022
work page 2022
-
[16]
Guieu, O. and Chinneck, J. W. Analyzing infeasible mixed-integer and integer linear programs. INFORMS Journal on Computing, 11(1):63–77, 1999
work page 1999
-
[17]
Hybrid models for learning to branch
Gupta, P., Gasse, M., Khalil, E., Mudigonda, P., Lodi, A., and Bengio, Y . Hybrid models for learning to branch. Advances in neural information processing systems, 33:18087–18097, 2020
work page 2020
-
[18]
Gurobi optimizer reference manual
Gurobi Optimization, L. Gurobi optimizer reference manual. https://docs.gurobi.com/ projects/optimizer, 2025
work page 2025
-
[19]
A gnn- guided predict-and-search framework for mixed-integer linear programming
Han, Q., Yang, L., Chen, Q., Zhou, X., Zhang, D., Wang, A., Sun, R., and Luo, X. A gnn- guided predict-and-search framework for mixed-integer linear programming. In The Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[20]
Learning to search in branch and bound algorithms
He, H., Daumé, H., and Eisner, J. Learning to search in branch and bound algorithms. Advances in neural information processing systems, 27, 2014
work page 2014
-
[21]
Hillier, F. S. and Lieberman, G. J. Introduction to operations research. McGraw-Hill, 2015. 10
work page 2015
-
[22]
M., Tian, Y ., Dilkina, B., and Steiner, B
Huang, T., Ferber, A. M., Tian, Y ., Dilkina, B., and Steiner, B. Searching large neighborhoods for integer linear programs with contrastive learning. In International conference on machine learning, pp. 13869–13890. PMLR, 2023
work page 2023
-
[23]
M., Zharmagambetov, A., Tian, Y ., and Dilkina, B
Huang, T., Ferber, A. M., Zharmagambetov, A., Tian, Y ., and Dilkina, B. Contrastive predict- and-search for mixed integer linear programs. In Forty-first International Conference on Machine Learning, 2024
work page 2024
-
[24]
Hubbs, C. D., Li, C., Sahinidis, N. V ., Grossmann, I. E., and Wassick, J. M. A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 141:106982, 2020
work page 2020
-
[25]
Hutter, F., Hoos, H. H., and Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. In Proceedings of the 5th international conference on Learning and Intelligent Optimization, pp. 507–523, 2011
work page 2011
-
[26]
A new polynomial-time algorithm for linear programming
Karmarkar, N. A new polynomial-time algorithm for linear programming. In Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp. 302–311, 1984
work page 1984
-
[27]
Learning to branch in mixed integer programming
Khalil, E., Le Bodic, P., Song, L., Nemhauser, G., and Dilkina, B. Learning to branch in mixed integer programming. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016
work page 2016
-
[28]
Kohavi, R. et al. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Kdd, volume 96, pp. 202–207, 1996
work page 1996
-
[29]
Pytorch implementations of reinforcement learning algorithms.https://github
Kostrikov, I. Pytorch implementations of reinforcement learning algorithms.https://github. com/ikostrikov/pytorch-a2c-ppo-acktr-gail , 2018
work page 2018
-
[30]
Kruber, M., Lübbecke, M. E., and Parmentier, A. Learning when to use a decomposition. In International conference on AI and OR techniques in constraint programming for combinatorial optimization problems, pp. 202–210. Springer, 2017
work page 2017
-
[31]
Kweon, O., Kim, B.-I., Lee, G., Im, H., Chung, C. Y ., and Lim, O. K. Parcel delivery network optimization problem considering multiple hubs and consolidation of small-sized parcels. Computers & Industrial Engineering, 191:110113, 2024
work page 2024
-
[32]
Labassi, A. G., Chételat, D., and Lodi, A. Learning to compare nodes in branch and bound with graph neural networks. Advances in neural information processing systems, 35:32000–32010, 2022
work page 2022
-
[33]
Towards a universal test suite for combinatorial auction algorithms
Leyton-Brown, K., Pearson, M., and Shoham, Y . Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the 2nd ACM conference on Electronic commerce, pp. 66–76, 2000
work page 2000
-
[34]
Deeper insights into graph convolutional networks for semi- supervised learning
Li, Q., Han, Z., and Wu, X.-M. Deeper insights into graph convolutional networks for semi- supervised learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32(1), 2018
work page 2018
-
[35]
Lin, T., Wang, Y ., Liu, X., and Qiu, X. A survey of transformers. AI open, 3:111–132, 2022
work page 2022
-
[36]
Liu, H., Wang, J., Geng, Z., Li, X., Zong, Y ., Zhu, F., Hao, J., and Wu, F. Apollo-milp: An alter- nating prediction-correction neural solving framework for mixed-integer linear programming. arXiv preprint arXiv:2503.01129, 2025
-
[37]
Mixed-integer linear optimization via learning-based two-layer large neighborhood search
Liu, W., Wang, A., Yang, W., and Shi, Q. Mixed-integer linear optimization via learning-based two-layer large neighborhood search. arXiv preprint arXiv:2412.08206, 2024
-
[38]
Reinforcement learning for combina- torial optimization: A survey
Mazyavkina, N., Sviridov, S., Ivanov, S., and Burnaev, E. Reinforcement learning for combina- torial optimization: A survey. Computers & Operations Research, 134:105400, 2021
work page 2021
-
[39]
Transformer for graphs: An overview from architecture perspective
Min, E., Chen, R., Bian, Y ., Xu, T., Zhao, K., Huang, W., Zhao, P., Huang, J., Ananiadou, S., and Rong, Y . Transformer for graphs: An overview from architecture perspective. arXiv preprint arXiv:2202.08455, 2022
-
[40]
P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K
Mnih, V ., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Balcan, M. F. and Weinberger, K. Q. (eds.),Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp. 1928–1937, New York, ...
work page 1928
-
[41]
Solving mixed integer programs using neural networks
Nair, V ., Bartunov, S., Gimeno, F., V on Glehn, I., Lichocki, P., Lobov, I., O’Donoghue, B., Sonnerat, N., Tjandraatmadja, C., Wang, P., et al. Solving mixed integer programs using neural networks. arXiv preprint arXiv:2012.13349, 2020
-
[42]
Pace, R. K. and Barry, R. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3): 291–297, 1997
work page 1997
-
[43]
Papageorgiou, D. J., Nemhauser, G. L., Sokol, J., Cheon, M.-S., and Keha, A. B. Mirplib–a library of maritime inventory routing problem instances: Survey, core model, and benchmark results. European Journal of Operational Research, 235(2):350–366, 2014
work page 2014
-
[44]
Paulus, M. and Krause, A. Learning to dive in branch and bound. Advances in Neural Information Processing Systems, 36:34260–34277, 2023
work page 2023
-
[45]
B., Zarpellon, G., Krause, A., Charlin, L., and Maddison, C
Paulus, M. B., Zarpellon, G., Krause, A., Charlin, L., and Maddison, C. Learning to cut by looking ahead: Cutting plane selection via imitation learning. In International conference on machine learning, pp. 17584–17600. PMLR, 2022
work page 2022
-
[46]
Smart feasibility pump: Reinforcement learning for (mixed) integer programming
Qi, M., Wang, M., and Shen, Z.-J. Smart feasibility pump: Reinforcement learning for (mixed) integer programming. Reinforcement Learning for Real Life Workshop, International Conference on Machine Learning (ICML), 2021
work page 2021
-
[47]
Ren, H. and Gao, W. A milp model for integrated plan and evaluation of distributed energy systems. Applied energy, 87(3):1001–1014, 2010
work page 2010
-
[48]
Self-supervised graph transformer on large-scale molecular data
Rong, Y ., Bian, Y ., Xu, T., Xie, W., Wei, Y ., Huang, W., and Huang, J. Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, 33:12559–12571, 2020
work page 2020
-
[49]
Using constraint programming and local search methods to solve vehicle routing problems
Shaw, P. Using constraint programming and local search methods to solve vehicle routing problems. In International conference on principles and practice of constraint programming, pp. 417–431. Springer, 1998
work page 1998
-
[50]
Learning primal heuristics for mixed integer programs
Shen, Y ., Sun, Y ., Eberhard, A., and Li, X. Learning primal heuristics for mixed integer programs. In 2021 international joint conference on neural networks (ijcnn), pp. 1–8. IEEE, 2021
work page 2021
-
[51]
Shoja, S. and Axehill, D. Exact complexity certification of start heuristics in branch-and-bound methods for mixed-integer linear programming. In 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 2292–2299. IEEE, 2023
work page 2023
-
[52]
Son, J., Kim, M., Kim, H., and Park, J. Meta-sage: Scale meta-learning scheduled adaptation with guided exploration for mitigating scale shift on combinatorial optimization. InInternational Conference on Machine Learning, pp. 32194–32210. PMLR, 2023
work page 2023
-
[53]
A general large neighborhood search framework for solving integer linear programs
Song, J., Yue, Y ., Dilkina, B., et al. A general large neighborhood search framework for solving integer linear programs. Advances in Neural Information Processing Systems, 33:20012–20023, 2020
work page 2020
-
[54]
Learning a large neighborhood search algorithm for mixed integer programs
Sonnerat, N., Wang, P., Ktena, I., Bartunov, S., and Nair, V . Learning a large neighborhood search algorithm for mixed integer programs. arXiv preprint arXiv:2107.10201, 2021
-
[55]
Reinforcement learning for integer programming: Learning to cut
Tang, Y ., Agrawal, S., and Faenza, Y . Reinforcement learning for integer programming: Learning to cut. In International conference on machine learning , pp. 9367–9376. PMLR, 2020
work page 2020
-
[56]
Tomlin, J. A. On scaling linear programming problems. Springer, 1975
work page 1975
- [57]
-
[58]
Vanderbei, R. J. Linear programming: foundations and extensions. Journal of the Operational Research Society, 49(1):94–94, 1998
work page 1998
-
[59]
Vaswani, A. Attention is all you need. Advances in Neural Information Processing Systems, 2017
work page 2017
-
[60]
Wu, X., Chen, Z., Wang, W. W., and Jadbabaie, A. A non-asymptotic analysis of oversmoothing in graph neural networks. In The Eleventh International Conference on Learning Representa- tions, 2023
work page 2023
-
[61]
Learning large neighborhood search policy for integer programming
Wu, Y ., Song, W., Cao, Z., and Zhang, J. Learning large neighborhood search policy for integer programming. Advances in Neural Information Processing Systems, 34:30075–30087, 2021. 12
work page 2021
-
[62]
Wu, Z., Jain, P., Wright, M., Mirhoseini, A., Gonzalez, J. E., and Stoica, I. Representing long- range context for graph neural networks with global attention. Advances in Neural Information Processing Systems, 34:13266–13279, 2021
work page 2021
-
[63]
Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., Shen, Y ., and Liu, T.-Y . Do transformers really perform badly for graph representation? Advances in neural information processing systems, 34:28877–28888, 2021
work page 2021
-
[64]
Confidence threshold neural diving
Yoon, T. Confidence threshold neural diving. NeurIPS 2021 ML4CO competition track, 2022
work page 2021
-
[65]
Graph-bert: Only attention is needed for learning graph representations
Zhang, J., Zhang, H., Xia, C., and Sun, L. Graph-bert: Only attention is needed for learning graph representations. arXiv preprint arXiv:2001.05140, 2020
-
[66]
Mapdp: Cooperative multi-agent reinforcement learning to solve pickup and delivery problems
Zong, Z., Zheng, M., Li, Y ., and Jin, D. Mapdp: Cooperative multi-agent reinforcement learning to solve pickup and delivery problems. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pp. 9980–9988, 2022
work page 2022
-
[67]
Zuo, Y ., Tharmarasa, R., Jassemi-Zargani, R., Kashyap, N., Thiyagalingam, J., and Kirubarajan, T. T. Milp formulation for aircraft path planning in persistent surveillance. IEEE Transactions on Aerospace and Electronic Systems, 56(5):3796–3811, 2020. 13 A Proof of Proposition 1 In this section, we prove Proposition 1: Suppose xt /∈ F , Rt,const > 0, and ...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.