Recognition: unknown
Machine Learning for Two-Stage Graph Sparsification for the Travelling Salesman Problem
Pith reviewed 2026-05-10 01:09 UTC · model grok-4.3
The pith
A two-stage machine-learning approach sparsifies TSP candidate graphs while retaining optimal-tour edges across distance types and sizes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a two-stage process—union of α-Nearest and POPMUSIC to maximize recall of optimal edges, followed by a trained model that classifies and removes unnecessary edges—substantially reduces candidate-graph density while preserving high coverage, generalizes across distance types and distributions, outperforms recent neural methods restricted to Euclidean distances, and becomes increasingly valuable at larger scales where single-stage heuristics degrade.
What carries the argument
The two-stage sparsification pipeline: first the union graph from α-Nearest and POPMUSIC heuristics for high recall, then a machine-learning model trained to prune edges not belonging to any optimal tour.
If this is right
- Candidate graphs become substantially sparser, allowing TSP solvers to search faster without loss of solution quality.
- The approach maintains high coverage of optimal-tour edges across four distance types and five spatial distributions.
- It outperforms recent neural sparsification methods that operate only on Euclidean instances.
- Performance gains relative to single-stage heuristics increase as problem size grows from 50 to 500 cities.
Where Pith is reading between the lines
- The same union-then-prune pattern could be tested on other combinatorial problems that rely on candidate graphs for local search.
- Checking the model's edge-removal decisions against known optimal tours on synthetic non-TSPLIB distributions would reveal any hidden distribution-specific bias.
- At scales beyond 500 cities the density reduction might enable exact or near-exact solving of instances that currently require heavy approximation.
Load-bearing premise
The machine-learning model trained on the union graph can reliably identify and remove edges that are never part of any optimal tour without introducing bias that harms coverage on unseen distributions or larger sizes.
What would settle it
An experiment on a large TSPLIB instance with non-Euclidean distances where the two-stage method's retained optimal edges fall below those kept by a single-stage heuristic, causing the solver to miss the known optimum.
Figures
read the original abstract
High-performance TSP solvers like LKH search within a sparsified candidate graph rather than over all possible edges. Graph sparsification is non-trivial: keep too many edges and the solver wastes time; cut too many and it loses edges that belong to the optimal tour. The two leading heuristic methods, $\alpha$-Nearest and POPMUSIC, produce high-quality candidate graphs, but no single heuristic is both sparse and reliable across all instance sizes and distributions. Machine learning methods can potentially learn better sparsification models. However, existing approaches operate on the complete graph, which is expensive and mostly restricted to Euclidean distances. To address this issue, we propose a two-stage graph sparsification approach: Stage~1 takes the union of $\alpha$-Nearest and POPMUSIC to maximise recall; Stage~2 trains a single model to reduce density. We conducted experiments across four TSPLIB distance types, five spatial distributions, and problem sizes from 50 to 500. The two-stage approach substantially reduces candidate-graph density while retaining high coverage, generalises across distance types and distributions, outperforms recent neural sparsification methods that are restricted to Euclidean distances, and becomes increasingly valuable at larger scales where single-stage heuristics degrade.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-stage graph sparsification method for the Travelling Salesman Problem. Stage 1 constructs the union of α-Nearest and POPMUSIC candidate graphs to maximize recall of optimal-tour edges. Stage 2 trains a single machine-learning classifier on this union to prune non-optimal edges and reduce density. Experiments are reported across four TSPLIB distance types, five spatial distributions, and instance sizes 50–500; the authors claim the method retains high coverage, generalizes beyond Euclidean distances, outperforms prior neural sparsifiers, and grows more valuable at larger scales.
Significance. If the generalization and scaling claims are substantiated, the work could improve practical TSP solvers such as LKH by supplying sparser yet reliable candidate graphs across diverse instance families. The two-stage design usefully combines existing heuristics with learned pruning and avoids the computational cost of operating on the complete graph. The breadth of tested distance types and distributions is a positive feature, but the absence of quantitative tables, error bars, and scaling results beyond n = 500 limits the strength of the central empirical claims.
major comments (3)
- [Abstract] Abstract: the assertion that the method 'becomes increasingly valuable at larger scales where single-stage heuristics degrade' is unsupported by the reported experiments, which are limited to n ≤ 500; no results, ablations, or analysis for n > 500 are provided to justify the extrapolation.
- [Experimental section] Experimental section: the manuscript provides no description of the train/validation/test split strategy, the precise feature set fed to the stage-2 classifier, or any ablation that isolates the contribution of the learned pruner from the union graph itself. Without these details the generalization claim across distributions cannot be rigorously assessed.
- [Results] Results: the abstract states that the two-stage approach 'substantially reduces candidate-graph density while retaining high coverage' yet no quantitative tables, coverage percentages, density ratios, error bars, or statistical tests are referenced; this absence prevents evaluation of the magnitude and reliability of the reported gains.
minor comments (1)
- [Abstract] The abstract uses the phrase 'union graph (α-nearest + POPMUSIC)' without immediately defining the concrete values of α or the POPMUSIC parameters employed; a short parenthetical clarification would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will make the indicated revisions to improve clarity, rigor, and support for the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that the method 'becomes increasingly valuable at larger scales where single-stage heuristics degrade' is unsupported by the reported experiments, which are limited to n ≤ 500; no results, ablations, or analysis for n > 500 are provided to justify the extrapolation.
Authors: We agree that no experiments for n > 500 are presented and that the extrapolation in the abstract is not directly supported by new data. The statement was motivated by the observed trend within n=50–500 that single-stage heuristics produce progressively denser candidate graphs while the two-stage pruner maintains coverage. To avoid overclaiming, we will revise the abstract to remove the phrase about larger scales and instead note that the approach 'maintains high coverage while reducing density across tested instance sizes up to 500.' We will also add a short paragraph in the discussion analyzing the scaling trend visible in the existing results. revision: partial
-
Referee: [Experimental section] Experimental section: the manuscript provides no description of the train/validation/test split strategy, the precise feature set fed to the stage-2 classifier, or any ablation that isolates the contribution of the learned pruner from the union graph itself. Without these details the generalization claim across distributions cannot be rigorously assessed.
Authors: These details were inadvertently omitted from the submitted manuscript. The split was performed instance-wise (70/15/15 train/validation/test) with separate generation per distribution and distance type to prevent leakage. The stage-2 classifier uses a feature vector consisting of normalized edge length, rank within α-nearest and POPMUSIC neighborhoods, local edge density, and global instance statistics. We will insert a new subsection detailing the split procedure, the complete feature list with definitions, and an ablation comparing the union graph alone versus the full two-stage pipeline. This will directly support the generalization claims. revision: yes
-
Referee: [Results] Results: the abstract states that the two-stage approach 'substantially reduces candidate-graph density while retaining high coverage' yet no quantitative tables, coverage percentages, density ratios, error bars, or statistical tests are referenced; this absence prevents evaluation of the magnitude and reliability of the reported gains.
Authors: We acknowledge the results section relies on figures without accompanying tables or statistical summaries. We will add a table reporting mean coverage percentages and density ratios (with standard deviations across 10 random seeds) for each distance type and spatial distribution. We will also include Wilcoxon signed-rank tests comparing the two-stage method against the union baseline and prior neural sparsifiers. These quantitative additions will allow direct assessment of the magnitude and reliability of the improvements. revision: yes
Circularity Check
No circularity: empirical ML evaluation on held-out TSP instances
full rationale
The paper proposes a two-stage heuristic+ML pipeline for TSP candidate-graph construction and evaluates it via standard train/test splits on TSPLIB and synthetic instances. No mathematical derivation, uniqueness theorem, or ansatz is invoked whose validity reduces to the fitted model outputs or to self-citations. Stage-1 is the explicit union of two published heuristics; Stage-2 is a supervised classifier whose training objective (edge classification) is distinct from the reported coverage/density metrics. All performance numbers are obtained by applying the trained model to unseen instances, satisfying the self-contained benchmark criterion.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Artificial Intelligence Review 58(9), 267 (2025).https://doi.org/10.1007/s10462-025-11267-x
Alanzi, E., Menai, M.E.B.: Solving the traveling salesman problem with machine learning: a review of recent advances and challenges. Artificial Intelligence Review 58(9), 267 (2025).https://doi.org/10.1007/s10462-025-11267-x
-
[2]
Applegate, D., Bixby, R., Chvatal, V., Cook, W.: Concorde TSP solver (2006)
2006
-
[3]
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural Combinatorial Optimization with Reinforcement Learning (2017), arXiv:1611.09940 [cs]
work page Pith review arXiv 2017
-
[4]
Mathematical Programming Computation16(4), 599–628 (2024).https://doi.org/10.1007/s12532-024-00262-y
Cook, W., Helsgaun, K., Hougardy, S., Schroeder, R.T.: Local elimination in the traveling salesman problem. Mathematical Programming Computation16(4), 599–628 (2024).https://doi.org/10.1007/s12532-024-00262-y
-
[5]
Information Processing Letters 24(5), 339–342 (1987).https://doi.org/10.1016/0020-0190(87)90160-8
Dillencourt, M.B.: Traveling salesman cycles are not always subgraphs of Delaunay triangulations or of minimum weight triangulations. Information Processing Letters 24(5), 339–342 (1987).https://doi.org/10.1016/0020-0190(87)90160-8
-
[6]
Computing in Geometry and Topology3(1), 1:1–1:23 (2024)
Fekete, S., Keldenich, P., Krupke, D., Niehs, E.: Edge Sparsification for Geometric Tour Problems. Computing in Geometry and Topology3(1), 1:1–1:23 (2024). https://doi.org/10.57717/cgt.v3i1.20
-
[7]
https://doi.org/10.48550/arXiv.2104.09345, arXiv:2104.09345 [cs]
Fitzpatrick, J., Ajwani, D., Carroll, P.: Learning to Sparsify Travelling Sales- man Problem Instances (2021). https://doi.org/10.48550/arXiv.2104.09345, arXiv:2104.09345 [cs]
-
[8]
Fu, Z.H., Qiu, K.B., Zha, H.: Generalize a Small Pre-trained Model to Arbi- trarily Large TSP Instances. Proceedings of the AAAI Conference on Artificial Intelligence35(8), 7474–7482 (May 2021).https://doi.org/10.1609/aaai.v35i8. 16916, https://ojs.aaai.org/index.php/AAAI/article/view/16916, number: 8
-
[9]
In: Advances in Neural Information Processing Systems
Gasse, M., Chetelat, D., Ferroni, N., Charlin, L., Lodi, A.: Exact Combinatorial Optimization with Graph Convolutional Neural Networks. In: Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019)
2019
-
[10]
In: Proceed- ings of the 16th ACM/SIGEVO Conference on Foundations of Genetic Algorithms
Heins, J., Bossek, J., Pohl, J., Seiler, M., Trautmann, H., Kerschke, P.: On the potential of normalized TSP features for automated algorithm selection. In: Proceed- ings of the 16th ACM/SIGEVO Conference on Foundations of Genetic Algorithms. pp. 1–15. FOGA ’21, Association for Computing Machinery, New York, NY, USA (2021).https://doi.org/10.1145/3450218.3477308
-
[11]
Heins, J., Bossek, J., Pohl, J., Seiler, M., Trautmann, H., Kerschke, P.: A study on theeffectsofnormalizedTSPfeaturesforautomatedalgorithmselection.Theoretical Computer Science940, 123–145 (2023).https://doi.org/10.1016/j.tcs.2022. 10.019
-
[12]
(eds.) Parallel Problem Solving from Nature – PPSN XVIII
Heins, J., Schäpermeier, L., Kerschke, P., Whitley, D.: Dancing to the State of the Art? In: Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tušar, T., Machado, P., Bäck, T. (eds.) Parallel Problem Solving from Nature – PPSN XVIII. pp. 100–115. Springer Nature Switzerland, Cham (2024).https: //doi.org/10.1007/978-3-031-70055-2_7
-
[13]
European Journal of Operational Research126(1), 106–130 (2000)
Helsgaun, K.: An effective implementation of the Lin–Kernighan traveling salesman heuristic. European Journal of Operational Research126(1), 106–130 (2000)
2000
-
[14]
In: Kratsch, D., Todinca, I
Hougardy, S., Schroeder, R.T.: Edge Elimination in TSP Instances. In: Kratsch, D., Todinca, I. (eds.) Graph-Theoretic Concepts in Computer Science. pp. 275–
-
[15]
Springer International Publishing, Cham (2014).https://doi.org/10.1007/ 978-3-319-12340-0_23
2014
- [16]
-
[17]
LIPIcs, Volume 210, CP 2021210, 33:1–33:21 (2021)
Joshi, C.K., Cappart, Q., Rousseau, L.M., Laurent, T.: Learning TSP Requires Rethinking Generalization. LIPIcs, Volume 210, CP 2021210, 33:1–33:21 (2021). https://doi.org/10.4230/LIPICS.CP.2021.33
-
[18]
48550/arXiv.1906.01227, http://arxiv.org/abs/1906.01227, arXiv:1906.01227 [cs, stat]
Joshi, C.K., Laurent, T., Bresson, X.: An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem (Oct 2019).https://doi.org/10. 48550/arXiv.1906.01227, http://arxiv.org/abs/1906.01227, arXiv:1906.01227 [cs, stat]
-
[19]
In: Proceedings of the Genetic and Evolutionary Computation Conference Companion
Kerschke, P., Bossek, J., Trautmann, H.: Parameterization of state-of-the-art perfor- mance indicators: a robustness study based on inexact TSP solvers. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. pp. 1737–
-
[20]
GECCO ’18, Association for Computing Machinery, New York, NY, USA (2018).https://doi.org/10.1145/3205651.3208233
-
[21]
Evolutionary Computation 26(4), 597–620 (2018).https://doi.org/10.1162/evco_a_00215
Kerschke, P., Kotthoff, L., Bossek, J., Hoos, H.H., Trautmann, H.: Leveraging TSP Solver Complementarity through Machine Learning. Evolutionary Computation 26(4), 597–620 (2018).https://doi.org/10.1162/evco_a_00215
-
[22]
Kool, W., Hoof, H.v., Welling, M.: Attention, Learn to Solve Routing Problems! (2019).https://doi.org/10.48550/arXiv.1803.08475, arXiv:1803.08475
-
[23]
In: Dhaenens, C., Jourdan, L., Marmion, M.E
Kotthoff, L., Kerschke, P., Hoos, H., Trautmann, H.: Improving the State of the Art in Inexact TSP Solving Using Per-Instance Algorithm Selection. In: Dhaenens, C., Jourdan, L., Marmion, M.E. (eds.) Learning and Intelligent Optimization. pp. 202–217. Springer International Publishing, Cham (2015).https://doi.org/10. 1007/978-3-319-19084-6_18
2015
-
[24]
Krasnogor, N., Plata, L., Moscato, P., Norman, M.: A New Hybrid Heuristic For Large Geometric Traveling Salesman Problems Based On The Delaunay Triangula- tion (1999)
1999
-
[25]
Sparse Dynamic Discretization Discovery via Arc- Dependent Time Discretizations.Comput
Letchford, A.N., Pearson, N.A.: Good triangulations yield good tours. Computers & Operations Research35(2), 638–647 (2008).https://doi.org/10.1016/j.cor. 2006.03.025
-
[26]
Lischka, A., Wu, J., Basso, R., Chehreghani, M.H., Kulcsar, B.: Travelling Salesman Problem Goes Sparse With Graph Neural Networks (2023)
2023
-
[27]
Lischka, A., Wu, J., Basso, R., Chehreghani, M.H., Kulcsár, B.: Less Is More – On the Importance of Sparsification for Transformers and Graph Neural Networks for TSP (2024).https://doi.org/10.48550/ARXIV.2403.17159
-
[28]
Proceedings of the 7th Internatinal Conferencenon Genetic Algorithms pp
NAGATA, Y.: Edge Assembly Crossover:A High-power Genetic Algorithm for the Traveling Salesman Problem. Proceedings of the 7th Internatinal Conferencenon Genetic Algorithms pp. 450–457 (1997)
1997
-
[29]
Qiu, R., Sun, Z., Yang, Y.: DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems. Advances in Neural Information Processing Systems35, 25531–25546 (Dec 2022),https://proceedings.neurips.cc/paper_files/paper/ 2022/hash/a3a7387e49f4de290c23beea2dfcdc75-Abstract-Conference.html
2022
-
[30]
In: Dorigo, M., Di Caro, G., Sampels, M
Randall, M., Montgomery, J.: Candidate Set Strategies for Ant Colony Optimisation. In: Dorigo, M., Di Caro, G., Sampels, M. (eds.) Ant Algorithms. pp. 243–249. Springer, Berlin, Heidelberg (2002).https://doi.org/10.1007/3-540-45724-0_ 22
-
[31]
ORSA Journal on Computing (1991).https://doi.org/10.1287/ijoc.3.4.376
Reinelt, G.: TSPLIB–A Traveling Salesman Problem Library. ORSA Journal on Computing (1991).https://doi.org/10.1287/ijoc.3.4.376
-
[32]
Sun, Y., Ernst, A., Li, X., Weiner, J.: Generalization of machine learning for problem reduction: a case study on travelling salesman problems. OR Spectrum43(3) (2021). https://doi.org/10.1007/s00291-020-00604-x ML for Two-Stage TSP Graph Sparsification 17
-
[33]
Sun, Y., Li, X., Ernst, A.: Using Statistical Measures and Machine Learning for Graph Reduction to Solve Maximum Weight Clique Problems. IEEE Transactions on Pattern Analysis and Machine Intelligence43(5), 1746–1760 (2021).https: //doi.org/10.1109/TPAMI.2019.2954827
-
[34]
Sun, Z., Yang, Y.: DIFUSCO: Graph-based Diffusion Solvers for Combina- torial Optimization. Advances in Neural Information Processing Systems36 (Feb 2024), https://proceedings.neurips.cc/paper_files/paper/2023/hash/ 0ba520d93c3df592c83a611961314c98-Abstract-Conference.html
2024
-
[35]
European Journal of Operational Research272(2), 420–429 (2019).https://doi
Taillard, É.D., Helsgaun, K.: POPMUSIC for the travelling salesman problem. European Journal of Operational Research272(2), 420–429 (2019).https://doi. org/10.1016/j.ejor.2018.06.039
-
[36]
Taillard, É.D., Voss, S.: Popmusic – Partial Optimization Metaheuristic under Spe- cial Intensification Conditions.https://doi.org/10.1007/978-1-4615-1507-4_ 27
-
[37]
In: 2024 IEEE International Conference on Progress in Informatics and Computing (PIC)
Tuononen, J., Fränti, P.: Modified Greedy Delaunay Graph-Based Method for TSP Initialization. In: 2024 IEEE International Conference on Progress in Informatics and Computing (PIC). pp. 177–183. IEEE, Shanghai, China (2024).https://doi. org/10.1109/PIC62406.2024.10892746
-
[38]
Vo, T.Q.T., Baiou, M., Nguyen, V.H., Weng, P.: Improving Subtour Elimination Constraint Generation in Branch-and-Cut Algorithms for the TSP with Machine Learning. In: Sellmann, M., Tierney, K. (eds.) Learning and Intelligent Optimization. pp. 537–551. Springer International Publishing, Cham (2023).https://doi.org/ 10.1007/978-3-031-44505-7_36
-
[39]
Wang, L., Zheng, J., Xiong, Z., Li, C., He, K.: Bandit based Dynamic Candidate Edge Selection in Solving Traveling Salesman Problems (2025).https://doi.org/ 10.48550/arXiv.2505.15862, arXiv:2505.15862 [cs]
-
[40]
Expert Systems with Applications42(12), 5150–5162 (2015)
Wang, Y.: An approximate method to compute a sparse graph for traveling salesman problem. Expert Systems with Applications42(12), 5150–5162 (2015). https: //doi.org/10.1016/j.eswa.2015.02.037
-
[41]
In: Advances in Neural Information Processing Systems
Xin, L., Song, W., Cao, Z., Zhang, J.: NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem. In: Advances in Neural Information Processing Systems. vol. 34, pp. 7472–7483. Curran Associates, Inc. (2021)
2021
-
[42]
URLhttps://doi.org/10.1016/j.knosys.2022.110144
Zheng, J., He, K., Zhou, J., Jin, Y., Li, C.M.: Reinforced Lin–Kernighan–Helsgaun algorithms for the traveling salesman problems. Knowledge-Based Systems260, 110144 (2023).https://doi.org/10.1016/j.knosys.2022.110144
-
[43]
https://doi.org/10.48550/arXiv.1809.10469, arXiv:1809.10469 [cs]
Zhong, X.: Probabilistic Analysis of Edge Elimination for Euclidean TSP (2024). https://doi.org/10.48550/arXiv.1809.10469, arXiv:1809.10469 [cs]
-
[44]
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
Zhou, C., Lin, X., Wang, Z., Zhang, Q.: Learning to Reduce Search Space for Generalizable Neural Routing Solver (2025).https://doi.org/10.48550/arXiv. 2503.03137, arXiv:2503.03137 [cs]
work page internal anchor Pith review doi:10.48550/arxiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.