Empowering Targeted Neighborhood Search via Hyper Tour for Large-Scale TSP

Chongyang Tao; Shuai Ma; Tongkai Lu

arxiv: 2510.20169 · v3 · submitted 2025-10-23 · 💻 cs.LG

Empowering Targeted Neighborhood Search via Hyper Tour for Large-Scale TSP

Tongkai Lu , Shuai Ma , Chongyang Tao This is my paper

Pith reviewed 2026-05-18 04:14 UTC · model grok-4.3

classification 💻 cs.LG

keywords large-scale TSPneural combinatorial optimizationhyper tourneighborhood searchclusteringtraveling salesman problemsparse heatmap

0 comments

The pith

A hyper tour on clustered supernodes guides targeted neighborhood search to solve large-scale TSP closer to optimality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Hyper Tour Guided Neighborhood Search to handle large Traveling Salesman Problems that overwhelm existing neural methods. It first clusters cities via a sparse heatmap graph, treats clusters as supernodes, and builds a hyper tour connecting those supernodes. The hyper tour then directs initial solution construction and limits later local search to only the edges it deems relevant. This shrinks the enormous search space while still capturing essential global connections. A reader would care because memory limits and weak global guidance have kept neural solvers from scaling beyond modest sizes, and this hierarchical guide offers a concrete way to close that gap.

Core claim

The central claim is that abstracting a TSP instance into clusters via a sparse heatmap graph, forming supernodes, and generating a hyper tour among them lets the method both initialize good starting tours and restrict neighborhood search to hyper-tour-relevant edges. By focusing optimization on this reduced set of connections the approach navigates large search spaces efficiently and produces solutions with smaller gaps to optimality than prior neural combinatorial methods, especially on bigger instances.

What carries the argument

The hyper tour, a cycle on supernodes derived from clustering the sparse heatmap graph, that supplies global guidance for initialization and restricts subsequent neighborhood search to its relevant edges.

If this is right

Memory usage drops because full global heatmaps or access matrices are no longer required.
The search space shrinks by focusing only on edges tied to the hyper tour.
Global guidance improves for navigating the vast space of large instances.
Solution quality rises and the gap to optimality shrinks on both synthetic and real-world large-scale cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hierarchical abstractions such as hyper tours may help other flat neural solvers overcome scaling barriers in combinatorial tasks.
The clustering-first strategy aligns with classical operations-research decompositions and could transfer to vehicle-routing variants.
If the hyper tour reliably encodes long-range structure, hybrid systems that feed its edges into traditional local-search heuristics might yield further gains.

Load-bearing premise

The sparse heatmap and the hyper tour built from it must retain enough information about the true optimal connections so that limiting the search to hyper-tour edges does not discard better routes.

What would settle it

On a large instance whose optimal tour contains many inter-cluster edges absent from the generated hyper tour, the method would produce no further improvement or a larger optimality gap than an unrestricted baseline search.

Figures

Figures reproduced from arXiv: 2510.20169 by Chongyang Tao, Shuai Ma, Tongkai Lu.

**Figure 1.** Figure 1: Overview of our HyperNS framework. Blue edges indicate normal connections, brown edges denote worth-deletion edges that link supernodes. The red edge marks the currently selected worth-deletion edge and its neighboring edges are shown in green. The area enclosed by the red dashed box in the upper-right corner illustrates the targeted neighborhood, which comprises the selected worth-deletion edge and its ne… view at source ↗

**Figure 2.** Figure 2: Illustration of targeted neighborhood search procedure. (a) A TSP tour. (b) Select edge (A, C) along with A and C’s 3-nearest neighbors: {T, G, F, B, H, N} and identify all edges involving them as Edel. (c) Delete all edges in Edel. (d) Retain only vertices {U, M, V, D} while adding two new edges, (U, M) and (V, D) to form a reduced subproblem with no more than 12 vertices. (e) Solve the subproblem via LK … view at source ↗

**Figure 3.** Figure 3: Parameter analysis. In each subplot, the left and right y-axes show tour length and total running time, respectively [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Traveling Salesman Problem (TSP) is a classic NP-hard problem that has garnered significant attention from both academia and industry. While neural-based methods have shown promise for solving TSPs, they still face challenges in scaling to larger instances, particularly in memory constraints associated with global heatmaps, edge weights, or access matrices, as well as in generating high-quality initial solutions and insufficient global guidance for efficiently navigating vast search spaces. To address these challenges, we propose a Hyper Tour Guided Neighborhood Search (HyperNS) method for large-scale TSP instances. Inspired by the ``clustering first, route second" strategy, our approach initially divides the TSP instance into clusters using a sparse heatmap graph and abstracts them as supernodes, followed by the generation of a hyper tour to guide both the initialization and optimization processes. This method reduces the search space by focusing on edges relevant to the hyper tour, leading to more efficient and effective optimization. Experimental results on both synthetic and real-world datasets demonstrate that our approach outperforms existing neural-based methods, particularly in handling larger-scale instances, offering a significant reduction in the gap to the optimal solution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The hyper-tour restriction idea for large TSP is a sensible attempt to cut memory and guide search, but the abstract gives no numbers or checks so the gains remain unproven.

read the letter

The paper introduces a pipeline that clusters points via a sparse heatmap, abstracts clusters as supernodes, builds a hyper tour on them, and then restricts neighborhood search to edges tied to that hyper tour. This is positioned as a way to handle memory limits and add global guidance when scaling neural TSP solvers past current sizes. The combination looks new relative to the neural TSP papers it cites, even if clustering-first and guided local search are older ideas separately. It does a clean job naming the practical bottlenecks: global heatmaps that blow up memory, weak initial solutions, and search spaces that are too big to navigate efficiently. The description of shrinking the edge set while trying to keep the hyper tour relevant is straightforward and easy to follow. The soft spots are mostly about missing evidence. The abstract claims outperformance on synthetic and real data plus a reduced gap to optimal, yet supplies no numbers, no baselines, no error bars, and no description of how the hyper tour is built or how the restriction is enforced in code. Without those details it is impossible to tell whether the reported gains are real or how large they are. The bigger technical worry is whether the clustering step and resulting hyper tour actually keep the long-range edges that the true optimum needs. If the sparse heatmap cuts across optimal connections, the restricted search can only find solutions inside an incomplete subspace, and the paper gives no argument or diagnostic to show this does not happen at scale. This work is aimed at people building learning-based solvers for routing and similar combinatorial tasks who already know the neural TSP literature and want concrete scaling tricks. A reader focused on industrial-size instances would get value from the high-level approach even before seeing the experiments. It deserves a serious referee because the claims are testable and the method is specific enough to evaluate once the full results and implementation details are in front of someone. I would send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Hyper Tour Guided Neighborhood Search (HyperNS) for large-scale TSP. It first clusters the instance into supernodes via a sparse heatmap graph, abstracts them to generate a hyper tour, and then uses the hyper tour both to initialize solutions and to restrict neighborhood search to hyper-tour-relevant edges. The central claim is that this reduces the search space while still yielding solutions closer to optimality than prior neural methods, with supporting experiments on synthetic and real-world datasets.

Significance. If the empirical claims hold after addressing the structural-preservation issue, the work would offer a concrete mechanism for scaling neural TSP solvers beyond current memory and search-space limits by combining clustering with guided local search. It directly targets the gap between global heatmap methods and purely local heuristics.

major comments (2)

[Method (hyper-tour construction and search restriction)] The central claim that restricting neighborhood search to hyper-tour-relevant edges still reaches near-optimal solutions rests on the unverified assumption that sparse-heatmap clustering and the resulting hyper tour preserve the long-range edges needed for the global optimum. No coverage analysis, edge-inclusion statistics, or ablation on cluster-boundary misalignment is provided to substantiate this (see skeptic note on global-structure preservation).
[Abstract and Experiments section] The abstract and experimental claims assert outperformance and gap reduction on large instances, yet the manuscript supplies no quantitative tables, baseline descriptions, instance sizes, or statistical details in the visible summary. Without these, the reader cannot assess whether the reported gains are load-bearing or merely presentation artifacts.

minor comments (2)

[Method] Define precisely how the sparse heatmap is constructed (threshold, model, training data) and how supernodes are formed from clusters.
[Neighborhood Search] Clarify the exact rule for selecting 'hyper-tour-relevant edges' during neighborhood search and whether this rule is parameter-free.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped clarify key aspects of our presentation. We address each major comment below and have revised the manuscript to strengthen the supporting evidence and clarity of results.

read point-by-point responses

Referee: [Method (hyper-tour construction and search restriction)] The central claim that restricting neighborhood search to hyper-tour-relevant edges still reaches near-optimal solutions rests on the unverified assumption that sparse-heatmap clustering and the resulting hyper tour preserve the long-range edges needed for the global optimum. No coverage analysis, edge-inclusion statistics, or ablation on cluster-boundary misalignment is provided to substantiate this (see skeptic note on global-structure preservation).

Authors: We agree that explicit verification of structural preservation strengthens the central claim. While the original experiments demonstrate that HyperNS yields lower optimality gaps than prior neural methods on large instances (suggesting retention of necessary global connections via the sparse heatmap), we did not provide dedicated coverage or ablation analysis in the initial submission. In the revised manuscript we have added a new subsection with edge-inclusion statistics, coverage rates of optimal-tour edges within the hyper-tour neighborhood, and an ablation examining performance under cluster-boundary misalignment. These additions directly substantiate that critical long-range edges are retained. revision: yes
Referee: [Abstract and Experiments section] The abstract and experimental claims assert outperformance and gap reduction on large instances, yet the manuscript supplies no quantitative tables, baseline descriptions, instance sizes, or statistical details in the visible summary. Without these, the reader cannot assess whether the reported gains are load-bearing or merely presentation artifacts.

Authors: The full manuscript contains detailed quantitative tables, baseline descriptions (including neural methods such as AM and POMO as well as classical heuristics), instance sizes up to 20,000 cities, and statistical measures (means and standard deviations over repeated runs) in the Experiments section. The abstract follows the conventional high-level format. To improve accessibility we have revised the abstract to incorporate key quantitative highlights and added explicit cross-references to the tables and figures. revision: partial

Circularity Check

0 steps flagged

No circularity: method and claims rest on independent experimental evaluation

full rationale

The described HyperNS approach—clustering via sparse heatmap graph, abstraction to supernodes, hyper-tour construction to restrict neighborhood edges, and guidance for initialization plus local search—is presented as a heuristic strategy inspired by 'clustering first, route second'. Performance is asserted via direct comparison on synthetic and real-world datasets against prior neural methods, with no equations, fitted parameters, or self-citations shown to reduce the central claims to tautologies or inputs by construction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method relies on the unstated assumption that a sparse heatmap can be built reliably and that the hyper tour over clusters meaningfully constrains the optimal tour; no explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption A sparse heatmap graph derived from the TSP instance can be used to form clusters whose supernodes preserve sufficient global connectivity information.
Invoked when the paper states it divides the instance into clusters using a sparse heatmap graph.

pith-pipeline@v0.9.0 · 5724 in / 1206 out tokens · 24810 ms · 2026-05-18T04:14:07.310079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

J., Applegate, D

10 Submission and Formatting Instructions for ICML 2026 Cook, W. J., Applegate, D. L., Bixby, R. E., and Chvatal, V . The traveling salesman problem: a computational study. Princeton university press,

work page 2026
[2]

arXiv preprint arXiv:1906.01227 , year=

Joshi, C. K., Laurent, T., and Bresson, X. An efficient graph convolutional network technique for the travelling salesman problem.arXiv preprint arXiv:1906.01227,

work page arXiv 1906
[3]

Neural combinatorial optimization with heavy decoder: Toward large scale generalization

11 Submission and Formatting Instructions for ICML 2026 Luo, F., Lin, X., Liu, F., Zhang, Q., and Wang, Z. Neural combinatorial optimization with heavy decoder: Toward large scale generalization. InNeurIPS,

work page 2026
[4]

Young, N. E. Greedy set-cover algorithms (1974-1979, chv´atal, johnson, lov ´asz, stein).Encyclopedia of algo- rithms, pp. 379–381,

work page 1974
[5]

Improving graph matching with positional reconstruction encoder-decoder network

Zhou, Y ., Jia, R., Lin, H., Quan, H., Zhao, Y ., and Lyu, X. Improving graph matching with positional reconstruction encoder-decoder network. 2023b. 12 Submission and Formatting Instructions for ICML 2026 A. Supplemental Materials A.1. Details of Related Work Traditional TSP solvers typically fall into three categories: exact methods (Cook et al., 2011),...

work page 2026
[6]

The Pointer Network (Vinyals et al.,

that generate tours from scratch with neural models and improvement-based methods (Joshi et al., 2019; Xiao et al., 2024; Li et al., 2023; Min et al., 2024; Chen & Tian, 2019; Wu et al., 2021a; Lu et al., 2019; Hudson et al., 2022; Yao et al., 2022; Ma et al., 2023; Ye et al., 2023; Wang et al., 2025; Li et al., 2025b) that first generated a tour and then...

work page 2019
[7]

first introduced an encoder-decoder framework for supervised incremental solution generation. Later advancements incorporated reinforcement learning (Bello et al., 2017; Deudon et al., 2018; Kool et al., 2019; Khalil et al., 2017), diffusion models (Li et al., 2024), refined decoders (Grinsztajn et al., 2023; Chalumeau et al., 2023; Luo et al., 2024), uti...

work page 2017
[8]

However, these approaches are limited to TSP instances with up to 1,000 vertices and struggle with larger instances

and the nearest neighbors (Lyu et al., 2024), trained on diverse instance distributions (Jacobs et al., 2021; Zhang et al., 2022; Jiang et al., 2022; Bi et al., 2022), explored meta-learning (Zhou et al., 2023a; Son et al., 2023), joint probability estimation (Li et al., 2025b) and curriculum learning (Wang et al., 2024). However, these approaches are lim...

work page 2024
[9]

to optimize the solution. Another type of search-based methods adopts neural models (mainly reinforcement learning) to guide the local search, including 2-opt (Chen & Tian, 2019; Wu et al., 2021a; Lu et al., 2019; Hudson et al., 2022; Yao et al., 2022), k-opt (Ma et al., 2023), beam search (Choo et al.,

work page 2019
[10]

The lack of optimal labels for large-scale instances makes training neural network models challenging, which restricts these methods to handling instances with up to 1,000 vertices

and Ant Colony Optimization (Ye et al., 2023). The lack of optimal labels for large-scale instances makes training neural network models challenging, which restricts these methods to handling instances with up to 1,000 vertices. A.2. Att-GCRN with Geometric Information. The main structure of our Att-GCRN-GI is consistent with Att-GCRN, with the only diffe...

work page 2023
[11]

The edge distance di,j is embedded as a H 2 dimensional feature vector

V0 =A 1 · ˆV (10) where A1 ∈R H×2. The edge distance di,j is embedded as a H 2 dimensional feature vector. Then it defines an indicator function of a TSP edge δK−N N i,j with the value one if nodes i and j are K-nearest neighbors, value two for self-connections, and value zero otherwise. Thus the edge input featuree i,j is: e0 i,j =A 2di,j +b 2||A3δk−N N ...

work page 1992
[12]

Consequently, the overall time complexity isO(c(n+p 2))

14 Submission and Formatting Instructions for ICML 2026 requires traversing all edges of each subgraph, which also has a time complexity of O(cp2). Consequently, the overall time complexity isO(c(n+p 2)). For hyper tour generation, the process of obtaining connected subgraphs and deleting bridges has a complexity ofO((k+1)n), given that the number of edge...

work page 2026

[1] [1]

J., Applegate, D

10 Submission and Formatting Instructions for ICML 2026 Cook, W. J., Applegate, D. L., Bixby, R. E., and Chvatal, V . The traveling salesman problem: a computational study. Princeton university press,

work page 2026

[2] [2]

arXiv preprint arXiv:1906.01227 , year=

Joshi, C. K., Laurent, T., and Bresson, X. An efficient graph convolutional network technique for the travelling salesman problem.arXiv preprint arXiv:1906.01227,

work page arXiv 1906

[3] [3]

Neural combinatorial optimization with heavy decoder: Toward large scale generalization

11 Submission and Formatting Instructions for ICML 2026 Luo, F., Lin, X., Liu, F., Zhang, Q., and Wang, Z. Neural combinatorial optimization with heavy decoder: Toward large scale generalization. InNeurIPS,

work page 2026

[4] [4]

Young, N. E. Greedy set-cover algorithms (1974-1979, chv´atal, johnson, lov ´asz, stein).Encyclopedia of algo- rithms, pp. 379–381,

work page 1974

[5] [5]

Improving graph matching with positional reconstruction encoder-decoder network

Zhou, Y ., Jia, R., Lin, H., Quan, H., Zhao, Y ., and Lyu, X. Improving graph matching with positional reconstruction encoder-decoder network. 2023b. 12 Submission and Formatting Instructions for ICML 2026 A. Supplemental Materials A.1. Details of Related Work Traditional TSP solvers typically fall into three categories: exact methods (Cook et al., 2011),...

work page 2026

[6] [6]

The Pointer Network (Vinyals et al.,

that generate tours from scratch with neural models and improvement-based methods (Joshi et al., 2019; Xiao et al., 2024; Li et al., 2023; Min et al., 2024; Chen & Tian, 2019; Wu et al., 2021a; Lu et al., 2019; Hudson et al., 2022; Yao et al., 2022; Ma et al., 2023; Ye et al., 2023; Wang et al., 2025; Li et al., 2025b) that first generated a tour and then...

work page 2019

[7] [7]

first introduced an encoder-decoder framework for supervised incremental solution generation. Later advancements incorporated reinforcement learning (Bello et al., 2017; Deudon et al., 2018; Kool et al., 2019; Khalil et al., 2017), diffusion models (Li et al., 2024), refined decoders (Grinsztajn et al., 2023; Chalumeau et al., 2023; Luo et al., 2024), uti...

work page 2017

[8] [8]

However, these approaches are limited to TSP instances with up to 1,000 vertices and struggle with larger instances

and the nearest neighbors (Lyu et al., 2024), trained on diverse instance distributions (Jacobs et al., 2021; Zhang et al., 2022; Jiang et al., 2022; Bi et al., 2022), explored meta-learning (Zhou et al., 2023a; Son et al., 2023), joint probability estimation (Li et al., 2025b) and curriculum learning (Wang et al., 2024). However, these approaches are lim...

work page 2024

[9] [9]

to optimize the solution. Another type of search-based methods adopts neural models (mainly reinforcement learning) to guide the local search, including 2-opt (Chen & Tian, 2019; Wu et al., 2021a; Lu et al., 2019; Hudson et al., 2022; Yao et al., 2022), k-opt (Ma et al., 2023), beam search (Choo et al.,

work page 2019

[10] [10]

The lack of optimal labels for large-scale instances makes training neural network models challenging, which restricts these methods to handling instances with up to 1,000 vertices

and Ant Colony Optimization (Ye et al., 2023). The lack of optimal labels for large-scale instances makes training neural network models challenging, which restricts these methods to handling instances with up to 1,000 vertices. A.2. Att-GCRN with Geometric Information. The main structure of our Att-GCRN-GI is consistent with Att-GCRN, with the only diffe...

work page 2023

[11] [11]

The edge distance di,j is embedded as a H 2 dimensional feature vector

V0 =A 1 · ˆV (10) where A1 ∈R H×2. The edge distance di,j is embedded as a H 2 dimensional feature vector. Then it defines an indicator function of a TSP edge δK−N N i,j with the value one if nodes i and j are K-nearest neighbors, value two for self-connections, and value zero otherwise. Thus the edge input featuree i,j is: e0 i,j =A 2di,j +b 2||A3δk−N N ...

work page 1992

[12] [12]

Consequently, the overall time complexity isO(c(n+p 2))

14 Submission and Formatting Instructions for ICML 2026 requires traversing all edges of each subgraph, which also has a time complexity of O(cp2). Consequently, the overall time complexity isO(c(n+p 2)). For hyper tour generation, the process of obtaining connected subgraphs and deleting bridges has a complexity ofO((k+1)n), given that the number of edge...

work page 2026