hub

Attention, learn to solve routing problems!

· 2018 · stat.ML · arXiv 1803.08475

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

open full Pith review browse 14 citing papers arXiv PDF

abstract

The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development. However, to push this idea towards practical implementation, we need better models and better ways of training. We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function. We significantly improve over recent learned heuristics for the Travelling Salesman Problem (TSP), getting close to optimal results for problems up to 100 nodes. With the same hyperparameters, we learn strong heuristics for two variants of the Vehicle Routing Problem (VRP), the Orienteering Problem (OP) and (a stochastic variant of) the Prize Collecting TSP (PCTSP), outperforming a wide range of baselines and getting results close to highly optimized and specialized algorithms.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem

cs.LG · 2026-04-06 · unverdicted · novelty 7.0

VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-shot to unseen scales.

Learning Altruistic Collaboration in Heterogeneous Multi-Team Systems

cs.RO · 2026-05-20 · unverdicted · novelty 6.0

A graph neural network learns to approximate altruistic robot transfers across heterogeneous teams using Hamilton's rule, achieving near-optimal allocation in simulated firefighting scenarios.

CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem

quant-ph · 2026-05-13 · unverdicted · novelty 6.0

Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.

Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba

cs.LG · 2026-02-24 · unverdicted · novelty 6.0

ECO uses supervised warm-up plus iterative batched DPO on a Mamba backbone to reach top neural performance on TSP and CVRP while lowering memory growth and raising throughput.

Learning-Optimized Qubit Mapping and Reuse to Minimize Inter-Core Communication in Modular Quantum Architectures

quant-ph · 2025-06-11 · unverdicted · novelty 6.0

QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.

Unrealized Expectations: Comparing AI Methods vs Classical Algorithms for Maximum Independent Set

cs.LG · 2025-02-05 · unverdicted · novelty 6.0

Classical solver KaMIS outperforms leading AI methods for Maximum Independent Set on random graphs, with some AI approaches no better than simple greedy heuristics and a new serialization analysis revealing similar reasoning.

Attention-Based Deep Reinforcement Learning for Qubit Allocation in Modular Quantum Architectures

quant-ph · 2024-06-17 · unverdicted · novelty 6.0

An attention-based DRL agent with Transformer encoder and GNN learns heuristics for qubit-to-core allocation in multi-core quantum systems to minimize state transfers and online compilation time.

Machine Learning for Two-Stage Graph Sparsification for the Travelling Salesman Problem

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

A two-stage ML sparsifier for TSP candidate graphs combines alpha-Nearest and POPMUSIC for high recall then trains a model to cut density while preserving coverage across distance types and instance sizes up to 500.

Fine-tuning Large Language Model for Automated Algorithm Design

cs.LG · 2025-07-13 · unverdicted · novelty 5.0

Fine-tuned LLMs with DAR sampling and DPO outperform off-the-shelf versions on algorithm design tasks and generalize to related settings.

RouteFormer: A Transformer-Based Routing Framework for Autonomous Vehicles

cs.RO · 2025-04-07 · unverdicted · novelty 5.0

RouteFormer is a transformer-RL hybrid for single-agent graph routing that reports 10% and 7% shorter distances than Concorde and LKH-3 on mission-like graphs by incorporating constraints the solvers ignore.

ARMATA: Auto-Regressive Multi-Agent Task Assignment

cs.MA · 2026-05-05 · unverdicted · novelty 5.0

ARMATA is a new end-to-end autoregressive model with multi-stage decoding that unifies allocation and routing for multi-agent systems and reports up to 20% better solutions than OR-Tools, CPLEX, and LKH-3 in seconds instead of hours.

NCO4CVRP: Neural Combinatorial Optimization for the Capacitated Vehicle Routing Problem

cs.LG · 2026-04-17 · unverdicted · novelty 3.0

Adding simulated annealing to random reconstruction and beam search to POMO in neural CVRP solvers reduces optimality gaps on standard benchmarks.

Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

math.OC · 2026-04-13 · unverdicted · novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.

Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming

stat.ML · 2026-05-18

citing papers explorer

Showing 14 of 14 citing papers.

Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem cs.LG · 2026-04-06 · unverdicted · none · ref 9
VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-shot to unseen scales.
Learning Altruistic Collaboration in Heterogeneous Multi-Team Systems cs.RO · 2026-05-20 · unverdicted · none · ref 21 · internal anchor
A graph neural network learns to approximate altruistic robot transfers across heterogeneous teams using Hamilton's rule, achieving near-optimal allocation in simulated firefighting scenarios.
CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem quant-ph · 2026-05-13 · unverdicted · none · ref 17 · internal anchor
Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.
Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba cs.LG · 2026-02-24 · unverdicted · none · ref 17 · internal anchor
ECO uses supervised warm-up plus iterative batched DPO on a Mamba backbone to reach top neural performance on TSP and CVRP while lowering memory growth and raising throughput.
Learning-Optimized Qubit Mapping and Reuse to Minimize Inter-Core Communication in Modular Quantum Architectures quant-ph · 2025-06-11 · unverdicted · none · ref 23 · internal anchor
QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.
Unrealized Expectations: Comparing AI Methods vs Classical Algorithms for Maximum Independent Set cs.LG · 2025-02-05 · unverdicted · none · ref 8 · internal anchor
Classical solver KaMIS outperforms leading AI methods for Maximum Independent Set on random graphs, with some AI approaches no better than simple greedy heuristics and a new serialization analysis revealing similar reasoning.
Attention-Based Deep Reinforcement Learning for Qubit Allocation in Modular Quantum Architectures quant-ph · 2024-06-17 · unverdicted · none · ref 29 · internal anchor
An attention-based DRL agent with Transformer encoder and GNN learns heuristics for qubit-to-core allocation in multi-core quantum systems to minimize state transfers and online compilation time.
Machine Learning for Two-Stage Graph Sparsification for the Travelling Salesman Problem cs.LG · 2026-04-22 · unverdicted · none · ref 22
A two-stage ML sparsifier for TSP candidate graphs combines alpha-Nearest and POPMUSIC for high recall then trains a model to cut density while preserving coverage across distance types and instance sizes up to 500.
Fine-tuning Large Language Model for Automated Algorithm Design cs.LG · 2025-07-13 · unverdicted · none · ref 14 · internal anchor
Fine-tuned LLMs with DAR sampling and DPO outperform off-the-shelf versions on algorithm design tasks and generalize to related settings.
RouteFormer: A Transformer-Based Routing Framework for Autonomous Vehicles cs.RO · 2025-04-07 · unverdicted · none · ref 36 · internal anchor
RouteFormer is a transformer-RL hybrid for single-agent graph routing that reports 10% and 7% shorter distances than Concorde and LKH-3 on mission-like graphs by incorporating constraints the solvers ignore.
ARMATA: Auto-Regressive Multi-Agent Task Assignment cs.MA · 2026-05-05 · unverdicted · none · ref 24
ARMATA is a new end-to-end autoregressive model with multi-stage decoding that unifies allocation and routing for multi-agent systems and reports up to 20% better solutions than OR-Tools, CPLEX, and LKH-3 in seconds instead of hours.
NCO4CVRP: Neural Combinatorial Optimization for the Capacitated Vehicle Routing Problem cs.LG · 2026-04-17 · unverdicted · none · ref 23
Adding simulated annealing to random reconstruction and beam search to POMO in neural CVRP solvers reduces optimality gaps on standard benchmarks.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 78
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming stat.ML · 2026-05-18 · unreviewed · ref 69 · internal anchor

Attention, learn to solve routing problems!

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer