TriSearch is an RL framework that optimizes triangulations of polytopes using bistellar flips with a circuit-supported subtriangulation action representation, generalizing zero-shot to larger instances and outperforming prior samplers in 3D and 4D.
hub
Attention, Learn to Solve Routing Problems!
22 Pith papers cite this work. Polarity classification is still indexing.
abstract
The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development. However, to push this idea towards practical implementation, we need better models and better ways of training. We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function. We significantly improve over recent learned heuristics for the Travelling Salesman Problem (TSP), getting close to optimal results for problems up to 100 nodes. With the same hyperparameters, we learn strong heuristics for two variants of the Vehicle Routing Problem (VRP), the Orienteering Problem (OP) and (a stochastic variant of) the Prize Collecting TSP (PCTSP), outperforming a wide range of baselines and getting results close to highly optimized and specialized algorithms.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Stochastic gradient ascent with averaging learns Lagrangian multipliers for MILP at the minimax rate Θ(s/√N) and faster Θ(s/N) for warm-start, closing the gap between upper and lower bounds.
VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-shot to unseen scales.
NCP trains a neural network to predict certificate-level dual prices for CO problems, enabling structured primal recovery with a local second-order error guarantee when consistency holds.
A structure-aware RL fairness attack with joint item and gender selection policies is introduced and shown effective on four recommender models across two datasets.
VaFM encodes constraint-specific VRP images via CNN into patch embeddings fused with graph nodes, using an auxiliary task to handle pixel imbalance, and reports better performance than prior methods on 16 VRP variants.
GRPO matches POMO solution quality within 2% on TSP/CVRP while avoiding REINFORCE training collapse on TSP-100 without needing a rollout baseline.
MViewRouter internalizes D4 geometric equivariance for routing via Multi-view Alternating Attention and Collective Policy Gradient Aggregation, yielding competitive solutions and strong generalization on TSP/CVRP benchmarks.
A graph neural network learns to approximate altruistic robot transfers across heterogeneous teams using Hamilton's rule, achieving near-optimal allocation in simulated firefighting scenarios.
Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.
ECO uses supervised warm-up plus iterative batched DPO on a Mamba backbone to reach top neural performance on TSP and CVRP while lowering memory growth and raising throughput.
QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.
Classical solver KaMIS outperforms leading AI methods for Maximum Independent Set on random graphs, with some AI approaches no better than simple greedy heuristics and a new serialization analysis revealing similar reasoning.
An attention-based DRL agent with Transformer encoder and GNN learns heuristics for qubit-to-core allocation in multi-core quantum systems to minimize state transfers and online compilation time.
N(CO)^2 applies reinforcement learning with chance constraints to solve stochastic orienteering problems, generalizing across instances with performance competitive to MILP.
Fine-tuned LLMs with DAR sampling and DPO outperform off-the-shelf versions on algorithm design tasks and generalize to related settings.
RouteFormer is a transformer-RL hybrid for single-agent graph routing that reports 10% and 7% shorter distances than Concorde and LKH-3 on mission-like graphs by incorporating constraints the solvers ignore.
ARMATA is a new end-to-end autoregressive model with multi-stage decoding that unifies allocation and routing for multi-agent systems and reports up to 20% better solutions than OR-Tools, CPLEX, and LKH-3 in seconds instead of hours.
A Transformer model trained via supervised learning on ILP solutions for a new nursing care taxi dispatch VRP variant reduces operating time by up to 8% on small instances while keeping constraint violations low.
Adding simulated annealing to random reconstruction and beam search to POMO in neural CVRP solvers reduces optimality gaps on standard benchmarks.
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
citing papers explorer
-
Baseline-Free Policy Optimization for Neural Combinatorial Optimization
GRPO matches POMO solution quality within 2% on TSP/CVRP while avoiding REINFORCE training collapse on TSP-100 without needing a rollout baseline.
-
Optimizing Nursing Care Taxi Dispatch Leveraging Integer Linear Programming Solvers and Machine Learning
A Transformer model trained via supervised learning on ILP solutions for a new nursing care taxi dispatch VRP variant reduces operating time by up to 8% on small instances while keeping constraint violations low.