GRPO matches POMO solution quality within 2% on TSP/CVRP while avoiding REINFORCE training collapse on TSP-100 without needing a rollout baseline.
Graph reinforcement learning for combinatorial optimization: A survey and unifying perspective,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
CONDITIONAL 2representative citing papers
Tabular RL on a Non-Markovian Rewards Decision Process formulation matches deep RL performance on real metro expansion in Xi'an and Amsterdam while cutting episodes by 18x and carbon emissions by 12x on average.
citing papers explorer
-
Baseline-Free Policy Optimization for Neural Combinatorial Optimization
GRPO matches POMO solution quality within 2% on TSP/CVRP while avoiding REINFORCE training collapse on TSP-100 without needing a rollout baseline.
-
Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning
Tabular RL on a Non-Markovian Rewards Decision Process formulation matches deep RL performance on real metro expansion in Xi'an and Amsterdam while cutting episodes by 18x and carbon emissions by 12x on average.