A reinforcement learning policy for the vertex-guard art gallery problem encodes sufficient geometric information in its encoder to allow a simple classifier to achieve high coverage feasibility out of distribution.
Preference optimization for combinatorial optimization problems,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
GRPO matches POMO solution quality within 2% on TSP/CVRP while avoiding REINFORCE training collapse on TSP-100 without needing a rollout baseline.
citing papers explorer
-
Learning to Place Guards by Reinforcement: A Geo-Free Neural Policy for the Vertex-Guard Art Gallery Problem
A reinforcement learning policy for the vertex-guard art gallery problem encodes sufficient geometric information in its encoder to allow a simple classifier to achieve high coverage feasibility out of distribution.
-
Baseline-Free Policy Optimization for Neural Combinatorial Optimization
GRPO matches POMO solution quality within 2% on TSP/CVRP while avoiding REINFORCE training collapse on TSP-100 without needing a rollout baseline.