OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.
D.; Ermon, S.; and Finn, C
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2representative citing papers
citing papers explorer
-
One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.
- The Ratchet Effect in Silico: How Interaction Drives Cumulative Intelligence in Large Language Models