BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
Carl: Learning scalable planning policies with simple rewards
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.RO 5years
2026 5representative citing papers
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
MAPLE performs closed-loop multi-agent training of VLA driving models entirely in latent space using supervised fine-tuning followed by RL with safety, progress, and diversity rewards, reaching SOTA on Bench2Drive.
GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
citing papers explorer
-
Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
-
Fail2Drive: Benchmarking Closed-Loop Driving Generalization
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
-
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving
MAPLE performs closed-loop multi-agent training of VLA driving models entirely in latent space using supervised fine-tuning followed by RL with safety, progress, and diversity rewards, reaching SOTA on Bench2Drive.
-
Learning Dexterous Grasping from Sparse Taxonomy Guidance
GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
-
Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.