BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
Carl: Learning scalable planning policies with simple rewards
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 8years
2026 8roles
background 1polarities
background 1representative citing papers
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.
MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.
PlanRL uses RL to generate parameters for polynomial trajectories in a Frenet frame with kinematic checks, achieving 5-11% better driving scores on CARLA benchmarks than control-based RL.
citing papers explorer
-
Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
-
Fail2Drive: Benchmarking Closed-Loop Driving Generalization
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
-
Scaling Self-Play for End-to-End Driving
Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.
-
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving
MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.
-
Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
-
DriveSafer: End-to-End Autonomous Driving with Safety Guidance
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.
-
PlanRL: A Trajectory Planning Architecture for Reinforcement Learning-based Driving Experts
PlanRL uses RL to generate parameters for polynomial trajectories in a Frenet frame with kinematic checks, achieving 5-11% better driving scores on CARLA benchmarks than control-based RL.
- Learning Dexterous Grasping from Sparse Taxonomy Guidance