Carl: Learning scalable planning policies with simple rewards

Bernhard Jaeger, Daniel Dauner, Jens Beißwenger, Simon Gerstenecker, Kashyap Chitta, Andreas Geiger · 2025 · arXiv 2504.17838

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

cs.RO · 2026-04-09 · conditional · novelty 7.0

Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.

MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving

cs.RO · 2026-05-13 · unverdicted · novelty 6.0

MAPLE performs closed-loop multi-agent training of VLA driving models entirely in latent space using supervised fine-tuning followed by RL with safety, progress, and diversity rewards, reaching SOTA on Bench2Drive.

Learning Dexterous Grasping from Sparse Taxonomy Guidance

cs.RO · 2026-04-05 · unverdicted · novelty 6.0

GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.

Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction

cs.RO · 2026-03-25 · conditional · novelty 6.0

Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.

citing papers explorer

Showing 5 of 5 citing papers.

Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving cs.RO · 2026-05-11 · unverdicted · none · ref 4
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
Fail2Drive: Benchmarking Closed-Loop Driving Generalization cs.RO · 2026-04-09 · conditional · none · ref 22
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving cs.RO · 2026-05-13 · unverdicted · none · ref 16
MAPLE performs closed-loop multi-agent training of VLA driving models entirely in latent space using supervised fine-tuning followed by RL with safety, progress, and diversity rewards, reaching SOTA on Bench2Drive.
Learning Dexterous Grasping from Sparse Taxonomy Guidance cs.RO · 2026-04-05 · unverdicted · none · ref 22
GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction cs.RO · 2026-03-25 · conditional · none · ref 23
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.

Carl: Learning scalable planning policies with simple rewards

fields

years

verdicts

representative citing papers

citing papers explorer