Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.
Ztrs: Zero-imitation end-to-end autonomous driving with trajectory scoring
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
DriveJudge combines VLM reasoning with rule functions on a new 33,577-sample human-annotated dataset, outperforming EPDMS by 21.23 AUC on quality classification and DriveCritic by 6.5% on trajectory preference.
TOAD applies test-time Cross-Entropy Method optimization to refine trajectories using the planner's scorer as a reward function, improving end-to-end autonomous driving performance without retraining.
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
GSDrive combines IL priors with RL feedback by probing multi-mode futures inside a 3D Gaussian Splatting simulator to supply dense rewards for closed-loop driving policy improvement on nuScenes.
The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.
SimScale synthesizes unseen driving states from real logs via neural rendering and reactive environments, generates pseudo-expert trajectories, and shows that co-training on real plus simulated data improves planning robustness and generalization on real benchmarks, with gains scaling by simulation
PriorEye augments end-to-end driving models with a dual-memory architecture that stores and gates geospatial visual priors to improve performance and robustness to sensor corruption on NAVSIM-v2.
A training-free fusion layer enables stale VLM selections to improve a real-time planner's trajectory scoring for urban sidewalk navigation, yielding 30% ADE reduction in challenging scenarios.
RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.
citing papers explorer
-
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.