iPad: Iterative Proposal-centric End-to-End Autonomous Driving
read the original abstract
End-to-end (E2E) autonomous driving systems offer a promising alternative to traditional modular pipelines by reducing information loss and error accumulation, with significant potential to enhance both mobility and safety. However, most existing E2E approaches directly generate plans based on dense bird's-eye view (BEV) grid features, leading to inefficiency and limited planning awareness. To address these limitations, we propose iterative Proposal-centric autonomous driving (iPad), a novel framework that places proposals - a set of candidate future plans - at the center of feature extraction and auxiliary tasks. Central to iPad is ProFormer, a BEV encoder that iteratively refines proposals and their associated features through proposal-anchored attention, effectively fusing multi-view image data. Additionally, we introduce two lightweight, proposal-centric auxiliary tasks - mapping and prediction - that improve planning quality with minimal computational overhead. Extensive experiments on the NAVSIM and CARLA Bench2Drive benchmarks demonstrate that iPad achieves state-of-the-art performance while being significantly more efficient than prior leading methods.
This paper has not been read by Pith yet.
Forward citations
Cited by 19 Pith papers
-
Driving risk emerges from the required two-dimensional joint evasive acceleration
Evasive acceleration quantifies driving risk as the minimum 2D constant relative acceleration needed to avoid collision and outperforms time-to-collision on warning timing, discrimination, and information retention ac...
-
LWDrive: Layer-Wise World-Model-Guided Vision-Language Model Planning for Autonomous Driving
LWDrive uses future-frame supervision on VLMs to create world-model features that a multi-layer Foresight Cascade Planner refines into final trajectories, reporting 92.0 on NAVSIM and 89.6 on NAVSIM-v2.
-
FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning
FlowR2A learns reward-conditioned action distributions via flow-matching decoder to unify dense reward supervision with dynamic proposal generation for multimodal driving planning.
-
World Engine: Towards the Era of Post-Training for Autonomous Driving
World Engine generates realistic safety-critical driving variations from logs for reinforcement post-training, reducing benchmark failures more than data scaling and showing collision reductions plus on-road gains in ...
-
Test-Time Trajectory Optimization for Autonomous Driving
TOAD applies test-time Cross-Entropy Method optimization to refine trajectories using the planner's scorer as a reward function, improving end-to-end autonomous driving performance without retraining.
-
D$^3$-MoE:Dual Disentangled Diffusion Mixture-of-Experts for Style-Controllable End-to-End Autonomous Driving
D³-MoE disentangles style and physical axes with diffusion and self-supervised MoE experts to produce style-controllable trajectories, reporting SOTA 88.2 PDMS on NAVSIM.
-
IDOL: Inverse-Dynamics-Guided Future Prediction for End-to-End Autonomous Driving
IDOL uses inverse dynamics on adjacent predicted latent futures to extract planning-relevant motion deltas, then optimizes trajectories with a closed-loop refinement step, reporting SOTA results on NAVSIM v1 and v2.
-
NTR: Neural Token Reconstruction for Scene Token Bottleneck in End-to-End Driving
NTR adds a self-distillation masked latent reconstruction objective that uses only scene tokens to reconstruct masked patch features, improving visual representation quality and planning performance in end-to-end auto...
-
ChainFlow-VLA: Causal Flow Planning with Vision-Language Models
ChainFlow-VLA unifies autoregressive causal trajectory modes with VLM-conditioned diffusion refinement to reach 94.85 on NAVSIM v1, matching human performance.
-
CLOVER: Closed-Loop Value Estimation and Ranking for End-to-End Autonomous Driving Planning
CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.
-
The DAWN of World-Action Interactive Models
DAWN couples a world predictor with a world-conditioned action denoiser in latent space so that each refines the other recursively, yielding strong planning and safety results on autonomous driving benchmarks.
-
ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution
ProDrive couples a query-centric planner with a BEV world model for end-to-end ego-environment co-evolution, enabling future-outcome assessment that improves safety and efficiency over reactive baselines on NAVSIM v1.
-
AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving
A cascaded end-to-end driving model conditions longitudinal planning on the lateral path via anchor-based regression and path-conditioned 1D displacement prediction, achieving SOTA driving score of 89.07 and 73.18% su...
-
SimScale: Learning to Drive via Real-World Simulation at Scale
SimScale synthesizes unseen driving states from real logs via neural rendering and reactive environments, generates pseudo-expert trajectories, and shows that co-training on real plus simulated data improves planning ...
-
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
PRIX presents an efficient camera-only planner with a novel CaRT module that matches larger multimodal models on NavSim and nuScenes while reducing model size and inference time.
-
LWDrive: Layer-Wise World-Model-Guided Vision-Language Model Planning for Autonomous Driving
LWDrive refines coarse VLM trajectories via future-frame supervision and a multi-layer Foresight Cascade Planner, reporting scores of 92.0 on NAVSIM and 89.6 on NAVSIM-v2.
-
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning
Discrete-WAM unifies world modeling and policy learning for autonomous driving by representing observations, states, decisions, and actions as tokens in one space and using hierarchical token editing for planning.
-
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.
-
CLEAR: Cognition and Latent Evaluation for Adaptive Routing in End-to-End Autonomous Driving
CLEAR achieves state-of-the-art PDMS of 93.7 on NAVSIM v1 by combining single-step VAE latent drift with Qwen 3.5-guided adaptive scheduling and trajectory scoring for end-to-end driving.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.