pith. sign in

arxiv: 2606.31106 · v1 · pith:4I6SWSZ2new · submitted 2026-06-30 · 💻 cs.RO · cs.AI· cs.LG

What Probing Reveals about Autonomous Driving: Linking Internal Prediction Errors to Ego Planning

Pith reviewed 2026-07-01 05:46 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords autonomous drivingpolicy probinginternal representationspredictionplanningimitation learningreinforcement learningcausal intervention
0
0 comments X

The pith

Autonomous driving policies with strong closed-loop performance often lack timely internal predictions of surrounding vehicles during near-collision events, which limits their ability to generate safe ego plans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether driving policies from imitation learning and reinforcement learning truly develop internal capabilities for predicting surrounding vehicle motions and planning safe trajectories, or whether they depend on surface-level heuristics that succeed in routine cases. It tracks these signals using linear probing and targeted perturbations across different scales of data and training to see when the representations appear or remain absent. Despite high performance in closed-loop simulators, the policies frequently fail to form accurate and timely predictions in critical near-collision scenarios. Causal interventions that correct mistaken predictions then produce safer ego trajectories. This shows that aggregate simulator scores conceal gaps in the predictive information the policies actually use for planning.

Core claim

Despite good closed-loop performance, policies often fail to form timely surrounding-vehicle predictions during near-collision events, revealing a limitation in the predictive signals available for ego planning. Causal intervention shows that correcting mistaken predictions improves ego planning toward safer trajectories.

What carries the argument

Linear probing and targeted perturbations applied to internal layers of imitation and reinforcement learning driving policies to detect and manipulate prediction and planning representations.

If this is right

  • Closed-loop simulator scores alone do not confirm the presence of internal prediction or planning capabilities in driving policies.
  • Increasing dataset size or training duration does not guarantee stronger internal prediction and planning signals beyond improved heuristics.
  • Policies from both behavior cloning and reinforcement learning exhibit similar failures to generate timely surrounding-vehicle predictions in critical scenarios.
  • Correcting internal prediction errors produces measurable improvements in the safety of generated ego trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Probing methods of this type could be extended to diagnose internal reasoning in other end-to-end learned control systems beyond driving.
  • Training procedures might incorporate explicit auxiliary objectives to encourage accurate internal predictions rather than relying solely on final trajectory performance.
  • The results suggest that scaling data and compute may reach a plateau for certain safety-critical capabilities unless predictive representations are directly addressed.

Load-bearing premise

Linear probing reliably identifies whether internal prediction and planning representations exist in the policies, and targeted perturbations can alter those signals without creating unrelated side effects.

What would settle it

An experiment that applies the same interventions to the probed prediction signals during near-collision events but measures no resulting change in ego planning outputs or trajectory safety metrics.

Figures

Figures reproduced from arXiv: 2606.31106 by Eugene Vinitsky, Hyeonchang Jeon, Kyungbeom Kim, Kyung-Joong Kim.

Figure 1
Figure 1. Figure 1: Recovery of planning with correction of prediction. The ego AV continues straight, while the other vehicle drives in parallel on the ego AV’s left side. Gradient arrows simplify the probing results and show the decoded future motion direction over time. (Original): After the 10-step prediction, the decoded ego plan curves away from its straight ground-truth trajectory, while the other vehicle’s future is a… view at source ↗
Figure 2
Figure 2. Figure 2: Power law relationships of BC and RL models: We evaluate the WOSAC, collision metrics (Off-Road and Veh-Coll), and goal progress rate. r is the correlation coefficient. To investigate the emergence of planning and prediction ability, we train the BC and RL models with three different random seeds while gradually increasing the dataset size, and evaluate them in the GPUDrive simulator [Kazemkhani et al., 20… view at source ↗
Figure 3
Figure 3. Figure 3: Performance metrics with surrounding-vehicle linear probing. Top: F1 Macro score dif￾ference between linear probes on trained model representations and the raw-input baseline (LP − Raw) across future steps fs ∈ 10, 20, 30, 40, where fs denotes the prediction horizon in timesteps. Bottom: trajectory-type accuracy difference at fs = 10 for Straight, T urn, Reverse, and Uncategorized cases. Higher values are … view at source ↗
Figure 4
Figure 4. Figure 4: Correlation between linear-probe and future ego–other distance. Vertical: probability gap (LP – raw); horizontal: future ego–other distance. Orange: regression line; r: correlation. difference in predicted probability for the true label between the LP and the raw-input probe, while the horizontal axis shows the future distance 10 steps ahead. Points are colored by relative distance change, current−future c… view at source ↗
Figure 5
Figure 5. Figure 5: Perturbed simulation results: We test the IL model by randomly removing the surrounding vehicles with a ratio p. We adjust the ratio by multiplying 1 ratio for vehicle collision. If the driving model does not spuriously depend on the presence of surrounding vehicles, removing them should not degrade performance and may even improve goal-reaching performance by reducing interaction constraints. Moreover, st… view at source ↗
Figure 6
Figure 6. Figure 6: Near-collision Analysis. Each heatmap shows the normalized difference between the predicted probability of surrounding-vehicle probing in the pre-collision window w and that of each spatial grid over the full episode, computed as w−agrid a , at 10 to 40 steps before collision. The heatmaps use an ego-centric 8 × 8 grid covering [−50, 50] meter along both axes, so each grid cell corresponds to a 12.5 m × 12… view at source ↗
Figure 7
Figure 7. Figure 7: Intervention experiment for adaptiveness and recovery (10 to 40 future timesteps): Examples from BC, RL, and IL models, and each pair of columns compares the original probing result with the probing result after intervention. (a) Adaptiveness: we perturb the surrounding-vehicle prediction so that it overlaps with the ego’s predicted path, and test whether the ego plan changes to avoid the induced conflict.… view at source ↗
Figure 8
Figure 8. Figure 8: Behavior Cloning model architecture: Overall architecture of the behavior cloning model. Ego vehicle, other vehicles, and road features are first embedded and fused via early-fusion attention. The fused representations are refined via self- and cross-attention modules and finally modeled with a Gaussian Mixture Model (GMM). In our behavior cloning framework, the model is conditioned on three types of input… view at source ↗
Figure 9
Figure 9. Figure 9: Distributions of 4 trajectory types on training and validation set. Straight refers to trajectories in which changes in ∆y and ∆yaw remain below predefined thresholds. Turn denotes the presence of a contiguous inter￾val with notable variations in ∆y and ∆yaw. Re￾verse captures cases where ∆x is negative for at least half of the trajectory. Uncategorized encom￾passes all remaining trajectories that do not s… view at source ↗
Figure 10
Figure 10. Figure 10: Power law relationships for WOSAC metrics: We evaluate the realism meta score, kinematic score, interactive score, map-based score, and minADE for 1,000 scenes in the validation set. C.3 Additional results of Power-law Relationship 100 500 1k 5k 10k20k40k80k Number of Scenes 3 × 10 −1 4 × 10 −1 6 × 10 −1 Goal Success Ratio = . Goal Rate = . ⋅ . 100 500 1k 5k 10k20k40k80k Number of Scenes 10 −1 Offroad Rat… view at source ↗
Figure 12
Figure 12. Figure 12: Power law relationships for simulation results by cases (BC): Closed-loop simulation results for BC with data scaling. Missing points correspond to zero-valued metrics. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Full performance metrics with surrounding-linear probing (BC): Light colors are raw-input probing and darker colors are BC probing(earlier layer and late layer) with error bars of standard deviation across different seeds. For convenience, we refer to linear probing in the earlier layer of the BC model as early LP and in the later layer as late LP. The top row is the F1 score across future step {10, 20, 3… view at source ↗
Figure 14
Figure 14. Figure 14: Full performance metrics with surrounding-linear probing (RL): Light colors are raw-input probing and darker colors are RL probing with error bars of standard deviation across different seeds. For convenience, we refer to linear probing as LP. The top row is the F1 score across future steps {10, 20, 30, 40}, and the bottom row is accuracy with labeled cases (Future step = 10). Note that since there are no… view at source ↗
Figure 15
Figure 15. Figure 15: Type accuracy of other future steps (20, 30, 40) (Other) (BC): The row is type (Normal, T urn, Straight, and Reverse) and the column is future step (20, 30, 40). 24 [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗
Figure 17
Figure 17. Figure 17: Performance metrics with ego AV linear probing (BC): Light colors are raw-input probing and darker colors are BC probing (earlier layer and late layer) with error bars of standard deviation across different seeds. The top row is the F1 score across future steps {10, 20, 30, 40} and the bottom row is accuracy with labeled cases (Future step = 10). E.3 Type Accuracy of Future Steps The figure 19 summarizes … view at source ↗
Figure 18
Figure 18. Figure 18: Performance metrics with ego AV linear probing (RL): Light colors are raw-input probing and darker colors are RL probing (earlier layer and late layer) with error bars of standard deviation across different seeds. The top row is the F1 score across future steps {10, 20, 30, 40} and the bottom row is accuracy with labeled cases (Future step = 10). significant for the Straight and Reverse cases, and perform… view at source ↗
Figure 19
Figure 19. Figure 19: Type accuracy of other future steps (20, 30, 40 (Ego) (BC): The row is type (Normal, T urn, Straight, and Reverse) and the column is future step (20, 30, 40). 28 [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
Figure 21
Figure 21. Figure 21: Comparison of ego-linear probing in single AV driving and all vehicles driving with rendered examples: The visualization of ego-linear probing when single AV drives (Top) and all vehicles include AV drive (Bottom). The expert trajectory is shown as a dark green line. The ego AV is blue; it turns yellow when off-road and red when a vehicle collides. To better understand the degradation of single AV driving… view at source ↗
Figure 22
Figure 22. Figure 22: Near-collision Analysis. (Off-road) Each heatmap shows the normalized difference between the predicted probability in the pre-collision window w and that of each spatial grid over the full episode, computed as w−agrid a , at 10 to 40 steps before collision. The red dot marks the ego vehicle position. Dark gray cells indicate grids that were filtered out due to insufficient collision samples. Top: RL, Bott… view at source ↗
Figure 23
Figure 23. Figure 23: Failure cases of intervention: The visualization examples for failure cases of changing labels. (a)-(d): BC, (e)-(h): RL, (i)-(j): IL 34 [PITH_FULL_IMAGE:figures/full_fig_p034_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Additional Examples for Adaptiveness cases: The visualization examples for adaptive￾ness cases. Change Lane: ego linear probing changes the lane to avoid collision. Slowing Speed: When we disturb the trajectory, the model slows down the speed. (a)-(d): BC, (e)-(h): RL, (i)-(l): IL 35 [PITH_FULL_IMAGE:figures/full_fig_p035_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Additional Examples for Recovery cases: The visualization examples for recovery cases. Correct Recovery: When we change the label to align with the correct expert trajectory of the surrounding vehicle. Incorrect Recovery: When we set the label to a random label. (a)-(b): BC, (c)-(d): RL, (e): IL 36 [PITH_FULL_IMAGE:figures/full_fig_p036_25.png] view at source ↗
read the original abstract

Large-scale datasets and fast simulators have enabled improvements in driving policies that appear safe and robust, yet strong performance in nominal scenarios can still mask flawed reasoning and unsafe heuristics. Summary scores from closed-loop simulators do not give significant insight into the policy, making it difficult to determine whether they truly predict the motion of surrounding vehicles, how the ego vehicle generates future plans, or whether they merely rely on brittle heuristics that happen to succeed in nominal scenarios. To better understand the limits and weaknesses of driving policies, we focus on probing for forms of prediction, i.e., where surrounding vehicles will move next, and planning, i.e., understanding how to generate safe trajectories. We focus on these two capabilities because they reflect behaviors expected of effective driving policies, and use their presence or absence to assess policy quality across data-driven behavior cloning and simulation-driven reinforcement learning policies. To evaluate the presence of these capabilities, we investigate them as a function of scale, asking whether the closed-loop gains from larger datasets and longer simulation training reflect stronger prediction and planning or merely better behavioral heuristics. We use linear probing and targeted perturbations in both imitation learning and reinforcement learning models to track when these internal signals emerge, plateau, or fail. Despite good closed-loop performance, policies often fail to form timely surrounding-vehicle predictions during near-collision events, revealing a limitation in the predictive signals available for ego planning. Finally, causal intervention shows that correcting mistaken predictions improves ego planning toward safer trajectories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper investigates internal representations in autonomous driving policies (imitation learning and reinforcement learning) by applying linear probing to detect prediction of surrounding vehicle motion and planning of ego trajectories, along with targeted perturbations for causal analysis. It claims that despite strong closed-loop performance, policies frequently fail to form timely surrounding-vehicle predictions during near-collision events, limiting the predictive signals available for ego planning; causal correction of mistaken predictions is shown to improve planning toward safer trajectories. The study examines these capabilities as a function of scale (dataset size and training length) to distinguish genuine predictive/planning improvements from behavioral heuristics.

Significance. If the probing and intervention results hold under rigorous validation, the work provides a mechanistic explanation for policy failures in edge cases that aggregate simulator scores obscure, directly linking prediction errors to planning deficiencies. This could guide improvements in policy architectures or training objectives. The empirical approach using external interventions (rather than self-referential derivations) is a positive feature, but its value depends on the validity of linear probes for complex representations.

major comments (2)
  1. [Probing and intervention methodology (as described in abstract)] The claim that policies 'often fail to form timely surrounding-vehicle predictions during near-collision events' (abstract) rests on linear probes reliably indicating absence of the relevant internal signals. If prediction information is encoded non-linearly (common in deep policies), linear probes can report absence even when the signal exists and is used downstream; this directly undermines the central conclusion about limitations in predictive signals for ego planning.
  2. [Causal intervention experiments (as described in abstract)] The causal claim that 'correcting mistaken predictions improves ego planning toward safer trajectories' requires that targeted perturbations isolate the intended prediction signals without confounding artifacts or off-target effects on the network. The abstract supplies no details on model architectures, exact perturbation methods, statistical controls, or validation that interventions are specific and causal, which is load-bearing for the intervention results.
minor comments (1)
  1. [Abstract] The abstract would benefit from explicit definitions of key terms such as 'timely' predictions and 'near-collision events' to allow readers to assess the scope of the reported failures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments focus on the validity of linear probing and the specificity of causal interventions; both are central to our claims. We respond point by point below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Probing and intervention methodology (as described in abstract)] The claim that policies 'often fail to form timely surrounding-vehicle predictions during near-collision events' (abstract) rests on linear probes reliably indicating absence of the relevant internal signals. If prediction information is encoded non-linearly (common in deep policies), linear probes can report absence even when the signal exists and is used downstream; this directly undermines the central conclusion about limitations in predictive signals for ego planning.

    Authors: We agree that linear probes can only detect linearly decodable information and therefore cannot rule out non-linear encodings. Our methodology follows the standard practice in mechanistic interpretability of using linear probes as a lower-bound test for the presence of accessible internal signals. The observed correlation between probe failure in near-collision regimes and subsequent planning deficiencies, together with the intervention results, supports the interpretation that predictive signals are limited. Nevertheless, the referee's point is valid: the manuscript language should not equate absence of a linear signal with complete absence of any representation. In revision we will (i) qualify all claims to refer specifically to linearly extractable prediction signals and (ii) add an explicit limitations paragraph discussing non-linear encodings and possible future non-linear probe experiments. revision: partial

  2. Referee: [Causal intervention experiments (as described in abstract)] The causal claim that 'correcting mistaken predictions improves ego planning toward safer trajectories' requires that targeted perturbations isolate the intended prediction signals without confounding artifacts or off-target effects on the network. The abstract supplies no details on model architectures, exact perturbation methods, statistical controls, or validation that interventions are specific and causal, which is load-bearing for the intervention results.

    Authors: The full manuscript contains the requested details: architectures for both imitation-learning and RL policies, the perturbation procedure (activation editing along probe-derived directions), control experiments that measure off-target effects, and statistical reporting of intervention outcomes. The abstract, however, is deliberately concise and therefore omits these elements. We will revise the abstract to include a short clause describing the intervention approach and will ensure the methods and results sections contain explicit validation of specificity (e.g., null interventions on unrelated directions). These changes will make the causal evidence more transparent without altering the experimental design. revision: yes

Circularity Check

0 steps flagged

Empirical probing study with no derivation chain

full rationale

The paper is an empirical investigation that applies linear probing and targeted perturbations to existing imitation and RL driving policies, then measures outcomes via closed-loop simulation and causal interventions. No equations, fitted parameters, or self-referential definitions are introduced whose outputs are later presented as independent predictions. All load-bearing claims rest on external experimental measurements rather than any internal reduction to the paper's own inputs or prior self-citations. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not mention or rely on any free parameters, axioms, or invented entities; the work is an empirical investigation using standard machine-learning interpretability tools.

pith-pipeline@v0.9.1-grok · 5806 in / 1164 out tokens · 38380 ms · 2026-07-01T05:46:00.297531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 8 canonical work pages · 4 internal anchors

  1. [1]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    nuScenes: A multimodal dataset for autonomous driving , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  2. [2]

    2023 IEEE international conference on robotics and automation (ICRA) , pages=

    Trafficgen: Learning to generate diverse and realistic traffic scenarios , author=. 2023 IEEE international conference on robotics and automation (ICRA) , pages=. 2023 , organization=

  3. [3]

    2021 , booktitle=

    Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting , author=. 2021 , booktitle=

  4. [4]

    Advances in Neural Information Processing Systems , volume=

    Model-based imitation learning for urban driving , author=. Advances in Neural Information Processing Systems , volume=

  5. [5]

    Reinforcement Learning Conference , year=

    Human-compatible driving agents through data-regularized self-play reinforcement learning , author=. Reinforcement Learning Conference , year=

  6. [6]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  7. [7]

    European Conference on Computer Vision , pages=

    Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  8. [8]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Argoverse: 3d tracking and forecasting with rich maps , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  9. [9]

    Advances in Neural Information Processing Systems , volume=

    Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research , author=. Advances in Neural Information Processing Systems , volume=

  10. [10]

    Advances in Neural Information Processing Systems , volume=

    Revisiting neural scaling laws in language and vision , author=. Advances in Neural Information Processing Systems , volume=

  11. [11]

    Scaling Laws for Autoregressive Generative Modeling

    Scaling laws for autoregressive generative modeling , author=. arXiv preprint arXiv:2010.14701 , year=

  12. [12]

    2025 , url=

    Saman Kazemkhani and Aarav Pandya and Daphne Cornelisse and Brennan Shacklett and Eugene Vinitsky , booktitle=. 2025 , url=

  13. [13]

    Advances in Neural Information Processing Systems , volume=

    Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world , author=. Advances in Neural Information Processing Systems , volume=

  14. [14]

    Advances in Neural Information Processing Systems , volume=

    Motion transformer with global intention localization and local movement refinement , author=. Advances in Neural Information Processing Systems , volume=

  15. [15]

    2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Wayformer: Motion forecasting via simple & efficient attention networks , author=. 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2023 , organization=

  16. [16]

    Conference on Robot Learning , pages=

    MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , author=. Conference on Robot Learning , pages=. 2020 , organization=

  17. [17]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Motionlm: Multi-agent motion forecasting as language modeling , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  18. [18]

    European conference on computer vision , pages=

    Drivelm: Driving with graph visual question answering , author=. European conference on computer vision , pages=. 2024 , organization=

  19. [19]

    Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

    Drama: Joint risk localization and captioning in driving , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

  20. [20]

    2025 , booktitle=

    WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving , author=. 2025 , booktitle=

  21. [21]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Is ego status all you need for open-loop end-to-end autonomous driving? , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  22. [22]

    Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

    Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes , author=. arXiv preprint arXiv:2305.10430 , year=

  23. [23]

    2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    CausalAgents: A robustness benchmark for motion forecasting , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

  24. [24]

    The Thirteenth International Conference on Learning Representations , year=

    Interpreting Emergent Planning in Model-Free Reinforcement Learning , author=. The Thirteenth International Conference on Learning Representations , year=

  25. [25]

    International conference on machine learning , pages=

    An investigation of model-free planning , author=. International conference on machine learning , pages=. 2019 , organization=

  26. [26]

    International conference on machine learning , pages=

    Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) , author=. International conference on machine learning , pages=. 2018 , organization=

  27. [27]

    Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies , pages=

    Linguistic regularities in continuous space word representations , author=. Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies , pages=

  28. [28]

    arXiv preprint arXiv:2412.02689 , year=

    Preliminary Investigation into Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving , author=. arXiv preprint arXiv:2412.02689 , year=

  29. [29]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Data Scaling Laws for End-to-End Autonomous Driving , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  30. [30]

    Baniodeh, K

    Scaling Laws of Motion Forecasting and Planning--A Technical Report , author=. arXiv preprint arXiv:2506.08228 , year=

  31. [31]

    Advances in neural information processing systems , volume=

    Visual autoregressive modeling: Scalable image generation via next-scale prediction , author=. Advances in neural information processing systems , volume=

  32. [32]

    2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

  33. [33]

    Understanding intermediate layers using linear classifier probes

    Understanding intermediate layers using linear classifier probes , author=. arXiv preprint arXiv:1610.01644 , year=

  34. [34]

    Advances in neural information processing systems , volume=

    Generative adversarial imitation learning , author=. Advances in neural information processing systems , volume=

  35. [35]

    Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=

    A reduction of imitation learning and structured prediction to no-regret online learning , author=. Proceedings of the fourteenth international conference on artificial intelligence and statistics , pages=. 2011 , organization=

  36. [36]

    IEEE Transactions on Robotics , volume=

    Interactive autonomous navigation with internal state inference and interactivity estimation , author=. IEEE Transactions on Robotics , volume=. 2024 , publisher=

  37. [37]

    2021 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Reinforcement learning for autonomous driving with latent state inference and spatial-temporal relationships , author=. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2021 , organization=

  38. [38]

    Scene Transformer: A unified architecture for predicting future trajectories of multiple agents , author=

  39. [39]

    European Conference on Computer Vision , pages=

    Reason2drive: Towards interpretable and chain-based reasoning for autonomous driving , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  40. [40]

    2024 , booktitle=

    Trajeglish: Traffic Modeling as Next-Token Prediction , author=. 2024 , booktitle=

  41. [41]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    On exposing the challenging long tail in future prediction of traffic actors , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  42. [42]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Shared cross-modal trajectory prediction for autonomous driving , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  43. [43]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Pedestrian and ego-vehicle trajectory prediction from monocular camera , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  44. [44]

    Forecasting From LiDAR via Future Object Detection , booktitle =

    Peri, Neehar and Luiten, Jonathon and Li, Mengtian and O. Forecasting From LiDAR via Future Object Detection , booktitle =. 2022 , pages =

  45. [45]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Chen, Dian and Kr\"ahenb\"uhl, Philipp , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2022 , pages =

  46. [46]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Scept: Scene-consistent, policy-based trajectory predictions for planning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  47. [47]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Hu, Yihan and Yang, Jiazhi and Chen, Li and Li, Keyu and Sima, Chonghao and Zhu, Xizhou and Chai, Siqi and Du, Senyao and Lin, Tianwei and Wang, Wenhai and Lu, Lewei and Jia, Xiaosong and Liu, Qiang and Dai, Jifeng and Qiao, Yu and Li, Hongyang , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2...

  48. [48]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models , author =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  49. [49]

    Conference on Robot Learning , pages=

    Jfp: Joint future prediction with interactive multi-agent modeling for autonomous driving , author=. Conference on Robot Learning , pages=. 2023 , organization=

  50. [50]

    arXiv preprint arXiv:2504.11521 , year=

    LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation , author=. arXiv preprint arXiv:2504.11521 , year=

  51. [51]

    Conference on Robot Learning , pages=

    Action-based representation learning for autonomous driving , author=. Conference on Robot Learning , pages=. 2021 , organization=

  52. [52]

    Advances in Neural Information Processing Systems , volume=

    The waymo open sim agents challenge , author=. Advances in Neural Information Processing Systems , volume=

  53. [53]

    2025 , note =

    Daphne Cornelisse and Spencer Cheng and Pragnay Mandavilli and Julian Hunt and Kevin Joseph and Wa. 2025 , note =

  54. [54]

    International Conference on Learning Representations , volume=

    Words in motion: Extracting interpretable control vectors for motion transformers , author=. International Conference on Learning Representations , volume=

  55. [55]

    Transformer Circuits Thread , volume=

    Towards monosemanticity: Decomposing language models with dictionary learning , author=. Transformer Circuits Thread , volume=

  56. [56]

    Advances in Neural Information Processing Systems , volume=

    Smart: Scalable multi-agent real-time motion generation via next-token prediction , author=. Advances in Neural Information Processing Systems , volume=

  57. [57]

    Cornelisse, A

    Building reliable sim driving agents by scaling self-play , author=. arXiv preprint arXiv:2502.14706 , year=

  58. [58]

    Proximal Policy Optimization Algorithms

    Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=