Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey
Pith reviewed 2026-05-24 09:32 UTC · model grok-4.3
The pith
Compression strategies for Transformer autonomous driving models must be integrated into system design rather than applied afterward.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rather than treating compression as an isolated post-processing step, the survey highlights it as a system-level design consideration that directly affects deployability, robustness, and safety of Transformer-based autonomous driving models.
What carries the argument
Deployment-oriented perspective that examines how efficiency constraints reshape model design choices across task roles and sensing configurations.
If this is right
- Model architectures will be selected and modified with upfront awareness of which compression methods preserve performance on specific driving tasks.
- Safety and robustness testing will need to evaluate compressed versions on target hardware rather than full-precision models alone.
- Future system designs will prioritize efficient attention mechanisms and low-rank approximations during initial development.
- Evaluation benchmarks will incorporate metrics for latency, memory, and energy under realistic vehicle constraints.
Where Pith is reading between the lines
- Hardware platforms for vehicles may need accelerators tuned specifically to the compressed attention patterns common in these models.
- The same system-level view could be tested on non-Transformer architectures to see if the deployability benefits hold more generally.
- Regulatory requirements for autonomous vehicles might eventually demand documented compression strategies as part of safety certification.
Load-bearing premise
The survey assumes that the representative models and compression strategies selected from the literature are sufficiently complete and unbiased to support general statements about task-dependent applicability and design trade-offs.
What would settle it
A systematic review that adds many previously omitted models and shows compression applicability patterns that contradict the surveyed task-dependent conclusions would falsify the general claims.
Figures
read the original abstract
Transformer-based models are becoming a central paradigm in autonomous driving because they can capture long-range spatial dependencies, multi-agent interactions, and multimodal context across perception, prediction, and planning. At the same time, their deployment in real vehicles remains difficult because high-capacity attention-based architectures impose substantial latency, memory, and energy overhead. This survey reviews representative Transformer-based autonomous driving models and organizes them by task role, sensing configuration, and architectural design. More importantly, it examines these models from a deployment-oriented perspective and analyzes how efficiency constraints reshape model design choices in practice. We further review compression and acceleration strategies relevant to Transformer-based driving systems, including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, and discuss their benefits, limitations, and task-dependent applicability. Rather than treating compression as an isolated post-processing step, we highlight it as a system-level design consideration that directly affects deployability, robustness, and safety. Finally, we identify open challenges and future research directions toward standardized, safety-aware, and hardware-conscious evaluation of efficient autonomous driving systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This survey reviews Transformer-based models for autonomous driving, organizing them by task role (perception, prediction, planning), sensing configuration, and architectural design. It analyzes compression and acceleration techniques including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, with discussion of their benefits, limitations, and task-dependent applicability. The central thesis is that compression should be treated as a system-level design consideration affecting deployability, robustness, and safety rather than a post-processing step, and the paper concludes by identifying open challenges for standardized, safety-aware evaluation.
Significance. If the reviewed models and methods are representative, the survey would usefully synthesize an emerging intersection of Transformers and efficient AD systems, providing researchers with a deployment-oriented lens that connects architectural choices to real-vehicle constraints. The explicit framing of compression as integral to safety and robustness could influence future work on hardware-conscious AD pipelines.
major comments (2)
- [Introduction] The manuscript states that it reviews 'representative' Transformer-based AD models and compression strategies but contains no description of the literature search protocol, databases, keywords, inclusion/exclusion criteria, date range, or total paper count (Introduction and §2). This absence is load-bearing for the claims of task-dependent applicability and the system-level safety perspective, because omitted counterexamples (e.g., cases where compression degrades safety metrics) could invalidate the highlighted patterns.
- [Compression Strategies] §4 (compression review) asserts task-dependent trade-offs and limitations without citing a systematic selection process or quantitative meta-analysis of the reviewed works. The general statements on robustness and safety therefore rest on an unverified sample; a concrete test would be to report how many papers were screened versus included and whether any safety-critical negative results were excluded.
minor comments (2)
- Figure captions and table headers could more explicitly link back to the system-level design claim (e.g., by annotating which compression methods are shown to affect safety metrics).
- A small number of citations appear to be from preprints without noting their archival status; adding DOIs or arXiv identifiers would improve traceability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our survey. We agree that greater transparency regarding the literature selection process will strengthen the paper and support the claims of representativeness and task-dependent applicability. We address each major comment below.
read point-by-point responses
-
Referee: [Introduction] The manuscript states that it reviews 'representative' Transformer-based AD models and compression strategies but contains no description of the literature search protocol, databases, keywords, inclusion/exclusion criteria, date range, or total paper count (Introduction and §2). This absence is load-bearing for the claims of task-dependent applicability and the system-level safety perspective, because omitted counterexamples (e.g., cases where compression degrades safety metrics) could invalidate the highlighted patterns.
Authors: We agree that the manuscript would benefit from an explicit description of the literature search process. Although the survey is intended as a representative rather than exhaustive systematic review, the lack of this information does limit assessment of scope and potential omissions. In the revised version we will add a dedicated subsection to §2 that specifies the databases searched, keywords and queries employed, inclusion/exclusion criteria, date range, and approximate counts of papers screened versus included. This addition will directly support the claims of representativeness and allow readers to evaluate the risk of omitted counterexamples. revision: yes
-
Referee: [Compression Strategies] §4 (compression review) asserts task-dependent trade-offs and limitations without citing a systematic selection process or quantitative meta-analysis of the reviewed works. The general statements on robustness and safety therefore rest on an unverified sample; a concrete test would be to report how many papers were screened versus included and whether any safety-critical negative results were excluded.
Authors: We agree that §4 would be strengthened by greater transparency on paper selection. While the review is narrative rather than a quantitative meta-analysis, we will revise the section to describe the selection criteria for the compression strategies and papers discussed, report screened versus included counts where records permit, and note any safety-critical negative results that were considered. These changes will provide clearer grounding for the statements on task-dependent trade-offs, robustness, and safety. revision: yes
Circularity Check
No circularity: literature survey with no derivations or predictions
full rationale
The paper is a survey that reviews and organizes existing Transformer-based autonomous driving models and compression methods from the literature. It presents no equations, no fitted parameters, no predictions, and no derivation chain. The central claim is a perspective on treating compression as a system-level factor, supported by synthesis of reviewed works rather than any self-referential reduction. No self-citation load-bearing, ansatz smuggling, or renaming of results occurs. The selection of representative models is acknowledged as a potential limitation in the reader's take, but that is a completeness issue, not circularity. This matches the default expectation for non-circular survey papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Three decades of driver assistance systems: Review and future perspectives,
K. Bengler, K. Dietmayer, B. Farber, M. Maurer, C. Stiller, and H. Winner, “Three decades of driver assistance systems: Review and future perspectives,” IEEE Intelligent trans- portation systems magazine , vol. 6, no. 4, pp. 6–22, 2014
work page 2014
-
[2]
Autonomous cars: Research results, issues, and future challenges,
R. Hussain and S. Zeadally, “Autonomous cars: Research results, issues, and future challenges,” IEEE Communica- tions Surveys & Tutorials , vol. 21, no. 2, pp. 1275–1313, 2018
work page 2018
-
[3]
A survey of deep learning techniques for autonomous driving,
S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, “A survey of deep learning techniques for autonomous driving,” Journal of Field Robotics, vol. 37, no. 3, pp. 362–386, 2020
work page 2020
-
[4]
Autonomous driving in urban environments: approaches, lessons and challenges,
M. Campbell, M. Egerstedt, J. P. How, and R. M. Murray, “Autonomous driving in urban environments: approaches, lessons and challenges,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 368, no. 1928, pp. 4649–4672, 2010
work page 1928
-
[5]
C. Katrakazas, M. Quddus, W.-H. Chen, and L. Deka, “Real-time motion planning methods for autonomous on- road driving: State-of-the-art and future research directions,” Transportation Research Part C: Emerging Technologies , vol. 60, pp. 416–442, 2015
work page 2015
-
[6]
A survey of motion planning and control techniques for self-driving urban vehicles,
B. Paden, M. ˇC´ap, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on intelli- gent vehicles, vol. 1, no. 1, pp. 33–55, 2016
work page 2016
-
[7]
Simultane- ous localization and mapping: A survey of current trends in autonomous driving,
G. Bresson, Z. Alsayed, L. Yu, and S. Glaser, “Simultane- ous localization and mapping: A survey of current trends in autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 2, no. 3, pp. 194–220, 2017
work page 2017
-
[8]
Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015
work page 2015
-
[9]
Recent advances in convolutional neural networks,
J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Caiet al., “Recent advances in convolutional neural networks,” Pattern recognition, vol. 77, pp. 354–377, 2018
work page 2018
-
[10]
A survey of autonomous driving: Common practices and emerging technologies,
E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE access , vol. 8, pp. 58 443– 58 469, 2020
work page 2020
-
[11]
A survey of deep learning applications to autonomous vehicle control,
S. Kuutti, R. Bowden, Y . Jin, P. Barber, and S. Fallah, “A survey of deep learning applications to autonomous vehicle control,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 712–733, 2020
work page 2020
-
[12]
Autonomous driving ar- chitectures: insights of machine learning and deep learning algorithms,
M. R. Bachute and J. M. Subhedar, “Autonomous driving ar- chitectures: insights of machine learning and deep learning algorithms,” Machine Learning with Applications , vol. 6, p. 100164, 2021
work page 2021
-
[13]
A review on autonomous vehicles: Progress, methods and challenges,
D. Parekh, N. Poddar, A. Rajpurkar, M. Chahal, N. Kumar, G. P. Joshi, and W. Cho, “A review on autonomous vehicles: Progress, methods and challenges,” Electronics, vol. 11, no. 14, p. 2162, 2022
work page 2022
-
[14]
Deep reinforcement learning for autonomous driving: A survey,
B. R. Kiran, I. Sobh, V . Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. P ´erez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transac- tions on Intelligent Transportation Systems , vol. 23, no. 6, pp. 4909–4926, 2021
work page 2021
-
[15]
Deep learning-based image 3d object detection for autonomous driving,
S. Y . Alaba and J. E. Ball, “Deep learning-based image 3d object detection for autonomous driving,” IEEE Sensors Journal, 2023
work page 2023
-
[16]
Deep learning-based vehicle behavior prediction for autonomous driving applications: A review,
S. Mozaffari, O. Y . Al-Jarrah, M. Dianati, P. Jennings, and A. Mouzakitis, “Deep learning-based vehicle behavior prediction for autonomous driving applications: A review,” IEEE Transactions on Intelligent Transportation Systems , vol. 23, no. 1, pp. 33–47, 2020
work page 2020
-
[17]
A survey on trajectory-prediction methods for autonomous driving,
Y . Huang, J. Du, Z. Yang, Z. Zhou, L. Zhang, and H. Chen, “A survey on trajectory-prediction methods for autonomous driving,” IEEE Transactions on Intelligent Vehicles , vol. 7, no. 3, pp. 652–674, 2022
work page 2022
-
[18]
Deep learning for image and point cloud fusion in autonomous driving: A review,
Y . Cui, R. Chen, W. Chu, L. Chen, D. Tian, Y . Li, and D. Cao, “Deep learning for image and point cloud fusion in autonomous driving: A review,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 2, pp. 722– 739, 2021
work page 2021
-
[19]
Planning and decision-making for autonomous vehicles,
W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision-making for autonomous vehicles,” Annual Review of Control, Robotics, and Autonomous Systems , vol. 1, pp. 187–210, 2018
work page 2018
-
[20]
Q. Liu, X. Li, S. Yuan, and Z. Li, “Decision-making technology for autonomous vehicles: Learning-based meth- ods, applications and future outlook,” in 2021 IEEE In- ternational Intelligent Transportation Systems Conference (ITSC). IEEE, 2021, pp. 30–37
work page 2021
-
[21]
S. Abdallaoui, E.-H. Aglzim, A. Chaibet, and A. Krib `eche, “Thorough review analysis of safe control of autonomous vehicles: path planning and navigation techniques,” Ener- gies, vol. 15, no. 4, p. 1358, 2022
work page 2022
-
[22]
Explainability of deep vision-based autonomous driving systems: Review and challenges,
´E. Zablocki, H. Ben-Younes, P. P ´erez, and M. Cord, “Explainability of deep vision-based autonomous driving systems: Review and challenges,” International Journal of Computer Vision (IJCV 2022) , vol. 130, no. 10, pp. 2425– 2452, 2022
work page 2022
-
[23]
A survey on safety-critical driving scenario generation—a methodological perspective,
W. Ding, C. Xu, M. Arief, H. Lin, B. Li, and D. Zhao, “A survey on safety-critical driving scenario generation—a methodological perspective,” IEEE Transactions on Intelli- gent Transportation Systems, 2023
work page 2023
-
[24]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” 31st Conference on Neural Information Processing Systems (NIPS 2017) , vol. 30, 2017
work page 2017
-
[25]
Neural machine translation by jointly learning to align and translate,
D. Bahdanau, K. H. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” in 3rd International Conference on Learning Representations (ICLR), 2015
work page 2015
-
[26]
Effective approaches to attention-based neural machine translation,
M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in The 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1412–1421
work page 2015
-
[27]
Bert: Pre-training of deep bidirectional transformers for language understanding,
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), 2019, pp. 4171–4186
work page 2019
-
[28]
Improving language understanding by generative pre- training,
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre- training,” 2018
work page 2018
-
[29]
An image is worth 16x16 words: Transformers for image recog- nition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recog- nition at scale,” ICLR, 2021
work page 2021
-
[30]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision (ICCV) , 2021, pp. 10 012–10 022
work page 2021
-
[31]
Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,
Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” 2023
work page 2023
-
[32]
BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework,
T. Liang, H. Xie, K. Yu, Z. Xia, Z. Lin, Y . Wang, T. Tang, B. Wang, and Z. Tang, “BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework,” in Neural Information Processing Systems (NeurIPS) , 2022
work page 2022
-
[33]
Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving,
Y . Zhang, Z. Zhu, W. Zheng, J. Huang, G. Huang, J. Zhou, and J. Lu, “Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving,” arXiv preprint arXiv:2205.09743 , 2022
-
[34]
Detr3d: 3d object detection from multi- view images via 3d-to-2d queries,
Y . Wang, V . C. Guizilini, T. Zhang, Y . Wang, H. Zhao, and J. Solomon, “Detr3d: 3d object detection from multi- view images via 3d-to-2d queries,” in Conference on Robot Learning, 2021, pp. 180–191
work page 2021
-
[35]
Futr3d: A unified sensor fusion framework for 3d detec- tion,
X. Chen, T. Zhang, Y . Wang, Y . Wang, and H. Zhao, “Futr3d: A unified sensor fusion framework for 3d detec- tion,” arXiv preprint arXiv:2203.10642 , 2022
-
[36]
Petr: Position em- bedding transformation for multi-view 3d object detection,
Y . Liu, T. Wang, X. Zhang, and J. Sun, “Petr: Position em- bedding transformation for multi-view 3d object detection,” in Computer Vision–ECCV 2022: 17th European Confer- JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 17 ence, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII. Springer, 2022, pp. 531–548
work page 2022
-
[37]
Petrv2: A unified framework for 3d perception from multi-camera images,
Y . Liu, J. Yan, F. Jia, S. Li, Q. Gao, T. Wang, X. Zhang, and J. Sun, “Petrv2: A unified framework for 3d perception from multi-camera images,” arXiv preprint arXiv:2206.01256 , 2022
-
[38]
Crossdtr: Cross-view and depth-guided transformers for 3d object detection,
C.-Y . Tseng, Y .-R. Chen, H.-Y . Lee, T.-H. Wu, W.-C. Chen, and W. Hsu, “Crossdtr: Cross-view and depth-guided transformers for 3d object detection,” The 40th IEEE In- ternational Conference on Robotics and Automation (ICRA 2023, 2023
work page 2023
-
[39]
Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y . Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in Computer Vision–ECCV 2022: 17th European Confer- ence, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX. Springer, 2022, pp. 1–18
work page 2022
-
[40]
C. Yang, Y . Chen, H. Tian, C. Tao, X. Zhu, Z. Zhang, G. Huang, H. Li, Y . Qiao, L. Lu et al. , “Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision,” arXiv preprint arXiv:2211.10439, 2022
-
[41]
Unifying voxel-based representation with transformer for 3d object detection,
Y . Li, Y . Chen, X. Qi, Z. Li, J. Sun, and J. Jia, “Unifying voxel-based representation with transformer for 3d object detection,” in 36th Conference on Neural Information Pro- cessing Systems (NeurIPS 2022). , 2022
work page 2022
-
[42]
Tri- perspective view for vision-based 3d semantic occupancy prediction,
Y . Huang, W. Zheng, Y . Zhang, J. Zhou, and J. Lu, “Tri- perspective view for vision-based 3d semantic occupancy prediction,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023
work page 2023
-
[43]
V oxformer: Sparse voxel transformer for camera-based 3d semantic scene comple- tion,
Y . Li, Z. Yu, C. Choy, C. Xiao, J. M. Alvarez, S. Fidler, C. Feng, and A. Anandkumar, “V oxformer: Sparse voxel transformer for camera-based 3d semantic scene comple- tion,” in The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) , 2023
work page 2023
-
[44]
Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving,
Y . Wei, L. Zhao, W. Zheng, Z. Zhu, J. Zhou, and J. Lu, “Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving,” in The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) , 2023
work page 2023
-
[45]
Motr: End-to-end multiple-object tracking with transformer,
F. Zeng, B. Dong, Y . Zhang, T. Wang, X. Zhang, and Y . Wei, “Motr: End-to-end multiple-object tracking with transformer,” in Computer Vision–ECCV 2022: 17th Euro- pean Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII. Springer, 2022, pp. 659–675
work page 2022
-
[46]
Mutr3d: A multi-camera tracking framework via 3d-to-2d queries,
T. Zhang, X. Chen, Y . Wang, Y . Wang, and H. Zhao, “Mutr3d: A multi-camera tracking framework via 3d-to-2d queries,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4537– 4546
work page 2022
-
[47]
Bevseg- former: Bird’s eye view semantic segmentation from arbi- trary camera rigs,
L. Peng, Z. Chen, Z. Fu, P. Liang, and E. Cheng, “Bevseg- former: Bird’s eye view semantic segmentation from arbi- trary camera rigs,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , 2023, pp. 5935–5943
work page 2023
-
[49]
End-to-end lane shape prediction with transformers,
R. Liu, Z. Yuan, T. Liu, and Z. Xiong, “End-to-end lane shape prediction with transformers,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 3694–3702
work page 2021
-
[50]
Curveformer: 3d lane detection by curve propagation with curve queries and attention,
Y . Bai, Z. Chen, Z. Fu, L. Peng, P. Liang, and E. Cheng, “Curveformer: 3d lane detection by curve propagation with curve queries and attention,” IEEE Conference on Robotics and Automation, ICRA 2023 , 2023
work page 2023
-
[51]
Translat- ing images into maps,
A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translat- ing images into maps,” in 2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 9200– 9206
work page 2022
-
[52]
Panoptic segformer: Delving deeper into panoptic segmentation with transformers,
Z. Li, W. Wang, E. Xie, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, and T. Lu, “Panoptic segformer: Delving deeper into panoptic segmentation with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 1280–1289
work page 2022
-
[53]
Struc- tured bird’s-eye-view traffic scene understanding from on- board images,
Y . B. Can, A. Liniger, D. P. Paudel, and L. Van Gool, “Struc- tured bird’s-eye-view traffic scene understanding from on- board images,” in Proceedings of the IEEE/CVF interna- tional conference on computer vision (ICCV) , 2021, pp. 15 661–15 670
work page 2021
-
[54]
Vectormapnet: End-to-end vectorized hd map learning,
Y . Liu, Y . Wang, Y . Wang, and H. Zhao, “Vectormapnet: End-to-end vectorized hd map learning,” arXiv preprint arXiv:2206.08920, 2022
-
[55]
Maptr: Structured modeling and learning for online vectorized hd map construction,
B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “Maptr: Structured modeling and learning for online vectorized hd map construction,” in International Conference on Learning Representations , 2023
work page 2023
-
[56]
Vectornet: Encoding hd maps and agent dynamics from vectorized representation,
J. Gao, C. Sun, H. Zhao, Y . Shen, D. Anguelov, C. Li, and C. Schmid, “Vectornet: Encoding hd maps and agent dynamics from vectorized representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 525–11 533
work page 2020
-
[57]
Densetnt: End-to-end trajec- tory prediction from dense goal sets,
J. Gu, C. Sun, and H. Zhao, “Densetnt: End-to-end trajec- tory prediction from dense goal sets,” in Proceedings of the IEEE/CVF international conference on computer vision (ICCV), 2021, pp. 15 303–15 312
work page 2021
-
[58]
Mul- timodal motion prediction with stacked transformers,
Y . Liu, J. Zhang, L. Fang, Q. Jiang, and B. Zhou, “Mul- timodal motion prediction with stacked transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021, pp. 7577–7586
work page 2021
-
[59]
Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,
Y . Yuan, X. Weng, Y . Ou, and K. M. Kitani, “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , 2021, pp. 9813– 9823
work page 2021
-
[60]
Wayformer: Motion forecasting via simple & efficient attention networks,
N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp, “Wayformer: Motion forecasting via simple & efficient attention networks,” arXiv preprint arXiv:2207.05844, 2022
-
[61]
Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022
work page 2022
-
[62]
Neat: Neural atten- tion fields for end-to-end autonomous driving,
K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural atten- tion fields for end-to-end autonomous driving,” in Proceed- ings of the IEEE/CVF international conference on computer vision (ICCV), 2021
work page 2021
-
[63]
Safety- enhanced autonomous driving using interpretable sensor fusion transformer,
H. Shao, L. Wang, R. Chen, H. Li, and Y . Liu, “Safety- enhanced autonomous driving using interpretable sensor fusion transformer,” in 6th Conference on Robot Learning (CoRL 2022). PMLR, 2022, pp. 726–737
work page 2022
-
[64]
Mmfn: Multi-modal-fusion-net for end-to-end driving,
Q. Zhang, M. Tang, R. Geng, F. Chen, R. Xin, and L. Wang, “Mmfn: Multi-modal-fusion-net for end-to-end driving,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2022, pp. 8638–8643
work page 2022
-
[65]
St- p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,
S. Hu, L. Chen, P. Wu, H. Li, J. Yan, and D. Tao, “St- p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII . Springer, 2022, pp. 533–549
work page 2022
-
[66]
Planning-oriented autonomous driv- ing,
Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y . Qiao, and H. Li, “Planning-oriented autonomous driv- ing,” 2023
work page 2023
-
[67]
End-to-end object detection with trans- formers,
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with trans- formers,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceed- ings, Part I 16 . Springer, 2020, pp. 213–229
work page 2020
-
[68]
Deformable detr: Deformable transformers for end-to-end object detection,
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” Ninth International Conference on Learn- ing Representations (ICLR 2021) , 2020
work page 2021
-
[69]
Future transformer for long-term action anticipation,
D. Gong, J. Lee, M. Kim, S. J. Ha, and M. Cho, “Future transformer for long-term action anticipation,” in The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2022, pp. 3052–3061. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 18
work page 2022
-
[70]
Hdmapnet: An online hd map construction and evaluation framework,
Q. Li, Y . Wang, Y . Wang, and H. Zhao, “Hdmapnet: An online hd map construction and evaluation framework,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 4628–4634
work page 2022
-
[71]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255
work page 2009
-
[72]
Microsoft coco: Com- mon objects in context,
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Com- mon objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 . Springer, 2014, pp. 740–755
work page 2014
-
[73]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 11 621–11 631
work page 2020
-
[74]
End-to-end lane marker detection via row-wise classification,
S. Yoo, H. S. Lee, H. Myeong, S. Yun, H. Park, J. Cho, and D. H. Kim, “End-to-end lane marker detection via row-wise classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , 2020, pp. 1006–1007
work page 2020
-
[75]
Persformer: 3d lane detection via perspective transformer and the openlane benchmark,
L. Chen, C. Sima, Y . Li, Z. Zheng, J. Xu, X. Geng, H. Li, C. He, J. Shi, Y . Qiao, and J. Yan, “Persformer: 3d lane detection via perspective transformer and the openlane benchmark,” in European Conference on Computer Vision (ECCV), 2022
work page 2022
-
[76]
Argoverse: 3d tracking and forecasting with rich maps,
M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan et al., “Argoverse: 3d tracking and forecasting with rich maps,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2019, pp. 8748–8757
work page 2019
-
[77]
Carla: An open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning . PMLR, 2017, pp. 1–16
work page 2017
-
[78]
Tnt: Target- driven trajectory prediction,
H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y . Shen, Y . Shen, Y . Chai, C. Schmidet al., “Tnt: Target- driven trajectory prediction,” in Conference on Robot Learn- ing. PMLR, 2021, pp. 895–904
work page 2021
-
[79]
Deformable convolutional networks,
J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei, “Deformable convolutional networks,” in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 764–773
work page 2017
-
[80]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778
work page 2016
-
[81]
Post-training quantization for vision transformer,
Z. Liu, Y . Wang, K. Han, W. Zhang, S. Ma, and W. Gao, “Post-training quantization for vision transformer,” 35th Conference on Neural Information Processing Systems (NeurIPS 2021)., vol. 34, pp. 28 092–28 103, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.