Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey

Juan Zhong; Xi Chen; Yuhang Shi; Zukang Xu

arxiv: 2304.10891 · v3 · pith:SKLXIQYDnew · submitted 2023-04-21 · 💻 cs.LG · cs.AI· cs.CV· cs.RO· cs.SY· eess.SY

Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey

Juan Zhong , Yuhang Shi , Zukang Xu , Xi Chen This is my paper

Pith reviewed 2026-05-24 09:32 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.ROcs.SYeess.SY

keywords autonomous drivingtransformer modelsmodel compressionquantizationpruningknowledge distillationdeploymentsafety

0 comments

The pith

Compression strategies for Transformer autonomous driving models must be integrated into system design rather than applied afterward.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews representative Transformer-based models for autonomous driving tasks including perception, prediction, and planning. It organizes the models by task role, sensing configuration, and architectural design while examining their computational demands. The central argument is that high-capacity attention architectures create latency, memory, and energy barriers to real-vehicle use, making compression techniques such as quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention necessary from the design stage. A sympathetic reader would care because treating compression as a system-level factor directly shapes deployability, robustness, and safety outcomes instead of leaving them as afterthoughts. The paper concludes by identifying open challenges for standardized, safety-aware, and hardware-conscious evaluation.

Core claim

Rather than treating compression as an isolated post-processing step, the survey highlights it as a system-level design consideration that directly affects deployability, robustness, and safety of Transformer-based autonomous driving models.

What carries the argument

Deployment-oriented perspective that examines how efficiency constraints reshape model design choices across task roles and sensing configurations.

If this is right

Model architectures will be selected and modified with upfront awareness of which compression methods preserve performance on specific driving tasks.
Safety and robustness testing will need to evaluate compressed versions on target hardware rather than full-precision models alone.
Future system designs will prioritize efficient attention mechanisms and low-rank approximations during initial development.
Evaluation benchmarks will incorporate metrics for latency, memory, and energy under realistic vehicle constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hardware platforms for vehicles may need accelerators tuned specifically to the compressed attention patterns common in these models.
The same system-level view could be tested on non-Transformer architectures to see if the deployability benefits hold more generally.
Regulatory requirements for autonomous vehicles might eventually demand documented compression strategies as part of safety certification.

Load-bearing premise

The survey assumes that the representative models and compression strategies selected from the literature are sufficiently complete and unbiased to support general statements about task-dependent applicability and design trade-offs.

What would settle it

A systematic review that adds many previously omitted models and shows compression applicability patterns that contradict the surveyed task-dependent conclusions would falsify the general claims.

Figures

Figures reproduced from arXiv: 2304.10891 by Juan Zhong, Xi Chen, Yuhang Shi, Zukang Xu.

**Figure 2.** Figure 2: A timeline diagram illustrating the history and key milestones of attention mechanisms and Transformer architectures research. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The architecture of ViT, the left panel shows the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Transformer inputs and outputs in different encoder and decode structures: (a) Object query from 2D image features; (b) [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Layers in ResNet and Swin-Transformer: (a) The ResNet basic unit, known as the ”bottleneck,” comprises 1x1 and 3x3 [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: A table lists primary operators for deploying an example Transformer model onto the portable hardware. Parameter Category: [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Transformer 4D Encoder Structure: BEVformer encoder structure ”encoder layer”, the same as Swin-Transformer, BEVformer [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Transformer-based models are becoming a central paradigm in autonomous driving because they can capture long-range spatial dependencies, multi-agent interactions, and multimodal context across perception, prediction, and planning. At the same time, their deployment in real vehicles remains difficult because high-capacity attention-based architectures impose substantial latency, memory, and energy overhead. This survey reviews representative Transformer-based autonomous driving models and organizes them by task role, sensing configuration, and architectural design. More importantly, it examines these models from a deployment-oriented perspective and analyzes how efficiency constraints reshape model design choices in practice. We further review compression and acceleration strategies relevant to Transformer-based driving systems, including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, and discuss their benefits, limitations, and task-dependent applicability. Rather than treating compression as an isolated post-processing step, we highlight it as a system-level design consideration that directly affects deployability, robustness, and safety. Finally, we identify open challenges and future research directions toward standardized, safety-aware, and hardware-conscious evaluation of efficient autonomous driving systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This survey reviews Transformer-based models for autonomous driving, organizing them by task role (perception, prediction, planning), sensing configuration, and architectural design. It analyzes compression and acceleration techniques including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, with discussion of their benefits, limitations, and task-dependent applicability. The central thesis is that compression should be treated as a system-level design consideration affecting deployability, robustness, and safety rather than a post-processing step, and the paper concludes by identifying open challenges for standardized, safety-aware evaluation.

Significance. If the reviewed models and methods are representative, the survey would usefully synthesize an emerging intersection of Transformers and efficient AD systems, providing researchers with a deployment-oriented lens that connects architectural choices to real-vehicle constraints. The explicit framing of compression as integral to safety and robustness could influence future work on hardware-conscious AD pipelines.

major comments (2)

[Introduction] The manuscript states that it reviews 'representative' Transformer-based AD models and compression strategies but contains no description of the literature search protocol, databases, keywords, inclusion/exclusion criteria, date range, or total paper count (Introduction and §2). This absence is load-bearing for the claims of task-dependent applicability and the system-level safety perspective, because omitted counterexamples (e.g., cases where compression degrades safety metrics) could invalidate the highlighted patterns.
[Compression Strategies] §4 (compression review) asserts task-dependent trade-offs and limitations without citing a systematic selection process or quantitative meta-analysis of the reviewed works. The general statements on robustness and safety therefore rest on an unverified sample; a concrete test would be to report how many papers were screened versus included and whether any safety-critical negative results were excluded.

minor comments (2)

Figure captions and table headers could more explicitly link back to the system-level design claim (e.g., by annotating which compression methods are shown to affect safety metrics).
A small number of citations appear to be from preprints without noting their archival status; adding DOIs or arXiv identifiers would improve traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our survey. We agree that greater transparency regarding the literature selection process will strengthen the paper and support the claims of representativeness and task-dependent applicability. We address each major comment below.

read point-by-point responses

Referee: [Introduction] The manuscript states that it reviews 'representative' Transformer-based AD models and compression strategies but contains no description of the literature search protocol, databases, keywords, inclusion/exclusion criteria, date range, or total paper count (Introduction and §2). This absence is load-bearing for the claims of task-dependent applicability and the system-level safety perspective, because omitted counterexamples (e.g., cases where compression degrades safety metrics) could invalidate the highlighted patterns.

Authors: We agree that the manuscript would benefit from an explicit description of the literature search process. Although the survey is intended as a representative rather than exhaustive systematic review, the lack of this information does limit assessment of scope and potential omissions. In the revised version we will add a dedicated subsection to §2 that specifies the databases searched, keywords and queries employed, inclusion/exclusion criteria, date range, and approximate counts of papers screened versus included. This addition will directly support the claims of representativeness and allow readers to evaluate the risk of omitted counterexamples. revision: yes
Referee: [Compression Strategies] §4 (compression review) asserts task-dependent trade-offs and limitations without citing a systematic selection process or quantitative meta-analysis of the reviewed works. The general statements on robustness and safety therefore rest on an unverified sample; a concrete test would be to report how many papers were screened versus included and whether any safety-critical negative results were excluded.

Authors: We agree that §4 would be strengthened by greater transparency on paper selection. While the review is narrative rather than a quantitative meta-analysis, we will revise the section to describe the selection criteria for the compression strategies and papers discussed, report screened versus included counts where records permit, and note any safety-critical negative results that were considered. These changes will provide clearer grounding for the statements on task-dependent trade-offs, robustness, and safety. revision: yes

Circularity Check

0 steps flagged

No circularity: literature survey with no derivations or predictions

full rationale

The paper is a survey that reviews and organizes existing Transformer-based autonomous driving models and compression methods from the literature. It presents no equations, no fitted parameters, no predictions, and no derivation chain. The central claim is a perspective on treating compression as a system-level factor, supported by synthesis of reviewed works rather than any self-referential reduction. No self-citation load-bearing, ansatz smuggling, or renaming of results occurs. The selection of representative models is acknowledged as a potential limitation in the reader's take, but that is a completeness issue, not circularity. This matches the default expectation for non-circular survey papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey with no new mathematical derivations, fitted parameters, or postulated entities.

pith-pipeline@v0.9.0 · 5733 in / 1015 out tokens · 27441 ms · 2026-05-24T09:32:27.712912+00:00 · methodology

Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)