pith. machine review for the scientific record. sign in

arxiv: 2605.08911 · v1 · submitted 2026-05-09 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:58 UTC · model grok-4.3

classification 💻 cs.CV
keywords lane detectionlane topologydriving scene reasoningunified modelingautonomous vehiclesOpenLane-V2
0
0 comments X

The pith

Modeling lanes and topology as connected predecessor and successor lanes enables direct perception of both from raw images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that treats lane topology as direct connections to predecessor and successor lanes rather than deriving it after separate detection. This unified representation is processed together with lane positions inside one perception pipeline that starts from the original image features. Existing methods follow a detect-then-reason sequence that can accumulate errors between the two stages. The authors test the approach on the OpenLane-V2 benchmark built from Argoverse2 and nuScenes data. It reports higher topology scores than prior separate-stage methods.

Core claim

We propose an innovative method called UniTopo, which represents the topological relationships between lanes as connected lanes, encompassing predecessor lanes, successor lanes, and their interconnections. This unified representation of lanes and lane topology allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline, establishing a new paradigm for directly perceiving lane topology from original image features.

What carries the argument

Unified representation of lanes and their topology as predecessor and successor connections processed inside a single perception pipeline starting from raw image features.

If this is right

  • Lane positions and topology are obtained simultaneously without post-processing from detections.
  • The method reports TOP_ll scores of 30.1 percent and 31.8 percent on the two OpenLane-V2 subsets.
  • These scores exceed the prior best method by 6.0 percent and 8.6 percent respectively.
  • A direct-perception paradigm replaces the previous reasoning-by-detection workflow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Error accumulation between detection and topology stages may be reduced because both are learned jointly.
  • Similar unification of detection and relational reasoning could apply to other scene elements such as traffic lights.
  • End-to-end driving stacks might absorb this single-stage lane module without modular hand-offs.

Load-bearing premise

Representing lane topology through predecessor and successor connections is enough to capture the needed relationships without separate detection steps or major loss of information.

What would settle it

On the OpenLane-V2 test sets, a pipeline that first detects lanes and then computes topology separately would need to match or exceed the reported TOP_ll scores of 30.1 percent and 31.8 percent for the unified model to lose its claimed advantage.

Figures

Figures reproduced from arXiv: 2605.08911 by Beipeng Mu, Bo Liu, Han Li, Si Liu, Yuhang Wang, Yulu Gao.

Figure 1
Figure 1. Figure 1: Motivation of UniTopo. Unlike the topological relationships between lanes and traffic elements, the connections among piecewise lanes can be observed in images by identifying where the lanes connect. These areas are highlighted by the green rectangle (e.g., regions containing junction points near stop lines), and all topological relations are explicitly visualized using dashed arrows. Recently, with the re… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Previous methods follow a reasoning-by-detection paradigm, where [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The overall architecture of our UniTopo. Within the image feature extractor, the multi-view images are processed through a backbone network (e.g., ResNet-50 [54]) and a neck network (e.g., FPN [55]) to extract features. These image features are then transformed into BEV features via a BEV encoder following BEVFormer [56], where the BEV representation serves as an intermediate feature rather than an input m… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the Topology-Aware Attention Module. First, the correlation between piecewise lanes and connected lanes is measured based on geometric distance. Then, utilizing a cross-attention mechanism, the lane topology information contained in the connection queries is transferred to the lane features. D. Topology-Aware Attention Module As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results across different scenarios on OpenLane-V2 subset A. Compared to TopoNet [7], our UniTopo improves lane-to-lane topology accuracy [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results of long-tail scenarios on OpenLane-V2 subset B. Our UniTopo accurately detects the (a) left-side curved lane and (b) roundabout, and predicts their topological relationships with other lanes. Multi-view Vision-only Input GT UniTopo (Ours) [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Failure case on OpenLane-V2 subset B. Due to nighttime scenes with low visibility, our method fails to detect distant lanes ahead, along with the corresponding topological relationships. in 24.8% TOPll and 45.8% OLS. Both variants lead to performance degradation, thereby confirming the effectiveness of our proposed attention mask design. E. Computational Cost Analysis We present the computational cost comp… view at source ↗
read the original abstract

Autonomous vehicles need to perceive not only physical elements in the driving scene, such as lane lines and traffic lights, but also logical elements like lane centerlines and their topology. Existing lane topology reasoning methods typically follow a reasoning-by-detection paradigm, where lane topological relationships are primarily derived from lane detection results. In this paper, we propose an innovative method called Unified Modeling of Lane and Lane Topology (UniTopo), which represents the topological relationships between lanes as connected lanes, encompassing predecessor lanes, successor lanes, and their interconnections. This unified representation of lanes and lane topology allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline, establishing a new paradigm for directly perceiving lane topology from original image features. We validate our method on the driving scene reasoning benchmark OpenLane-V2, which consists of two subsets, built based on Argoverse2 and nuScenes, respectively. Our method achieves TOP_ll of 30.1% and 31.8% on the two subsets, significantly surpassing the existing state-of-the-art method T^2SG by 6.0% and 8.6%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes UniTopo, a unified modeling method for lanes and lane topology in driving scenes. It represents topological relationships as connected predecessor/successor lanes (and their interconnections) to enable simultaneous perception of lane geometry and topology directly from raw image features within a single pipeline, departing from the conventional reasoning-by-detection paradigm. On the OpenLane-V2 benchmark (Argoverse2 and nuScenes subsets), the method reports TOP_ll scores of 30.1% and 31.8%, outperforming prior SOTA T^2SG by 6.0% and 8.6%.

Significance. If the unified representation truly supports direct topology perception without implicit detection stages or feature bottlenecks, the work could shift the field toward more integrated perception pipelines for autonomous driving, with potential gains in efficiency and reduced error propagation. The reported benchmark improvements are concrete and would be a meaningful advance if backed by full architectural details, loss formulations, and ablations.

major comments (1)
  1. Abstract: The load-bearing claim that representing topology as predecessor/successor connections 'allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline' and establishes 'a new paradigm for directly perceiving lane topology from original image features' requires explicit confirmation that no separate lane detection head or intermediate instance representation is used. Without equations, network diagrams, or loss terms showing how connections are regressed/classified from image features alone, it remains unclear whether the joint objective avoids trading geometric accuracy for topological accuracy or reintroduces implicit detection.
minor comments (1)
  1. The abstract references the TOP_ll metric and OpenLane-V2 subsets but provides no definition of the metric, no comparison table, and no mention of other standard metrics (e.g., lane detection mAP or topology-specific scores) used in the benchmark.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive comments. We address the major comment below and clarify the unified end-to-end design of UniTopo while revising the manuscript for greater explicitness.

read point-by-point responses
  1. Referee: Abstract: The load-bearing claim that representing topology as predecessor/successor connections 'allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline' and establishes 'a new paradigm for directly perceiving lane topology from original image features' requires explicit confirmation that no separate lane detection head or intermediate instance representation is used. Without equations, network diagrams, or loss terms showing how connections are regressed/classified from image features alone, it remains unclear whether the joint objective avoids trading geometric accuracy for topological accuracy or reintroduces implicit detection.

    Authors: We thank the referee for this observation. UniTopo is designed as a single end-to-end network: a shared image backbone extracts features that feed directly into a unified prediction head. This head simultaneously regresses lane geometry (as ordered points) and classifies predecessor/successor connections between lane instances, with no separate detection head, no post-processing instance grouping, and no intermediate lane representations. Topology is not derived after detection but is an explicit output of the same feature embeddings via a connectivity classification branch. The full architecture (including network diagram in Figure 2), the joint loss (geometric regression plus topology cross-entropy, Equation 4), and the direct regression of connections from image features are detailed in Section 3. Ablation studies confirm that joint optimization improves rather than trades off geometric and topological accuracy. To make the abstract claim self-contained, we have revised it to explicitly state the absence of a separate detection stage and added a short clarifying sentence referencing the unified head. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on external benchmarks

full rationale

The paper introduces a unified representation of lanes and topology as predecessor/successor connections to enable direct perception from image features in one pipeline, contrasting it with prior reasoning-by-detection approaches. Validation consists of performance numbers (TOP_ll 30.1% and 31.8%) on the public OpenLane-V2 benchmark subsets, with direct numerical comparison to an external SOTA method T^2SG. No equations, loss terms, fitted parameters, or self-citations appear in the provided text that would reduce the claimed improvements or the new paradigm to a tautology or input fit by construction. The derivation chain therefore remains open to external falsification via benchmark results rather than closing on its own definitions or prior author work.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard computer-vision assumptions about feature learning plus the domain-specific choice to encode topology via predecessor/successor connections; no new physical entities or ad-hoc constants are introduced beyond typical neural-network training.

free parameters (1)
  • neural network weights and hyperparameters
    Learned parameters of the perception model trained on driving data; not enumerated in abstract.
axioms (2)
  • domain assumption Lane topology can be faithfully represented as a graph of predecessor and successor connections
    Invoked when defining the unified representation that enables direct perception.
  • domain assumption Image features contain sufficient information to infer both geometry and topology jointly
    Underlying the shift from reasoning-by-detection to unified modeling.

pith-pipeline@v0.9.0 · 5507 in / 1281 out tokens · 25374 ms · 2026-05-12T01:58:51.684404+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    UniTopo defines two groups of queries for piecewise lanes and connected lanes, uses a shared lane decoder to interact with BEV features, and employs a shared lane head to obtain lane positions and the lane-to-lane topology relationships. In addition, we design a Topology-Aware Attention Module (TAM) to incorporate lane connection information into the features of piecewise lanes.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We propose a method for unified modeling of lane and lane topology that concurrently perceives lanes and their topological structures, establishing a new paradigm distinct from the reasoning-by-detection approach.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 4 internal anchors

  1. [1]

    nuScenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 621–11 631

  2. [2]

    Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

    B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Ponteset al., “Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting,”arXiv preprint arXiv:2301.00493, 2023

  3. [3]

    Scalability in Perception for Autonomous Driving: Waymo Open Dataset,

    P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2446–2454

  4. [4]

    Planning-oriented Autonomous Driving,

    Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented Autonomous Driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 853–17 862

  5. [5]

    V AD: Vectorized Scene Representation for Efficient Autonomous Driving,

    B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang, “V AD: Vectorized Scene Representation for Efficient Autonomous Driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8340–8350

  6. [6]

    OpenLane-V2: A Topology Reasoning Bench- mark for Unified 3D HD Mapping,

    H. Wang, T. Li, Y . Li, L. Chen, C. Sima, Z. Liu, B. Wang, P. Jia, Y . Wang, S. Jianget al., “OpenLane-V2: A Topology Reasoning Bench- mark for Unified 3D HD Mapping,”Advances in Neural Information Processing Systems, vol. 36, pp. 18 873–18 884, 2024

  7. [7]

    Graph-based Topology Reasoning for Driving Scenes,

    T. Li, L. Chen, X. Geng, H. Wang, Y . Li, Z. Liu, S. Jiang, Y . Wang, H. Xu, C. Xuet al., “Graph-based Topology Reasoning for Driving Scenes,”arXiv preprint arXiv:2304.05277, 2023

  8. [8]

    TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning,

    D. Wu, J. Chang, F. Jia, Y . Liu, T. Wang, and J. Shen, “TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning,”arXiv preprint arXiv:2310.06753, 2023

  9. [9]

    Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors,

    H. Li, Z. Huang, Z. Wang, W. Rong, N. Wang, and S. Liu, “Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors,”arXiv preprint arXiv:2406.03105, 2024

  10. [10]

    TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes,

    Y . Fu, W. Liao, X. Liu, Y . Ma, F. Dai, Y . Zhanget al., “TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes,” arXiv preprint arXiv:2405.14747, 2024

  11. [11]

    RoadPainter: Points Are Ideal Navigators for Topology TransformER,

    Z. Ma, S. Liang, Y . Wen, W. Lu, and G. Wan, “RoadPainter: Points Are Ideal Navigators for Topology TransformER,” inEuropean Conference on Computer Vision, 2024, pp. 179–195

  12. [12]

    Driving Scene Un- derstanding with Traffic Scene-Assisted Topology Graph Transformer,

    F. Rong, W. Peng, M. Lan, Q. Zhang, and L. Zhang, “Driving Scene Un- derstanding with Traffic Scene-Assisted Topology Graph Transformer,” inProceedings of the 32nd ACM International Conference on Multime- dia, 2024, pp. 10 075–10 084

  13. [13]

    T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving,

    C. Lv, M. Qi, L. Liu, and H. Ma, “T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 17 197– 17 206

  14. [14]

    Augmenting Lane Perception and Topology Under- standing with Standard Definition Navigation Maps,

    K. Z. Luo, X. Weng, Y . Wang, S. Wu, J. Li, K. Q. Weinberger, Y . Wang, and M. Pavone, “Augmenting Lane Perception and Topology Under- standing with Standard Definition Navigation Maps,” inInternational Conference on Robotics and Automation, 2024, pp. 4029–4035

  15. [15]

    LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving,

    T. Li, P. Jia, B. Wang, L. Chen, K. Jiang, J. Yan, and H. Li, “LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving,” arXiv preprint arXiv:2312.16108, 2023

  16. [16]

    Semi-Supervised Classification with Graph Convolutional Networks

    T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks,”arXiv preprint arXiv:1609.02907, 2016

  17. [17]

    Deformable DETR: Deformable Transformers for End-to-End Object Detection

    X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable Transformers for End-to-End Object Detection,” arXiv preprint arXiv:2010.04159, 2020

  18. [18]

    Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit,

    X. Li, J. Li, X. Hu, and J. Yang, “Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit,”IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 1, pp. 248–258, 2019

  19. [19]

    Keep your Eyes on the Lane: Real-time Attention- guided Lane Detection,

    L. Tabelini, R. Berriel, T. M. Paixao, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “Keep your Eyes on the Lane: Real-time Attention- guided Lane Detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 294–302

  20. [20]

    CLRNet: Cross Layer Refinement Network for Lane Detection,

    T. Zheng, Y . Huang, Y . Liu, W. Tang, Z. Yang, D. Cai, and X. He, “CLRNet: Cross Layer Refinement Network for Lane Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 898–907. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 12

  21. [21]

    CLRNetV2: A Faster and Stronger Lane Detector,

    T. Zheng, Y . Huang, Y . Liu, B. Lin, Z. Yang, D. Cai, and X. He, “CLRNetV2: A Faster and Stronger Lane Detector,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 6, pp. 4271– 4284, 2025

  22. [22]

    Dense Hybrid Proposal Modulation for Lane Detection,

    Y . Wu, L. Zhao, J. Lu, and H. Yan, “Dense Hybrid Proposal Modulation for Lane Detection,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6845–6859, 2023

  23. [23]

    SMFRNet: Complex Scene Lane Detec- tion With Start Point-Guided Multi-Dimensional Feature Refinement,

    S. Tan, Y . Zhang, and S. Zhu, “SMFRNet: Complex Scene Lane Detec- tion With Start Point-Guided Multi-Dimensional Feature Refinement,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 12, pp. 13 364–13 372, 2024

  24. [24]

    VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection,

    Y . Zhang, L. Zhu, W. Feng, H. Fu, M. Wang, Q. Li, C. Li, and S. Wang, “VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 681–15 690

  25. [25]

    Recursive Video Lane Detection,

    D. Jin, D. Kim, and C.-S. Kim, “Recursive Video Lane Detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8473–8482

  26. [26]

    STADet: Streaming Timing-Aware Video Lane Detection,

    K. He, J. Xie, X. Dai, K. Chang, F. Chen, and Z. Wang, “STADet: Streaming Timing-Aware Video Lane Detection,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 8644– 8656, 2024

  27. [27]

    LaneTCA: Enhancing Video Lane Detection With Temporal Context Aggregation,

    K. Zhou, L. Li, W. Zhou, Y . Wang, H. Feng, and H. Li, “LaneTCA: Enhancing Video Lane Detection With Temporal Context Aggregation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 9, pp. 8574–8585, 2025

  28. [28]

    3D-LaneNet: End-to-End 3D Multiple Lane Detection,

    N. Garnett, R. Cohen, T. Pe’er, R. Lahav, and D. Levi, “3D-LaneNet: End-to-End 3D Multiple Lane Detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2921–2930

  29. [29]

    Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection,

    Y . Guo, G. Chen, P. Zhao, W. Zhang, J. Miao, J. Wang, and T. E. Choe, “Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection,” inEuropean Conference on Computer Vision, 2020, pp. 666–681

  30. [30]

    3D-LaneNet+: Anchor Free Lane Detection using a Semi- Local Representation,

    N. Efrat, M. Bluvstein, S. Oron, D. Levi, N. Garnett, and B. E. Shlomo, “3D-LaneNet+: Anchor Free Lane Detection using a Semi- Local Representation,”arXiv preprint arXiv:2011.01535, 2020

  31. [31]

    Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints,

    R. Liu, D. Chen, T. Liu, Z. Xiong, and Z. Yuan, “Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints,” inProceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 1765–1772

  32. [32]

    PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark,

    L. Chen, C. Sima, Y . Li, Z. Zheng, J. Xu, X. Geng, H. Li, C. He, J. Shi, Y . Qiaoet al., “PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark,” inEuropean Conference on Computer Vision, 2022, pp. 550–567

  33. [33]

    Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection,

    S. Huang, Z. Shen, Z. Huang, Z.-h. Ding, J. Dai, J. Han, N. Wang, and S. Liu, “Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 451–17 460

  34. [34]

    Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression,

    S. Huang, Z. Shen, Z. Huang, Y . Liao, J. Han, N. Wang, and S. Liu, “Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 3, pp. 1660–1673, 2025

  35. [35]

    LATR: 3D Lane Detection from Monocular Images with Transformer,

    Y . Luo, C. Zheng, X. Yan, T. Kun, C. Zheng, S. Cui, and Z. Li, “LATR: 3D Lane Detection from Monocular Images with Transformer,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7941–7952

  36. [36]

    Cross-view Semantic Segmentation for Sensing Surroundings,

    B. Pan, J. Sun, H. Y . T. Leung, A. Andonian, and B. Zhou, “Cross-view Semantic Segmentation for Sensing Surroundings,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4867–4873, 2020

  37. [37]

    Cross-View Transformers for Real-Time Map-View Semantic Segmentation,

    B. Zhou and P. Kr ¨ahenb¨uhl, “Cross-View Transformers for Real-Time Map-View Semantic Segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 760–13 769

  38. [38]

    Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer,

    S. Chen, T. Cheng, X. Wang, W. Meng, Q. Zhang, and W. Liu, “Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer,”arXiv preprint arXiv:2206.04584, 2022

  39. [39]

    HDMapNet: An Online HD Map Construction and Evaluation Framework,

    Q. Li, Y . Wang, Y . Wang, and H. Zhao, “HDMapNet: An Online HD Map Construction and Evaluation Framework,” inInternational Conference on Robotics and Automation, 2022, pp. 4628–4634

  40. [40]

    VectorMapNet: End- to-end Vectorized HD Map Learning,

    Y . Liu, T. Yuan, Y . Wang, Y . Wang, and H. Zhao, “VectorMapNet: End- to-end Vectorized HD Map Learning,” inInternational Conference on Machine Learning, 2023, pp. 22 352–22 369

  41. [41]

    Maptr: Structured modeling and learning for online vectorized hd map construction,

    B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction,”arXiv preprint arXiv:2208.14437, 2022

  42. [42]

    MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction,

    B. Liao, S. Chen, Y . Zhang, B. Jiang, Q. Zhang, W. Liu, C. Huang, and X. Wang, “MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction,”arXiv preprint arXiv:2308.05736, 2023

  43. [43]

    Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction,

    Z. Liu, X. Zhang, G. Liu, J. Zhao, and N. Xu, “Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction,” inEuropean Conference on Computer Vision, 2025, pp. 461–477

  44. [44]

    StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Con- struction,

    T. Yuan, Y . Liu, Y . Wang, Y . Wang, and H. Zhao, “StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Con- struction,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7356–7365

  45. [45]

    Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images,

    Y . B. Can, A. Liniger, D. P. Paudel, and L. Van Gool, “Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 661–15 670

  46. [46]

    End-to-End Object Detection with Transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” in European Conference on Computer Vision, 2020, pp. 213–229

  47. [47]

    CenterLineDet: CenterLine Graph Detection for Road Lanes with Vehicle-mounted Sensors by Transformer for HD Map Generation,

    Z. Xu, Y . Liu, Y . Sun, M. Liu, and L. Wang, “CenterLineDet: CenterLine Graph Detection for Road Lanes with Vehicle-mounted Sensors by Transformer for HD Map Generation,” inInternational Conference on Robotics and Automation, 2023, pp. 3553–3559

  48. [48]

    Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction,

    Liao, Bencheng and Chen, Shaoyu and Jiang, Bo and Cheng, Tianheng and Zhang, Qian and Liu, Wenyu and Huang, Chang and Wang, Xing- gang, “Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction,” inEuropean Conference on Computer Vision, 2025, pp. 334–351

  49. [49]

    Continuity Preserving Online CenterLine Graph Learning,

    Y . Han, K. Yu, and Z. Li, “Continuity Preserving Online CenterLine Graph Learning,” inEuropean Conference on Computer Vision, 2024, pp. 342–359

  50. [50]

    RATopo: Improving Lane Topology Reasoning via Redundancy Assignment,

    H. Li, S. Huang, L. Xu, Y . Gao, B. Mu, and S. Liu, “RATopo: Improving Lane Topology Reasoning via Redundancy Assignment,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 777–786

  51. [51]

    SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval- Augmented Generation,

    H. Ye, M. Qi, Z. Liu, L. Liu, and H. Ma, “SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval- Augmented Generation,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 11 170–11 178

  52. [52]

    SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation,

    C. Lv, M. Qi, X. Li, Z. Yang, and H. Ma, “SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, 2024, pp. 4035–4043

  53. [53]

    Attentive Relational Networks for Mapping Images to Scene Graphs,

    M. Qi, W. Li, Z. Yang, Y . Wang, and J. Luo, “Attentive Relational Networks for Mapping Images to Scene Graphs,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3957–3966

  54. [54]

    Deep Residual Learning for Image Recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

  55. [55]

    Feature Pyramid Networks for Object Detection,

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125

  56. [56]

    BEV- Former: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers,

    Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y . Qiao, and J. Dai, “BEV- Former: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers,” inEuropean Conference on Computer Vision, 2022, pp. 1–18

  57. [57]

    Focal Loss for Dense Object Detection,

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal Loss for Dense Object Detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2980–2988

  58. [58]

    Generalized Intersection over Union: A Metric and A Loss for Bound- ing Box Regression,

    H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized Intersection over Union: A Metric and A Loss for Bound- ing Box Regression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 658–666

  59. [59]

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2016

  60. [60]

    Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment,

    Q. Chen, X. Chen, J. Wang, S. Zhang, K. Yao, H. Feng, J. Han, E. Ding, G. Zeng, and J. Wang, “Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6633–6642

  61. [61]

    DETRs with Hybrid Matching,

    D. Jia, Y . Yuan, H. He, X. Wu, H. Yu, W. Lin, L. Sun, C. Zhang, and H. Hu, “DETRs with Hybrid Matching,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 702–19 712. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 13

  62. [62]

    Decoupled Weight Decay Regularization

    I. Loshchilov, “Decoupled Weight Decay Regularization,”arXiv preprint arXiv:1711.05101, 2017

  63. [63]

    Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

    Y . Luo, L. Zheng, T. Guan, J. Yu, and Y . Yang, “Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation.” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2507–2516

  64. [64]

    Category-Level Adversarial Adaptation for Semantic Segmentation using Purified Fea- tures

    Y . Luo, P. Liu, L. Zheng, T. Guan, J. Yu, and Y . Yang, “Category-Level Adversarial Adaptation for Semantic Segmentation using Purified Fea- tures.”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 3940–3956, 2021

  65. [65]

    Kill Two Birds with One Stone: Domain Generalization for Semantic Segmentation via Network Pruning

    Y . Luo, P. Liu, and Y . Yang, “Kill Two Birds with One Stone: Domain Generalization for Semantic Segmentation via Network Pruning.”In- ternational Journal of Computer Vision, vol. 133, no. 1, pp. 335–352, 2025