arxiv: 2605.08911 · v1 · submitted 2026-05-09 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Unified Modeling of Lane and Lane Topology for Driving Scene Reasoning

Han Li , Yulu Gao , Si Liu , Yuhang Wang , Bo Liu , Beipeng Mu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords lane detectionlane topologydriving scene reasoningunified modelingautonomous vehiclesOpenLane-V2

0 comments

The pith

Modeling lanes and topology as connected predecessor and successor lanes enables direct perception of both from raw images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that treats lane topology as direct connections to predecessor and successor lanes rather than deriving it after separate detection. This unified representation is processed together with lane positions inside one perception pipeline that starts from the original image features. Existing methods follow a detect-then-reason sequence that can accumulate errors between the two stages. The authors test the approach on the OpenLane-V2 benchmark built from Argoverse2 and nuScenes data. It reports higher topology scores than prior separate-stage methods.

Core claim

We propose an innovative method called UniTopo, which represents the topological relationships between lanes as connected lanes, encompassing predecessor lanes, successor lanes, and their interconnections. This unified representation of lanes and lane topology allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline, establishing a new paradigm for directly perceiving lane topology from original image features.

What carries the argument

Unified representation of lanes and their topology as predecessor and successor connections processed inside a single perception pipeline starting from raw image features.

If this is right

Lane positions and topology are obtained simultaneously without post-processing from detections.
The method reports TOP_ll scores of 30.1 percent and 31.8 percent on the two OpenLane-V2 subsets.
These scores exceed the prior best method by 6.0 percent and 8.6 percent respectively.
A direct-perception paradigm replaces the previous reasoning-by-detection workflow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Error accumulation between detection and topology stages may be reduced because both are learned jointly.
Similar unification of detection and relational reasoning could apply to other scene elements such as traffic lights.
End-to-end driving stacks might absorb this single-stage lane module without modular hand-offs.

Load-bearing premise

Representing lane topology through predecessor and successor connections is enough to capture the needed relationships without separate detection steps or major loss of information.

What would settle it

On the OpenLane-V2 test sets, a pipeline that first detects lanes and then computes topology separately would need to match or exceed the reported TOP_ll scores of 30.1 percent and 31.8 percent for the unified model to lose its claimed advantage.

Figures

Figures reproduced from arXiv: 2605.08911 by Beipeng Mu, Bo Liu, Han Li, Si Liu, Yuhang Wang, Yulu Gao.

**Figure 1.** Figure 1: Motivation of UniTopo. Unlike the topological relationships between lanes and traffic elements, the connections among piecewise lanes can be observed in images by identifying where the lanes connect. These areas are highlighted by the green rectangle (e.g., regions containing junction points near stop lines), and all topological relations are explicitly visualized using dashed arrows. Recently, with the re… view at source ↗

**Figure 2.** Figure 2: (a) Previous methods follow a reasoning-by-detection paradigm, where [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The overall architecture of our UniTopo. Within the image feature extractor, the multi-view images are processed through a backbone network (e.g., ResNet-50 [54]) and a neck network (e.g., FPN [55]) to extract features. These image features are then transformed into BEV features via a BEV encoder following BEVFormer [56], where the BEV representation serves as an intermediate feature rather than an input m… view at source ↗

**Figure 4.** Figure 4: Illustration of the Topology-Aware Attention Module. First, the correlation between piecewise lanes and connected lanes is measured based on geometric distance. Then, utilizing a cross-attention mechanism, the lane topology information contained in the connection queries is transferred to the lane features. D. Topology-Aware Attention Module As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results across different scenarios on OpenLane-V2 subset A. Compared to TopoNet [7], our UniTopo improves lane-to-lane topology accuracy [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results of long-tail scenarios on OpenLane-V2 subset B. Our UniTopo accurately detects the (a) left-side curved lane and (b) roundabout, and predicts their topological relationships with other lanes. Multi-view Vision-only Input GT UniTopo (Ours) [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Failure case on OpenLane-V2 subset B. Due to nighttime scenes with low visibility, our method fails to detect distant lanes ahead, along with the corresponding topological relationships. in 24.8% TOPll and 45.8% OLS. Both variants lead to performance degradation, thereby confirming the effectiveness of our proposed attention mask design. E. Computational Cost Analysis We present the computational cost comp… view at source ↗

read the original abstract

Autonomous vehicles need to perceive not only physical elements in the driving scene, such as lane lines and traffic lights, but also logical elements like lane centerlines and their topology. Existing lane topology reasoning methods typically follow a reasoning-by-detection paradigm, where lane topological relationships are primarily derived from lane detection results. In this paper, we propose an innovative method called Unified Modeling of Lane and Lane Topology (UniTopo), which represents the topological relationships between lanes as connected lanes, encompassing predecessor lanes, successor lanes, and their interconnections. This unified representation of lanes and lane topology allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline, establishing a new paradigm for directly perceiving lane topology from original image features. We validate our method on the driving scene reasoning benchmark OpenLane-V2, which consists of two subsets, built based on Argoverse2 and nuScenes, respectively. Our method achieves TOP_ll of 30.1% and 31.8% on the two subsets, significantly surpassing the existing state-of-the-art method T^2SG by 6.0% and 8.6%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniTopo's connected-lanes representation for joint lane and topology perception delivers clear benchmark gains, but the direct-from-features claim still needs architectural details to confirm it avoids implicit detection steps.

read the letter

The one thing to know is that this paper replaces the standard lane detection followed by separate topology reasoning with a single model that treats topology as predecessor and successor connections between lanes, allowing direct extraction from image features. On OpenLane-V2 it reports TOP_ll scores of 30.1% and 31.8% on the Argoverse2 and nuScenes based subsets, beating the prior T^2SG method by 6.0% and 8.6%.

The new part is the unified representation itself. By framing topology as these connections rather than post-processing detected lanes, the approach aims to avoid information loss in the reasoning step. The benchmark results are a clear positive, as they come from standard public datasets and show consistent improvement.

What works is the empirical validation. The gains are large enough to notice and the problem of integrated lane and topology perception is relevant for autonomous driving safety.

Where it is softer is in the supporting evidence for the central claim. The abstract does not include architecture diagrams, loss formulations, or ablation studies that would show how the connections are regressed without an intermediate lane detection stage or whether the joint training maintains geometric precision. Until those are examined, the assertion that this is truly direct perception without hidden detection remains plausible but unverified. If the full paper has those details and they check out, the contribution strengthens considerably.

This work is mainly for specialists in computer vision for autonomous vehicles, particularly those dealing with lane detection and scene graph like structures. Someone already working on OpenLane-V2 or similar benchmarks would get immediate value from the numbers and the modeling idea.

I would recommend sending it for peer review. The performance claims are specific and the paradigm difference is worth having referees examine the implementation to see if the unified modeling delivers as described.

Referee Report

1 major / 1 minor

Summary. The paper proposes UniTopo, a unified modeling method for lanes and lane topology in driving scenes. It represents topological relationships as connected predecessor/successor lanes (and their interconnections) to enable simultaneous perception of lane geometry and topology directly from raw image features within a single pipeline, departing from the conventional reasoning-by-detection paradigm. On the OpenLane-V2 benchmark (Argoverse2 and nuScenes subsets), the method reports TOP_ll scores of 30.1% and 31.8%, outperforming prior SOTA T^2SG by 6.0% and 8.6%.

Significance. If the unified representation truly supports direct topology perception without implicit detection stages or feature bottlenecks, the work could shift the field toward more integrated perception pipelines for autonomous driving, with potential gains in efficiency and reduced error propagation. The reported benchmark improvements are concrete and would be a meaningful advance if backed by full architectural details, loss formulations, and ablations.

major comments (1)

Abstract: The load-bearing claim that representing topology as predecessor/successor connections 'allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline' and establishes 'a new paradigm for directly perceiving lane topology from original image features' requires explicit confirmation that no separate lane detection head or intermediate instance representation is used. Without equations, network diagrams, or loss terms showing how connections are regressed/classified from image features alone, it remains unclear whether the joint objective avoids trading geometric accuracy for topological accuracy or reintroduces implicit detection.

minor comments (1)

The abstract references the TOP_ll metric and OpenLane-V2 subsets but provides no definition of the metric, no comparison table, and no mention of other standard metrics (e.g., lane detection mAP or topology-specific scores) used in the benchmark.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive comments. We address the major comment below and clarify the unified end-to-end design of UniTopo while revising the manuscript for greater explicitness.

read point-by-point responses

Referee: Abstract: The load-bearing claim that representing topology as predecessor/successor connections 'allows us to simultaneously obtain both the positions and topological information of lanes within a shared perception pipeline' and establishes 'a new paradigm for directly perceiving lane topology from original image features' requires explicit confirmation that no separate lane detection head or intermediate instance representation is used. Without equations, network diagrams, or loss terms showing how connections are regressed/classified from image features alone, it remains unclear whether the joint objective avoids trading geometric accuracy for topological accuracy or reintroduces implicit detection.

Authors: We thank the referee for this observation. UniTopo is designed as a single end-to-end network: a shared image backbone extracts features that feed directly into a unified prediction head. This head simultaneously regresses lane geometry (as ordered points) and classifies predecessor/successor connections between lane instances, with no separate detection head, no post-processing instance grouping, and no intermediate lane representations. Topology is not derived after detection but is an explicit output of the same feature embeddings via a connectivity classification branch. The full architecture (including network diagram in Figure 2), the joint loss (geometric regression plus topology cross-entropy, Equation 4), and the direct regression of connections from image features are detailed in Section 3. Ablation studies confirm that joint optimization improves rather than trades off geometric and topological accuracy. To make the abstract claim self-contained, we have revised it to explicitly state the absence of a separate detection stage and added a short clarifying sentence referencing the unified head. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on external benchmarks

full rationale

The paper introduces a unified representation of lanes and topology as predecessor/successor connections to enable direct perception from image features in one pipeline, contrasting it with prior reasoning-by-detection approaches. Validation consists of performance numbers (TOP_ll 30.1% and 31.8%) on the public OpenLane-V2 benchmark subsets, with direct numerical comparison to an external SOTA method T^2SG. No equations, loss terms, fitted parameters, or self-citations appear in the provided text that would reduce the claimed improvements or the new paradigm to a tautology or input fit by construction. The derivation chain therefore remains open to external falsification via benchmark results rather than closing on its own definitions or prior author work.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard computer-vision assumptions about feature learning plus the domain-specific choice to encode topology via predecessor/successor connections; no new physical entities or ad-hoc constants are introduced beyond typical neural-network training.

free parameters (1)

neural network weights and hyperparameters
Learned parameters of the perception model trained on driving data; not enumerated in abstract.

axioms (2)

domain assumption Lane topology can be faithfully represented as a graph of predecessor and successor connections
Invoked when defining the unified representation that enables direct perception.
domain assumption Image features contain sufficient information to infer both geometry and topology jointly
Underlying the shift from reasoning-by-detection to unified modeling.

pith-pipeline@v0.9.0 · 5507 in / 1281 out tokens · 25374 ms · 2026-05-12T01:58:51.684404+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

UniTopo defines two groups of queries for piecewise lanes and connected lanes, uses a shared lane decoder to interact with BEV features, and employs a shared lane head to obtain lane positions and the lane-to-lane topology relationships. In addition, we design a Topology-Aware Attention Module (TAM) to incorporate lane connection information into the features of piecewise lanes.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a method for unified modeling of lane and lane topology that concurrently perceives lanes and their topological structures, establishing a new paradigm distinct from the reasoning-by-detection approach.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 4 internal anchors

[1]

nuScenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 621–11 631

work page 2020
[2]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Ponteset al., “Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting,”arXiv preprint arXiv:2301.00493, 2023

work page internal anchor Pith review arXiv 2023
[3]

Scalability in Perception for Autonomous Driving: Waymo Open Dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2446–2454

work page 2020
[4]

Planning-oriented Autonomous Driving,

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented Autonomous Driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 853–17 862

work page 2023
[5]

V AD: Vectorized Scene Representation for Efficient Autonomous Driving,

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang, “V AD: Vectorized Scene Representation for Efficient Autonomous Driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8340–8350

work page 2023
[6]

OpenLane-V2: A Topology Reasoning Bench- mark for Unified 3D HD Mapping,

H. Wang, T. Li, Y . Li, L. Chen, C. Sima, Z. Liu, B. Wang, P. Jia, Y . Wang, S. Jianget al., “OpenLane-V2: A Topology Reasoning Bench- mark for Unified 3D HD Mapping,”Advances in Neural Information Processing Systems, vol. 36, pp. 18 873–18 884, 2024

work page 2024
[7]

Graph-based Topology Reasoning for Driving Scenes,

T. Li, L. Chen, X. Geng, H. Wang, Y . Li, Z. Liu, S. Jiang, Y . Wang, H. Xu, C. Xuet al., “Graph-based Topology Reasoning for Driving Scenes,”arXiv preprint arXiv:2304.05277, 2023

work page arXiv 2023
[8]

TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning,

D. Wu, J. Chang, F. Jia, Y . Liu, T. Wang, and J. Shen, “TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning,”arXiv preprint arXiv:2310.06753, 2023

work page arXiv 2023
[9]

Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors,

H. Li, Z. Huang, Z. Wang, W. Rong, N. Wang, and S. Liu, “Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors,”arXiv preprint arXiv:2406.03105, 2024

work page arXiv 2024
[10]

TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes,

Y . Fu, W. Liao, X. Liu, Y . Ma, F. Dai, Y . Zhanget al., “TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes,” arXiv preprint arXiv:2405.14747, 2024

work page arXiv 2024
[11]

RoadPainter: Points Are Ideal Navigators for Topology TransformER,

Z. Ma, S. Liang, Y . Wen, W. Lu, and G. Wan, “RoadPainter: Points Are Ideal Navigators for Topology TransformER,” inEuropean Conference on Computer Vision, 2024, pp. 179–195

work page 2024
[12]

Driving Scene Un- derstanding with Traffic Scene-Assisted Topology Graph Transformer,

F. Rong, W. Peng, M. Lan, Q. Zhang, and L. Zhang, “Driving Scene Un- derstanding with Traffic Scene-Assisted Topology Graph Transformer,” inProceedings of the 32nd ACM International Conference on Multime- dia, 2024, pp. 10 075–10 084

work page 2024
[13]

T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving,

C. Lv, M. Qi, L. Liu, and H. Ma, “T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 17 197– 17 206

work page 2025
[14]

Augmenting Lane Perception and Topology Under- standing with Standard Definition Navigation Maps,

K. Z. Luo, X. Weng, Y . Wang, S. Wu, J. Li, K. Q. Weinberger, Y . Wang, and M. Pavone, “Augmenting Lane Perception and Topology Under- standing with Standard Definition Navigation Maps,” inInternational Conference on Robotics and Automation, 2024, pp. 4029–4035

work page 2024
[15]

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving,

T. Li, P. Jia, B. Wang, L. Chen, K. Jiang, J. Yan, and H. Li, “LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving,” arXiv preprint arXiv:2312.16108, 2023

work page arXiv 2023
[16]

Semi-Supervised Classification with Graph Convolutional Networks

T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks,”arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

Deformable DETR: Deformable Transformers for End-to-End Object Detection

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable Transformers for End-to-End Object Detection,” arXiv preprint arXiv:2010.04159, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[18]

Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit,

X. Li, J. Li, X. Hu, and J. Yang, “Line-CNN: End-to-End Traffic Line Detection With Line Proposal Unit,”IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 1, pp. 248–258, 2019

work page 2019
[19]

Keep your Eyes on the Lane: Real-time Attention- guided Lane Detection,

L. Tabelini, R. Berriel, T. M. Paixao, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “Keep your Eyes on the Lane: Real-time Attention- guided Lane Detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 294–302

work page 2021
[20]

CLRNet: Cross Layer Refinement Network for Lane Detection,

T. Zheng, Y . Huang, Y . Liu, W. Tang, Z. Yang, D. Cai, and X. He, “CLRNet: Cross Layer Refinement Network for Lane Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 898–907. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 12

work page 2022
[21]

CLRNetV2: A Faster and Stronger Lane Detector,

T. Zheng, Y . Huang, Y . Liu, B. Lin, Z. Yang, D. Cai, and X. He, “CLRNetV2: A Faster and Stronger Lane Detector,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 6, pp. 4271– 4284, 2025

work page 2025
[22]

Dense Hybrid Proposal Modulation for Lane Detection,

Y . Wu, L. Zhao, J. Lu, and H. Yan, “Dense Hybrid Proposal Modulation for Lane Detection,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6845–6859, 2023

work page 2023
[23]

SMFRNet: Complex Scene Lane Detec- tion With Start Point-Guided Multi-Dimensional Feature Refinement,

S. Tan, Y . Zhang, and S. Zhu, “SMFRNet: Complex Scene Lane Detec- tion With Start Point-Guided Multi-Dimensional Feature Refinement,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 12, pp. 13 364–13 372, 2024

work page 2024
[24]

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection,

Y . Zhang, L. Zhu, W. Feng, H. Fu, M. Wang, Q. Li, C. Li, and S. Wang, “VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 681–15 690

work page 2021
[25]

Recursive Video Lane Detection,

D. Jin, D. Kim, and C.-S. Kim, “Recursive Video Lane Detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8473–8482

work page 2023
[26]

STADet: Streaming Timing-Aware Video Lane Detection,

K. He, J. Xie, X. Dai, K. Chang, F. Chen, and Z. Wang, “STADet: Streaming Timing-Aware Video Lane Detection,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 8644– 8656, 2024

work page 2024
[27]

LaneTCA: Enhancing Video Lane Detection With Temporal Context Aggregation,

K. Zhou, L. Li, W. Zhou, Y . Wang, H. Feng, and H. Li, “LaneTCA: Enhancing Video Lane Detection With Temporal Context Aggregation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 9, pp. 8574–8585, 2025

work page 2025
[28]

3D-LaneNet: End-to-End 3D Multiple Lane Detection,

N. Garnett, R. Cohen, T. Pe’er, R. Lahav, and D. Levi, “3D-LaneNet: End-to-End 3D Multiple Lane Detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2921–2930

work page 2019
[29]

Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection,

Y . Guo, G. Chen, P. Zhao, W. Zhang, J. Miao, J. Wang, and T. E. Choe, “Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection,” inEuropean Conference on Computer Vision, 2020, pp. 666–681

work page 2020
[30]

3D-LaneNet+: Anchor Free Lane Detection using a Semi- Local Representation,

N. Efrat, M. Bluvstein, S. Oron, D. Levi, N. Garnett, and B. E. Shlomo, “3D-LaneNet+: Anchor Free Lane Detection using a Semi- Local Representation,”arXiv preprint arXiv:2011.01535, 2020

work page arXiv 2011
[31]

Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints,

R. Liu, D. Chen, T. Liu, Z. Xiong, and Z. Yuan, “Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints,” inProceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 1765–1772

work page 2022
[32]

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark,

L. Chen, C. Sima, Y . Li, Z. Zheng, J. Xu, X. Geng, H. Li, C. He, J. Shi, Y . Qiaoet al., “PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark,” inEuropean Conference on Computer Vision, 2022, pp. 550–567

work page 2022
[33]

Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection,

S. Huang, Z. Shen, Z. Huang, Z.-h. Ding, J. Dai, J. Han, N. Wang, and S. Liu, “Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 451–17 460

work page 2023
[34]

Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression,

S. Huang, Z. Shen, Z. Huang, Y . Liao, J. Han, N. Wang, and S. Liu, “Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 3, pp. 1660–1673, 2025

work page 2025
[35]

LATR: 3D Lane Detection from Monocular Images with Transformer,

Y . Luo, C. Zheng, X. Yan, T. Kun, C. Zheng, S. Cui, and Z. Li, “LATR: 3D Lane Detection from Monocular Images with Transformer,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7941–7952

work page 2023
[36]

Cross-view Semantic Segmentation for Sensing Surroundings,

B. Pan, J. Sun, H. Y . T. Leung, A. Andonian, and B. Zhou, “Cross-view Semantic Segmentation for Sensing Surroundings,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4867–4873, 2020

work page 2020
[37]

Cross-View Transformers for Real-Time Map-View Semantic Segmentation,

B. Zhou and P. Kr ¨ahenb¨uhl, “Cross-View Transformers for Real-Time Map-View Semantic Segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 760–13 769

work page 2022
[38]

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer,

S. Chen, T. Cheng, X. Wang, W. Meng, Q. Zhang, and W. Liu, “Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer,”arXiv preprint arXiv:2206.04584, 2022

work page arXiv 2022
[39]

HDMapNet: An Online HD Map Construction and Evaluation Framework,

Q. Li, Y . Wang, Y . Wang, and H. Zhao, “HDMapNet: An Online HD Map Construction and Evaluation Framework,” inInternational Conference on Robotics and Automation, 2022, pp. 4628–4634

work page 2022
[40]

VectorMapNet: End- to-end Vectorized HD Map Learning,

Y . Liu, T. Yuan, Y . Wang, Y . Wang, and H. Zhao, “VectorMapNet: End- to-end Vectorized HD Map Learning,” inInternational Conference on Machine Learning, 2023, pp. 22 352–22 369

work page 2023
[41]

Maptr: Structured modeling and learning for online vectorized hd map construction,

B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction,”arXiv preprint arXiv:2208.14437, 2022

work page arXiv 2022
[42]

MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction,

B. Liao, S. Chen, Y . Zhang, B. Jiang, Q. Zhang, W. Liu, C. Huang, and X. Wang, “MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction,”arXiv preprint arXiv:2308.05736, 2023

work page arXiv 2023
[43]

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction,

Z. Liu, X. Zhang, G. Liu, J. Zhao, and N. Xu, “Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction,” inEuropean Conference on Computer Vision, 2025, pp. 461–477

work page 2025
[44]

StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Con- struction,

T. Yuan, Y . Liu, Y . Wang, Y . Wang, and H. Zhao, “StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Con- struction,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7356–7365

work page 2024
[45]

Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images,

Y . B. Can, A. Liniger, D. P. Paudel, and L. Van Gool, “Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 661–15 670

work page 2021
[46]

End-to-End Object Detection with Transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” in European Conference on Computer Vision, 2020, pp. 213–229

work page 2020
[47]

CenterLineDet: CenterLine Graph Detection for Road Lanes with Vehicle-mounted Sensors by Transformer for HD Map Generation,

Z. Xu, Y . Liu, Y . Sun, M. Liu, and L. Wang, “CenterLineDet: CenterLine Graph Detection for Road Lanes with Vehicle-mounted Sensors by Transformer for HD Map Generation,” inInternational Conference on Robotics and Automation, 2023, pp. 3553–3559

work page 2023
[48]

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction,

Liao, Bencheng and Chen, Shaoyu and Jiang, Bo and Cheng, Tianheng and Zhang, Qian and Liu, Wenyu and Huang, Chang and Wang, Xing- gang, “Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction,” inEuropean Conference on Computer Vision, 2025, pp. 334–351

work page 2025
[49]

Continuity Preserving Online CenterLine Graph Learning,

Y . Han, K. Yu, and Z. Li, “Continuity Preserving Online CenterLine Graph Learning,” inEuropean Conference on Computer Vision, 2024, pp. 342–359

work page 2024
[50]

RATopo: Improving Lane Topology Reasoning via Redundancy Assignment,

H. Li, S. Huang, L. Xu, Y . Gao, B. Mu, and S. Liu, “RATopo: Improving Lane Topology Reasoning via Redundancy Assignment,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 777–786

work page 2025
[51]

SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval- Augmented Generation,

H. Ye, M. Qi, Z. Liu, L. Liu, and H. Ma, “SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval- Augmented Generation,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 11 170–11 178

work page 2025
[52]

SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation,

C. Lv, M. Qi, X. Li, Z. Yang, and H. Ma, “SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, 2024, pp. 4035–4043

work page 2024
[53]

Attentive Relational Networks for Mapping Images to Scene Graphs,

M. Qi, W. Li, Z. Yang, Y . Wang, and J. Luo, “Attentive Relational Networks for Mapping Images to Scene Graphs,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3957–3966

work page 2019
[54]

Deep Residual Learning for Image Recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778

work page 2016
[55]

Feature Pyramid Networks for Object Detection,

T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125

work page 2017
[56]

BEV- Former: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers,

Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y . Qiao, and J. Dai, “BEV- Former: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers,” inEuropean Conference on Computer Vision, 2022, pp. 1–18

work page 2022
[57]

Focal Loss for Dense Object Detection,

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal Loss for Dense Object Detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2980–2988

work page 2017
[58]

Generalized Intersection over Union: A Metric and A Loss for Bound- ing Box Regression,

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized Intersection over Union: A Metric and A Loss for Bound- ing Box Regression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 658–666

work page 2019
[59]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2016

work page 2016
[60]

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment,

Q. Chen, X. Chen, J. Wang, S. Zhang, K. Yao, H. Feng, J. Han, E. Ding, G. Zeng, and J. Wang, “Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6633–6642

work page 2023
[61]

DETRs with Hybrid Matching,

D. Jia, Y . Yuan, H. He, X. Wu, H. Yu, W. Lin, L. Sun, C. Zhang, and H. Hu, “DETRs with Hybrid Matching,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 702–19 712. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 13

work page 2023
[62]

Decoupled Weight Decay Regularization

I. Loshchilov, “Decoupled Weight Decay Regularization,”arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[63]

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

Y . Luo, L. Zheng, T. Guan, J. Yu, and Y . Yang, “Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation.” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2507–2516

work page 2019
[64]

Category-Level Adversarial Adaptation for Semantic Segmentation using Purified Fea- tures

Y . Luo, P. Liu, L. Zheng, T. Guan, J. Yu, and Y . Yang, “Category-Level Adversarial Adaptation for Semantic Segmentation using Purified Fea- tures.”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 3940–3956, 2021

work page 2021
[65]

Kill Two Birds with One Stone: Domain Generalization for Semantic Segmentation via Network Pruning

Y . Luo, P. Liu, and Y . Yang, “Kill Two Birds with One Stone: Domain Generalization for Semantic Segmentation via Network Pruning.”In- ternational Journal of Computer Vision, vol. 133, no. 1, pp. 335–352, 2025

work page 2025