pith. machine review for the scientific record. sign in

arxiv: 2603.01558 · v2 · submitted 2026-03-02 · 💻 cs.CV

Recognition: no theorem link

TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding

Authors on Pith no claims yet

Pith reviewed 2026-05-15 18:00 UTC · model grok-4.3

classification 💻 cs.CV
keywords road topology3D centerline extractionBEV mask predictiondense offset fieldheight mapgeographic data leakagelong-range benchmark
0
0 comments X

The pith

TopoMaskV3 adds dense offset and height heads to mask-based road topology so the pipeline runs as a standalone 3D predictor and reaches 28.5 OLS on geographically disjoint long-range tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Road topology methods extract centerlines from sensor data for mapping and planning. Earlier mask approaches stayed in 2D and still needed a separate parametric head to fix grid errors. TopoMaskV3 inserts an offset field that nudges each mask pixel to sub-grid accuracy and a height map that supplies elevation directly from the same dense representation. The authors also release new train-test splits that keep geographic regions completely separate and add a long-range evaluation out to 100 m. On this stricter benchmark the updated mask pipeline sets a new record while showing less overfitting than Bezier alternatives.

Core claim

TopoMaskV3 extends the mask pipeline with two dense heads—one predicting a 2D offset field inside each BEV cell to correct discretization and one predicting a height value per cell for direct 3D centerline recovery—removing the need for any parametric fusion stage. The same work introduces geographically distinct data partitions and a long-range benchmark that together eliminate location-based memorization, after which the mask representation proves more robust than prior Bezier methods and LiDAR fusion yields its largest relative gains at distance.

What carries the argument

Dense offset field and dense height map heads that operate directly on the BEV mask grid to supply sub-pixel corrections and elevation without a separate parametric branch.

If this is right

  • Mask representations exhibit lower geographic overfitting than Bezier curve methods on the new splits.
  • LiDAR fusion improves scores most at long range and shows bigger gains on the original overlapping split.
  • Standalone 3D centerline extraction becomes possible without hybrid fusion stages.
  • Geographically disjoint evaluation becomes the required standard for fair road topology benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Autonomous systems could deploy the model in entirely new cities without retraining on local map data.
  • The same offset-plus-height pattern could be tested on other linear 3D structures such as overhead wires.
  • Removing the need for parametric fusion may lower compute cost enough for real-time onboard use.
  • Future datasets should adopt the geographic-split protocol as default practice.

Load-bearing premise

The offset and height heads produce accurate 3D corrections on their own and the new geographic splits remove every form of location-based memorization.

What would settle it

Retraining the model without the offset or height heads and measuring whether OLS on the disjoint long-range split falls below the previous best methods.

Figures

Figures reproduced from arXiv: 2603.01558 by Alptekin Temizel, Halil Ibrahim Ozturk, Muhammet Esat Kalfaoglu, Ozsel Kilinc.

Figure 1
Figure 1. Figure 1: Quad-Direction Labels Encoding. Each centerline is assigned one of four directional labels: up, down, left, or right, based on majority voting between consecutive points. Ties are resolved using the angle between the start and end points. 2.2. Multi-modal and Temporal Road Topology Un￾derstanding • Multi-Modality: A significant trend is the use of Stan￾dard Definition (SD) map priors. SMERF [27] to￾kenizes… view at source ↗
Figure 2
Figure 2. Figure 2: TopoMaskV3 Architecture Overview. The method adopts an instance-query-based design. Bird’s Eye View (BEV) features extracted from multi-camera images are processed by a transformer decoder that predicts: binary masks, quad-direction labels, 2D offsets, and height maps. A quad-direction-aware post-processing step then converts these dense outputs into 3D centerline instances. Offset Head Height Head Transfo… view at source ↗
Figure 3
Figure 3. Figure 3: TopoMaskV3 Decoder Architecture. Each sparse query is decoded by five parallel heads, each predicting a different centerline attribute. (which uses the primary path), but it is essential for two spe￾cific extensions: (i) replacing the baseline Masked Atten￾tion (MA) with Bezier Deformable Attention (BDA) [10] (See Section S.1), or (ii) enabling the output fusion mecha￾nism described below. When the Bezier … view at source ↗
read the original abstract

Mask-based paradigms for road topology understanding, such as TopoMaskV2, offer a complementary alternative to query-based methods by generating centerlines via a dense rasterized intermediate representation. However, prior work was limited to 2D predictions and suffered from severe discretization artifacts, necessitating fusion with parametric heads. We introduce TopoMaskV3, which advances this pipeline into a robust, standalone 3D predictor via two novel dense prediction heads: a dense offset field for sub-grid discretization correction within the existing BEV resolution, and a dense height map for direct 3D estimation. Beyond the architecture, we are the first to address geographic data leakage in road topology evaluation by introducing (1) geographically distinct splits to prevent memorization and ensure fair generalization, and (2) a long-range (+/-100 m) benchmark. TopoMaskV3 achieves state-of-the-art 28.5 OLS on this geographically disjoint benchmark, surpassing all prior methods. Our analysis shows that the mask representation is more robust to geographic overfitting than Bezier, while LiDAR fusion is most beneficial at long range and exhibits larger relative gains on the overlapping original split, suggesting overlap-induced memorization effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces TopoMaskV3, extending mask-based road topology methods to 3D via two new dense prediction heads (offset field for sub-grid correction and height map for direct 3D estimation). It also defines geographically distinct train/test splits and a long-range benchmark to reduce geographic leakage, reporting SOTA performance of 28.5 OLS on the disjoint split while providing analysis of mask robustness versus Bezier curves and LiDAR fusion benefits.

Significance. If the empirical gains are confirmed, the work offers a standalone 3D centerline predictor that avoids parametric fusion and introduces evaluation practices that could reduce memorization risks in geographic datasets; the mask-vs-Bezier robustness findings may guide representation choices in future topology models.

major comments (2)
  1. [Abstract] Abstract and Experiments section: the central claim that the dense offset and height heads produce accurate 3D corrections sufficient for standalone use (without additional parametric fusion) is load-bearing for the 28.5 OLS result, yet the visible text provides no ablation tables, error analysis, or quantitative attribution linking these heads specifically to the reported gain over prior methods.
  2. [Abstract] Abstract: the geographically distinct splits are asserted to eliminate location-based memorization, but no quantitative verification (e.g., performance metrics under stricter feature-matched cross-region testing or comparison of regional pattern similarity) is supplied to rule out residual leakage via shared road topologies or densities, which directly undermines the fairness of the new benchmark and the SOTA claim.
minor comments (2)
  1. Clarify the precise formulation of the OLS metric for 3D predictions and how height/offset errors are incorporated.
  2. Add explicit comparison tables showing results on both the original overlapping split and the new disjoint split to quantify the memorization effect mentioned in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to provide the requested empirical support.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Experiments section: the central claim that the dense offset and height heads produce accurate 3D corrections sufficient for standalone use (without additional parametric fusion) is load-bearing for the 28.5 OLS result, yet the visible text provides no ablation tables, error analysis, or quantitative attribution linking these heads specifically to the reported gain over prior methods.

    Authors: We agree that the manuscript would benefit from explicit ablations to attribute the performance gains. In the revised version we will add ablation tables in the Experiments section that isolate the contribution of the dense offset field and height map heads, including direct comparisons of OLS scores with and without each head. We will also include error analysis quantifying the reduction in discretization artifacts and 3D estimation accuracy provided by these heads relative to the prior TopoMaskV2 baseline. revision: yes

  2. Referee: [Abstract] Abstract: the geographically distinct splits are asserted to eliminate location-based memorization, but no quantitative verification (e.g., performance metrics under stricter feature-matched cross-region testing or comparison of regional pattern similarity) is supplied to rule out residual leakage via shared road topologies or densities, which directly undermines the fairness of the new benchmark and the SOTA claim.

    Authors: We acknowledge that stronger quantitative verification of reduced leakage would strengthen the benchmark claims. In revision we will add analysis comparing performance on feature-matched cross-region subsets and report similarity metrics for road topologies and densities across the geographic splits. This will provide direct evidence supporting the fairness of the disjoint benchmark and the reported 28.5 OLS SOTA result. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on empirical evaluation of new heads and splits

full rationale

The paper advances a mask-based pipeline by adding two dense prediction heads (offset field and height map) and introduces geographically distinct splits plus a long-range benchmark. The 28.5 OLS SOTA claim is presented as the outcome of experimental comparison on these splits. No mathematical derivation chain, equations, or self-referential definitions appear in the provided text. No self-citations are used to justify uniqueness theorems, ansatzes, or load-bearing premises. The central results are externally falsifiable via the reported metrics and splits rather than reducing to fitted parameters or prior self-work by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The new heads are architectural additions whose internal hyperparameters are not detailed here.

pith-pipeline@v0.9.0 · 5531 in / 1062 out tokens · 40621 ms · 2026-05-15T18:00:58.159608+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

  1. [1]

    nuscenes: A multi- modal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 6

  2. [2]

    Structured bird’s-eye-view traffic scene un- derstanding from onboard images

    Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, and Luc Van Gool. Structured bird’s-eye-view traffic scene un- derstanding from onboard images. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15661–15670, 2021. 2, 17

  3. [3]

    Efficient and robust 2d-to-bev representation learning via geometry-guided ker- nel transformer.arXiv preprint arXiv:2206.04584, 2022

    Shaoyu Chen, Tianheng Cheng, Xinggang Wang, Wenming Meng, Qian Zhang, and Wenyu Liu. Efficient and robust 2d-to-bev representation learning via geometry-guided ker- nel transformer.arXiv preprint arXiv:2206.04584, 2022. 3

  4. [4]

    Masked-attention mask transformer for universal image segmentation

    Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1290–1299, 2022. 7, 11, 13

  5. [5]

    TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes

    Yanping Fu, Wenbin Liao, Xinyuan Liu, Yike Ma, Feng Dai, Yucheng Zhang, and others. TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes. arXiv preprint arXiv:2405.14747, 2024. 2, 8, 17

  6. [6]

    TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving,

    Yanping Fu, Xinyuan Liu, Tianyu Li, Yike Ma, Yucheng Zhang, and Feng Dai. TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving,

  7. [7]

    arXiv:2505.17771 [cs]. 2

  8. [8]

    Simple-bev: What really mat- ters for multi-sensor bev perception? In2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 2759–2765

    Adam W Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, and Katerina Fragkiadaki. Simple-bev: What really mat- ters for multi-sensor bev perception? In2023 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 2759–2765. IEEE, 2023. 3

  9. [9]

    Bevpoolv2: A cutting-edge implementation of bevdet toward deployment.arXiv preprint arXiv:2211.17111, 2022

    Junjie Huang and Guan Huang. Bevpoolv2: A cutting-edge implementation of bevdet toward deployment.arXiv preprint arXiv:2211.17111, 2022. 14

  10. [10]

    BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

    Junjie Huang, Guan Huang, Zheng Zhu, Ye Yun, and Dalong Du. BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View.arXiv preprint arXiv:2112.11790, 2021. 3

  11. [11]

    TopoBDA: Towards Bezier De- formable Attention for Road Topology Understanding.arXiv preprint arXiv:2412.18951, 2024

    Muhammet Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kil- inc, and Alptekin Temizel. TopoBDA: Towards Bezier De- formable Attention for Road Topology Understanding.arXiv preprint arXiv:2412.18951, 2024. 2, 3, 4, 5, 6, 7, 8, 11, 12, 14, 15, 17

  12. [12]

    TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem.arXiv preprint arXiv:2409.11325, 2024

    Muhammet Esat Kalfaoglu, Halil Ibrahim Ozturk, Ozsel Kilinc, and Alptekin Temizel. TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem.arXiv preprint arXiv:2409.11325, 2024. 1, 2, 3, 6, 8, 13, 14, 15, 17

  13. [13]

    Dn-detr: Accelerate detr training by intro- ducing query denoising

    Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M Ni, and Lei Zhang. Dn-detr: Accelerate detr training by intro- ducing query denoising. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13619–13627, 2022. 14

  14. [14]

    Mask dino: Towards a unified transformer-based framework for object detection and segmentation

    Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3041–3050, 2023. 7, 11

  15. [15]

    Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors, 2024

    Han Li, Zehao Huang, Zitian Wang, Wenge Rong, Naiyan Wang, and Si Liu. Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors, 2024. arXiv:2406.03105 [cs]. 2, 17

  16. [16]

    Hdmapnet: An online hd map construction and evaluation framework

    Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online hd map construction and evaluation framework. In 2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634. IEEE, 2022. 16

  17. [17]

    Graph-based topology reasoning for driving scenes.arXiv preprint arXiv:2304.05277, 2023

    Tianyu Li, Li Chen, Huijie Wang, Yang Li, Jiazhi Yang, Xiangwei Geng, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, and others. Graph-based topology reasoning for driving scenes.arXiv preprint arXiv:2304.05277, 2023. 2, 6, 8, 13, 17

  18. [18]

    LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

    Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, and Hongyang Li. LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving. In ICLR, 2024. 2

  19. [19]

    Bevdepth: Acquisition of reliable depth for multi-view 3d object detec- tion

    Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detec- tion. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1477–1485, 2023. 3

  20. [20]

    Fast-bev: A fast and strong bird’s-eye view perception baseline.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8665– 8679, 2024

    Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, and others. Fast-bev: A fast and strong bird’s-eye view perception baseline.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):8665– 8679, 2024. Publisher: IEEE. 3

  21. [21]

    Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2024

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2024. Publisher: IEEE. 3

  22. [22]

    Lane Graph as Path: Continuity-preserving Path- wise Modeling for Online Lane Graph Construction.arXiv preprint arXiv:2303.08815, 2023

    Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Lane Graph as Path: Continuity-preserving Path- wise Modeling for Online Lane Graph Construction.arXiv preprint arXiv:2303.08815, 2023. 2

  23. [23]

    Maptr: Structured modeling and learning for online vectorized hd map construction

    Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, and Chang Huang. Maptr: Structured modeling and learning for online vectorized hd map construction. InInternational Conference on Learning Representations, 2023. 2, 16, 17

  24. [24]

    Maptrv2: An end-to-end framework for online vectorized hd map construction.International Journal of Computer Vision, pages 1–23, 2024

    Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Maptrv2: An end-to-end framework for online vectorized hd map construction.International Journal of Computer Vision, pages 1–23, 2024. Publisher: Springer. 2

  25. [25]

    Localization Is All You Evaluate: Data Leak- 9 age in Online Mapping Datasets and How to Fix It, 2024

    Adam Lilja, Junsheng Fu, Erik Stenborg, and Lars Ham- marstrand. Localization Is All You Evaluate: Data Leak- 9 age in Online Mapping Datasets and How to Fix It, 2024. arXiv:2312.06420 [cs]. 1, 2, 3, 6, 16

  26. [26]

    DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

    Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR. InInternational Conference on Learning Representations, 2022. 13, 14

  27. [27]

    Vectormapnet: End-to-end vectorized hd map learning

    Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, and Hang Zhao. Vectormapnet: End-to-end vectorized hd map learning. InInternational Conference on Machine Learning, pages 22352–22369. PMLR, 2023. 16, 17

  28. [28]

    Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps.arXiv preprint arXiv:2311.04079, 2023

    Katie Z Luo, Xinshuo Weng, Yan Wang, Shuang Wu, Jie Li, Kilian Q Weinberger, Yue Wang, and Marco Pavone. Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps.arXiv preprint arXiv:2311.04079, 2023. 3

  29. [29]

    T2SG: Traffic Topology Scene Graph for Topol- ogy Reasoning in Autonomous Driving.arXiv preprint arXiv:2411.18894, 2024

    Changsheng Lv, Mengshi Qi, Liang Liu, and Huadong Ma. T2SG: Traffic Topology Scene Graph for Topol- ogy Reasoning in Autonomous Driving.arXiv preprint arXiv:2411.18894, 2024. 2, 17

  30. [30]

    RoadPainter: Points Are Ideal Navigators for Topology transformER.arXiv preprint arXiv:2407.15349, 2024

    Zhongxing Ma, Shuang Liang, Yongkun Wen, Weixin Lu, and Guowei Wan. RoadPainter: Points Are Ideal Navigators for Topology transformER.arXiv preprint arXiv:2407.15349, 2024. 17

  31. [31]

    TorchVision: Py- Torch’s Computer Vision library, 2016

    TorchVision maintainers and contributors. TorchVision: Py- Torch’s Computer Vision library, 2016. 14

  32. [32]

    Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unproject- ing to 3d

    Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unproject- ing to 3d. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceed- ings, Part XIV 16, pages 194–210. Springer, 2020. 3, 12, 14

  33. [33]

    Openlane-v2: A topology rea- soning benchmark for unified 3d hd mapping.Advances in Neural Information Processing Systems, 36, 2024

    Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, and others. Openlane-v2: A topology rea- soning benchmark for unified 3d hd mapping.Advances in Neural Information Processing Systems, 36, 2024. 6

  34. [34]

    Exploring object-centric temporal modeling for efficient multi-view 3d object detection

    Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, and Xi- angyu Zhang. Exploring object-centric temporal modeling for efficient multi-view 3d object detection. InProceedings of the IEEE/CVF international conference on computer vi- sion, pages 3621–3631, 2023. 3

  35. [35]

    Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

    Benjamin Wilson, William Qi, Tanmay Agarwal, John Lam- bert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Rat- nesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, and others. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 6

  36. [36]

    TopoMLP: An Simple yet Strong Pipeline for Driving Topology Reasoning.ICLR, 2024

    Dongming Wu, Jiahao Chang, Fan Jia, Yingfei Liu, Tiancai Wang, and Jianbing Shen. TopoMLP: An Simple yet Strong Pipeline for Driving Topology Reasoning.ICLR, 2024. 2, 8, 14, 17

  37. [37]

    Mˆ2BEV: Multi-Camera Joint 3D Detection and Segmen- tation with Unified Birds-Eye View Representation.arXiv preprint arXiv:2204.05088, 2022

    Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, and Jose M Alvarez. Mˆ2BEV: Multi-Camera Joint 3D Detection and Segmen- tation with Unified Birds-Eye View Representation.arXiv preprint arXiv:2204.05088, 2022. 3

  38. [38]

    Second: Sparsely em- bedded convolutional detection.Sensors, 18(10):3337, 2018

    Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely em- bedded convolutional detection.Sensors, 18(10):3337, 2018. Publisher: Multidisciplinary Digital Publishing Institute. 12

  39. [39]

    TopoSD: Topology-Enhanced Lane Segment Percep- tion with SDMap Prior.arXiv preprint arXiv:2411.14751,

    Sen Yang, Minyue Jiang, Ziwei Fan, Xiaolu Xie, Xiao Tan, Yingying Li, Errui Ding, Liang Wang, and Jingdong Wang. TopoSD: Topology-Enhanced Lane Segment Percep- tion with SDMap Prior.arXiv preprint arXiv:2411.14751,

  40. [40]

    FASTopoWM: Fast-Slow Lane Segment Topol- ogy Reasoning with Latent World Models.arXiv preprint arXiv:2507.23325, 2025

    Yiming Yang, Hongbin Lin, Yueru Luo, Suzhong Fu, Chao Zheng, Xinrui Yan, Shuqi Mei, Kun Tang, Shuguang Cui, and Zhen Li. FASTopoWM: Fast-Slow Lane Segment Topol- ogy Reasoning with Latent World Models.arXiv preprint arXiv:2507.23325, 2025. 3

  41. [41]

    TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving,

    Yiming Yang, Yueru Luo, Bingkun He, Hongbin Lin, Suzhong Fu, Chao Yan, Kun Tang, Xinrui Yan, Chao Zheng, Shuguang Cui, and Zhen Li. TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving,

  42. [42]

    arXiv:2507.00709 [cs]. 3

  43. [43]

    SMART: Advancing Scalable Map Priors for Driving Topol- ogy Reasoning.arXiv preprint arXiv:2502.04329, 2025

    Junjie Ye, David Paz, Hengyuan Zhang, Yuliang Guo, Xinyu Huang, Henrik I Christensen, Yue Wang, and Liu Ren. SMART: Advancing Scalable Map Priors for Driving Topol- ogy Reasoning.arXiv preprint arXiv:2502.04329, 2025. 3

  44. [44]

    Streammapnet: Streaming mapping network for vectorized online hd map construction

    Tianyuan Yuan, Yicheng Liu, Yue Wang, Yilun Wang, and Hang Zhao. Streammapnet: Streaming mapping network for vectorized online hd map construction. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7356–7365, 2024. 1, 2, 3, 6, 16

  45. [45]

    DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

    Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection.arXiv preprint arXiv:2203.03605, 2022. 14

  46. [46]

    flawed” V1.1 metric and the “healthy

    Brady Zhou and Philipp Kr ¨ahenb¨uhl. Cross-view transform- ers for real-time map-view semantic segmentation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13760–13769, 2022. 3 10 Supplementary Material for: TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding (a) ...