Beyond Chamfer Distance: Granular Order-aware Evaluation Metric For Online Mapping

Adam Lilja; Chouaib Bencheikh Lehocine; Junsheng Fu; Lars Hammarstrand

arxiv: 2605.22578 · v1 · pith:GTH2KDZJnew · submitted 2026-05-21 · 💻 cs.CV

Beyond Chamfer Distance: Granular Order-aware Evaluation Metric For Online Mapping

Chouaib Bencheikh Lehocine , Adam Lilja , Junsheng Fu , Lars Hammarstrand This is my paper

Pith reviewed 2026-05-22 06:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords online mappingevaluation metricchamfer distancepolylineorder-awareautonomous drivingnuScenesdetection quality

0 comments

The pith

PLD and SOSPA metrics rank online mapping methods by identifying detection as the main performance bottleneck unlike mAP.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current evaluation relies on mean average precision with thresholded Chamfer distance, which ignores the order of points in polylines and polygons and gives only coarse signals about geometric accuracy. The paper introduces SOSPA as a single-instance measure that respects sequence order while satisfying metric properties, and PLD as a multi-instance framework that uses soft assignment to jointly score detection and localization quality. On nuScenes data, these metrics produce different rankings of leading methods such as MapTRv2, StreamMapNet and MapTracker, and decompose errors to show that detection failures outweigh geometric ones. A reader would care because sharper diagnostics can steer research toward the changes that actually improve map estimation for autonomous driving.

Core claim

The central claim is that SOSPA provides an order-aware, axiom-satisfying replacement for Chamfer distance on individual map elements, while PLD replaces hard-threshold mAP with a soft joint measure of detection and geometry; together they yield rankings and error breakdowns on nuScenes that expose detection capability as the dominant current limitation.

What carries the argument

SOSPA (sequence optimal sub-pattern assignment) for order-preserving single-geometry comparison, and PLD (polyline localisation and detection) for soft multi-instance scoring that avoids binary thresholds.

Load-bearing premise

The entire argument assumes that order-aware optimal assignment and soft matching capture geometric and detection quality more faithfully than hard-thresholded Chamfer mAP.

What would settle it

If PLD and traditional mAP produce identical method rankings and error trends when both are run on the same nuScenes predictions from MapTRv2, StreamMapNet and MapTracker, the claimed advantage in granularity would not hold.

Figures

Figures reproduced from arXiv: 2605.22578 by Adam Lilja, Chouaib Bencheikh Lehocine, Junsheng Fu, Lars Hammarstrand.

**Figure 1.** Figure 1: CD-based mAP reports a perfect score of 1.0 for predictions diagonally shifted by 1.0m from the ground truth, with misordering in one instance. Our proposed mPLD correctly penalizes such predictions, yielding a score that reflects the geometric degradation. (τ ∈ {1.0, 1.5, 2.0} is used for AP, and c = 1.5, p = 1 for PLD.) Additionally, it violates the triangle inequality, rendering it non-metric. The metri… view at source ↗

**Figure 2.** Figure 2: Illustrative examples highlighting the advantages of PLD. All confidences set to [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative example from MapTracker on R100, comparing AP to PLD for pedestrian crossing (blue), divider (red), and boundary (green). (τ ∈ {1.0, 1.5, 2.0} is used for AP, and c = 1.5, p = 1 for PLD. Confidences are shown on the figure.) the detection error e¯det. (mDet.) dominates the localization error e¯ (c,p) loc. (mLoc.) by a factor of two to four. Detection capability, i.e., missed instances and false… view at source ↗

**Figure 4.** Figure 4: Additional qualitative example from StreamMapNet on [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Additional qualitative example from StreamMapNet on [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Additional qualitative example from StreamMapNet on [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

Online map estimation is a crucial component of autonomous driving systems that reduces the reliance on costly high-definition maps. State-of-the-art (SOTA) methods commonly predict map elements as ordered sequences of points that form polylines and polygons. The evaluation of these methods relies predominantly on mean average precision (mAP) based on thresholded Chamfer distance (CD). This framework lacks sensitivity to point ordering and provides limited granularity in assessing geometric quality, making it difficult to distinguish which methods truly excel over others. In this work, we address these limitations on two fronts. For the single-instance similarity measure, we introduce sequence optimal sub-pattern assignment (SOSPA), an order-aware metric that enables fine-grained evaluation of individual geometries while satisfying all metric axioms. For the multi-instance evaluation framework, we propose polyline localisation and detection (PLD), a soft metric that jointly captures detection quality and geometric accuracy, replacing the hard thresholding of mAP with a principled soft assignment. Through evaluations on nuScenes, we demonstrate that PLD effectively ranks SOTA online mapping methods (MapTRv2, StreamMapNet, MapTracker) while providing a decomposed error analysis. This analysis identifies detection capability as the dominant bottleneck in current methods, revealing a performance trend that mAP fails to capture. Code for evaluation using our metrics will be released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims to introduce SOSPA, a sequence optimal sub-pattern assignment metric for order-aware single-instance similarity of polylines/polygons that satisfies all metric axioms, and PLD, a soft-assignment multi-instance metric that jointly evaluates detection quality and geometric accuracy without hard thresholding. On nuScenes, PLD ranks SOTA methods (MapTRv2, StreamMapNet, MapTracker) and decomposes errors to identify detection capability as the dominant bottleneck, a trend not captured by mAP-based evaluation.

Significance. If the metrics are rigorously validated and the decomposition is shown to be robust, the work could improve evaluation practices in online mapping for autonomous driving by providing more granular, order-sensitive, and decomposable alternatives to thresholded Chamfer distance and mAP.

major comments (1)

[PLD definition and decomposed error analysis] The headline result that PLD identifies detection as the dominant bottleneck (and reveals trends invisible to mAP) depends on a decomposed error analysis derived from the joint soft assignment. Because the assignment simultaneously scores localisation and existence, any split into separate 'detection' and 'geometry' terms requires an explicit partitioning of the cost. Please provide the precise formula or procedure used for this decomposition (in the PLD definition section or associated equation) and demonstrate invariance to relative scaling of the geometric versus existence cost terms. Without this, the dominance conclusion risks being an artifact of the chosen weights rather than an intrinsic property of the evaluated methods.

minor comments (1)

[Abstract] The abstract states that SOSPA satisfies all metric axioms but provides no sketch or reference to the proof; a brief indication of which axioms are verified and where the verification appears would aid readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for the constructive feedback. Below we respond to the major comment regarding the PLD decomposed error analysis.

read point-by-point responses

Referee: The headline result that PLD identifies detection as the dominant bottleneck (and reveals trends invisible to mAP) depends on a decomposed error analysis derived from the joint soft assignment. Because the assignment simultaneously scores localisation and existence, any split into separate 'detection' and 'geometry' terms requires an explicit partitioning of the cost. Please provide the precise formula or procedure used for this decomposition (in the PLD definition section or associated equation) and demonstrate invariance to relative scaling of the geometric versus existence cost terms. Without this, the dominance conclusion risks being an artifact of the chosen weights rather than an intrinsic property of the evaluated methods.

Authors: We agree that an explicit description of the decomposition is essential to substantiate the claim. The PLD metric uses a joint soft assignment based on a cost that combines existence probability and geometric similarity via SOSPA. The decomposition separates the total error into detection (existence-related assignment costs for unmatched or falsely matched elements) and geometry (SOSPA costs for correctly assigned elements). We will provide the precise mathematical formulation of this partitioning in the PLD definition section. Additionally, to address the scaling invariance, we will include an analysis showing that the relative dominance of detection error persists under different weightings of the geometric and existence terms (e.g., varying the balance parameter by factors of 0.5x to 2x). This will be added to the experiments section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

New metrics defined from first principles with independent empirical evaluation

full rationale

The paper proposes SOSPA as an order-aware single-instance metric satisfying metric axioms and PLD as a soft-assignment multi-instance framework that jointly scores detection and geometry while replacing mAP hard thresholds. These are introduced as novel constructions rather than derived from prior fitted parameters or self-cited equations. The decomposed error analysis and claim that detection is the dominant bottleneck are outputs of applying the metrics to nuScenes evaluations of MapTRv2, StreamMapNet, and MapTracker; they do not reduce by construction to the metric definitions themselves or to any self-citation chain. No self-definitional, fitted-input, or uniqueness-imported steps appear in the derivation. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the unproven assertion that the new metrics satisfy all metric axioms and that soft assignment better captures performance than hard thresholds; no free parameters or invented entities are described.

axioms (2)

domain assumption SOSPA satisfies all metric axioms
Stated in the abstract as a property of the single-instance measure.
domain assumption Soft assignment in PLD is a principled replacement for hard thresholding
Abstract presents PLD as jointly capturing detection and geometry via soft assignment.

pith-pipeline@v0.9.0 · 5779 in / 1341 out tokens · 40143 ms · 2026-05-22T06:57:02.155208+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Q. Li, Y . Wang, Y . Wang, and H. Zhao. HDMapNet: An online HD map construction and evaluation framework. InProc. 2022 IEEE Int. Conf. Robot. Autom. (ICRA), pages 4628–4634, 2022

work page 2022
[2]

Y . Liu, T. Yuan, Y . Wang, Y . Wang, and H. Zhao. VectorMapNet: End-to-end vectorized HD map learning. InProc. 40th Int. Conf. Mach. Learn. (ICML), volume 202 ofProc. Mach. Learn. Res., pages 22352–22369. PMLR, 2023

work page 2023
[3]

B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang. MapTR: Structured modeling and learning for online vectorized HD map construction. InProc. Int. Conf. Learn. Represent. (ICLR), 2023

work page 2023
[4]

B. Liao, S. Chen, Y . Zhang, B. Jiang, Q. Zhang, W. Liu, C. Huang, and X. Wang. MapTRv2: An end-to-end framework for online vectorized HD map construction.Int. J. Comput. Vis., pages 1–23, 2024

work page 2024
[5]

T. Yuan, Y . Liu, Y . Wang, Y . Wang, and H. Zhao. StreamMapNet: Streaming mapping network for vectorized online HD map construction. InProc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV). IEEE, 2024

work page 2024
[6]

J. Chen, Y . Wu, J. Tan, H. Ma, and Y . Furukawa. MapTracker: Tracking with strided memory fusion for consistent vector HD mapping. InProc. Eur. Conf. Comput. Vis. (ECCV), Lecture Notes in Computer Science. Springer, 2024. Oral presentation

work page 2024
[7]

H. Wang, T. Li, Y . Li, L. Chen, C. Sima, Z. Liu, B. Wang, P. Jia, Y . Wang, S. Jiang, F. Wen, H. Xu, P. Luo, J. Yan, W. Zhang, and H. Li. Openlane-v2: A topology reasoning benchmark for unified 3d HD mapping. InProc. 37th Int. Conf. Neural Inf. Process. Syst. (NeurIPS), pages 827–838, New Orleans, LA, USA, 2023. Curran Associates Inc

work page 2023
[8]

Schuhmacher, B.-T

D. Schuhmacher, B.-T. V o, and B.-N. V o. A consistent metric for performance evaluation of multi-object filters.IEEE Trans. Signal Process., 56(9):3447–3457, September 2008

work page 2008
[9]

A. S. Rahmathullah, Á. F. García-Fernández, and L. Svensson. Generalized optimal sub-pattern assignment metric. InProc. 20th Int. Conf. Inf. Fusion (Fusion). IEEE, July 2017

work page 2017
[10]

Y . Xia, Á. F. García-Fernández, J. Karlsson, T. Yuan, K.-C. Chang, and L. Svensson. Proba- bilistic GOSPA: A metric for performance evaluation of multi-object filters with uncertainties. IEEE Trans. Aerosp. Electron. Syst., 2025

work page 2025
[11]

R. A. Wagner and M. J. Fischer. The string-to-string correction problem.J. ACM, 21(1):168–178, 1974

work page 1974
[12]

M. Maes. On a cyclic string-to-string correction problem.Inf. Process. Lett., 35(2):73–78, 1990

work page 1990
[13]

Caesar, V

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom. nuScenes: A multimodal dataset for autonomous driving. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 11621–11631, 2020

work page 2020
[14]

Carion, F

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End-to-end object detection with transformers. InProc. Eur. Conf. Comput. Vis. (ECCV), pages 213–229, Cham, Switzerland, 2020. Springer International Publishing

work page 2020
[15]

Á. F. García-Fernández, A. S. Rahmathullah, and L. Svensson. A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms.IEEE Trans. Signal Process., 68:3917–3928, 2020

work page 2020
[16]

J. Gu, Á. F. García-Fernández, R. E. Firth, and L. Svensson. Graph GOSPA metric: A metric to measure the discrepancy between graphs of different sizes.IEEE Trans. Signal Process., 72:4037–4049, 2024. 10

work page 2024
[17]

Li and B

Y . Li and B. Liu. A normalized levenshtein distance metric.IEEE Trans. Pattern Anal. Mach. Intell., 29(6):1091–1095, 2007

work page 2007
[18]

V . I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

work page 1966
[19]

Marzal and E

A. Marzal and E. Vidal. Computation of normalized edit distance and applications.IEEE Trans. Pattern Anal. Mach. Intell., 15(9):926–932, 1993

work page 1993
[20]

S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Molecular Biology, 48(3):443–453, 1970

work page 1970
[21]

V odolazskiy

E. V odolazskiy. Discrete Fréchet distance for closed curves.Comput. Geom., 111:101967, 2023

work page 2023
[22]

Marzal and S

A. Marzal and S. Barrachina. Speeding up the computation of the edit distance for cyclic strings. InProc. 15th Int. Conf. Pattern Recognit. (ICPR), volume 2, pages 891–894, 2000

work page 2000
[23]

Lilja, J

A. Lilja, J. Fu, E. Stenborg, and L. Hammarstrand. Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). IEEE, 2024. 11 Table 4: Effect of sampling distance on mPLD (c= 1.5 , p= 1 ) and CD-mAP benchmarked using StreamMapNet on R60. The runtime is based on ...

work page arXiv 2024

[1] [1]

Q. Li, Y . Wang, Y . Wang, and H. Zhao. HDMapNet: An online HD map construction and evaluation framework. InProc. 2022 IEEE Int. Conf. Robot. Autom. (ICRA), pages 4628–4634, 2022

work page 2022

[2] [2]

Y . Liu, T. Yuan, Y . Wang, Y . Wang, and H. Zhao. VectorMapNet: End-to-end vectorized HD map learning. InProc. 40th Int. Conf. Mach. Learn. (ICML), volume 202 ofProc. Mach. Learn. Res., pages 22352–22369. PMLR, 2023

work page 2023

[3] [3]

B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang. MapTR: Structured modeling and learning for online vectorized HD map construction. InProc. Int. Conf. Learn. Represent. (ICLR), 2023

work page 2023

[4] [4]

B. Liao, S. Chen, Y . Zhang, B. Jiang, Q. Zhang, W. Liu, C. Huang, and X. Wang. MapTRv2: An end-to-end framework for online vectorized HD map construction.Int. J. Comput. Vis., pages 1–23, 2024

work page 2024

[5] [5]

T. Yuan, Y . Liu, Y . Wang, Y . Wang, and H. Zhao. StreamMapNet: Streaming mapping network for vectorized online HD map construction. InProc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV). IEEE, 2024

work page 2024

[6] [6]

J. Chen, Y . Wu, J. Tan, H. Ma, and Y . Furukawa. MapTracker: Tracking with strided memory fusion for consistent vector HD mapping. InProc. Eur. Conf. Comput. Vis. (ECCV), Lecture Notes in Computer Science. Springer, 2024. Oral presentation

work page 2024

[7] [7]

H. Wang, T. Li, Y . Li, L. Chen, C. Sima, Z. Liu, B. Wang, P. Jia, Y . Wang, S. Jiang, F. Wen, H. Xu, P. Luo, J. Yan, W. Zhang, and H. Li. Openlane-v2: A topology reasoning benchmark for unified 3d HD mapping. InProc. 37th Int. Conf. Neural Inf. Process. Syst. (NeurIPS), pages 827–838, New Orleans, LA, USA, 2023. Curran Associates Inc

work page 2023

[8] [8]

Schuhmacher, B.-T

D. Schuhmacher, B.-T. V o, and B.-N. V o. A consistent metric for performance evaluation of multi-object filters.IEEE Trans. Signal Process., 56(9):3447–3457, September 2008

work page 2008

[9] [9]

A. S. Rahmathullah, Á. F. García-Fernández, and L. Svensson. Generalized optimal sub-pattern assignment metric. InProc. 20th Int. Conf. Inf. Fusion (Fusion). IEEE, July 2017

work page 2017

[10] [10]

Y . Xia, Á. F. García-Fernández, J. Karlsson, T. Yuan, K.-C. Chang, and L. Svensson. Proba- bilistic GOSPA: A metric for performance evaluation of multi-object filters with uncertainties. IEEE Trans. Aerosp. Electron. Syst., 2025

work page 2025

[11] [11]

R. A. Wagner and M. J. Fischer. The string-to-string correction problem.J. ACM, 21(1):168–178, 1974

work page 1974

[12] [12]

M. Maes. On a cyclic string-to-string correction problem.Inf. Process. Lett., 35(2):73–78, 1990

work page 1990

[13] [13]

Caesar, V

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom. nuScenes: A multimodal dataset for autonomous driving. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 11621–11631, 2020

work page 2020

[14] [14]

Carion, F

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End-to-end object detection with transformers. InProc. Eur. Conf. Comput. Vis. (ECCV), pages 213–229, Cham, Switzerland, 2020. Springer International Publishing

work page 2020

[15] [15]

Á. F. García-Fernández, A. S. Rahmathullah, and L. Svensson. A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms.IEEE Trans. Signal Process., 68:3917–3928, 2020

work page 2020

[16] [16]

J. Gu, Á. F. García-Fernández, R. E. Firth, and L. Svensson. Graph GOSPA metric: A metric to measure the discrepancy between graphs of different sizes.IEEE Trans. Signal Process., 72:4037–4049, 2024. 10

work page 2024

[17] [17]

Li and B

Y . Li and B. Liu. A normalized levenshtein distance metric.IEEE Trans. Pattern Anal. Mach. Intell., 29(6):1091–1095, 2007

work page 2007

[18] [18]

V . I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

work page 1966

[19] [19]

Marzal and E

A. Marzal and E. Vidal. Computation of normalized edit distance and applications.IEEE Trans. Pattern Anal. Mach. Intell., 15(9):926–932, 1993

work page 1993

[20] [20]

S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Molecular Biology, 48(3):443–453, 1970

work page 1970

[21] [21]

V odolazskiy

E. V odolazskiy. Discrete Fréchet distance for closed curves.Comput. Geom., 111:101967, 2023

work page 2023

[22] [22]

Marzal and S

A. Marzal and S. Barrachina. Speeding up the computation of the edit distance for cyclic strings. InProc. 15th Int. Conf. Pattern Recognit. (ICPR), volume 2, pages 891–894, 2000

work page 2000

[23] [23]

Lilja, J

A. Lilja, J. Fu, E. Stenborg, and L. Hammarstrand. Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). IEEE, 2024. 11 Table 4: Effect of sampling distance on mPLD (c= 1.5 , p= 1 ) and CD-mAP benchmarked using StreamMapNet on R60. The runtime is based on ...

work page arXiv 2024