pith. sign in

arxiv: 2606.03590 · v1 · pith:Y3JC5JYGnew · submitted 2026-06-02 · 💻 cs.RO

CANMOT: Class-Aware Noise Modeling for Multi-Object Tracking in Autonomous Driving

Pith reviewed 2026-06-28 09:56 UTC · model grok-4.3

classification 💻 cs.RO
keywords multi-object trackingKalman filterautonomous drivingnoise modelingclass-awareuncertainty calibrationnuScenesidentity switches
0
0 comments X

The pith

Class-specific noise covariances in Kalman filters cut identity switches in 3D multi-object tracking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the assumption that all traffic participants share identical process and measurement noise characteristics in Kalman filter tracking. It introduces class-specific diagonal covariance matrices that can be aligned to each object's own coordinate frame. Experiments on the nuScenes benchmark show gains in tracking accuracy and large reductions in identity switches compared to standard global-noise baselines. The approach also yields better uncertainty calibration as measured by ANEES and chi-squared tests, although inconsistency persists. These changes require no modification to the underlying filter equations themselves.

Core claim

Class-specific diagonal process and measurement covariance matrices, optionally expressed in the object coordinate frame, improve multi-object tracking performance and substantially reduce identity switches on nuScenes while also improving uncertainty consistency compared with global-noise Kalman filter baselines.

What carries the argument

Class-specific and object-aligned diagonal covariance matrices for process and measurement noise.

If this is right

  • Tracking performance improves over state-of-the-art baselines on nuScenes.
  • Identity switches decrease substantially.
  • Uncertainty estimates become better calibrated according to ANEES and chi-squared violation tests.
  • The underlying Kalman filter framework stays unchanged.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same class-aware diagonal modeling could be tested on other motion models or sensors without redesigning the filter.
  • Remaining inconsistency after the change points to possible value in allowing limited cross-class coupling or learned time variation.
  • If the per-class patterns hold across cities or weather conditions, the matrices could serve as a lightweight prior for new deployments.

Load-bearing premise

Noise characteristics of traffic participants are stationary within each class and can be captured by fixed diagonal covariance matrices.

What would settle it

A controlled experiment on a new dataset that applies non-diagonal or time-varying covariances and finds no tracking gain or worse calibration would falsify the benefit of the fixed class-specific diagonal model.

Figures

Figures reproduced from arXiv: 2606.03590 by Stefan Sch\"utte, Timo Osterburg, Torsten Bertram.

Figure 1
Figure 1. Figure 1: Comparative overview of sample covariance esti [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparison of tracking results of (a) CANMOT and (b) global sample covariance for a sample scene [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Kalman filter (KF)-based multi-object tracking (MOT) remains a strong baseline for autonomous driving due to its strong performance, computational efficiency and interpretability. In most practical systems, the process noise and measurement noise covariances are defined globally and shared across object classes, presuming identical uncertainty characteristics across heterogeneous traffic participants. This work revisits this assumption and proposes CANMOT, a class-aware and object-aligned noise modeling framework for KF-based 3D MOT. Class-specific diagonal process and measurement covariance matrices are introduced and optionally expressed in the object coordinate frame to preserve longitudinal-lateral anisotropy. Systematic experiments on the nuScenes benchmark show that class-aware and object-aligned noise modeling improves tracking performance and substantially reduces identity switches compared to state-of-the-art (SotA). In addition, the consistency of the estimated uncertainty is analyzed using the Average Normalized Estimation Error Squared (ANEES) and $\chi^2$-based violation tests. The results reveal severe overconfidence in standard KF-based MOT baselines. While the proposed formulation improves calibration without modifying the underlying filtering framework, it still exhibits substantial inconsistency, highlighting the need for further research in this area. Code is available at https://github.com/rst-tu-dortmund/learned-3d-nms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes CANMOT, a class-aware and object-aligned noise modeling approach for Kalman filter-based 3D multi-object tracking. It replaces globally shared process and measurement noise covariances with per-class diagonal matrices (optionally rotated into the object frame) and reports improved MOTA, fewer identity switches, and better uncertainty calibration on nuScenes compared to prior SotA baselines, while acknowledging remaining chi-squared and ANEES inconsistencies.

Significance. If the central performance claims hold after clarification, the work demonstrates that modest, interpretable changes to noise modeling can yield measurable gains in tracking metrics and calibration without altering the underlying KF framework. The public code release at the cited GitHub repository is a clear strength that supports reproducibility and follow-on work.

major comments (3)
  1. [Abstract, §3] Abstract and §3: The procedure used to obtain the class-specific diagonal covariance entries is not described (learned, hand-tuned, or cross-validated). If these values were selected or fitted using the same nuScenes splits employed for final evaluation, the reported gains risk circularity; an explicit statement of the data split and selection protocol is required to substantiate the claim that class-aware modeling drives the improvement.
  2. [§4.2, Table 2] §4.2 and Table 2: The ablation isolating the contribution of the object-frame alignment versus the per-class values themselves is missing. Without this decomposition it is unclear whether the reported reductions in ID switches are attributable to the proposed framework or simply to the introduction of additional free parameters.
  3. [§4.3] §4.3: The paper notes that CANMOT still exhibits “substantial inconsistency” on ANEES and χ² tests. Because the central claim links the observed metric gains to improved noise modeling, a quantitative analysis of how much of the MOTA/ID-switch improvement survives when the filter is forced to satisfy the consistency tests would strengthen the attribution.
minor comments (2)
  1. [§3.1] Notation for the rotated covariance matrices should be introduced with an explicit equation (e.g., Eq. (X)) rather than only in prose.
  2. [Figure 3] Figure 3 caption should state the exact number of sequences and frames used for the ANEES plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address each major comment below with clarifications and proposed changes to the manuscript.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3: The procedure used to obtain the class-specific diagonal covariance entries is not described (learned, hand-tuned, or cross-validated). If these values were selected or fitted using the same nuScenes splits employed for final evaluation, the reported gains risk circularity; an explicit statement of the data split and selection protocol is required to substantiate the claim that class-aware modeling drives the improvement.

    Authors: We appreciate this observation. The class-specific diagonal entries were obtained by optimizing the covariance parameters via cross-validation on a held-out subset of the nuScenes training data, kept separate from the official validation and test splits used for all reported results. This protocol was designed to prevent any data leakage or circularity. We will revise the abstract and §3 to explicitly document the optimization procedure, the validation subset used for tuning, and confirmation that evaluation used only the standard nuScenes splits. revision: yes

  2. Referee: [§4.2, Table 2] §4.2 and Table 2: The ablation isolating the contribution of the object-frame alignment versus the per-class values themselves is missing. Without this decomposition it is unclear whether the reported reductions in ID switches are attributable to the proposed framework or simply to the introduction of additional free parameters.

    Authors: We agree that the requested ablation is necessary to isolate the sources of improvement. We will add a new ablation experiment in §4.2 that compares three configurations: (1) global shared covariances, (2) per-class diagonal covariances without object-frame rotation, and (3) the full CANMOT model with both per-class values and object-frame alignment. The results will be incorporated into an expanded Table 2 to quantify the separate contributions to MOTA and ID-switch reductions. revision: yes

  3. Referee: [§4.3] §4.3: The paper notes that CANMOT still exhibits “substantial inconsistency” on ANEES and χ² tests. Because the central claim links the observed metric gains to improved noise modeling, a quantitative analysis of how much of the MOTA/ID-switch improvement survives when the filter is forced to satisfy the consistency tests would strengthen the attribution.

    Authors: We acknowledge the value of strengthening the causal link. However, enforcing consistency constraints would require post-hoc adjustment of the learned covariances, which would no longer reflect the proposed class-aware modeling and could confound attribution to the framework itself. We will revise §4.3 to include additional discussion and any available quantitative correlations between the partial calibration improvements and the observed tracking gains, while retaining the honest reporting of remaining inconsistencies. This provides a clearer attribution without altering the core experimental setup. revision: partial

Circularity Check

0 steps flagged

No significant circularity; modeling choices evaluated on independent benchmark metrics

full rationale

The paper introduces class-specific diagonal covariance matrices as explicit design choices for KF-based MOT rather than deriving them from first principles or fitting them in a way that tautologically forces the reported tracking metrics. Evaluation relies on standard nuScenes benchmark scores (MOTA, identity switches, ANEES) that are downstream of the modeling decision and not equivalent to the covariance selection by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are described in the text. The central claim remains an empirical comparison on held-out tracking data, with the paper itself noting remaining calibration inconsistencies.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that noise statistics are class-stationary and diagonal (or diagonal after object-frame rotation). The per-class covariance entries themselves are free parameters whose values must be chosen or learned; no independent evidence for their correctness outside the nuScenes evaluation is supplied.

free parameters (2)
  • class-specific process noise diagonal entries
    One set of diagonal values per object class; chosen or fitted to data.
  • class-specific measurement noise diagonal entries
    One set of diagonal values per object class; chosen or fitted to data.
axioms (2)
  • domain assumption Kalman filter linear-Gaussian assumptions hold for each class separately
    Standard KF model is retained; class-specific covariances are the only change.
  • domain assumption Object classes are known and stable during tracking
    Class label is used to select the covariance matrix at each step.

pith-pipeline@v0.9.1-grok · 5761 in / 1517 out tokens · 17007 ms · 2026-06-28T09:56:31.215159+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 9 canonical work pages

  1. [1]

    Nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar et al., “Nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, 2020, pp. 11 621–11 631

  2. [2]

    Learnable online graph represen- tations for 3d multi-object tracking,

    J.-N. Zaech et al., “Learnable online graph represen- tations for 3d multi-object tracking,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5103–5110, 2022.DOI:10.1109/LRA.2022.3145952

  3. [3]

    Development and analysis of digging and soil removing mechanisms for mole- bot: Bio-inspired mole-like drilling robot,

    X. Weng et al., “3d multi-object tracking: A baseline and new evaluation metrics,” in2020 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10 359–10 366.DOI:10 . 1109 / IROS45743.2020.9341164

  4. [4]

    Chiu et al.,Probabilistic 3d multi-object tracking for autonomous driving, 2020

    H.-k. Chiu et al.,Probabilistic 3d multi-object tracking for autonomous driving, 2020. arXiv:2001.05673 [cs.CV]. [Online]. Available:https://arxiv. org/abs/2001.05673

  5. [5]

    Tuning multi object tracking systems using bayesian optimization,

    T. Fleck and J. M. Zoellner, “Tuning multi object tracking systems using bayesian optimization,” in 2021 IEEE 24th International Conference on Infor- mation Fusion (FUSION), 2021, pp. 1–8.DOI:10. 23919/FUSION49465.2021.9626895

  6. [6]

    Weak in the NEES?: Auto-tuning kalman filters with bayesian optimization,

    Z. Chen et al., “Weak in the NEES?: Auto-tuning kalman filters with bayesian optimization,” in2018 21st International Conference on Information Fusion (FUSION), IEEE, Jul. 2018.DOI:10 . 23919 / icif.2018.8454982

  7. [7]

    RH20T: A comprehensive robotic dataset for learning diverse skills in one-shot

    H.-K. Chiu et al., “Probabilistic 3d multi-object co- operative tracking for autonomous driving via dif- ferentiable multi-sensor kalman filter,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 18 458–18 464.DOI:10 . 1109 / ICRA57147.2024.10610487

  8. [8]

    A multi-sensor fusion system for moving object detection and tracking in urban driv- ing environments,

    H. Cho et al., “A multi-sensor fusion system for moving object detection and tracking in urban driv- ing environments,” in2014 IEEE International Con- ference on Robotics and Automation (ICRA), 2014, pp. 1836–1843.DOI:10 . 1109 / ICRA . 2014 . 6907100

  9. [9]

    Poly-mot: A polyhedral framework for 3d multi-object tracking,

    X. Li et al., “Poly-mot: A polyhedral framework for 3d multi-object tracking,” in2023 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), 2023, pp. 9391–9398.DOI:10 . 1109 / IROS55552.2023.10341778

  10. [10]

    Liu et al.,Imm-mot: A novel 3d multi-object track- ing framework with interacting multiple model filter,

    X. Liu et al.,Imm-mot: A novel 3d multi-object track- ing framework with interacting multiple model filter,

  11. [11]

    09672 [cs.CV]

    arXiv:2502 . 09672 [cs.CV]. [Online]. Available:https : / / arxiv . org / abs / 2502 . 09672

  12. [12]

    Zak-OTFS for identification of linear time-varying s ystems,

    S. Wei, M. Liang, and F. Meyer, “Bayesian mul- tiobject tracking with neural-enhanced motion and measurement models,”IEEE Transactions on Signal Processing, pp. 1–16, 2026.DOI:10.1109/TSP. 2026.3654026

  13. [13]

    Bimanual robot-assisted dressing: A spherical coordinate-based strategy for tight-fitting garments

    X. Wang et al., “Mctrack: A unified 3d multi-object tracking framework for autonomous driving,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 4551–4558. DOI:10.1109/IROS60139.2025.11245874

  14. [14]

    A new approach to linear filtering and prediction problems,

    R. E. Kalman, “A new approach to linear filtering and prediction problems,” 1960

  15. [15]

    Scalability in perception for autonomous driving: Waymo open dataset,

    P. Sun et al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

  16. [16]

    Center-based 3d object detection and tracking,

    T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11 784–11 793

  17. [17]

    Ef- ficient global optimization of expensive black-box functions,

    D. R. Jones, M. Schonlau, and W. J. Welch, “Ef- ficient global optimization of expensive black-box functions,”Journal of Global optimization, vol. 13, no. 4, pp. 455–492, 1998