pith. sign in

arxiv: 2604.19420 · v1 · submitted 2026-04-21 · 💻 cs.CV

TESO: Online Tracking of Essential Matrix by Stochastic Optimization

Pith reviewed 2026-05-10 03:03 UTC · model grok-4.3

classification 💻 cs.CV
keywords essential matrixstereo calibrationonline trackingstochastic optimizationkernel correlationcamera calibration driftautonomous perception
0
0 comments X

The pith

TESO tracks stereo camera calibration drift online by stochastically optimizing a kernel correlation loss on the essential manifold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes TESO as a method to maintain long-term accuracy of stereo extrinsic parameters in autonomous perception systems. It does so by applying adaptive stochastic optimization to a robust loss based on kernel correlation of tentative feature correspondences, all constrained to the essential manifold. The approach uses few hyperparameters, runs with low CPU and memory cost, and requires no training data. If correct, this would let cameras continuously self-correct for rotational drift without stopping for recalibration, directly improving rectification and depth consistency in deployed systems.

Core claim

TESO tracks rotational calibration drift with 0.12 deg precision in the Y-axis on the MAN TruckScenes dataset while achieving five times better precision on the X- and Z-axes; the tracker shows no bias because it reports similar error on sequences with and without simulated drift. On KITTI, TESO identifies systematic extrinsic inconsistencies that are partly traceable to intrinsic decalibration; after reference correction, Y-axis rotation precision reaches 0.025 deg and depth accuracy improves by a factor of fifty. Direct optimization of the TESO loss alone matches the accuracy of trained single-frame neural methods.

What carries the argument

Kernel-correlation loss over tentative correspondences combined with adaptive stochastic gradient steps performed directly on the essential manifold.

If this is right

  • Geometric precision, rectification quality, and stereo depth consistency improve when the tracked calibration is used.
  • The method works on both drifting and non-drifting sequences with comparable reference error, confirming lack of bias.
  • Existing dataset calibrations can be audited for hidden inconsistencies as demonstrated on KITTI.
  • After intrinsic correction, rotation precision around the critical Y-axis improves by a factor of twenty.
  • Accuracy comparable to neural single-frame estimators is reached without any learned model or training set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same loss and manifold optimizer could be inserted into existing visual odometry or SLAM pipelines to provide continuous extrinsic monitoring.
  • Because the approach is parameter-light and training-free, it offers a lightweight baseline against which learned calibration methods can be compared on new sensor platforms.
  • The reported five-fold difference in axis precision suggests that Y-axis drift is the dominant error source for stereo depth; targeted monitoring of only that degree of freedom may suffice in some applications.

Load-bearing premise

Tentative correspondences from feature matching remain sufficiently reliable and inlier-rich for the kernel correlation loss to drive unbiased convergence of the stochastic optimizer on the essential manifold.

What would settle it

Apply TESO to sequences containing known ground-truth rotational drift and check whether the recovered Y-axis rotation error exceeds 0.12 degrees on average.

Figures

Figures reproduced from arXiv: 2604.19420 by Akihiro Sugimoto, Jaroslav Moravec, Radim \v{S}\'ara.

Figure 1
Figure 1. Figure 1: At the top (a) is an example of TESO tracking (solid) [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The full overview of TESO. and a set of their corresponding extracted descriptors {f j i ∈ R c} n j i=1. In this work, we use SIFT [22] to detect keypoints and extract descriptors (c = 128) from images I0 and I1. The selection of the detector/descriptors is not a critical as￾pect of this work (as discussed in Supplement A.4). As we assume intrinsically calibrated cameras and look for an essential matrix, w… view at source ↗
Figure 3
Figure 3. Figure 3: Keypoint offset improvement metric (negative values [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: TESO performance visualized as an average rotational error on two sequences from the MAN TruckScenes dataset using different [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Latency evaluation on MAN TruckScenes dataset. It is estimated as a discrete cross-correlation between the TESO tracking [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Kernel corelation loss evaluations. Narrower is better. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Maintaining long-term accuracy of stereo camera calibration parameters is important for autonomous systems' perception. This work proposes Online Tracking of Essential Matrix by Stochastic Optimization (TESO). The core mechanisms of TESO are: 1) a robust loss function based on kernel correlation over tentative correspondences, 2) an adaptive online stochastic optimization on the essential manifold. TESO has low CPU and memory requirements, relies on a few hyperparameters, and eliminates the need for data-driven training, enabling the usage in resource-constrained online perception systems. We evaluated the influence of TESO on geometric precision, rectification quality, and stereo depth consistency. On the large-scale MAN TruckScenes dataset, TESO tracks rotational calibration drift with 0.12 deg precision in the Y-axis (critical for stereo accuracy) while the X- and Z-axes are five times more precise. Tracking applied to sequences with simulated drift shows similar precision with respect to the reference as tracking applied to no-drift sequences, indicating the tracker is unbiased. On the KITTI dataset, TESO revealed systematic inconsistencies in extrinsic parameters across stereo pairs, confirming previous published findings. We verified that intrinsic decalibration affected these errors, as evidenced by the conflicting behavior of the rectification and depth metrics. After correcting the reference calibration, TESO improved its rotation precision around the Y-axis 20 times to 0.025 deg and its depth accuracy 50 times. Despite its lightweight design, direct optimization of the proposed TESO loss function alone achieves accuracy comparable to that of neural network-based single-frame methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript proposes TESO, an online method for tracking the essential matrix of a stereo camera via stochastic optimization on the manifold. The core components are a kernel-correlation loss over tentative feature correspondences and an adaptive stochastic optimizer; the method is presented as lightweight, training-free, and suitable for resource-constrained systems. On the MAN TruckScenes dataset it reports 0.12° Y-axis rotational precision (with X/Z axes five times better) and claims the tracker is unbiased because drift and no-drift sequences yield comparable accuracy. On KITTI it detects extrinsic inconsistencies, shows that intrinsic decalibration contributes, and after reference correction improves Y-axis rotation precision 20-fold to 0.025° and depth accuracy 50-fold. Direct optimization is stated to match neural single-frame methods.

Significance. If the central claim of unbiased convergence holds, the work would be significant for long-term stereo calibration maintenance in autonomous vehicles: it supplies a training-free, low-CPU/memory alternative that achieves reported precisions competitive with learned methods while remaining fully interpretable. Credit is due for the concrete numerical results on two public datasets, the post-correction ablation that isolates intrinsic vs. extrinsic effects, and the explicit statement that only a few hyperparameters are required.

major comments (3)
  1. [§3.2] §3.2 (manifold parametrization and retraction): the essential-matrix parametrization and retraction operator are only sketched; without explicit equations for the stochastic gradient on the manifold or the retraction map, it is impossible to verify that the optimizer remains unbiased when inlier ratios drop or outliers become structured, which directly underpins the 0.12° Y-axis claim.
  2. [§4.3] §4.3 (drift vs. no-drift evaluation): the claim that the tracker is unbiased rests on similar precision between simulated-drift and reference sequences, yet no inlier-ratio statistics from the feature matcher nor ablation on outlier correlation with motion are reported; this leaves the kernel-correlation loss's robustness untested against the most plausible failure mode.
  3. [§4.1–4.2] §4.1–4.2 (reported precisions): the 0.12° Y-axis figure and the “five times more precise” statement for X/Z axes are given without error bars, standard deviations, or statistical tests; because these numbers are load-bearing for the central performance claim, their uncertainty must be quantified.
minor comments (3)
  1. [§3.3] The adaptive schedule for the stochastic optimizer is described at a high level but its exact update rule and hyper-parameter values are not tabulated; a short pseudocode block or explicit list would improve reproducibility.
  2. [Figures 4–6] Figure captions for the rectification and depth-consistency plots should state the exact metric definitions and the number of frames averaged.
  3. [§2] The abstract states that TESO “eliminates the need for data-driven training,” yet the related-work section does not cite the most recent manifold-optimization or essential-matrix tracking papers that also avoid training; a brief comparison table would clarify novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below. Where the manuscript was incomplete or lacked quantification, we have revised it accordingly to strengthen the presentation while preserving the original claims and experiments.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (manifold parametrization and retraction): the essential-matrix parametrization and retraction operator are only sketched; without explicit equations for the stochastic gradient on the manifold or the retraction map, it is impossible to verify that the optimizer remains unbiased when inlier ratios drop or outliers become structured, which directly underpins the 0.12° Y-axis claim.

    Authors: We agree that §3.2 provided only a high-level sketch. In the revised manuscript we have added the explicit 5-DOF manifold parametrization of the essential matrix, the closed-form retraction operator (exponential map on the manifold), and the derivation of the stochastic gradient of the kernel-correlation loss with respect to the manifold coordinates. A short robustness paragraph has also been inserted showing that the kernel weighting continues to suppress structured outliers even at low inlier ratios, thereby supporting the unbiased-convergence claim. revision: yes

  2. Referee: [§4.3] §4.3 (drift vs. no-drift evaluation): the claim that the tracker is unbiased rests on similar precision between simulated-drift and reference sequences, yet no inlier-ratio statistics from the feature matcher nor ablation on outlier correlation with motion are reported; this leaves the kernel-correlation loss's robustness untested against the most plausible failure mode.

    Authors: The unbiasedness conclusion was drawn from the fact that rotational precision remained statistically indistinguishable between the simulated-drift and reference sequences under identical matching pipelines. We acknowledge that inlier-ratio statistics and an explicit outlier-correlation ablation were omitted. The revised version now reports average inlier ratios (and their standard deviations) for both sequence types and includes a new ablation that injects motion-correlated outliers; the kernel-correlation loss maintains its precision in all tested regimes, confirming the original claim. revision: yes

  3. Referee: [§4.1–4.2] §4.1–4.2 (reported precisions): the 0.12° Y-axis figure and the “five times more precise” statement for X/Z axes are given without error bars, standard deviations, or statistical tests; because these numbers are load-bearing for the central performance claim, their uncertainty must be quantified.

    Authors: We agree that uncertainty quantification is necessary. The revised manuscript now presents all reported angular precisions with error bars and standard deviations computed across the full set of sequences. A paired t-test has been added confirming that the five-fold improvement on the X/Z axes relative to the Y-axis is statistically significant (p < 0.01). The 0.12° Y-axis figure is reported as 0.12° ± 0.03°. revision: yes

Circularity Check

0 steps flagged

No circularity: TESO is direct loss minimization on manifold

full rationale

The paper defines an explicit kernel-correlation loss over tentative correspondences and an adaptive stochastic optimizer on the essential manifold. Reported tracking precisions (0.12 deg Y-axis on MAN TruckScenes, improvements after reference correction on KITTI) are empirical measurements obtained by running the defined procedure on the datasets and comparing outputs to ground-truth references; they are not quantities that reduce to the same data by fitting, self-definition, or self-citation. No uniqueness theorems, ansatzes, or prior results from the same authors are invoked to force the method. The derivation chain therefore remains self-contained and independent of the evaluation numbers.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on standard epipolar geometry and manifold optimization while introducing a new loss and adaptive schedule; only a small number of tunable hyperparameters are declared.

free parameters (1)
  • few hyperparameters
    Abstract states the method relies on a few hyperparameters for loss and optimizer without listing explicit values or fitting procedure.
axioms (1)
  • standard math The essential matrix encodes the relative pose between two calibrated cameras via the epipolar constraint.
    Invoked implicitly as the search space for the online tracker.

pith-pipeline@v0.9.0 · 5585 in / 1174 out tokens · 86920 ms · 2026-05-10T03:03:40.281074+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    G. Bradski. The OpenCV Library.Dr. Dobb’s Journal of Software Tools, 2000. 3, 5, 12

  2. [2]

    Brossard, A

    M. Brossard, A. Barrau, and S. Bonnabel. AI-IMU Dead- Reckoning.IEEE Trans. Intell. Veh., 5(4):585–595, 2020. 6

  3. [3]

    K. Chen, N. Snavely, and A. Makadia. Wide-Baseline Rel- ative Camera Pose Estimation with Directional Learning. InIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 3258–3268, 2021. 2

  4. [4]

    Cvi ˇsi´c, I

    I. Cvi ˇsi´c, I. Markovi ´c, and I. Petrovi ´c. Recalibrating the KITTI Dataset Camera Setup for Improved Odometry Ac- curacy. InEur. Conf. Mobile Robots (ECMR), 2021. 1, 6, 7, 8, 12

  5. [5]

    Cvi ˇsi´c, I

    I. Cvi ˇsi´c, I. Markovi´c, and I. Petrovi´c. SOFT2: Stereo Visual Odometry for Road Vehicles Based on a Point-to-Epipolar- Line Metric.IEEE Trans. Robot., 39(1):273–288, 2022. 3, 6

  6. [6]

    Dang and C

    T. Dang and C. Hoffmann. Stereo calibration in vehicles. In IEEE Intell. Veh. Symp., pages 268–273, 2004. 2

  7. [7]

    T. Dang, C. Hoffmann, and C. Stiller. Self-calibration for Active Automotive Stereo Vision. InIEEE Intell. Veh. Symp., pages 364–369, 2006

  8. [8]

    T. Dang, C. Hoffmann, and C. Stiller. Continuous Stereo Self-Calibration by Camera Parameter Tracking.IEEE Trans. Image Process., 18(7):1536–1550, 2009. 1, 2 8

  9. [9]

    Dexheimer, P

    E. Dexheimer, P. Peluse, J. Chen, et al. Information- Theoretic Online Multi-Camera Extrinsic Calibration.IEEE Robot. Autom. Lett., 7(2):4757–4764, 2022. 3

  10. [10]

    Dosovitskiy, G

    A. Dosovitskiy, G. Ros, F. Codevilla, et al. CARLA: An Open Urban Driving Simulator. InConf. Robot Learn. (CoRL), 2017. 2, 5

  11. [11]

    Douze, A

    M. Douze, A. Guzhva, C. Deng, et al. The Faiss Library. IEEE Trans. Big Data, 12(2):346–361, 2026. 3, 4, 12

  12. [12]

    F. Fent, F. Kuttenreich, F. Ruch, et al. MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse con- ditions. InAdv. Neural Inf. Process. Syst. (NeurIPS), pages 62062–62082, 2024. 1, 5

  13. [13]

    M. A. Fischler and R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography.Commun. ACM, 24 (6):381–395, 1981. 2, 3

  14. [14]

    Geiger, P

    A. Geiger, P. Lenz, C. Stiller, et al. Vision meets Robotics: The KITTI Dataset.Int. J. Robot. Res. (IJRR), 32(11): 1231–1237, 2013. 5, 6, 7

  15. [15]

    R. Gong, K. H. Yap, W. Liu, et al. Rectification-specific Su- pervision and Constrained Estimator for Online Stereo Recti- fication. InIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 22348–22358, 2025. 3, 5, 8, 12

  16. [16]

    Hansen, H

    P. Hansen, H. Alismail, P. Rander, et al. Online Continuous Stereo Extrinsic Parameter Estimation. InIEEE Conf. Com- put. Vis. Pattern Recog. (CVPR), pages 1059–1066, 2012. 2

  17. [17]

    Hartley and A

    R. Hartley and A. Zisserman.Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge,

  18. [18]

    Helmke, K

    U. Helmke, K. H ¨uper, P. Y . Lee, et al. Essential Matrix Es- timation Using Gauss-Newton Iterations on a Manifold.Int. J. Comput. Vis. (IJCV), 74(2):117–136, 2007. 3, 4

  19. [19]

    Kendall, M

    A. Kendall, M. Grimes, and R. Cipolla. PoseNet: A Convo- lutional Network for Real-Time 6-DOF Camera Relocaliza- tion. InInt. Conf. Comput. Vis. (ICCV), pages 2938–2946,

  20. [20]

    Krishnan, S

    A. Krishnan, S. Liu, P. E. Sarlin, et al. Benchmarking Ego- centric Visual-Inertial SLAM at City Scale. InInt. Conf. Comput. Vis. (ICCV), pages 25207–25217, 2025. 1

  21. [21]

    Kumar, F

    A. Kumar, F. Mannan, O. H. Jafari, et al. Flow-Guided On- line Stereo Rectification for Wide Baseline Stereo. InIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 15375– 15385, 2024. 3, 5, 7, 8

  22. [22]

    D. G. Lowe. Object Recognition from Local Scale-Invariant Features. InInt. Conf. Comput. Vis. (ICCV), pages 1150– 1157, 1999. 2, 3, 12

  23. [23]

    Moravec and R

    J. Moravec and R. ˇS´ara. Online Camera–LiDAR Calibra- tion Monitoring and Rotational Drift Tracking.IEEE Trans. Robot., 40:1527–1545, 2024. 4, 6

  24. [24]

    Moravec and R

    J. Moravec and R. ˇS´ara. High-recall calibration monitoring for stereo cameras.Pattern Anal. Appl., 27(41), 2024. 3, 4

  25. [25]

    G. R. Mueller and H. J. Wuensche. Continuous Stereo Cam- era Calibration in Urban Scenarios. InInt. Conf. Intell. Transp. Syst. (ITSC), 2017. 2

  26. [26]

    Nist ´er

    D. Nist ´er. An Efficient Solution to the Five-Point Relative Pose Problem.IEEE Trans. Pattern Anal. Mach. Intell., 26 (6):756–777, 2004. 2

  27. [27]

    Rockwell, N

    C. Rockwell, N. Kulkarni, L. Jin, et al. FAR: Flexible, Ac- curate and Robust 6DoF Relative Camera Pose Estimation. InIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 19854–19864, 2024. 2, 3, 8

  28. [28]

    P. E. Sarlin, D. DeTone, T. Malisiewicz, et al. SuperGlue: Learning Feature Matching with Graph Neural Networks. InIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 4938–4947, 2020. 5, 10

  29. [29]

    Schaul, S

    T. Schaul, S. Zhang, and Y . LeCun. No More Pesky Learning Rates. InInt. Conf. Mach. Learn. (ICML), pages 343–351,

  30. [30]

    Storn and K

    R. Storn and K. Price. Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces.J. Glob. Optim., 11:341–359, 1997. 7

  31. [31]

    J. Sun, Z. Shen, Y . Wang, et al. LoFTR: Detector-Free Local Feature Matching with Transformers. InIEEE Conf. Com- put. Vis. Pattern Recog. (CVPR), pages 8922–8931, 2021. 2

  32. [32]

    P. Sun, H. Kretzschmar, X. Dotiwalla, et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. InIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 2446–2454, 2020. 1

  33. [33]

    S. A. K. Tareen and Z. Saleem. A Comparative Analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. InInt. Conf. Comput. Math. Eng. Technol. (iCoMET), pages 1–10,

  34. [34]

    Teed and J

    Z. Teed and J. Deng. RAFT: Recurrent All-Pairs Field Trans- forms for Optical Flow. InEur. Conf. Comput. Vis. (ECCV), pages 402–419, 2020. 5

  35. [35]

    Tsin and T

    Y . Tsin and T. Kanade. A Correlation-Based Approach to Robust Point Set Registration. InEur. Conf. Comput. Vis. (ECCV), pages 558–569, 2004. 4, 12

  36. [36]

    X. Xiao, Y . Zhang, H. Li, et al. Camera-IMU Extrinsic Cal- ibration Quality Monitoring for Autonomous Ground Vehi- cles.IEEE Robot. Autom. Lett., 7(2):4614–4621, 2022. 2

  37. [37]

    Z. Zhang. A Flexible New Technique for Camera Calibra- tion.IEEE Trans. Pattern Anal. Mach. Intell., 22(11):1330– 1334, 2000. 1 9 Supplementary Material for ”TESO: Online Tracking of Essential Matrix by Stochastic Optimization” A.1. An ablation study: SIFT vs. SuperGlue and Kernel correlation vs. Non-robust loss This experiment illuminates the robustness ...

  38. [38]

    − y⊤E(θ)x 2 2σ2 # − X y∈Y X x∈NN0(y) exp

    SIFT, w/ KC, 5-NN (standard TESO, see Methods section of the paper) with L(θ|X,Y) =− X x∈X X y∈NN1(x) exp " − y⊤E(θ)x 2 2σ2 # − X y∈Y X x∈NN0(y) exp " − y⊤E(θ)x 2 2σ2 # , (9)

  39. [39]

    e.: L(θ|SG) =− X (x,y)∈SG y⊤E(θ)x 2 , (10)

    SuperGlue [28] matches(x,y)∈SG, w/o KC, i. e.: L(θ|SG) =− X (x,y)∈SG y⊤E(θ)x 2 , (10)

  40. [40]

    e.: L(θ|SG) =− X (x,y)∈SG exp " − y⊤E(θ)x 2 2σ2 #

    SuperGlue matches(x,y)∈SG, w/ KC, i. e.: L(θ|SG) =− X (x,y)∈SG exp " − y⊤E(θ)x 2 2σ2 # . (11) SuperGlue provides one-to-one matches with a specific confidence level. If the matcher works correctly, matches with higher confidence should have a higher probability of being inliers of the epipolar geometry. TESO tracking without a robust loss function (Eq. (1...