VOCA: Visual Odometry with Codec Awareness
Pith reviewed 2026-07-02 19:21 UTC · model grok-4.3
The pith
VOCA improves causal stereo visual odometry on compressed video by incorporating codec information to reduce artifacts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VOCA is a causal stereo visual-odometry method that exploits codec information to improve tracking performance. It achieves state-of-the-art performance on causal VO for relative trajectory error, efficiency, and absolute trajectory error on compressed streams.
What carries the argument
The VOCA pipeline, which feeds codec-derived information into a causal stereo visual-odometry estimator to mitigate compression artifacts during pose tracking.
If this is right
- Real-world spatial world models that receive compressed video can maintain higher tracking accuracy without increasing bandwidth or storage.
- Perception modules inside planning systems can operate directly on hardware-decoded streams rather than requiring raw frames.
- Efficiency gains allow longer operation on resource-constrained platforms that already use video codecs.
Where Pith is reading between the lines
- The same codec signals could be tested on other perception tasks such as depth estimation or object tracking that currently assume uncompressed input.
- Extending the approach beyond stereo to monocular or multi-camera setups would test whether codec awareness generalizes when baseline geometry changes.
- Integration with learned feature detectors might further reduce reliance on hand-crafted handling of compression noise.
Load-bearing premise
That codec information can be extracted and leveraged in a causal stereo setup to measurably mitigate compression artifacts and produce superior trajectory estimates compared to existing methods on standard benchmarks.
What would settle it
Running VOCA and prior causal stereo methods on the same compressed benchmark streams while withholding all codec metadata and measuring whether the performance gap disappears.
Figures
read the original abstract
Camera pose estimation from image streams is a critical component of spatial world models that integrate perception into planning and decision-making. Nearly all Visual Odometry (VO) and Simultaneous Localization and Mapping (V-SLAM) systems have focused on datasets containing raw, uncompressed videos. Many working systems instead use ubiquitous hardware units to efficiently compress and decode video streams, saving orders of magnitude in storage and bandwidth. However, this lossy compression introduces visual artifacts that hinder the performance of traditional tracking systems. We present VOCA, a causal stereo visual-odometry method that exploits codec information to improve tracking performance. We achieve state-of-the-art performance on causal VO for relative trajectory error, efficiency, and absolute trajectory error on compressed streams. This work highlights the potential of leveraging widely available video codec information for vision tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents VOCA, a causal stereo visual-odometry method that exploits codec information to improve tracking performance. It claims state-of-the-art results for relative trajectory error, efficiency, and absolute trajectory error on compressed streams, highlighting the use of widely available video codec side information for vision tasks.
Significance. If the results hold, the work addresses a practical gap by showing how codec side information can mitigate compression artifacts in causal stereo VO, which could improve robustness and efficiency in real-world systems that rely on compressed video rather than raw streams.
major comments (1)
- [Abstract] Abstract: the central claim of achieving SOTA performance on RTE, efficiency, and ATE for causal VO on compressed streams is unsupported, as the text provides no experimental setup, datasets, baselines, quantitative results, or validation details.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the practical relevance of exploiting codec side information in causal stereo visual odometry. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of achieving SOTA performance on RTE, efficiency, and ATE for causal VO on compressed streams is unsupported, as the text provides no experimental setup, datasets, baselines, quantitative results, or validation details.
Authors: Abstracts are concise summaries and do not contain experimental details by design. The full manuscript provides the requested information: Section 4 describes the experimental setup and datasets (including compressed streams from standard benchmarks), Section 5 details the baselines and quantitative comparisons, and Tables 1-3 report the RTE, efficiency, and ATE results demonstrating state-of-the-art performance for causal VO. These sections validate the abstract claims with specific metrics and ablation studies. revision: no
Circularity Check
No significant circularity
full rationale
The paper describes an empirical method for causal stereo visual odometry that incorporates codec side information to mitigate compression artifacts. No derivation chain, equations, fitted parameters, uniqueness theorems, or ansatzes are presented in the provided text. Claims reduce to experimental results on standard benchmarks (RTE, ATE, efficiency), which are externally falsifiable and not equivalent to inputs by construction. No self-citation load-bearing steps or renamings of known results appear. This is the expected outcome for a systems/implementation paper without mathematical derivations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: 2024 33rd International Conference on Computer Communications and Networks (ICCCN)
Arunruangsirilert, K., Katto, J.: Evaluation of hardware-based video encoders on modern gpus for uhd live-streaming. In: 2024 33rd International Conference on Computer Communications and Networks (ICCCN). pp. 1–9 (2024).https:// doi.org/10.1109/ICCCN61486.2024.10637525
-
[2]
Bahl, S., Mendonca, R., Chen, L., Jain, U., Pathak, D.: Affordances from Human Videos as a Versatile Representation for Robotics. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 01–13. IEEE, Vancou- ver, BC, Canada (Jun 2023).https://doi.org/10.1109/CVPR52729.2023.01324
-
[3]
com- press highlights, lift midtones
Banerjee, P., Shkodrani, S., Moulon, P., Hampali, S., Han, S., Zhang, F., Zhang, L., Fountain, J., Miller, E., Basol, S., Newcombe, R., Wang, R., Engel, J.J., Hodan, T.: HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7061–7071 (Jun 2025).https://d...
-
[4]
In: Euro- pean conference on computer vision
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Euro- pean conference on computer vision. pp. 404–417. Springer (2006)
2006
-
[5]
Bross, B., Wang, Y.K., Ye, Y., Liu, S., Chen, J., Sullivan, G.J., Ohm, J.R.: Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology31(10), 3736–3764 (2021).https://doi.org/10.1109/TCSVT.2021.3101953
-
[6]
The International Journal of Robotics Research35(10), 1157–1163 (2016)
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M.W., Siegwart, R.: The euroc micro aerial vehicle datasets. The International Journal of Robotics Research35(10), 1157–1163 (2016)
2016
-
[7]
IEEE transactions on robotics37(6), 1874–1890 (2021)
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE transactions on robotics37(6), 1874–1890 (2021)
2021
-
[8]
Carlone, L., Kim, A., Barfoot, T., Cremers, D., Dellaert, F.: Slam handbook: From localization and mapping to spatial intelligence (2025)
2025
-
[9]
In: Proceedings of the Computer Vision and Pattern Recognition Conference (2025)
Chen, H., Sun, B., Zhang, A., Pollefeys, M., Leutenegger, S.: VidBot: Learning generalizable 3d actions from in-the-wild 2d human videos for zero-shot robotic manipulation. In: Proceedings of the Computer Vision and Pattern Recognition Conference (2025)
2025
-
[10]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Chen, W., Chen, L., Wang, R., Pollefeys, M.: Leap-vo: Long-term effective any point tracking for visual odometry. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19844–19853 (2024)
2024
-
[11]
In: 2026 International Conference on 3D Vision (3DV)
Chi, Y., Sommer, L., Dünkel, O., Muhle, D., Cremers, D., Theobalt, C., Ko- rtylewski, A.: C3po: Canonicalization of 3d pose from partial views with gener- alizable correspondence features. In: 2026 International Conference on 3D Vision (3DV). pp. 587–597. IEEE (2026)
2026
-
[12]
Chng, C.K., Parra, A., Chin, T.J., Latif, Y.: Monocular rotational odometry with incrementalrotationaveragingandloopclosure.In:2020DigitalImageComputing: Techniques and Applications (DICTA). pp. 1–8. IEEE (2020)
2020
-
[13]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Cin, A.P.D., Dikov, G., Ju, J., Ghafoorian, M.: Anymap: Learning a general cam- era model for structure-from-motion with unknown distortion in dynamic scenes. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16674–16684 (2025)
2025
-
[14]
Deng, K., Ti, Z., Xu, J., Yang, J., Xie, J.: Vggt-long: Chunk it, loop it, align it–pushing vggt’s limits on kilometer-scale long rgb sequences. arXiv preprint arXiv:2507.16443 (2025) VOCA: Visual Odometry with Codec Awareness 17
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
In: Proceedings of the IEEE conference on com- puter vision and pattern recognition workshops
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE conference on com- puter vision and pattern recognition workshops. pp. 224–236 (2018)
2018
-
[16]
IEEE transactions on pattern analysis and machine intelligence40(3), 611–625 (2017)
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence40(3), 611–625 (2017)
2017
-
[17]
In: European conference on computer vision
Engel, J., Schöps, T., Cremers, D.: Lsd-slam: Large-scale direct monocular slam. In: European conference on computer vision. pp. 834–849. Springer (2014)
2014
-
[18]
In: 2020 IEEE International Conference on Robotics and Automation (ICRA)
Geneva, P., Eckenhoff, K., Lee, W., Yang, Y., Huang, G.: Openvins: A research platform for visual-inertial estimation. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). pp. 4666–4672. IEEE (2020)
2020
-
[19]
Advances in Neural Information Processing Systems38, 4989–5014 (2026)
Gross, M., Fahmy, A., Niwattananan, D., Muhle, D., Song, R., Cremers, D., Meeß, H.: Ipformer: Visual 3d panoptic scene completion with context-adaptive instance proposals. Advances in Neural Information Processing Systems38, 4989–5014 (2026)
2026
-
[20]
Proceedings of the IEEE109(9), 1435–1462 (2021)
Han, J., Li, B., Mukherjee, D., Chiang, C.H., Grange, A., Chen, C., Su, H., Parker, S., Deng, S., Joshi, U., Chen, Y., Wang, Y., Wilkins, P., Xu, Y., Bankoski, J.: A technical overview of av1. Proceedings of the IEEE109(9), 1435–1462 (2021). https://doi.org/10.1109/JPROC.2021.3058584
-
[21]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Han, K., Muhle, D., Wimbauer, F., Cremers, D.: Boosting self-supervision for single-view scene completion via knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9837– 9847 (2024)
2024
-
[22]
Cam- bridge University Press, Cambridge, 2 edn
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cam- bridge University Press, Cambridge, 2 edn. (2004).https://doi.org/10.1017/ CBO9780511811685
2004
-
[23]
In: 2024 International Conference on 3D Vision (3DV)
Hayler, A., Wimbauer, F., Muhle, D., Rupprecht, C., Cremers, D.: S4c: Self- supervised semantic scene completion with neural fields. In: 2024 International Conference on 3D Vision (3DV). pp. 409–420. IEEE (2024)
2024
-
[24]
In: European Wireless 2023; 28th European Wireless Conference
Hofer, J., et al.: H.264 Compress-Then-Analyze Transmission in Edge-Assisted Visual SLAM. In: European Wireless 2023; 28th European Wireless Conference. pp. 130–135 (2023)
2023
-
[25]
Hsiao, Y.M., Lee, J.F., Chen, J.S., Chu, Y.S.: Review: H.264 video transmis- sions over wireless networks: Challenges and solutions. Comput. Commun.34(14), 1661–1672 (Sep 2011).https://doi.org/10.1016/j.comcom.2011.03.016
-
[26]
International Telecommunication Union: ITU-T Recommendation H.262: Informa- tion technology – Generic coding of moving pictures and associated audio infor- mation: Video.https://www.itu.int/rec/T- REC- H.262(Jan 2021), accessed: 2026-02-08
2021
-
[27]
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp
Ji, Y., Tan, H., Shi, J., Hao, X., Zhang, Y., Zhang, H., Wang, P., Zhao, M., Mu, Y., An,P.,Xue,X.,Su,Q.,Lyu,H.,Zheng,X.,Liu,J.,Wang,Z.,Zhang,S.:Robobrain: A unified brain model for robotic manipulation from abstract to concrete. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1724–1734 (2025)
2025
-
[28]
In: 2014 IEEE Conference on Computer Vision and Pattern Recognition
Kantorov, V., Laptev, I.: Efficient feature extraction, encoding, and classification for action recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2593–2600 (2014).https://doi.org/10.1109/CVPR.2014.332
-
[29]
Keetha, N., Müller, N., Schönberger, J., Porzi, L., Zhang, Y., Fischer, T., Knapitsch, A., Zauss, D., Weber, E., Antunes, N., et al.: Mapanything: Universal feed-forward metric 3d reconstruction; map-anything. github. io. In: 2026 Interna- tional Conference on 3D Vision (3DV). pp. 499–509. IEEE (2026) 18 N. Hilscher et al
2026
-
[30]
In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality
Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality. pp. 225–234. IEEE (2007)
2007
-
[31]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Lee, S.H., Civera, J.: Rotation-only bundle adjustment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 424– 433 (2021)
2021
-
[32]
arXiv preprint arXiv:2202.09199 (2022)
Leutenegger, S.: Okvis2: Realtime scalable visual-inertial slam with loop closure. arXiv preprint arXiv:2202.09199 (2022)
-
[33]
In: 2011 International conference on computer vision
Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: Binary robust invariant scalable keypoints. In: 2011 International conference on computer vision. pp. 2548–2555. Ieee (2011)
2011
-
[34]
Depth Anything 3: Recovering the Visual Space from Any Views
Lin, H., Chen, S., Liew, J., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. arXiv preprint arXiv:2511.10647 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
IEEE Transactions on Circuits and Systems for Video Technology30(11), 3898–3910 (2020)
Lin, L., Yu, S., Zhou, L., Chen, W., Zhao, T., Wang, Z.: Pea265: Perceptual assess- ment of video compression artifacts. IEEE Transactions on Circuits and Systems for Video Technology30(11), 3898–3910 (2020)
2020
-
[36]
Liou, M.: Overview of the p×64 kbit/s video coding standard. Commun. ACM 34(4), 59–63 (Apr 1991).https://doi.org/10.1145/103085.103091
-
[37]
Interna- tional journal of computer vision60(2), 91–110 (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Interna- tional journal of computer vision60(2), 91–110 (2004)
2004
-
[38]
In: Hayes, P.J
Lucas, B.D., Kanade, T.: An iterative image registration technique with an appli- cation to stereo vision. In: Hayes, P.J. (ed.) Proceedings of the 7th International Joint Conference on Artificial Intelligence, IJCAI ’81, Vancouver, BC, Canada, August 24-28, 1981. pp. 674–679. William Kaufmann (1981)
1981
-
[39]
In: 2025 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS)
de Mayo, M., Cremers, D., Pire, T.: The monado slam dataset for egocentric visual-inertial tracking. In: 2025 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS). pp. 13111–13118. IEEE (2025)
2025
-
[40]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Muhle, D., Koestler, L., Demmel, N., Bernard, F., Cremers, D.: The probabilistic normal epipolar constraint for frame-to-frame rotation optimization under uncer- tain feature positions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1819–1828 (2022)
2022
-
[41]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Muhle, D., Koestler, L., Jatavallabhula, K.M., Cremers, D.: Learning correspon- dence uncertainty via differentiable nonlinear least squares. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13102– 13112 (2023)
2023
-
[42]
In: 2013 Picture Coding Symposium (PCS)
Mukherjee, D., Bankoski, J., Grange, A., Han, J., Koleszar, J., Wilkins, P., Xu, Y., Bultje, R.: The latest open-source video codec vp9 - an overview and preliminary results. In: 2013 Picture Coding Symposium (PCS). pp. 390–393 (2013).https: //doi.org/10.1109/PCS.2013.6737765
-
[43]
Transactions on Robotics (T-RO)31(5), 1147–1163 (2015).https://doi.org/10.1109/TRO.2015.2463671
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics31(5), 1147–1163 (2015). https://doi.org/10.1109/TRO.2015.2463671
-
[44]
IEEE Transactions on Robotics33(5), 1255–1262 (2017).https://doi.org/10.1109/TRO.2017.2705103
Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monoc- ular, stereo, and rgb-d cameras. IEEE Transactions on Robotics33(5), 1255–1262 (2017).https://doi.org/10.1109/TRO.2017.2705103
-
[45]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Murai, R., Dexheimer, E., Davison, A.J.: Mast3r-slam: Real-time dense slam with 3d reconstruction priors. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16695–16705 (2025) VOCA: Visual Odometry with Codec Awareness 19
2025
-
[46]
ACM Computing Surveys57(12), 1–47 (Jul 2025).https://doi.org/10.1145/3742472,http://dx.doi.org/10
Peroni, L., Gorinsky, S.: An end-to-end pipeline perspective on video streaming in best-effort networks: A survey and tutorial. ACM Computing Surveys57(12), 1–47 (Jul 2025).https://doi.org/10.1145/3742472,http://dx.doi.org/10. 1145/3742472
-
[47]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2025)
Qian, S., Mo, K., Blukis, V., Fouhey, D.F., Fox, D., Goyal, A.: 3D-MVP: 3D Mul- tiview Pretraining for Manipulation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2025)
2025
-
[48]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Reich, C., Hahn, O., Cremers, D., Roth, S., Debnath, B.: A perspective on deep vision performance with standard image and video codecs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5712– 5721 (2024)
2024
-
[49]
In: 2011 International Conference on Computer Vision
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: An efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision. pp. 2564–2571 (2011).https://doi.org/10.1109/ICCV.2011.6126544
-
[50]
In: 2021 International conference on unmanned aircraft systems (ICUAS)
Rückert,D.,Stamminger,M.:Snake-slam:Efficientglobalvisualinertialslamusing decoupled nonlinear optimization. In: 2021 International conference on unmanned aircraft systems (ICUAS). pp. 219–228. IEEE (2021)
2021
-
[51]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Sandström,E.,Zhang,G.,Tateno,K.,Oechsle,M.,Niemeyer,M.,Zhang,Y.,Patel, M., Van Gool, L., Oswald, M., Tombari, F.: Splat-slam: Globally optimized rgb- only slam with 3d gaussians. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 1680–1691 (2025)
2025
-
[52]
Proceedings of the IEEE83(6), 907–924 (1995).https://doi
Schafer, R., Sikora, T.: Digital video coding standards and their role in video communications. Proceedings of the IEEE83(6), 907–924 (1995).https://doi. org/10.1109/5.387092
-
[53]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4104–4113 (2016)
2016
-
[54]
In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Schubert, D., Goll, T., Demmel, N., Usenko, V., Stückler, J., Cremers, D.: The tum vi benchmark for evaluating visual-inertial odometry. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1680–
2018
-
[55]
IEEE Communications Sur- veys & Tutorials17(1), 469–492 (2015).https://doi.org/10.1109/COMST.2014
Seufert, M., Egger, S., Slanina, M., Zinner, T., Hoßfeld, T., Tran-Gia, P.: A survey on quality of experience of http adaptive streaming. IEEE Communications Sur- veys & Tutorials17(1), 469–492 (2015).https://doi.org/10.1109/COMST.2014. 2360940
-
[56]
Shi, J., Tomasi, C.: Good features to track. In: Conference on Computer Vision and Pattern Recognition, CVPR 1994, 21-23 June, 1994, Seattle, WA, USA. pp. 593–600. IEEE (1994).https://doi.org/10.1109/CVPR.1994.323794,https: //doi.org/10.1109/CVPR.1994.323794
-
[57]
In: 2025 International Conference on 3D Vision (3DV)
Smith, C., Charatan, D., Tewari, A., Sitzmann, V.: Flowmap: High-quality camera poses, intrinsics, and depth via gradient descent. In: 2025 International Conference on 3D Vision (3DV). pp. 389–400. IEEE (2025)
2025
-
[58]
In: Springer handbook of robotics, pp
Stachniss, C., Leonard, J.J., Thrun, S.: Simultaneous localization and mapping. In: Springer handbook of robotics, pp. 1153–1176. Springer (2016)
2016
-
[59]
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on Circuits and Systems for Video Technology22(12), 1649–1668 (2012).https://doi.org/10.1109/TCSVT. 2012.2221191
-
[60]
Advances in neural information processing systems34, 16558–16569 (2021) 20 N
Teed, Z., Deng, J.: Droid-slam: Deep visual slam for monocular, stereo, and rgb- d cameras. Advances in neural information processing systems34, 16558–16569 (2021) 20 N. Hilscher et al
2021
-
[61]
Tomasi, C., Kanade, T.: Detection and tracking of point features. Tech. rep., In- ternational Journal of Computer Vision (1991)
1991
-
[62]
In: 2023 Seventh IEEE International Conference on Robotic Computing (IRC)
Turner, R.N., Banerjee, N.K., Banerjee, S.: Mov-slam: Using motion vectors for real-time single-cpu visual slam. In: 2023 Seventh IEEE International Conference on Robotic Computing (IRC). pp. 51–58. IEEE (2023)
2023
-
[63]
Ungureanu, D., Bogo, F., Galliani, S., Sama, P., Duan, X., Meekhof, C., Stühmer, J., Cashman, T.J., Tekin, B., Schönberger, J.L., Olszta, P., Pollefeys, M.: HoloLens 2 Research Mode as a Tool for Computer Vision Research (Aug 2020).https: //doi.org/10.48550/arXiv.2008.11239,http://arxiv.org/abs/2008.11239
-
[64]
Usenko, V., Demmel, N., Schubert, D., Stückler, J., Cremers, D.: Visual-inertial mapping with non-linear factor recovery. IEEE Robotics Autom. Lett.5(2), 422– 429 (2020).https://doi.org/10.1109/LRA.2019.2961227,https://doi.org/ 10.1109/LRA.2019.2961227
-
[65]
IEEE Robotics and Automation Letters7(2), 1408–1415 (2022)
Von Stumberg, L., Cremers, D.: Dm-vio: Delayed marginalization visual-inertial odometry. IEEE Robotics and Automation Letters7(2), 1408–1415 (2022)
2022
-
[66]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5294–5306 (2025)
2025
-
[67]
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Wang, Y., Zhou, J., Zhu, H., Chang, W., Zhou, Y., Li, Z., Chen, J., Pang, J., Shen, C., He, T.:π 3: Permutation-equivariant visual geometry learning. arXiv preprint arXiv:2507.13347 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Wiegand, T., Sullivan, G., Bjontegaard, G., Luthra, A.: Overview of the h.264/avc video coding standard. IEEE Transactions on Circuits and Systems for Video Tech- nology13(7), 560–576 (2003).https://doi.org/10.1109/TCSVT.2003.815165
-
[69]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Wimbauer, F., Chen, W., Muhle, D., Rupprecht, C., Cremers, D.: Anycam: Learn- ing to recover camera poses and intrinsics from casual videos. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16717–16727 (2025)
2025
-
[70]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wimbauer, F., Yang, N., Rupprecht, C., Cremers, D.: Behind the scenes: Density fields for single view reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9076–9086 (2023)
2023
-
[71]
In: Proceedings of the 31st ACM International Conference on Multimedia
Zhou, S., Jiang, X., Tan, W., He, R., Yan, B.: Mvflow: Deep optical flow estimation of compressed videos with motion vector prior. In: Proceedings of the 31st ACM International Conference on Multimedia. pp. 1964–1974 (2023)
1964
-
[72]
In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhu, Z., Akkaya, I.B., Waeijen, L., Bondarev, E., Pourtaherian, A., Moreira, O.: MEET: Towards Memory-Efficient Temporal Sparse Deep Neural Networks. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 29309–29320 (Jun 2025).https://doi.org/10.1109/CVPR52734. 2025.02729,https://ieeexplore.ieee.org/document/11092745
-
[73]
In: 2025 IEEE 4th International Conference on Intelligent Reality (ICIR)
Zouein, J., Javidnia, H., Pitié, F., Kokaram, A.: Leveraging AV1 Motion Vectors for Fast and Dense Feature Matching. In: 2025 IEEE 4th International Conference on Intelligent Reality (ICIR). pp. 1–4.https://doi.org/10.1109/ICIR68135. 2025.11361611
-
[74]
Zouein, J., Vibhoothi, V., Kokaram, A.: AV1 Motion Vector Fidelity and Applica- tionforEfficientOpticalFlow.In:2025PictureCodingSymposium(PCS).pp.1–5. https://doi.org/10.1109/PCS65673.2025.11417638 VOCA: Visual Odometry with Codec Awareness 1 A Metrics In this paper, we utilize two commonly used metrics to evaluate the performance of Visual Odometry algor...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.