Recognition: unknown
StreamCacheVGGT: Streaming Visual Geometry Transformers with Robust Scoring and Hybrid Cache Compression
Pith reviewed 2026-05-10 11:12 UTC · model grok-4.3
The pith
StreamCacheVGGT maintains 3D reconstruction quality from video by scoring tokens across layers and merging rather than deleting them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By replacing binary eviction with cross-layer consistency-enhanced scoring and hybrid cache compression that merges tokens via nearest-neighbor assignment on the key-vector manifold, the framework preserves geometric salience across long video streams and achieves higher reconstruction accuracy on five benchmarks while enforcing constant memory use.
What carries the argument
Cross-Layer Consistency-Enhanced Scoring (CLCES) that tracks sustained salience across layers combined with Hybrid Cache Compression (HCC) that performs three-tier triage and nearest-neighbor merging on the key-vector manifold.
If this is right
- Higher reconstruction accuracy and long-term stability on 7-Scenes, NRGBD, ETH3D, Bonn, and KITTI.
- Strict adherence to constant memory and compute budgets without any training.
- Reduced information loss compared with pure eviction that deletes tokens outright.
- Robust scores derived from order-statistical analysis across the transformer hierarchy.
Where Pith is reading between the lines
- The same merging strategy on the key-vector manifold could be tested on other streaming transformer applications such as video object tracking or scene flow estimation.
- Cross-layer consistency tracking may reduce sensitivity to single-layer activation noise in related vision transformers that process sequential data.
- The three-tier triage could be adapted to trade off accuracy against memory in real-time robotics pipelines that must run indefinitely.
Load-bearing premise
Merging tokens by nearest-neighbor similarity in key space will preserve the geometric information needed for accurate 3D reconstruction without adding new distortions.
What would settle it
A controlled test on long video sequences where reconstruction error rises or geometric fidelity drops when hybrid merging is enabled compared with a larger-memory baseline that simply retains all tokens.
read the original abstract
Reconstructing dense 3D geometry from continuous video streams requires stable inference under a constant memory budget. Existing $O(1)$ frameworks primarily rely on a ``pure eviction'' paradigm, which suffers from significant information destruction due to binary token deletion and evaluation noise from localized, single-layer scoring. To address these bottlenecks, we propose StreamCacheVGGT, a training-free framework that reimagines cache management through two synergistic modules: Cross-Layer Consistency-Enhanced Scoring (CLCES) and Hybrid Cache Compression (HCC). CLCES mitigates activation noise by tracking token importance trajectories across the Transformer hierarchy, employing order-statistical analysis to identify sustained geometric salience. Leveraging these robust scores, HCC transcends simple eviction by introducing a three-tier triage strategy that merges moderately important tokens into retained anchors via nearest-neighbor assignment on the key-vector manifold. This approach preserves essential geometric context that would otherwise be lost. Extensive evaluations on five benchmarks (7-Scenes, NRGBD, ETH3D, Bonn, and KITTI) demonstrate that StreamCacheVGGT sets a new state-of-the-art, delivering superior reconstruction accuracy and long-term stability while strictly adhering to constant-cost constraints.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes StreamCacheVGGT, a training-free framework for constant-memory streaming 3D geometry reconstruction from video using Visual Geometry Transformers. It introduces Cross-Layer Consistency-Enhanced Scoring (CLCES) to track token importance across layers via order statistics and Hybrid Cache Compression (HCC) with a three-tier triage that merges moderately important tokens via nearest-neighbor assignment on the key-vector manifold, claiming SOTA accuracy and long-term stability on the 7-Scenes, NRGBD, ETH3D, Bonn, and KITTI benchmarks while avoiding binary eviction.
Significance. If the central claims hold, the work would offer a practical advance for online dense reconstruction under fixed memory budgets by replacing pure eviction with context-preserving compression; the training-free design and explicit focus on cross-layer geometric salience are strengths that could influence follow-on systems in robotics and AR.
major comments (2)
- The SOTA claim on five benchmarks is load-bearing for the contribution, yet the abstract (and by extension the evaluation section) provides no quantitative metrics, tables, error bars, or direct comparisons to prior constant-memory baselines, preventing verification of the asserted gains in reconstruction accuracy and stability.
- HCC section (three-tier triage and nearest-neighbor merging on key-vector manifold): the premise that proximity in key space equals preservation of 3D geometric context for moderately important tokens is untested; no ablation, failure-mode analysis, or comparison to pure-eviction baselines shows that merging does not distort camera poses or point clouds beyond what CLCES already filters, directly undermining the 'superior accuracy' result.
minor comments (1)
- Notation for CLCES order-statistical analysis and HCC manifold distance is introduced without explicit equations or pseudocode, making the heuristics hard to reproduce.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and outline the revisions we will make to strengthen the presentation and evidence.
read point-by-point responses
-
Referee: The SOTA claim on five benchmarks is load-bearing for the contribution, yet the abstract (and by extension the evaluation section) provides no quantitative metrics, tables, error bars, or direct comparisons to prior constant-memory baselines, preventing verification of the asserted gains in reconstruction accuracy and stability.
Authors: We agree that the abstract omits specific quantitative metrics, which limits immediate verifiability of the SOTA claims. The evaluation section does contain tables reporting reconstruction accuracy, stability metrics, and comparisons against prior constant-memory baselines across the five benchmarks (7-Scenes, NRGBD, ETH3D, Bonn, KITTI), including results aggregated over multiple runs. To address the concern directly, we will revise the abstract to include key numerical results and error bars summarizing the gains. This is a presentation clarification rather than an absence of supporting data. revision: yes
-
Referee: HCC section (three-tier triage and nearest-neighbor merging on key-vector manifold): the premise that proximity in key space equals preservation of 3D geometric context for moderately important tokens is untested; no ablation, failure-mode analysis, or comparison to pure-eviction baselines shows that merging does not distort camera poses or point clouds beyond what CLCES already filters, directly undermining the 'superior accuracy' result.
Authors: We acknowledge that the manuscript does not provide dedicated ablations isolating the nearest-neighbor merging step on the key-vector manifold, nor explicit failure-mode analysis of potential distortions to camera poses or point clouds. The reported superior accuracy is based on end-to-end benchmark results versus pure-eviction baselines, but we agree this does not fully isolate the contribution of the merging operation. We will add targeted ablation studies and distortion analysis (including pose and point-cloud metrics) in the revised manuscript and supplementary material to directly test the premise. revision: yes
Circularity Check
No circularity: training-free heuristics with external benchmark validation
full rationale
The paper describes a training-free framework consisting of explicit heuristic modules (CLCES for cross-layer scoring via order statistics and HCC for three-tier merging on the key-vector manifold). No parameters are fitted to the target reconstruction data, no predictions reduce to fitted inputs by construction, and no load-bearing claims rely on self-citations or imported uniqueness theorems. Performance is asserted via direct evaluation on five independent external benchmarks (7-Scenes, NRGBD, ETH3D, Bonn, KITTI), keeping the derivation chain self-contained against those benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Token importance trajectories across the Transformer hierarchy reliably indicate sustained geometric salience.
- domain assumption Nearest-neighbor assignment on the key-vector manifold merges moderately important tokens without destroying essential geometric information.
Reference graph
Works this paper leans on
-
[1]
Grounding image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and Jérôme Revaud. Grounding image matching in 3d with mast3r. pages 71–91, 2024
2024
-
[2]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20697–20709, 2024
2024
-
[3]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. pages 5294–5306, 2025
2025
-
[4]
Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass
Jianing Yang, Alexander Sax, Kevin J Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. pages 21924–21935, 2025
2025
-
[5]
OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer
Si-Yu Lu, Po-Ting Chen, Hui-Che Hsu, Sin-Ye Jhong, Wen-Huang Cheng, and Yung-Yao Chen. Ovggt: O (1) constant-cost streaming visual geometry transformer.arXiv preprint arXiv:2603.05959, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[6]
Streaming 4d visual geometry transformer.arXiv preprint arXiv:2507.11539, 2025
Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Yuqi Wu, Jie Zhou, and Jiwen Lu. Streaming 4d visual geometry transformer. arXiv preprint arXiv:2507.11539, 2025
-
[7]
Flashattention: Fast and memory-efficient exact attention with io-awareness
Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. volume 35, pages 16344–16359, 2022
2022
-
[8]
Feed-forward neural networks.Ieee Potentials, 13(4):27–31, 2002
George Bebis and Michael Georgiopoulos. Feed-forward neural networks.Ieee Potentials, 13(4):27–31, 2002
2002
-
[9]
Token merging: Your vit but faster
Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, and Judy Hoffman. Token merging: Your vit but faster. 2022
2022
-
[10]
Map-free visual relocalization: Metric pose relative to a single image
Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Aron Monszpart, Victor Prisacariu, Daniyar Turmukhambetov, and Eric Brachmann. Map-free visual relocalization: Metric pose relative to a single image. In European Conference on Computer Vision, pages 690–708. Springer, 2022
2022
-
[11]
Llafs: When large language models meet few-shot segmentation
Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, and Jun Liu. Llafs: When large language models meet few-shot segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3065–3075, 2024
2024
-
[12]
Deepmvs: Learning multi-view stereopsis
Po-Han Huang, Kevin Matzen, Johannes Kopf, Narendra Ahuja, and Jia-Bin Huang. Deepmvs: Learning multi-view stereopsis. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2821–2830, 2018
2018
-
[13]
Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling
Qi Zhu, Jiangwei Lao, Deyi Ji, Junwei Luo, Kang Wu, Yingying Zhang, Lixiang Ru, Jian Wang, Jingdong Chen, Ming Yang, et al. Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
2025
-
[14]
Ibd: Alleviating hallucinations in large vision-language models via image-biased decoding
Lanyun Zhu, Deyi Ji, Tianrun Chen, Peng Xu, Jieping Ye, and Jun Liu. Ibd: Alleviating hallucinations in large vision-language models via image-biased decoding. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2024
-
[15]
Structural and statistical texture knowledge distillation and learning for segmentation.IEEE Transactionson PatternAnalysis and MachineIntelligence, 47(5):3639–3656, 2025
Deyi Ji, Feng Zhao, Hongtao Lu, Feng Wu, and Jieping Ye. Structural and statistical texture knowledge distillation and learning for segmentation.IEEE Transactionson PatternAnalysis and MachineIntelligence, 47(5):3639–3656, 2025
2025
-
[16]
Discrete latent perspective learning for segmentation and detection
Deyi Ji, Feng Zhao, Lanyun Zhu, Wenwei Jin, Hongtao Lu, and Jieping Ye. Discrete latent perspective learning for segmentation and detection. InInternational Conference on Machine Learning, pages 21719–21730, 2024
2024
-
[17]
Not every patch is needed: Towards a more efficient and effective backbone for video-based person re-identification.IEEE Transactionson Image Processing, 2025
Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, and Jun Liu. Not every patch is needed: Towards a more efficient and effective backbone for video-based person re-identification.IEEE Transactionson Image Processing, 2025
2025
-
[18]
You Shen, Zhipeng Zhang, Yansong Qu, Xiawu Zheng, Jiayi Ji, Shengchuan Zhang, and Liujuan Cao. Fastvggt: Training-free acceleration of visual geometry transformer.arXiv preprint arXiv:2509.02560, 2025
-
[19]
Ultra-high resolution segmentation with ultra-rich context: A novel benchmark
Deyi Ji, Feng Zhao, Hongtao Lu, Mingyuan Tao, and Jieping Ye. Ultra-high resolution segmentation with ultra-rich context: A novel benchmark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23621–23630, 2023. 13
2023
-
[20]
Structural and statistical texture knowledge distillation for semantic segmentation
Deyi Ji, Haoran Wang, Mingyuan Tao, Jianqiang Huang, Xian-Sheng Hua, and Hongtao Lu. Structural and statistical texture knowledge distillation for semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16876–16885, 2022
2022
-
[21]
Learning statistical texture for semantic segmentation
Lanyu Zhu, Deyi Ji, Shiping Zhu, Weihao Gan, Wei Wu, and Junjie Yan. Learning statistical texture for semantic segmentation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
2021
-
[22]
Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InProceedings of the IEEE/CVF international conference on computer vision, pages 10901–10911, 2021
2021
-
[23]
Pptformer: Pseudo multi-perspective transformer for uav segmentation
Deyi Ji, Wenwei Jin, Hongtao Lu, and Feng Zhao. Pptformer: Pseudo multi-perspective transformer for uav segmentation. International Joint Conference on Artificial Intelligence, pages 893–901, 2024
2024
-
[24]
Megadepth: Learning single-view depth prediction from internet photos
Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2041–2050, 2018
2041
-
[25]
Popen: Preference-based optimization and ensemble for lvlm-based reasoning segmentation
Lanyun Zhu, Tianrun Chen, Qianxiong Xu, Xuanyi Liu, Deyi Ji, Haiyang Wu, De Wen Soh, and Jun Liu. Popen: Preference-based optimization and ensemble for lvlm-based reasoning segmentation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
2025
-
[26]
Retrv-r1: A reasoning-driven mllm framework for universal and efficient multimodal retrieval.Neural Information Processing Systems (NeurIPS), 2025
Lanyun Zhu, Deyi Ji, Tianrun Chen, Haiyang Wu, and Shiqi Wang. Retrv-r1: A reasoning-driven mllm framework for universal and efficient multimodal retrieval.Neural Information Processing Systems (NeurIPS), 2025
2025
-
[27]
Deyi Ji, Feng Zhao, and Hongtao Lu. Guided patch-grouping wavelet transformer with spatial congruence for ultra-high resolution segmentation.International Joint Conference on Artificial Intelligence, pages 920–928, 2023
2023
-
[28]
Replay master: Automatic sample selection and effective memory utilization for continual semantic segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, De Wen Soh, and Jun Liu. Replay master: Automatic sample selection and effective memory utilization for continual semantic segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[29]
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Yifan Wang et al. pi3: Scalable permutation-equivariant visual geometry learning.arXivpreprintarXiv:2507.13347, 2025
work page internal anchor Pith review arXiv 2025
-
[30]
Llafs++: Few-shot image segmentation with large language models.IEEE Transactionson Pattern Analysis and Machine Intelligence, 2025
Lanyun Zhu, Tianrun Chen, Deyi Ji, Peng Xu, Jieping Ye, and Jun Liu. Llafs++: Few-shot image segmentation with large language models.IEEE Transactionson Pattern Analysis and Machine Intelligence, 2025
2025
-
[31]
Context-aware graph convolution network for target re-identification
Deyi Ji, Haoran Wang, Hanzhe Hu, Weihao Gan, Wei Wu, and Junjie Yan. Context-aware graph convolution network for target re-identification. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 1646–1654, 2021
2021
-
[32]
CPCF: A cross-prompt contrastive framework for referring multimodal large language models
Lanyun Zhu, Deyi Ji, Tianrun Chen, Haiyang Wu, De Wen Soh, and Jun Liu. CPCF: A cross-prompt contrastive framework for referring multimodal large language models. InForty-secondInternational Conference on Machine Learning, 2025
2025
-
[33]
Learning gabor texture features for fine-grained recognition
Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, and Jun Liu. Learning gabor texture features for fine-grained recognition. InProceedings of the IEEE/CVF international conference on computer vision, pages 1621–1631, 2023
2023
-
[34]
View-centric multi-object tracking with homographic matching in moving uav.IEEE Transactionson Geoscience and Remote Sensing, 2026
Deyi Ji, Lanyun Zhu, Siqi Gao, Qi Zhu, Yiru Zhao, Peng Xu, Yue Ding, Hongtao Lu, Jieping Ye, Feng Wu, et al. View-centric multi-object tracking with homographic matching in moving uav.IEEE Transactionson Geoscience and Remote Sensing, 2026
2026
-
[35]
3d reconstruction with spatial memory
Hengyi Wang and Lourdes Agapito. 3d reconstruction with spatial memory. pages 78–89, 2025
2025
-
[36]
Continuous 3d perception model with persistent state
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d perception model with persistent state. pages 10510–10522, 2025
2025
-
[37]
TTT3R: 3D Reconstruction as Test-Time Train- ing.arXiv:2509.26645, 2025
Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, and Anpei Chen. Ttt3r: 3d reconstruction as test-time training. arXiv preprint arXiv:2509.26645, 2025
-
[38]
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory.arXiv:2507.02863, 2025
Yuqi Wu, Wenzhao Zheng, Jie Zhou, and Jiwen Lu. Point3r: Streaming 3d reconstruction with explicit spatial pointer memory.arXiv preprint arXiv:2507.02863, 2025
-
[39]
H2o: Heavy-hitter oracle for efficient generative inference of large language models
Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, et al. H2o: Heavy-hitter oracle for efficient generative inference of large language models. Advancesin Neural Information Processing Systems, 36:34661–34710, 2023. 14
2023
-
[40]
Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S Kevin Zhou. Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.arXiv preprint arXiv:2407.11550, 2024
-
[41]
Snapkv: Llm knows what you are looking for before generation.Advances in Neural Information Processing Systems, 37:22947–22970, 2024
Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. Snapkv: Llm knows what you are looking for before generation.Advances in Neural Information Processing Systems, 37:22947–22970, 2024
2024
-
[42]
Dynamicvit: Efficient vision transformers with dynamic token sparsification
Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems, 34:13937–13949, 2021
2021
-
[43]
Representation shift: Unifying token compression with flashattention
Joonmyung Choi, Sanghyeok Lee, Byungoh Ko, Eunseo Kim, Jihyung Kil, and Hyunwoo J Kim. Representation shift: Unifying token compression with flashattention. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20456–20466, 2025
2025
-
[44]
Dinov2: Learning robust visual features without supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. 2023
2023
-
[45]
Scannet: Richly-annotated 3d reconstructions of indoor scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017
2017
-
[46]
Evict3R: eci.arXiv preprint arXiv:2507.14890, 2025
Jinhui Deng, Zhili Li, Yijin Ma, Xin Yang, and Pengfei Wan. Evict3R: eci.arXiv preprint arXiv:2507.14890, 2025
-
[47]
Shuai Yuan, Yantai Yang, Xiaotian Yang, Xupeng Zhang, Zhonghao Zhao, Lingming Zhang, and Zhipeng Zhang. Infinitevggt: Visual geometry grounded transformer for endless streams.arXiv preprint arXiv:2601.02281, 2026
-
[48]
Neural rgb-d surface reconstruction
Dejan Azinović, Ricardo Martin-Brualla, Dan B Goldman, Matthias Nießner, and Justus Thies. Neural rgb-d surface reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6290–6301, 2022
2022
-
[49]
A multi-view stereo benchmark with high-resolution images and multi-camera videos
Thomas Schops, Johannes L Schonberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3260–3269, 2017
2017
-
[50]
Refusion: 3d reconstruc- tion in dynamic environments for rgb-d cameras exploiting residuals
Emanuele Palazzolo, Jens Behley, Philipp Lottes, Philippe Giguere, and Cyrill Stachniss. Refusion: 3d reconstruc- tion in dynamic environments for rgb-d cameras exploiting residuals. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7855–7862. IEEE, 2019
2019
-
[51]
Vision meets robotics: The kitti dataset
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset. The international journal of robotics research, 32(11):1231–1237, 2013. 15
2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.