Lite Any Stereo: Efficient Zero-Shot Stereo Matching
Pith reviewed 2026-05-17 20:28 UTC · model grok-4.3
The pith
An ultra-light stereo matching model matches or exceeds the accuracy of much larger methods on real-world benchmarks while using under 1% of their computation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By designing a compact backbone and hybrid cost aggregation module, and applying a three-stage training strategy on large-scale data, an ultra-light architecture can achieve top accuracy in zero-shot stereo matching across real-world benchmarks while consuming less than 1% of the computational resources of state-of-the-art accurate methods.
What carries the argument
The Lite Any Stereo framework, centered on its compact yet expressive backbone and hybrid cost aggregation module, which together enable efficient processing and effective generalization.
If this is right
- Lightweight models become viable for accurate zero-shot stereo depth estimation in practical settings.
- Computational costs for high-performance stereo matching drop dramatically, enabling deployment on resource-limited devices.
- The three-stage training approach demonstrates a scalable way to bridge simulation-to-real gaps in depth estimation.
- Non-prior-based methods can now prioritize efficiency without major accuracy trade-offs.
Where Pith is reading between the lines
- This suggests that similar efficiency-focused designs could apply to related tasks such as optical flow estimation.
- Future work might test if even smaller models or different training scales yield comparable results.
- Adoption could shift industry standards toward lightweight architectures for real-time 3D perception.
Load-bearing premise
The three-stage training strategy on million-scale data bridges the sim-to-real gap for the ultra-light model without relying on hidden tuning specific to the evaluation benchmarks.
What would settle it
Evaluating the model on a new, unseen real-world stereo dataset where it fails to rank highly or loses its accuracy advantage over heavier models would challenge the claim.
Figures
read the original abstract
Recent advances in stereo matching have focused on accuracy, often at the cost of significantly increased model size. Traditionally, the community has regarded efficient models as incapable of zero-shot ability due to their limited capacity. In this paper, we introduce Lite Any Stereo, a stereo depth estimation framework that achieves strong zero-shot generalization while remaining highly efficient. To this end, we design a compact yet expressive backbone to ensure scalability, along with a carefully crafted hybrid cost aggregation module. We further propose a three-stage training strategy on million-scale data to effectively bridge the sim-to-real gap. Together, these components demonstrate that an ultra-light model can deliver strong generalization, ranking 1st across four widely used real-world benchmarks. Remarkably, our model attains accuracy comparable to or exceeding state-of-the-art non-prior-based accurate methods while requiring less than 1% computational cost, setting a new standard for efficient stereo matching.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Lite Any Stereo, an ultra-light stereo depth estimation framework featuring a compact yet expressive backbone and a hybrid cost aggregation module. It employs a three-stage training strategy on million-scale synthetic data to bridge the sim-to-real gap, claiming to rank first on four widely used real-world benchmarks while achieving accuracy comparable to or exceeding state-of-the-art non-prior-based methods at less than 1% computational cost.
Significance. If the results hold under rigorous validation, the work would demonstrate that deliberately low-capacity architectures can deliver strong zero-shot generalization in stereo matching through architecture design and large-scale training, potentially redefining efficiency-accuracy trade-offs and enabling deployment on resource-limited devices. The explicit focus on million-scale synthetic pretraining for sim-to-real transfer is a notable strength if supported by ablations.
major comments (2)
- [Abstract and §4] Abstract and experimental section: the central claim of 1st-place zero-shot rankings and comparable accuracy to heavier SOTA methods is presented without error bars, ablation details on training-data composition, or explicit comparison tables; this makes it impossible to verify whether benchmark choices or data exclusions affect the reported advantage for the ultra-light model.
- [§3] §3 (three-stage training): the strategy is load-bearing for the sim-to-real claim, yet the description provides no quantitative evidence such as held-out real validation sets or ablations ruling out per-benchmark hyperparameter search or post-training selection; for a low-capacity backbone this is required to establish that the <1% compute advantage is reproducible on truly unseen real data.
minor comments (2)
- [§2] Clarify notation for the hybrid cost aggregation module and ensure all equations are numbered consistently with references in the text.
- [Table in §4] Add a table summarizing compute (FLOPs or runtime) alongside accuracy metrics for all compared methods to support the <1% claim.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight opportunities to strengthen the experimental validation and clarity of our claims. We address each major point below and have incorporated revisions to improve transparency and rigor.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and experimental section: the central claim of 1st-place zero-shot rankings and comparable accuracy to heavier SOTA methods is presented without error bars, ablation details on training-data composition, or explicit comparison tables; this makes it impossible to verify whether benchmark choices or data exclusions affect the reported advantage for the ultra-light model.
Authors: We agree that additional statistical and compositional details would aid verification. In the revised manuscript we have added error bars computed across multiple independent training runs with different random seeds to all reported rankings and accuracy figures. We have also inserted a dedicated ablation subsection on training-data composition, quantifying the contribution of each synthetic source to zero-shot performance. Section 4 now contains expanded comparison tables that list all competing methods alongside their compute costs, accuracies on the four benchmarks, and explicit notes on any data exclusions or benchmark usage. revision: yes
-
Referee: [§3] §3 (three-stage training): the strategy is load-bearing for the sim-to-real claim, yet the description provides no quantitative evidence such as held-out real validation sets or ablations ruling out per-benchmark hyperparameter search or post-training selection; for a low-capacity backbone this is required to establish that the <1% compute advantage is reproducible on truly unseen real data.
Authors: We acknowledge that quantitative support for the training strategy strengthens the sim-to-real claims. The revised Section 3 now reports results on held-out real validation subsets drawn from the benchmarks, confirming consistent transfer without per-benchmark adaptation. We have added ablations that vary the training stages while keeping hyperparameters fixed across all evaluations, demonstrating that gains arise from the staged curriculum rather than post-training selection or benchmark-specific tuning. These changes support reproducibility of the efficiency advantage on truly unseen real data. revision: yes
Circularity Check
Empirical results on external benchmarks; no load-bearing reduction to self-defined quantities or self-citations
full rationale
The paper describes an ultra-light backbone, hybrid cost aggregation, and three-stage training on synthetic million-scale data, then reports rankings on four real-world benchmarks. No equations or derivations are presented that reduce by construction to fitted parameters or prior self-citations. The zero-shot generalization claim is supported by external benchmark evaluation rather than internal redefinition or renaming of known results. This matches the low circularity expected for an empirical architecture paper whose central claims remain falsifiable on held-out real data.
Axiom & Free-Parameter Ledger
free parameters (1)
- three-stage training hyperparameters
axioms (1)
- domain assumption Stereo matching can be solved via learned cost-volume aggregation in a CNN backbone
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hybrid cost aggregation module that jointly leverages 2D and 3D representations... C_agg = G_2D(G_3D(C))
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
three-stage training strategy on million-scale data... Stage① supervised on 1.8 M synthetic, Stage② self-distillation, Stage③ knowledge distillation on 0.5 M real pairs
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Correlate-and- excite: Real-time stereo matching via guided cost volume excitation
Antyanta Bangunharcana, Jae Won Cho, Seokju Lee, In So Kweon, Kyung-Soo Kim, and Soohyun Kim. Correlate-and- excite: Real-time stereo matching via guided cost volume excitation. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3542–3548. IEEE, 2021. 2, 6, 8
work page 2021
-
[2]
Wei Bao, Wei Wang, Yuhua Xu, Yulan Guo, Siyu Hong, and Xiaohu Zhang. Instereo2k: a large real dataset for stereo matching in indoor scenes.Science China Information Sci- ences, 63(11):1–11, 2020. 4
work page 2020
-
[3]
Luca Bartolomei, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. Stereo anywhere: Robust zero-shot deep stereo matching even where either stereo or mono fail.arXiv preprint arXiv:2412.04472, 2024. 2
-
[4]
Uasol, a large- scale high-resolution outdoor stereo dataset.Scientific data, 6(1):162, 2019
Zuria Bauer, Francisco Gomez-Donoso, Edmanuel Cruz, Sergio Orts-Escolano, and Miguel Cazorla. Uasol, a large- scale high-resolution outdoor stereo dataset.Scientific data, 6(1):162, 2019. 4
work page 2019
-
[5]
D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. InECCV, pages 611–625, 2012. 4
work page 2012
-
[6]
Yohann Cabon, Naila Murray, and Martin Humenberger. Vir- tual kitti 2, 2020. 4, 5, 7
work page 2020
-
[7]
Pyramid stereo matching network
Jia-Ren Chang and Yong-Sheng Chen. Pyramid stereo matching network. InCVPR, pages 5410–5418, 2018. 2
work page 2018
-
[8]
Domain generalized stereo matching via hierarchical visual transformation
Tianyu Chang, Xun Yang, Tianzhu Zhang, and Meng Wang. Domain generalized stereo matching via hierarchical visual transformation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9559– 9568, 2023. 2
work page 2023
-
[9]
Monster: Marry monodepth to stereo unleashes power, 2025
Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, and Xin Yang. Monster: Marry monodepth to stereo unleashes power, 2025. 1, 2, 3
work page 2025
-
[10]
Hierarchical neural architecture search for deep stereo matching.arXiv preprint arXiv:2010.13501,
Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Tom Drummond, Hongdong Li, and Zongyuan Ge. Hierarchical neural architecture search for deep stereo matching.arXiv preprint arXiv:2010.13501,
-
[11]
WeiQin Chuah, Ruwan Tennakoon, Reza Hoseinnezhad, Alireza Bab-Hadiashar, and David Suter. Itsa: An information-theoretic approach to automatic shortcut avoid- ance and domain generalization in stereo matching networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13022–13032, 2022. 2
work page 2022
-
[12]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 3
work page 2009
-
[13]
Deeppruner: Learning efficient stereo matching via differentiable patchmatch
Shivam Duggal, Shenlong Wang, Wei-Chiu Ma, Rui Hu, and Raquel Urtasun. Deeppruner: Learning efficient stereo matching via differentiable patchmatch. InProceedings of the IEEE/CVF international conference on computer vision, pages 4384–4393, 2019. 2, 8
work page 2019
-
[14]
Are we ready for autonomous driving? the kitti vision benchmark suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. InCVPR, pages 3354–3361, 2012. 1, 2, 3, 5, 6, 7, 8
work page 2012
-
[15]
Cascade cost volume for high-resolution multi-view stereo and stereo matching
Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. Cascade cost volume for high-resolution multi-view stereo and stereo matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2495–2504, 2020. 2
work page 2020
-
[16]
Context-enhanced stereo transformer
Weiyu Guo, Zhaoshuo Li, Yongkui Yang, Zheng Wang, Rus- sell H Taylor, Mathias Unberath, Alan Yuille, and Yingwei Li. Context-enhanced stereo transformer. InEuropean Con- ference on Computer Vision, pages 263–279. Springer, 2022. 2
work page 2022
-
[17]
Group-wise correlation stereo network
Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, and Hongsheng Li. Group-wise correlation stereo network. In CVPR, pages 3273–3282, 2019. 2
work page 2019
-
[18]
Openstereo: A comprehensive benchmark for stereo matching and strong baseline,
Xianda Guo, Chenming Zhang, Juntao Lu, Yiqi Wang, Yiqun Duan, Tian Yang, Zheng Zhu, and Long Chen. Openstereo: A comprehensive benchmark for stereo matching and strong baseline.arXiv preprint arXiv:2312.00343, 2023. 2
-
[19]
Stereo anything: Unifying stereo matching with large-scale mixed data,
Xianda Guo, Chenming Zhang, Youmin Zhang, Dujun Nie, Ruilin Wang, Wenzhao Zheng, Matteo Poggi, and Long Chen. Stereo anything: Unifying stereo matching with large- scale mixed data.arXiv preprint arXiv:2411.14053, 2024. 2, 6
-
[20]
Light- stereo: Channel boost is all your need for efficient 2d cost aggregation, 2024
Xianda Guo, Chenming Zhang, Youmin Zhang, Wenzhao Zheng, Dujun Nie, Matteo Poggi, and Long Chen. Light- stereo: Channel boost is all your need for efficient 2d cost aggregation, 2024. 1, 2, 3, 4, 6, 8
work page 2024
-
[21]
Holopix50k: A large-scale in-the-wild stereo image dataset
Yiwen Hua, Puneet Kohli, Pritish Uplavikar, Anand Ravi, Saravana Gunaseelan, Jason Orozco, and Edward Li. Holopix50k: A large-scale in-the-wild stereo image dataset. InCVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, 2020., 2020. 4
work page 2020
-
[22]
Defom-stereo: Depth foundation model based stereo matching, 2025
Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, and Rui Huang. Defom-stereo: Depth foundation model based stereo matching, 2025. 1, 2, 3
work page 2025
-
[23]
Stereo4d: Learning how things move in 3d from internet stereo videos.arXiv preprint,
Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, and Aleksander Holynski. Stereo4d: Learning how things move in 3d from internet stereo videos.arXiv preprint,
-
[24]
Uncertainty guided adaptive warping for robust and efficient stereo matching
Junpeng Jing, Jiankun Li, Pengfei Xiong, Jiangyu Liu, Shuaicheng Liu, Yichen Guo, Xin Deng, Mai Xu, Lai Jiang, and Leonid Sigal. Uncertainty guided adaptive warping for robust and efficient stereo matching. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3318–3327, 2023. 2, 3, 6, 8
work page 2023
-
[25]
Match- stereo-videos: Bidirectional alignment for consistent dy- namic stereo matching
Junpeng Jing, Ye Mao, and Krystian Mikolajczyk. Match- stereo-videos: Bidirectional alignment for consistent dy- namic stereo matching. InEuropean Conference on Com- puter Vision, pages 415–432. Springer, 2024. 2
work page 2024
-
[26]
Match stereo videos via bidirectional alignment,
Junpeng Jing, Ye Mao, Anlan Qiu, and Krystian Miko- lajczyk. Match stereo videos via bidirectional alignment,
-
[27]
Stereo any video: Temporally consistent stereo match- ing, 2025
Junpeng Jing, Weixun Luo, Ye Mao, and Krystian Mikola- jczyk. Stereo any video: Temporally consistent stereo match- ing, 2025. 9
work page 2025
-
[28]
Dy- namicstereo: Consistent dynamic depth from stereo videos
Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Dy- namicstereo: Consistent dynamic depth from stereo videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13229–13239, 2023. 2, 4
work page 2023
-
[29]
Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction
Sameh Khamis, Sean Fanello, Christoph Rhemann, Adarsh Kowdle, Julien Valentin, and Shahram Izadi. Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. InECCV, pages 573–590, 2018. 2
work page 2018
-
[30]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
Practical stereo matching via cascaded recurrent net- work with adaptive correlation
Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, and Shuaicheng Liu. Practical stereo matching via cascaded recurrent net- work with adaptive correlation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16263–16272, 2022. 1, 2, 4, 5, 7
work page 2022
-
[32]
Revisiting stereo depth estimation from a sequence- to-sequence perspective with transformers
Zhaoshuo Li, Xingtong Liu, Nathan Drenkow, Andy Ding, Francis X Creighton, Russell H Taylor, and Mathias Un- berath. Revisiting stereo depth estimation from a sequence- to-sequence perspective with transformers. InProceedings of the IEEE/CVF international conference on computer vi- sion, pages 6197–6206, 2021. 2
work page 2021
-
[33]
Learning for disparity estimation through feature constancy
Zhengfa Liang, Yiliu Feng, Yulan Guo, Hengzhu Liu, Wei Chen, Linbo Qiao, Li Zhou, and Jianfeng Zhang. Learning for disparity estimation through feature constancy. InCVPR, pages 2811–2820, 2018. 2
work page 2018
-
[34]
Raft-stereo: Multilevel recurrent field transforms for stereo matching
Lahav Lipson, Zachary Teed, and Jia Deng. Raft-stereo: Multilevel recurrent field transforms for stereo matching. arXiv preprint arXiv:2109.07547, 2021. 1, 2
-
[35]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 11976–11986,
-
[36]
Cooperative computation of stereo disparity
D Marr and T Poggio. Cooperative computation of stereo disparity. InNeurocomputing: foundations of research, pages 259–267. 1988. 1
work page 1988
-
[37]
Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InCVPR, pages 4040–4048, 2016. 4, 5, 6, 7, 8
work page 2016
-
[38]
Spring: A high-resolution high- detail dataset and benchmark for scene flow, optical flow and stereo
Lukas Mehl, Jenny Schmalfuss, Azin Jahedi, Yaroslava Nali- vayko, and Andr ´es Bruhn. Spring: A high-resolution high- detail dataset and benchmark for scene flow, optical flow and stereo. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 4
work page 2023
-
[39]
Object scene flow for au- tonomous vehicles
Moritz Menze and Andreas Geiger. Object scene flow for au- tonomous vehicles. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 1, 2, 3, 5, 6, 7, 8
work page 2015
-
[40]
Confidence aware stereo matching for realistic cluttered scenario
Junhong Min and Youngpil Jeon. Confidence aware stereo matching for realistic cluttered scenario. In2024 IEEE In- ternational Conference on Image Processing (ICIP), pages 3491–3497. IEEE, 2024. 5
work page 2024
-
[41]
Cascade residual learning: A two-stage con- volutional neural network for stereo matching
Jiahao Pang, Wenxiu Sun, Jimmy SJ Ren, Chengxi Yang, and Qiong Yan. Cascade residual learning: A two-stage con- volutional neural network for stereo matching. InCVPRW, pages 887–895, 2017. 2
work page 2017
-
[42]
Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging.SIAM journal on control and optimization, 30(4):838–855, 1992. 5
work page 1992
-
[43]
Masked representation learn- ing for domain generalized stereo matching
Zhibo Rao, Bangshu Xiong, Mingyi He, Yuchao Dai, Renjie He, Zhelun Shen, and Xing Li. Masked representation learn- ing for domain generalized stereo matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 5435–5444, 2023. 2
work page 2023
-
[44]
Mobilenetv2: Inverted residuals and linear bottlenecks
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zh- moginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 4510–4520, 2018. 3
work page 2018
-
[45]
Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algo- rithms.IJCV, 47(1):7–42, 2002. 1, 2, 5, 6, 7
work page 2002
-
[46]
A multi-view stereo benchmark with high- resolution images and multi-camera videos
Thomas Schops, Johannes L Schonberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and An- dreas Geiger. A multi-view stereo benchmark with high- resolution images and multi-camera videos. InCVPR, pages 3260–3269, 2017. 1, 2, 5, 6, 7
work page 2017
-
[47]
Mobilestereonet: Towards lightweight deep net- works for stereo matching
Faranak Shamsafar, Samuel Woerz, Rafia Rahim, and An- dreas Zell. Mobilestereonet: Towards lightweight deep net- works for stereo matching. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2417–2426, 2022. 1, 3, 4, 6, 8
work page 2022
-
[48]
Cfnet: Cascade and fused cost volume for robust stereo matching
Zhelun Shen, Yuchao Dai, and Zhibo Rao. Cfnet: Cascade and fused cost volume for robust stereo matching. InCVPR, pages 13906–13915, 2021. 2
work page 2021
-
[49]
Chitransformer: Towards reliable stereo from cues
Qing Su and Shihao Ji. Chitransformer: Towards reliable stereo from cues. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 1939–1949, 2022. 2
work page 1939
-
[50]
Tatsunori Taniai, Yasuyuki Matsushita, Yoichi Sato, and Takeshi Naemura. Continuous 3d label stereo matching us- ing local expansion moves.IEEE TPAMI, 40(11):2725– 2739, 2017. 1
work page 2017
-
[51]
Hitnet: Hierar- chical iterative tile refinement network for real-time stereo matching
Vladimir Tankovich, Christian Hane, Yinda Zhang, Adarsh Kowdle, Sean Fanello, and Sofien Bouaziz. Hitnet: Hierar- chical iterative tile refinement network for real-time stereo matching. InCVPR, pages 14362–14372, 2021. 2, 3, 8
work page 2021
-
[52]
Falling things: A synthetic dataset for 3d object detection and pose estimation
Jonathan Tremblay, Thang To, and Stan Birchfield. Falling things: A synthetic dataset for 3d object detection and pose estimation. InCVPRW, pages 2038–2041, 2018. 4, 5, 7
work page 2038
-
[53]
Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity invariant cnns. In2017 international conference on 3D Vision (3DV), pages 11–20. IEEE, 2017. 8
work page 2017
-
[54]
Qiang Wang, Shizhen Zheng, Qingsong Yan, Fei Deng, Kaiyong Zhao, and Xiaowen Chu. Irs: A large naturalis- tic indoor robotics stereo dataset to train deep models for disparity and surface normal estimation.arXiv preprint arXiv:1912.09678, 2019. 4 10
-
[55]
FADNet: A fast and accurate network for disparity estimation
Qiang Wang, Shaohuai Shi, Shizhen Zheng, Kaiyong Zhao, and Xiaowen Chu. FADNet: A fast and accurate network for disparity estimation. In2020 IEEE International Conference on Robotics and Automation (ICRA 2020), pages 101–107,
work page 2020
-
[56]
Tartanair: A dataset to push the limits of visual slam
Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Se- bastian Scherer. Tartanair: A dataset to push the limits of visual slam. 2020. 4
work page 2020
-
[57]
Selective-stereo: Adaptive frequency information selection for stereo matching
Xianqi Wang, Gangwei Xu, Hao Jia, and Xin Yang. Selective-stereo: Adaptive frequency information selection for stereo matching. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 19701–19710, 2024. 1, 2, 6, 7
work page 2024
-
[58]
Flickr1024: A large-scale dataset for stereo image super-resolution
Yingqian Wang, Longguang Wang, Jungang Yang, Wei An, and Yulan Guo. Flickr1024: A large-scale dataset for stereo image super-resolution. InInternational Conference on Computer Vision Workshops, pages 3852–3857, 2019. 4
work page 2019
-
[59]
Croco v2: Improved cross-view completion pre- training for stereo matching and optical flow
Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Br ´egier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, and J ´erˆome Revaud. Croco v2: Improved cross-view completion pre- training for stereo matching and optical flow. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 17969–17...
work page 2023
-
[60]
Foundationstereo: Zero- shot stereo matching, 2025
Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching, 2025. 1, 2, 3, 4, 5, 6, 7
work page 2025
-
[61]
Con- vnext v2: Co-designing and scaling convnets with masked autoencoders
Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Con- vnext v2: Co-designing and scaling convnets with masked autoencoders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16133– 16142, 2023. 3
work page 2023
-
[62]
Structure-guided ranking loss for single image depth prediction
Ke Xian, Jianming Zhang, Oliver Wang, Long Mai, Zhe Lin, and Zhiguo Cao. Structure-guided ranking loss for single image depth prediction. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 5
work page 2020
-
[63]
Bilateral grid learning for stereo matching networks
Bin Xu, Yuhua Xu, Xiaoli Yang, Wei Jia, and Yulan Guo. Bilateral grid learning for stereo matching networks. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12497–12506, 2021. 2, 8
work page 2021
-
[64]
Atten- tion concatenation volume for accurate and efficient stereo matching
Gangwei Xu, Junda Cheng, Peng Guo, and Xin Yang. Atten- tion concatenation volume for accurate and efficient stereo matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12981– 12990, 2022. 2, 6, 8
work page 2022
-
[65]
Iterative geometry encoding volume for stereo matching
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 21919–21928, 2023. 2
work page 2023
-
[66]
Gangwei Xu, Yun Wang, Junda Cheng, Jinhui Tang, and Xin Yang. Accurate and efficient stereo matching via attention concatenation volume.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023. 1, 2, 6, 8
work page 2023
-
[67]
Banet: Bilateral aggregation network for mobile stereo matching
Gangwei Xu, Jiaxin Liu, Xianqi Wang, Junda Cheng, Yong Deng, Jinliang Zang, Yurui Chen, and Xin Yang. Banet: Bilateral aggregation network for mobile stereo matching. arXiv preprint arXiv:2503.03259, 2025. 1, 3, 4, 5, 6, 8
-
[68]
Aanet: Adaptive aggregation network for efficient stereo matching
Haofei Xu and Juyong Zhang. Aanet: Adaptive aggregation network for efficient stereo matching. InCVPR, pages 1959– 1968, 2020. 2, 3, 8
work page 1959
-
[69]
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, and Andreas Geiger. Unifying flow, stereo and depth estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 2
work page 2023
-
[70]
Hierarchical deep stereo matching on high- resolution images
Gengshan Yang, Joshua Manela, Michael Happold, and Deva Ramanan. Hierarchical deep stereo matching on high- resolution images. InCVPR, pages 5515–5524, 2019. 2
work page 2019
-
[71]
Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios
Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, and Bolei Zhou. Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 899–908, 2019. 4, 5, 6
work page 2019
-
[72]
Depth anything: Unleashing the power of large-scale unlabeled data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10371–10381, 2024. 1, 2, 3
work page 2024
-
[73]
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.arXiv preprint arXiv:2406.09414, 2024. 1, 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[74]
A decomposition model for stereo matching
Chengtang Yao, Yunde Jia, Huijun Di, Pengxiang Li, and Yuwei Wu. A decomposition model for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 6091–6100, 2021. 8
work page 2021
-
[75]
Ga-net: Guided aggregation net for end- to-end stereo matching
Feihu Zhang, Victor Prisacariu, Ruigang Yang, and Philip HS Torr. Ga-net: Guided aggregation net for end- to-end stereo matching. InCVPR, pages 185–194, 2019. 2
work page 2019
-
[76]
Revisiting domain generalized stereo matching networks from a feature consistency perspective
Jiawei Zhang, Xiang Wang, Xiao Bai, Chen Wang, Lei Huang, Yimin Chen, Lin Gu, Jun Zhou, Tatsuya Harada, and Edwin R Hancock. Revisiting domain generalized stereo matching networks from a feature consistency perspective. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13001–13011, 2022. 2
work page 2022
-
[77]
Learning representations from foundation models for domain generalized stereo matching
Yongjian Zhang, Longguang Wang, Kunhong Li, Yun Wang, and Yulan Guo. Learning representations from foundation models for domain generalized stereo matching. InEuropean Conference on Computer Vision, pages 146–162. Springer,
-
[78]
Jingyi Zhou, Haoyu Zhang, Jiakang Yuan, Peng Ye, Tao Chen, Hao Jiang, Meiya Chen, and Yangyang Zhang. All- in-one: Transferring vision foundation models into stereo matching.arXiv preprint arXiv:2412.09912, 2024. 2 11
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.