Bidirectional Cross-Modal Prompting for Event-Frame Asymmetric Stereo
Pith reviewed 2026-05-10 11:14 UTC · model grok-4.3
The pith
Bidirectional prompting projects event and frame data into each other's domains to recover cues for accurate stereo matching.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our approach learns finely aligned stereo representations within a target canonical space and integrates complementary representations by projecting each modality into both event and frame domains.
What carries the argument
Bidirectional cross-modal prompting that projects each modality into both event and frame domains to recover and fuse domain-specific cues.
If this is right
- Stereo matching accuracy increases because complementary texture from frames and timing from events are both retained.
- Representations become more robust to motion blur and illumination changes.
- The same model generalizes better across different camera setups and scene speeds.
- No extra hardware synchronization is required beyond the two modalities already present.
Where Pith is reading between the lines
- The prompting idea could transfer to other pairs of sensors that differ in temporal resolution, such as lidar and camera fusion.
- In robotics, this might allow lighter rigs that still produce reliable depth during high-speed maneuvers.
- A direct test would measure whether removing the reverse projection step drops accuracy by a measurable margin on the same data.
Load-bearing premise
The gap between event and frame data marginalizes useful cues, and bidirectional prompting can recover those cues without creating new alignment mistakes.
What would settle it
Performance on a held-out dataset of fast-motion, low-light scenes falls to or below current state-of-the-art stereo methods, or measured alignment error rises after prompting.
Figures
read the original abstract
Conventional frame-based cameras capture rich contextual information but suffer from limited temporal resolution and motion blur in dynamic scenes. Event cameras offer an alternative visual representation with higher dynamic range free from such limitations. The complementary characteristics of the two modalities make event-frame asymmetric stereo promising for reliable 3D perception under fast motion and challenging illumination. However, the modality gap often leads to marginalization of domain-specific cues essential for cross-modal stereo matching. In this paper, we introduce Bi-CMPStereo, a novel bidirectional cross-modal prompting framework that fully exploits semantic and structural features from both domains for robust matching. Our approach learns finely aligned stereo representations within a target canonical space and integrates complementary representations by projecting each modality into both event and frame domains. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods in accuracy and generalization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Bi-CMPStereo, a bidirectional cross-modal prompting framework for event-frame asymmetric stereo matching. It claims to address the modality gap by learning finely aligned stereo representations within a target canonical space, integrating complementary semantic and structural features via bidirectional projection of each modality into both event and frame domains, and demonstrates significant outperformance over state-of-the-art methods in accuracy and generalization through extensive experiments.
Significance. If the empirical claims hold with proper validation, the framework could advance reliable 3D perception for dynamic scenes and challenging illumination by better preserving domain-specific cues that standard cross-modal matching tends to marginalize.
major comments (2)
- [Abstract] Abstract: The assertion of significant outperformance over SOTA methods in accuracy and generalization is presented without any quantitative results, baselines, ablation studies, or error analysis, preventing evaluation of the data-to-claim link.
- [Method] Method section: The bidirectional cross-modal prompting is described at a high level without specifying the projection mechanism (e.g., learned adapters, temporal aggregation for sparse events, or cycle-consistency losses), which is load-bearing for the claim that domain-specific cues are recovered without introducing new alignment errors or information loss.
minor comments (1)
- The 'canonical space' is referenced without a formal definition, diagram, or equation clarifying the projection operators and alignment process.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below. Revisions will be made to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion of significant outperformance over SOTA methods in accuracy and generalization is presented without any quantitative results, baselines, ablation studies, or error analysis, preventing evaluation of the data-to-claim link.
Authors: We agree that the abstract would benefit from including quantitative highlights to strengthen the link between claims and evidence. In the revised manuscript, we will update the abstract to incorporate key performance metrics demonstrating outperformance (e.g., accuracy and generalization improvements on standard benchmarks). The full details on baselines, ablation studies, and error analysis are provided in the Experiments section, but we will ensure the abstract offers a clearer summary of these results. revision: yes
-
Referee: [Method] Method section: The bidirectional cross-modal prompting is described at a high level without specifying the projection mechanism (e.g., learned adapters, temporal aggregation for sparse events, or cycle-consistency losses), which is load-bearing for the claim that domain-specific cues are recovered without introducing new alignment errors or information loss.
Authors: We appreciate the referee highlighting this point and acknowledge that the current Method section presents the bidirectional cross-modal prompting at a high level. In the revised version, we will expand this section to explicitly specify the projection mechanisms, including details on learned adapters for domain mapping, temporal aggregation approaches for handling sparse event data, and the incorporation of cycle-consistency losses. These additions, along with supporting equations and illustrations, will clarify how domain-specific cues are preserved without introducing alignment errors or information loss. revision: yes
Circularity Check
No circularity: architecture proposal with no derivations or fitted predictions
full rationale
The paper introduces Bi-CMPStereo as a bidirectional cross-modal prompting framework for event-frame stereo matching. The abstract and available description contain no equations, no parameter fitting presented as prediction, no uniqueness theorems, and no self-citations that bear the central claim. The approach is defined by its architectural choices (canonical space alignment and bidirectional projection), which are independent of any input data or prior results by construction. Empirical outperformance is asserted via experiments rather than derived from the inputs themselves. This is a standard non-circular method paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Deep event stereo leveraged by event-to- image translation
Soikat Hasan Ahmed, Hae Woong Jang, SM Nadim Uddin, and Yong Ju Jung. Deep event stereo leveraged by event-to- image translation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 882–890, 2021
work page 2021
-
[2]
Neural disparity refinement for arbitrary resolution stereo
Filippo Aleotti, Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, and Luigi Di Ste- 10 fano. Neural disparity refinement for arbitrary resolution stereo. In2021 International Conference on 3D Vision (3DV), pages 207–217. IEEE, 2021
work page 2021
-
[3]
Lidar-event stereo fusion with hallucinations
Luca Bartolomei, Matteo Poggi, Andrea Conti, and Stefano Mattoccia. Lidar-event stereo fusion with hallucinations. In European Conference on Computer Vision, pages 125–145. Springer, 2024
work page 2024
-
[4]
Luca Bartolomei, Enrico Mannocci, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. Depth anyevent: A cross- modal distillation paradigm for event-based monocular depth estimation.arXiv preprint arXiv:2509.15224, 2025
-
[5]
Stereo anywhere: Robust zero-shot deep stereo matching even where either stereo or mono fail
Luca Bartolomei, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. Stereo anywhere: Robust zero-shot deep stereo matching even where either stereo or mono fail. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[6]
M3ed: Multi-robot, multi-sensor, multi-environment event dataset
Kenneth Chaney, Fernando Cladera, Ziyun Wang, Anthony Bisulco, M Ani Hsieh, Christopher Korpela, Vijay Kumar, Camillo J Taylor, and Kostas Daniilidis. M3ed: Multi-robot, multi-sensor, multi-environment event dataset. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4016–4023, 2023
work page 2023
-
[7]
Pyramid stereo matching network
Jia-Ren Chang and Yong-Sheng Chen. Pyramid stereo matching network. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5410–5418, 2018
work page 2018
-
[8]
Domain generalized stereo matching via hierarchical visual transformation
Tianyu Chang, Xun Yang, Tianzhu Zhang, and Meng Wang. Domain generalized stereo matching via hierarchical visual transformation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9559–9568, 2023
work page 2023
-
[9]
Depth from asymmetric frame-event stereo: A divide-and-conquer approach
Xihao Chen, Wenming Weng, Yueyi Zhang, and Zhi- wei Xiong. Depth from asymmetric frame-event stereo: A divide-and-conquer approach. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3045–3054, 2024
work page 2024
-
[10]
Mocha-stereo: Motif chan- nel attention network for stereo matching
Ziyang Chen, Wei Long, He Yao, Yongjun Zhang, Bingshu Wang, Yongbin Qin, and Jia Wu. Mocha-stereo: Motif chan- nel attention network for stereo matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[11]
Monster: Marry monodepth to stereo unleashes power
Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, and Xin Yang. Monster: Marry monodepth to stereo unleashes power. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[12]
Event-image fusion stereo using cross-modality feature propagation
Hoonhee Cho and Kuk-Jin Yoon. Event-image fusion stereo using cross-modality feature propagation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 454– 462, 2022
work page 2022
-
[13]
Selection and cross simi- larity for event-image deep stereo
Hoonhee Cho and Kuk-Jin Yoon. Selection and cross simi- larity for event-image deep stereo. InEuropean Conference on Computer Vision, pages 470–486. Springer, 2022
work page 2022
-
[14]
Learning adaptive dense event stereo from the image domain
Hoonhee Cho, Jegyeong Cho, and Kuk-Jin Yoon. Learning adaptive dense event stereo from the image domain. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17797–17807, 2023
work page 2023
-
[15]
Non-coaxial event-guided motion deblurring with spa- tial alignment
Hoonhee Cho, Yuhwan Jeong, Taewoo Kim, and Kuk-Jin Yoon. Non-coaxial event-guided motion deblurring with spa- tial alignment. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 12492–12503, 2023
work page 2023
-
[16]
Tempo- ral event stereo via joint learning with stereoscopic flow
Hoonhee Cho, Jae-Young Kang, and Kuk-Jin Yoon. Tempo- ral event stereo via joint learning with stereoscopic flow. In European Conference on Computer Vision, pages 294–314. Springer, 2024
work page 2024
-
[17]
WeiQin Chuah, Ruwan Tennakoon, Reza Hoseinnezhad, Alireza Bab-Hadiashar, and David Suter. Itsa: An information-theoretic approach to automatic shortcut avoid- ance and domain generalization in stereo matching net- works. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13022–13032, 2022
work page 2022
-
[18]
Chao Ding, Mingyuan Lin, Haijian Zhang, Jianzhuang Liu, and Lei Yu. Video frame interpolation with stereo event and intensity cameras.IEEE Transactions on Multimedia, 26: 9187–9202, 2024
work page 2024
-
[19]
Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020
work page 2020
-
[20]
Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Davide Scaramuzza. Dsec: A stereo event camera dataset for driv- ing scenarios.IEEE Robotics and Automation Letters, 6(3): 4947–4954, 2021
work page 2021
-
[21]
Dipon Kumar Ghosh and Yong Ju Jung. Two-stage cross- fusion network for stereo event-based depth estimation.Ex- pert Systems with Applications, 241:122743, 2024
work page 2024
-
[22]
Suman Ghosh and Guillermo Gallego. Event-based stereo depth estimation: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[23]
Unsupervised monocular depth estimation with left- right consistency
Cl ´ement Godard, Oisin Mac Aodha, and Gabriel J Bros- tow. Unsupervised monocular depth estimation with left- right consistency. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 270–279, 2017
work page 2017
-
[24]
arXiv preprint arXiv:2507.22052 (2025) 4, 11
Ziren Gong, Xiaohan Li, Fabio Tosi, Jiawei Han, Stefano Mattoccia, Jianfei Cai, and Matteo Poggi. Ov3r: Open- vocabulary semantic 3d reconstruction from rgb videos. arXiv preprint arXiv:2507.22052, 2025
-
[25]
Bridgedepth: Bridging monocular and stereo reasoning with latent alignment
Tongfan Guan, Jiaxin Guo, Chen Wang, and Yun-Hui Liu. Bridgedepth: Bridging monocular and stereo reasoning with latent alignment. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), Honolulu, Hawaii, USA, 2025. ICCV 2025 Highlight
work page 2025
-
[26]
Context-enhanced stereo transformer
Weiyu Guo, Zhaoshuo Li, Yongkui Yang, Zheng Wang, Rus- sell H Taylor, Mathias Unberath, Alan Yuille, and Yingwei Li. Context-enhanced stereo transformer. InEuropean Con- ference on Computer Vision, pages 263–279. Springer, 2022
work page 2022
-
[27]
Group-wise correlation stereo network
Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, and Hongsheng Li. Group-wise correlation stereo network. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3273–3282, 2019. 11
work page 2019
-
[28]
Defom-stereo: Depth foundation model based stereo matching
Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, and Rui Huang. Defom-stereo: Depth foundation model based stereo matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[29]
End-to-end learning of geometry and context for deep stereo regression
Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry. End-to-end learning of geometry and context for deep stereo regression. InProceedings of the IEEE international confer- ence on computer vision, pages 66–75, 2017
work page 2017
-
[30]
Haram Kim, Sangil Lee, Junha Kim, and H Jin Kim. Real- time hetero-stereo matching for event and frame camera with aligned events using maximum shift distance.IEEE Robotics and Automation Letters, 8(1):416–423, 2022
work page 2022
-
[31]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[32]
Hamid Laga, Laurent Valentin Jospin, Farid Boussaid, and Mohammed Bennamoun. A survey on deep learning tech- niques for stereo-based depth estimation.IEEE transactions on pattern analysis and machine intelligence, 44(4):1738– 1764, 2020
work page 2020
-
[33]
Practical stereo matching via cascaded re- current network with adaptive correlation
Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Zi- wei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, and Shuaicheng Liu. Practical stereo matching via cascaded re- current network with adaptive correlation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16263–16272, 2022
work page 2022
-
[34]
Active event-based stereo vision
Jianing Li, Yunjian Zhang, Haiqian Han, and Xiangyang Ji. Active event-based stereo vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 971–981, 2025
work page 2025
-
[35]
Zhaoshuo Li, Xingtong Liu, Nathan Drenkow, Andy Ding, Francis X. Creighton, Russell H. Taylor, and Mathias Un- berath. Revisiting stereo depth estimation from a sequence- to-sequence perspective with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 6197–6206, 2021
work page 2021
-
[36]
Learn- ing for disparity estimation through feature constancy
Zhengfa Liang, Yiliu Feng, Yulan Guo, Hengzhu Liu, Wei Chen, Linbo Qiao, Li Zhou, and Jianfeng Zhang. Learn- ing for disparity estimation through feature constancy. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2811–2820, 2018
work page 2018
-
[37]
Patrick Lichtsteiner, Christoph Posch, and Tobi Delbruck. A 128×128 120 db 15µs latency asynchronous temporal con- trast vision sensor.IEEE journal of solid-state circuits, 43 (2):566–576, 2008
work page 2008
-
[38]
Mingyuan Lin, Chi Zhang, Chu He, and Lei Yu. Learn- ing parallax for stereo event-based motion deblurring.IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[39]
Raft-stereo: Multilevel recurrent field transforms for stereo matching
Lahav Lipson, Zachary Teed, and Jia Deng. Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 2021 International Conference on 3D Vision (3DV), pages 218–227. IEEE, 2021
work page 2021
-
[40]
Graftnet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature
Biyang Liu, Huimin Yu, and Guodong Qi. Graftnet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13012–13021, 2022
work page 2022
-
[41]
Hanyue Lou, Jinxiu Liang, Minggui Teng, Bin Fan, Yong Xu, and Boxin Shi. Zero-shot event-intensity asymmetric stereo via visual prompting from image domain.Advances in Neural Information Processing Systems, 37:13274–13301, 2024
work page 2024
-
[42]
Ef- ficient deep learning for stereo matching
Wenjie Luo, Alexander G Schwing, and Raquel Urtasun. Ef- ficient deep learning for stereo matching. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 5695–5703, 2016
work page 2016
-
[43]
Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4040–4048, 2016
work page 2016
-
[44]
Nico Messikommer, Daniel Gehrig, Mathias Gehrig, and Davide Scaramuzza. Bridging the gap between events and frames through unsupervised domain adaptation.IEEE Robotics and Automation Letters, 7(2):3515–3522, 2022
work page 2022
-
[45]
S²M²: Scalable Stereo Matching Model for Reliable Depth Estimation
Junhong Min, Youngpil Jeon, Jimin Kim, and Minyong Choi. S²M²: Scalable Stereo Matching Model for Reliable Depth Estimation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), page to ap- pear, Honolulu, Hawai’i, 2025. IEEE. Accepted at ICCV 2025, Hawai’i Convention Center, Oct 19–23
work page 2025
-
[46]
Mohammad Mostafavi, Lin Wang, and Kuk-Jin Yoon. Learn- ing to reconstruct hdr images from events, with applications to depth and flow prediction.International Journal of Com- puter Vision, 129(4):900–920, 2021
work page 2021
-
[47]
Event-intensity stereo: Estimating depth by the best of both worlds
Mohammad Mostafavi, Kuk-Jin Yoon, and Jonghyun Choi. Event-intensity stereo: Estimating depth by the best of both worlds. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4258–4267, 2021
work page 2021
-
[48]
Stereo depth from events cameras: Concentrate and focus on the future
Yeongwoo Nam, Mohammad Mostafavi, Kuk-Jin Yoon, and Jonghyun Choi. Stereo depth from events cameras: Concentrate and focus on the future. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6114–6123, 2022
work page 2022
-
[49]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An im- perative style, high-performance deep learning library.Ad- vances in neural information processing systems, 32, 2019
work page 2019
-
[50]
Federated online adaptation for deep stereo
Matteo Poggi and Fabio Tosi. Federated online adaptation for deep stereo. InCVPR, 2024
work page 2024
-
[51]
Matteo Poggi, Alessio Tonioni, Fabio Tosi, Stefano Mattoc- cia, and Luigi Di Stefano. Continual adaptation for deep stereo.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4713–4729, 2021
work page 2021
-
[52]
Matteo Poggi, Fabio Tosi, Konstantinos Batsos, Philippos Mordohai, and Stefano Mattoccia. On the synergies between machine learning and binocular stereo for depth estimation from images: a survey.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(9):5314–5334, 2021
work page 2021
-
[53]
Henri Rebecq, Ren ´e Ranftl, Vladlen Koltun, and Davide Scaramuzza. High speed and high dynamic range video with 12 an event camera.IEEE transactions on pattern analysis and machine intelligence, 43(6):1964–1980, 2019
work page 1964
-
[54]
U- net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015
work page 2015
-
[55]
Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algo- rithms.International journal of computer vision, 47:7–42, 2002
work page 2002
-
[56]
Cfnet: Cascade and fused cost volume for robust stereo matching
Zhelun Shen, Yuchao Dai, and Zhibo Rao. Cfnet: Cascade and fused cost volume for robust stereo matching. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13906–13915, 2021
work page 2021
-
[57]
Raft: Recurrent all-pairs field transforms for optical flow
Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InEuropean conference on com- puter vision, pages 402–419. Springer, 2020
work page 2020
-
[58]
Unsupervised adaptation for deep stereo
Alessio Tonioni, Matteo Poggi, Stefano Mattoccia, and Luigi Di Stefano. Unsupervised adaptation for deep stereo. InPro- ceedings of the IEEE International Conference on Computer Vision, pages 1605–1613, 2017
work page 2017
-
[59]
Fabio Tosi, Alessio Tonioni, Daniele De Gregorio, and Mat- teo Poggi. Nerf-supervised deep stereo. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 855–866, 2023
work page 2023
-
[60]
Neural disparity refinement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
Fabio Tosi, Filippo Aleotti, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, and Luigi Di Ste- fano. Neural disparity refinement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[61]
A sur- vey on deep stereo matching in the twenties.arXiv preprint arXiv:2407.07816, 2024
Fabio Tosi, Luca Bartolomei, and Matteo Poggi. A sur- vey on deep stereo matching in the twenties.arXiv preprint arXiv:2407.07816, 2024. Extended version of CVPR 2024 Tutorial ”Deep Stereo Matching in the Twen- ties” (https://sites.google.com/view/stereo-twenties)
-
[62]
Learning an event sequence em- bedding for dense event-based deep stereo
Stepan Tulyakov, Francois Fleuret, Martin Kiefel, Peter Gehler, and Michael Hirsch. Learning an event sequence em- bedding for dense event-based deep stereo. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 1527–1537, 2019
work page 2019
-
[63]
Selective-stereo: Adaptive frequency information selection for stereo matching
Xianqi Wang, Gangwei Xu, Hao Jia, and Xin Yang. Selective-stereo: Adaptive frequency information selection for stereo matching. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[64]
Unos: Unified unsupervised optical-flow and stereo-depth estimation by watching videos
Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yi Yang, and Wei Xu. Unos: Unified unsupervised optical-flow and stereo-depth estimation by watching videos. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[65]
Stereo hybrid event-frame (shef) cameras for 3d perception
Ziwei Wang, Liyuan Pan, Yonhon Ng, Zheyu Zhuang, and Robert Mahony. Stereo hybrid event-frame (shef) cameras for 3d perception. In2021 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 9758–
-
[66]
Foundationstereo: Zero- shot stereo matching.CVPR, 2025
Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching.CVPR, 2025
work page 2025
-
[67]
Atten- tion concatenation volume for accurate and efficient stereo matching
Gangwei Xu, Junda Cheng, Peng Guo, and Xin Yang. Atten- tion concatenation volume for accurate and efficient stereo matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12981– 12990, 2022
work page 2022
-
[68]
Iterative geometry encoding volume for stereo matching
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 21919–21928, 2023
work page 2023
-
[69]
Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Junda Cheng, Chunyuan Liao, and Xin Yang. Igev++: Iterative multi-range geometry encoding volumes for stereo matching.arXiv preprint arXiv:2409.00638, 2024
-
[70]
Aanet: Adaptive aggrega- tion network for efficient stereo matching
Haofei Xu and Juyong Zhang. Aanet: Adaptive aggrega- tion network for efficient stereo matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1959–1968, 2020
work page 1959
-
[71]
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, and Andreas Geiger. Unifying flow, stereo and depth estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
work page 2023
-
[72]
Ninghui Xu, Lihui Wang, Jiajia Zhao, and Zhiting Yao. De- noising for dynamic vision sensor based on augmented spa- tiotemporal correlation.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4812–4824, 2023
work page 2023
-
[73]
Ninghui Xu, Lihui Wang, Zhiting Yao, and Takayuki Okatani. Mets: Motion-encoded time-surface for event- based high-speed pose tracking.International Journal of Computer Vision, 133(7):4401–4419, 2025
work page 2025
-
[74]
Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024
work page 2024
-
[75]
Xun Yang, Tianyu Chang, Tianzhu Zhang, Shanshan Wang, Richang Hong, and Meng Wang. Learning hierarchical vi- sual transformation for domain generalizable visual match- ing and recognition.International Journal of Computer Vi- sion, 132(11):4823–4849, 2024
work page 2024
-
[76]
Diving into the fusion of monocular pri- ors for generalized stereo matching
Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia. Diving into the fusion of monocular pri- ors for generalized stereo matching. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14887–14897, 2025
work page 2025
-
[77]
Hierarchical discrete distribution decomposition for match density esti- mation
Zhichao Yin, Trevor Darrell, and Fisher Yu. Hierarchical discrete distribution decomposition for match density esti- mation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6044–6053, 2019
work page 2019
-
[78]
Computing the stereo match- ing cost with a convolutional neural network
Jure Zbontar and Yann LeCun. Computing the stereo match- ing cost with a convolutional neural network. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 1592–1599, 2015
work page 2015
-
[79]
Data association between event streams and in- tensity frames under diverse baselines
Dehao Zhang, Qiankun Ding, Peiqi Duan, Chu Zhou, and Boxin Shi. Data association between event streams and in- tensity frames under diverse baselines. InEuropean Confer- ence on Computer Vision, pages 72–90. Springer, 2022. 13
work page 2022
-
[80]
Ga-net: Guided aggregation net for end-to- end stereo matching
Feihu Zhang, Victor Prisacariu, Ruigang Yang, and Philip HS Torr. Ga-net: Guided aggregation net for end-to- end stereo matching. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 185–194, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.