CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration
Pith reviewed 2026-05-10 19:34 UTC · model grok-4.3
The pith
CRFT uses a transformer to learn a consistent recurrent feature flow that aligns cross-modal images more accurately and robustly than existing methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CRFT learns a modality-independent feature flow representation within a transformer-based architecture that jointly performs feature alignment and flow estimation. The coarse stage establishes global correspondences through multi-scale feature correlation, while the fine stage refines local details via hierarchical feature fusion and adaptive spatial reasoning. An iterative discrepancy-guided attention mechanism with a Spatial Geometric Transform recurrently refines the flow field, progressively capturing subtle spatial inconsistencies and enforcing feature-level consistency. This design enables accurate alignment under large affine and scale variations while maintaining structural coherence
What carries the argument
The Consistent-Recurrent Feature Flow Transformer, which learns a modality-independent feature flow representation through iterative discrepancy-guided attention and Spatial Geometric Transform to enforce consistency across modalities.
Load-bearing premise
A single modality-independent feature flow representation learned in the transformer can jointly handle feature alignment and flow estimation while the iterative discrepancy-guided attention with Spatial Geometric Transform enforces consistency under large affine and scale variations.
What would settle it
Registration experiments on a new cross-modal dataset featuring affine and scale variations larger than those in the original tests, where CRFT's accuracy and robustness metrics fall below those of competing state-of-the-art methods.
Figures
read the original abstract
We present Consistent-Recurrent Feature Flow Transformer (CRFT), a unified coarse-to-fine framework based on feature flow learning for robust cross-modal image registration. CRFT learns a modality-independent feature flow representation within a transformer-based architecture that jointly performs feature alignment and flow estimation. The coarse stage establishes global correspondences through multi-scale feature correlation, while the fine stage refines local details via hierarchical feature fusion and adaptive spatial reasoning. To enhance geometric adaptability, an iterative discrepancy-guided attention mechanism with a Spatial Geometric Transform (SGT) recurrently refines the flow field, progressively capturing subtle spatial inconsistencies and enforcing feature-level consistency. This design enables accurate alignment under large affine and scale variations while maintaining structural coherence across modalities. Extensive experiments on diverse cross-modal datasets demonstrate that CRFT consistently outperforms state-of-the-art registration methods in both accuracy and robustness. Beyond registration, CRFT provides a generalizable paradigm for multimodal spatial correspondence, offering broad applicability to remote sensing, autonomous navigation, and medical imaging. Code and datasets are publicly available at https://github.com/NEU-Liuxuecong/CRFT.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CRFT, a unified coarse-to-fine transformer framework for cross-modal image registration based on learning modality-independent feature flows. The coarse stage uses multi-scale feature correlation for global correspondences, while the fine stage employs hierarchical feature fusion and adaptive spatial reasoning. An iterative discrepancy-guided attention mechanism augmented with a Spatial Geometric Transform (SGT) recurrently refines the flow field to capture spatial inconsistencies and enforce consistency under large affine and scale variations. The central claim is that CRFT consistently outperforms state-of-the-art registration methods in accuracy and robustness across diverse cross-modal datasets, with broader applicability to remote sensing, autonomous navigation, and medical imaging; code and datasets are released publicly.
Significance. If the empirical results are robust, this work could advance cross-modal registration by providing a practical, transformer-based paradigm that jointly addresses feature alignment and flow estimation without modality-specific assumptions. The emphasis on recurrent consistency enforcement and public code release supports reproducibility and potential adoption in applied domains where large deformations are common.
minor comments (3)
- The abstract states that 'extensive experiments... demonstrate that CRFT consistently outperforms' but provides no quantitative metrics, specific datasets, or baseline comparisons; moving at least one key result (e.g., average error reduction on a named dataset) into the abstract would strengthen the summary.
- Notation for the Spatial Geometric Transform (SGT) and the discrepancy-guided attention is introduced without an explicit equation reference in the high-level description; adding a compact equation block early in §3 would improve readability for readers unfamiliar with recurrent flow refinement.
- The claim of a 'modality-independent feature flow representation' is presented as an outcome of joint training; a short ablation isolating the contribution of the recurrent SGT module versus a non-recurrent baseline would help substantiate that this property is not merely an artifact of the training data.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript on CRFT and the recommendation for minor revision. We appreciate the recognition of the framework's potential to advance cross-modal registration through recurrent consistency enforcement and its applicability across domains. No specific major comments were provided in the report, so we have no point-by-point revisions to address at this stage.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a transformer-based architecture for cross-modal registration with coarse-to-fine stages, discrepancy-guided attention, and a Spatial Geometric Transform module. No equations, derivations, or first-principles claims are present in the provided text that reduce performance claims to fitted parameters, self-definitions, or self-citation chains. The central claims rest on empirical outperformance across datasets, which is independent of any internal reduction. This is a standard empirical ML architecture paper with no load-bearing theoretical steps that could exhibit circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
R Archana and PS Eliahim Jeevaraj. Deep learning models for digital image processing: a review.Artificial Intelligence Review, 57(1):11, 2024. 1
work page 2024
-
[2]
Lin Bie, Shouan Pan, Siqi Li, Yining Zhao, and Yue Gao. Graphi2p: Image-to-point cloud registration with exploring pattern of correspondence via graph learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22161–22171, 2025. 1
work page 2025
-
[3]
Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, Shalini Subramanian, Aaron Carass, Jerry L Prince, and Yong Du. A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond.Medical Image Analysis, 100:103385,
-
[4]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024. 1
work page 2024
-
[5]
Dsap: Dynamic sparse attention perception matcher for accurate local feature matching
Kun Dai, Ke Wang, Tao Xie, Tao Sun, Jinhang Zhang, Qingjia Kong, Zhiqiang Jiang, Ruifeng Li, Lijun Zhao, and Mohamed Omar. Dsap: Dynamic sparse attention perception matcher for accurate local feature matching. IEEE Transactions on Instrumentation and Measurement, 73:1–16, 2024. 1
work page 2024
-
[6]
Yuxin Deng and Jiayi Ma. Redfeat: Recoupling detection and description for multimodal feature learning.IEEE Transactions on Image Processing, 32:591–602, 2022. 2
work page 2022
-
[7]
Superpoint: Self-supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point detection and description. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, pages 224–236, 2018. 2
work page 2018
-
[8]
Improve representation for imbalanced regression through geometric constraints
Zijian Dong, Yilei Wu, Chongyao Chen, Yingtian Zou, Yichi Zhang, and Juan Helen Zhou. Improve representation for imbalanced regression through geometric constraints. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 5082–5091, 2025. 1
work page 2025
-
[9]
Dkm: Dense kernelized feature matching for geometry estimation
Johan Edstedt, Ioannis Athanasiadis, M ˚arten Wadenb ¨ack, and Michael Felsberg. Dkm: Dense kernelized feature matching for geometry estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17765–17775, 2023. 2
work page 2023
-
[10]
Roma: Robust dense feature matching
Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19790–19800, 2024. 2
work page 2024
-
[11]
Colabsfm: Collaborative structure-from-motion by point cloud registra- tion
Johan Edstedt, Andr ´e Mateus, and Alberto Jaenal. Colabsfm: Collaborative structure-from-motion by point cloud registra- tion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6573–6583, 2025. 1
work page 2025
-
[12]
Yuxiang Fu, Qi Yan, Lele Wang, Ke Li, and Renjie Liao. Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 17282–17293,
-
[13]
Low-latency automotive vision with event cameras.Nature, 629(8014): 1034–1040, 2024
Daniel Gehrig and Davide Scaramuzza. Low-latency automotive vision with event cameras.Nature, 629(8014): 1034–1040, 2024. 1
work page 2024
-
[14]
Flowformer: A transformer architecture for optical flow
Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, and Hongsheng Li. Flowformer: A transformer architecture for optical flow. InProc. Eur. Conf. Comput. Vis., pages 668–685, 2022. 2
work page 2022
-
[15]
Omniglue: Generalizable feature matching with foundation model guidance
Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, and Andr ´e Araujo. Omniglue: Generalizable feature matching with foundation model guidance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19865–19875, 2024. 2
work page 2024
-
[16]
A review of multimodal image matching: Methods and applications.Information Fusion, 73:22–71,
Xingyu Jiang, Jiayi Ma, Guobao Xiao, Zhenfeng Shao, and Xiaojie Guo. A review of multimodal image matching: Methods and applications.Information Fusion, 73:22–71,
-
[17]
Dense-sfm: Structure from motion with dense consistent matching
JongMin Lee and Sungjoo Yoo. Dense-sfm: Structure from motion with dense consistent matching. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6404–6414, 2025. 2
work page 2025
-
[18]
Grounding image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and J ´erˆome Revaud. Grounding image matching in 3d with mast3r. InEuropean Conference on Computer Vision, pages 71–91. Springer,
-
[19]
Genflow3d: Generative scene flow estimation and prediction on point cloud sequences
Hanlin Li, Wenming Weng, Yueyi Zhang, and Zhiwei Xiong. Genflow3d: Generative scene flow estimation and prediction on point cloud sequences. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27488– 27497, 2025. 2
work page 2025
-
[20]
Jiayuan Li, Qingwu Hu, and Mingyao Ai. Rift: Multi-modal image matching based on radiation-variation insensitive feature transform.IEEE Trans. Image Process., 29:3296– 3310, 2019. 2, 6
work page 2019
-
[21]
Lnift: Locally normalized image for rotation invariant multimodal feature matching.IEEE Trans
Jiayuan Li, Wangyi Xu, Pengcheng Shi, Yongjun Zhang, and Qingwu Hu. Lnift: Locally normalized image for rotation invariant multimodal feature matching.IEEE Trans. Geosci. Remote Sens., 60:1–14, 2022. 2, 6
work page 2022
-
[22]
Rift2: Speeding-up rift with a new rotation-invariance technique.arXiv, 2023
Jiayuan Li, Pengcheng Shi, Qingwu Hu, and Yongjun Zhang. Rift2: Speeding-up rift with a new rotation-invariance technique.arXiv, 2023. 2, 6
work page 2023
-
[23]
Wuxin Li, Qian Chen, Guohua Gu, and Xiubao Sui. Object matching of visible–infrared image based on attention mechanism and feature fusion.Pattern Recognition, 158: 110972, 2025. 1
work page 2025
-
[24]
Implicit correspondence learning for image-to-point cloud registration
Xinjun Li, Wenfei Yang, Jiacheng Deng, Zhixin Cheng, Xu Zhou, and Tianzhu Zhang. Implicit correspondence learning for image-to-point cloud registration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16922–16931, 2025. 1
work page 2025
-
[25]
Yuxuan Li, Xiang Li, Weijie Li, Qibin Hou, Li Liu, Ming- Ming Cheng, and Jian Yang. Sardet-100k: Towards open- source benchmark and toolkit for large-scale sar object detection.Advances in Neural Information Processing Systems, 37:128430–128461, 2024. 1
work page 2024
-
[26]
Lightglue: Local feature matching at light speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023. 2
work page 2023
-
[27]
Xuecong Liu, Xichao Teng, Zhang Li, Qifeng Yu, and Yijie Bian. A fast algorithm for high accuracy airborne sar geolocation based on local linear approximation.IEEE Trans. Instrum. Meas., 71:1–12, 2022. 1
work page 2022
-
[28]
Shape-adaptive modality independent region descriptor for multimodal remote sensing image matching
Xuecong Liu, Xichao Teng, Yijie Bian, Zhang Li, and Qifeng Yu. Shape-adaptive modality independent region descriptor for multimodal remote sensing image matching. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 17:18139– 18155, 2024. 2
work page 2024
-
[29]
Robust multi-sensor image matching based on normalized self-similarity region descriptor.Chin
Xuecong Liu, Xichao Teng, Jing Luo, Zhang Li, Qifeng Yu, and Yijie Bian. Robust multi-sensor image matching based on normalized self-similarity region descriptor.Chin. J. Aeronaut., 37(1):271–286, 2024. 2
work page 2024
-
[30]
Xuecong Liu, Zixuan Sun, Hongwei Ding, Xin Song, Shuaiying Zhang, and Yongsheng Sun. Gaff: Global attention feature flow network for optical and sar image registration under geometric transformations.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026. 2
work page 2026
-
[31]
Cross-rejective open-set sar image registration
Shasha Mao, Shiming Lu, Zhaolong Du, Licheng Jiao, Shuiping Gou, Luntian Mou, Xuequan Lu, Lin Xiong, and Yimeng Zhang. Cross-rejective open-set sar image registration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23027–23036, 2025. 1
work page 2025
-
[32]
Cesar, Xiangyang Ji, and Xu-Cheng Yin
Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar, Xiangyang Ji, and Xu-Cheng Yin. Dpflow: Adaptive optical flow estimation with a dual-pyramid framework. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 17810–17820, 2025. 2
work page 2025
-
[33]
Flowseek: Optical flow made easier with depth foundation models and motion bases
Matteo Poggi and Fabio Tosi. Flowseek: Optical flow made easier with depth foundation models and motion bases. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5667–5679, 2025. 1
work page 2025
-
[34]
Xfeat: Accelerated features for lightweight image matching
Guilherme Potje, Felipe Cadar, Andr ´e Araujo, Renato Martins, and Erickson R Nascimento. Xfeat: Accelerated features for lightweight image matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2682–2691, 2024. 2
work page 2024
-
[35]
Must: The first dataset and unified framework for multispectral uav single object tracking
Haolin Qin, Tingfa Xu, Tianhao Li, Zhenxiang Chen, Tao Feng, and Jianan Li. Must: The first dataset and unified framework for multispectral uav single object tracking. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 16882–16891, 2025. 1
work page 2025
-
[36]
Minima: Modality invariant image matching
Jiangwei Ren, Xingyu Jiang, Zizhuo Li, Dingkang Liang, Xin Zhou, and Xiang Bai. Minima: Modality invariant image matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 6
work page 2025
-
[37]
Superglue: Learning feature matching with graph neural networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pages 4938–4947,
-
[38]
Diff2flow: Training flow matching models via diffusion model alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel, and Bj ¨orn Ommer. Diff2flow: Training flow matching models via diffusion model alignment. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 28347– 28357, 2025. 1
work page 2025
-
[39]
Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation
Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, and Hongsheng Li. Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pages 1599–1610, 2023. 2
work page 2023
-
[40]
Loftr: Detector-free local feature matching with transformers
Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. Loftr: Detector-free local feature matching with transformers. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pages 8922–8931, 2021. 2
work page 2021
-
[41]
Zixuan Sun, Shuaifeng Zhi, Kai Huo, Xuecong Liu, Weidong Jiang, and Yongxiang Liu. Os 3 flow: Optical and sar image registration using symmetry-guided semi-dense optical flow.IEEE Geoscience and Remote Sensing Letters, 21:1–5, 2024. 2
work page 2024
-
[42]
Zixuan Sun, Shuaifeng Zhi, Ruize Li, Jingyuan Xia, Yongxiang Liu, and Weidong Jiang. Gdros: A geometry- guided dense registration framework for optical-sar images under large geometric transformations.arXiv preprint arXiv:2511.00598, 2025. 2, 6
-
[43]
Raft: Recurrent all-pairs field transforms for optical flow
Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InProc. Eur. Conf. Comput. Vis.,
-
[44]
Xichao Teng, Xuecong Liu, Zhang Li, Qifeng Yu, and Yijie Bian. Omird: Orientated modality independent region descriptor for optical-to-sar image matching.IEEE Geosci. Remote Sens. Lett., 20:1–5, 2023. 2
work page 2023
-
[45]
¨Onder Tuzcuo ˘glu, Aybora K ¨oksal, Bu ˘gra Sofu, Sinan Kalkan, and A. Aydin Alatan. Xoftr: Cross-modal feature matching transformer. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pages 4275–4286, 2024. 2, 6
work page 2024
-
[46]
Haiqiao Wang, Dong Ni, and Yi Wang. Recursive deformable pyramid network for unsupervised medical image registration.IEEE Transactions on Medical Imaging, 43(6):2229–2240, 2024. 1
work page 2024
-
[47]
Efficient loftr: Semi-dense local feature matching with sparse-like speed
Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, and Xiaowei Zhou. Efficient loftr: Semi-dense local feature matching with sparse-like speed. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21666–21675, 2024. 2, 6
work page 2024
-
[48]
Yanan Wang, Yaobin Tian, Jiawei Chen, Kun Xu, and Xilun Ding. A survey of visual slam in dynamic environment: The evolution from geometric to semantic approaches.IEEE Transactions on Instrumentation and Measurement, 73:1– 21, 2024. 1
work page 2024
-
[49]
Dfm: Differentiable feature matching for anomaly detection
Sheng Wu, Yimi Wang, Xudong Liu, Yuguang Yang, Runqi Wang, Guodong Guo, David Doermann, and Baochang Zhang. Dfm: Differentiable feature matching for anomaly detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15224–15233, 2025. 1
work page 2025
-
[50]
Single-model and any-modality for video object tracking
Zongwei Wu, Jilai Zheng, Xiangxuan Ren, Florin-Alexandru Vasluianu, Chao Ma, Danda Pani Paudel, Luc Van Gool, and Radu Timofte. Single-model and any-modality for video object tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19156– 19166, 2024. 1
work page 2024
-
[51]
Yuming Xiang, Feng Wang, and Hongjian You. Os-sift: A robust sift-like algorithm for high-resolution optical-to-sar image registration in suburban areas.IEEE Trans. Geosci. Remote Sens., 56(6):3078–3090, 2018. 2
work page 2018
-
[52]
Yuming Xiang, Rongshu Tao, Feng Wang, Hongjian You, and Bing Han. Automatic registration of optical and sar images via improved phase congruency model.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13:5847–5861, 2020. 5
work page 2020
-
[53]
Yun Xiao, Chunlei Zhang, Yuan Chen, Bo Jiang, and Jin Tang. Adrnet: Affine and deformable registration networks for multimodal remote sensing images.IEEE Transactions on Geoscience and Remote Sensing, 62:1–13, 2024. 2, 6
work page 2024
-
[54]
Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal- driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1602–1611, 2025. 2
work page 2025
-
[55]
Han Xu, Jiayi Ma, Junjun Jiang, Xiaojie Guo, and Haibin Ling. U2fusion: A unified unsupervised image fusion network.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020. 5
work page 2020
-
[56]
Gmflow: Learning optical flow via global matching
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, and Dacheng Tao. Gmflow: Learning optical flow via global matching. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pages 8121–8130, 2022. 2
work page 2022
-
[57]
Murf: Mutually reinforcing multi-modal image registration and fusion.IEEE Trans
Han Xu, Jiteng Yuan, and Jiayi Ma. Murf: Mutually reinforcing multi-modal image registration and fusion.IEEE Trans. Pattern Anal. Mach. Intell., 45(10):12148–12166,
-
[58]
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, and Andreas Geiger. Unifying flow, stereo and depth estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 2, 6
work page 2023
-
[59]
Bin Yang, Jun Chen, and Mang Ye. Towards grand unified representation learning for unsupervised visible-infrared person re-identification. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11069– 11079, 2023. 1
work page 2023
-
[60]
Yibin Ye, Xichao Teng, Hongrui Yang, Shuo Chen, Yuli Sun, Yijie Bian, Tao Tan, Zhang Li, and Qifeng Yu. 3mos: a multi-source, multi-resolution, and multi-scene optical-sar dataset with insights for multi-modal image matching.Visual Intelligence, 3(1):1–27, 2025. 1
work page 2025
-
[61]
Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Sicheng Zhao, Yimian Dai, and Xiangyu Yue. From easy to hard: Pro- gressive active learning framework for infrared small target detection with single point supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2588–2598, 2025. 1
work page 2025
-
[62]
Comatcher: Multi- view collaborative feature matching
Jintao Zhang, Zimin Xia, Mingyue Dong, Shuhan Shen, Linwei Yue, and Xianwei Zheng. Comatcher: Multi- view collaborative feature matching. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21970–21980, 2025. 1
work page 2025
-
[63]
Adapting dense matching for homography estimation with grid-based acceleration
Kaining Zhang, Yuxin Deng, Jiayi Ma, and Paolo Favaro. Adapting dense matching for homography estimation with grid-based acceleration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6294– 6303, 2025. 1
work page 2025
-
[64]
Yongjun Zhang, Yongxiang Yao, Yi Wan, Weiyu Liu, Wupeng Yang, Zhi Zheng, and Rang Xiao. Histogram of the orientation of the weighted phase descriptor for multi- modal remote sensing image matching.ISPRS Journal of Photogrammetry and Remote Sensing, 196:1–15, 2023. 2, 6
work page 2023
-
[65]
Full- dof egomotion estimation for event cameras using geometric solvers
Ji Zhao, Banglei Guan, Zibin Liu, and Laurent Kneip. Full- dof egomotion estimation for event cameras using geometric solvers. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11515–11524, 2025. 1
work page 2025
-
[66]
Chongyue Zheng, Shanshan Li, Chengyou Wang, and Bing Zhang. Msg: Robust multimodal remote sensing image matching using side window gaussian space.IEEE Transactions on Geoscience and Remote Sensing, 2025. 2, 6 CRFT: Consistent–Recurrent Feature Flow Transformer for Cross-Modal Image Registration Supplementary Material A. Visualization of Registration Resul...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.