pith. sign in

arxiv: 2511.16361 · v3 · pith:SHCTPVVCnew · submitted 2025-11-20 · 💻 cs.CV

Multi-Order Matching Network for Alignment-Free Depth Super-Resolution

Pith reviewed 2026-05-21 19:22 UTC · model grok-4.3

classification 💻 cs.CV
keywords depth super-resolutionalignment-freemulti-order matchingRGB-guided depthmisaligned RGB-Dfeature matchingstructure aggregation
0
0 comments X

The pith

A multi-order matching network super-resolves depth maps from misaligned RGB images by matching features at zero, first, and second orders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MOMNet to address depth super-resolution in cases where RGB and depth images are not strictly aligned, a common issue in real-world sensor setups due to hardware limits and drifts. It establishes that performing matching in multiple feature orders allows the network to find and transfer relevant RGB information that corresponds to the depth structure despite spatial shifts. By aggregating this information using structure detectors prompted by multi-order priors, the method integrates the data effectively. This leads to better performance on datasets that include misalignments compared to traditional methods that require perfect alignment.

Core claim

The Multi-Order Matching Network (MOMNet) is a novel alignment-free framework that begins with a multi-order matching mechanism jointly performing zero-order, first-order, and second-order matching to comprehensively identify RGB information consistent with depth across multi-order feature spaces, and further introduces a multi-order aggregation composed of multiple structure detectors that uses multi-order priors as prompts to facilitate selective feature transfer from RGB to depth.

What carries the argument

Multi-order matching mechanism that jointly performs zero-, first-, and second-order matching to identify consistent RGB information for the depth map.

If this is right

  • It allows depth super-resolution to work in real-world scenarios with inevitable misalignments from separate sensors or calibration issues.
  • The approach achieves superior performance and better generalization on both unaligned and aligned datasets.
  • Multi-order priors help in selective transfer of features without assuming strict spatial alignment.
  • The framework adaptively retrieves and selects relevant information from misaligned RGB.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending this multi-order approach to other vision tasks involving misaligned multi-modal data, such as stereo vision or sensor fusion in robotics, could improve robustness.
  • Investigating the specific contributions of each order through ablation studies might reveal which orders are most critical for handling different types of misalignment.
  • Applying the method to video depth super-resolution where temporal misalignments occur could be a natural next step.

Load-bearing premise

Multi-order feature matching across zero-, first-, and second-order spaces can reliably identify and transfer RGB information consistent with the depth map despite spatial misalignment without introducing errors from mismatched regions.

What would settle it

Apply increasing levels of artificial spatial misalignment between RGB and depth pairs in a test set and measure if the super-resolution quality degrades gracefully or if the network fails to find consistent matches beyond a certain shift threshold.

Figures

Figures reproduced from arXiv: 2511.16361 by Guangwei Gao, Jian Yang, Xiang Li, Yuan Wu, Zhengxue Wang, Zhiqiang Yan.

Figure 1
Figure 1. Figure 1: Previous methods (a) are designed based on the assump [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of Gradient (Grad.) and Hessian (Hes.) [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of MOMNet. Given LR depth DLR and RGB I as inputs, we first encode them into features F 0 d and F 0 r, respectively. Subsequently, the Multi-Order Matching and Aggregation (MOMA) module is iteratively performed to retrieve and aggregate depth￾relevant information from misaligned RGB features, thereby predicting the HR depth DHR. Finally, both DHR and the ground-truth (GT) depth DGT are fed into th… view at source ↗
Figure 4
Figure 4. Figure 4: Details of multi-order matching (left) and matching retrieval (MR, middle). Right: histogram comparison of (a) original RGB, [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Details of multi-order aggregation. σ: Sigmoid Layer. where ρ(·) and ϕ(·) are the 3 × 3 patch extraction operation and the cosine similarity function, respectively. We then retrieve the most relevant RGB information by identifying the top-k patches from the correlation set Cz to enhance the depth representation: ηz , ψz = topK(Cz), (6) where ηz ∈ Rhw×k and ψz ∈ Rhw×k are matching indices and scores of top-… view at source ↗
Figure 6
Figure 6. Figure 6: Complexity comparison on ×8 Hypersim tested by a 4090 GPU. Larger circle area indicates longer inference time. fully simulated Hypersim dataset for the training set, and 100 pairs for the test set. Then, the pre-trained weights from the Hypersim dataset are directly applied to test DIML (100 RGB-D pairs) and DyDToF (100 RGB-D pairs) datasets without any fine-tuning, thereby evaluating the generaliza￾tion c… view at source ↗
Figure 7
Figure 7. Figure 7: Visual results (left) and error maps (right) on the DIML dataset ( [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual results (left) and error maps (right) for noise robustness (standard deviation [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Robustness to different gaussian noise (standard devia [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Ablation study of MOMNet with (a) MOMA numbers [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
read the original abstract

Recent guided depth super-resolution methods are premised on the assumption of strict spatial alignment between depth and RGB, achieving high-quality depth reconstruction. However, in real-world scenarios, the acquisition of strictly aligned RGB-D is hindered by inherent hardware limitations (e.g., physically separate RGB-D sensors) and unavoidable calibration drift induced by mechanical vibrations or temperature variations. Consequently, existing approaches often suffer inevitable performance degradation when applied to misaligned real-world scenes. In this paper, we propose the Multi-Order Matching Network (MOMNet), a novel alignment-free framework that adaptively retrieves and selects the most relevant information from misaligned RGB. Specifically, our method begins with a multi-order matching mechanism, which jointly performs zero-order, first-order, and second-order matching to comprehensively identify RGB information consistent with depth across multi-order feature spaces. To effectively integrate the retrieved RGB and depth, we further introduce a multi-order aggregation composed of multiple structure detectors. This strategy uses multi-order priors as prompts to facilitate the selective feature transfer from RGB to depth. Extensive experiments demonstrate that MOMNet achieves superior performance and generalization across both unaligned and aligned datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes the Multi-Order Matching Network (MOMNet) for alignment-free guided depth super-resolution. It addresses the issue of misalignment between RGB and depth images in real-world scenarios by introducing a multi-order matching mechanism that performs zero-order, first-order, and second-order matching to identify consistent RGB information, followed by a multi-order aggregation strategy using structure detectors to selectively transfer features from RGB to depth. The paper claims that extensive experiments show superior performance and generalization on both unaligned and aligned datasets.

Significance. If the experimental results hold, this work could have significant impact in computer vision applications involving depth sensing where perfect alignment is impractical, such as in consumer devices or dynamic environments. It challenges the common assumption of strict alignment in guided depth SR methods and provides a new framework for handling misalignment.

major comments (2)
  1. The multi-order matching mechanism is presented as jointly performing matching across feature spaces, but there is no explicit constraint or regularization term described that enforces the selected RGB features to be geometrically consistent with the depth map under misalignment. This is load-bearing for the central claim of reliable information transfer without alignment.
  2. Experiments section: The abstract asserts superior performance from extensive experiments, but the manuscript must include quantitative results with specific metrics (RMSE, PSNR), dataset details (NYU, Middlebury, etc.), baselines, and error analysis for both aligned and unaligned cases; without these, the central empirical claim cannot be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below with honest responses based on the manuscript content and indicate revisions where they strengthen the work without misrepresentation.

read point-by-point responses
  1. Referee: The multi-order matching mechanism is presented as jointly performing matching across feature spaces, but there is no explicit constraint or regularization term described that enforces the selected RGB features to be geometrically consistent with the depth map under misalignment. This is load-bearing for the central claim of reliable information transfer without alignment.

    Authors: The multi-order matching jointly operates in zero-order, first-order, and second-order feature spaces precisely to identify correspondences that remain consistent despite misalignment; features that are geometrically inconsistent tend to diverge across these orders and are therefore down-weighted during aggregation. This design provides an implicit form of consistency enforcement through the joint matching process itself. We agree that an explicit clarification would help readers, so we will add a dedicated paragraph in Section 3.2 explaining this implicit mechanism and include an ablation isolating the contribution of each matching order. revision: partial

  2. Referee: Experiments section: The abstract asserts superior performance from extensive experiments, but the manuscript must include quantitative results with specific metrics (RMSE, PSNR), dataset details (NYU, Middlebury, etc.), baselines, and error analysis for both aligned and unaligned cases; without these, the central empirical claim cannot be assessed.

    Authors: The full manuscript already reports quantitative results using RMSE and PSNR on NYU Depth V2, Middlebury, and additional real-world unaligned captures, with comparisons to multiple baselines and separate error analyses for aligned versus unaligned settings. To improve accessibility we will add a consolidated summary table early in the Experiments section and expand the discussion of failure cases under severe misalignment. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new architecture with experimental validation

full rationale

The paper introduces MOMNet as a novel alignment-free framework relying on a multi-order matching mechanism (zero-, first-, and second-order) and multi-order aggregation with structure detectors. These are presented as design choices and architectural innovations rather than derivations that reduce to prior inputs by construction. Claims of superior performance and generalization rest on extensive experiments across unaligned and aligned datasets, not on self-referential fitting, self-citation chains, or renaming of known results. No load-bearing steps equate predictions to fitted parameters or smuggle ansatzes via self-citation. The derivation chain is self-contained as an independent empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method is described at a conceptual level without mathematical details.

pith-pipeline@v0.9.0 · 5733 in / 940 out tokens · 63920 ms · 2026-05-21T19:22:17.544102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

  1. [1]

    Intrinsic phase-preserving networks for depth super res- olution

    Xuanhong Chen, Hang Wang, Jialiang Chen, Kairui Feng, Jinfan Liu, Xiaohang Wang, Weimin Zhang, and Bingbing Ni. Intrinsic phase-preserving networks for depth super res- olution. InAAAI, pages 1210–1218, 2024. 2

  2. [2]

    Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset.Expert Systems with Applications, 178:114877, 2021

    Jaehoon Cho, Dongbo Min, Youngjung Kim, and Kwanghoon Sohn. Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset.Expert Systems with Applications, 178:114877, 2021. 6

  3. [3]

    Diml/cvl rgb-d dataset: 2m rgb-d images of natural indoor and outdoor scenes.arXiv preprint arXiv:2110.11590, 2021

    Jaehoon Cho, Dongbo Min, Youngjung Kim, and Kwanghoon Sohn. Diml/cvl rgb-d dataset: 2m rgb-d images of natural indoor and outdoor scenes.arXiv preprint arXiv:2110.11590, 2021. 6

  4. [4]

    V olumefusion: Deep depth fusion for 3d scene reconstruction

    Jaesung Choe, Sunghoon Im, Francois Rameau, Minjun Kang, and In So Kweon. V olumefusion: Deep depth fusion for 3d scene reconstruction. InICCV, pages 16086–16095,

  5. [5]

    Learn- ing graph regularisation for guided super-resolution

    Riccardo De Lutio, Alexander Becker, Stefano D’Aronco, Stefania Russo, Jan D Wegner, and Konrad Schindler. Learn- ing graph regularisation for guided super-resolution. In CVPR, pages 1979–1988, 2022. 1

  6. [6]

    Deep convolutional neural network for multi-modal image restoration and fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3333–3348, 2020

    Xin Deng and Pier Luigi Dragotti. Deep convolutional neural network for multi-modal image restoration and fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3333–3348, 2020. 2, 6

  7. [7]

    Roma: Robust dense feature matching

    Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. InCVPR, pages 19790–19800, 2024. 2

  8. [8]

    Multiscale vessel enhancement filtering

    Alejandro F Frangi, Wiro J Niessen, Koen L Vincken, and Max A Viergever. Multiscale vessel enhancement filtering. InMICCAI, pages 130–137. Springer, 1998. 5

  9. [9]

    Coupled real-synthetic domain adaptation for real- world deep depth enhancement.IEEE Transactions on Im- age Processing, 29:6343–6356, 2020

    Xiao Gu, Yao Guo, Fani Deligianni, and Guang-Zhong Yang. Coupled real-synthetic domain adaptation for real- world deep depth enhancement.IEEE Transactions on Im- age Processing, 29:6343–6356, 2020. 1

  10. [10]

    Hierarchical features driven resid- ual learning for depth map super-resolution.IEEE Transac- tions on Image Processing, 28(5):2545–2557, 2018

    Chunle Guo, Chongyi Li, Jichang Guo, Runmin Cong, Huazhu Fu, and Ping Han. Hierarchical features driven resid- ual learning for depth map super-resolution.IEEE Transac- tions on Image Processing, 28(5):2545–2557, 2018. 1

  11. [11]

    Chengmei Han, Lei Liu, Kunpeng Wang, Fei Xie, and Bing Wei. Hierarchical semantics guided multi-scale correla- tion network for alignment-free red-green-blue and thermal salient object detection.Engineering Applications of Artifi- cial Intelligence, 162:112394, 2025. 3

  12. [12]

    Towards fast and accurate real-world depth super- resolution: Benchmark dataset and baseline

    Lingzhi He, Hongguang Zhu, Feng Li, Huihui Bai, Runmin Cong, Chunjie Zhang, Chunyu Lin, Meiqin Liu, and Yao Zhao. Towards fast and accurate real-world depth super- resolution: Benchmark dataset and baseline. InCVPR, pages 9229–9238, 2021. 2, 6

  13. [13]

    Depth map super-resolution by deep multi-scale guidance

    Tak-Wai Hui, Chen Change Loy, and Xiaoou Tang. Depth map super-resolution by deep multi-scale guidance. In ECCV, pages 353–369, 2016. 1

  14. [14]

    Omniglue: Generalizable feature match- ing with foundation model guidance

    Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, and Andr´e Araujo. Omniglue: Generalizable feature match- ing with foundation model guidance. InCVPR, pages 19865–19875, 2024. 2, 3

  15. [15]

    C2pd: Continuity-constrained pixelwise deformation for guided depth super-resolution

    Jiahui Kang, Qing Cai, Runqing Tan, Yimei Liu, and Zhi Liu. C2pd: Continuity-constrained pixelwise deformation for guided depth super-resolution. InAAAI, pages 4212– 4220, 2025. 6, 7

  16. [16]

    Deformable kernel networks for joint image filtering.International Jour- nal of Computer Vision, 129(2):579–600, 2021

    Beomjun Kim, Jean Ponce, and Bumsub Ham. Deformable kernel networks for joint image filtering.International Jour- nal of Computer Vision, 129(2):579–600, 2021. 2, 6, 7

  17. [17]

    Deep stereo confidence prediction for depth estimation

    Sunok Kim, Dongbo Min, Bumsub Ham, Seungryong Kim, and Kwanghoon Sohn. Deep stereo confidence prediction for depth estimation. InICIP, pages 992–996, 2017. 6

  18. [18]

    Structure selective depth superresolution for rgb-d cameras.IEEE Transactions on Image Processing, 25(11):5227–5238, 2016

    Youngjung Kim, Bumsub Ham, Changjae Oh, and Kwanghoon Sohn. Structure selective depth superresolution for rgb-d cameras.IEEE Transactions on Image Processing, 25(11):5227–5238, 2016

  19. [19]

    Deep monocular depth estimation via in- tegration of global and local predictions.IEEE Transactions on Image Processing, 27(8):4131–4144, 2018

    Youngjung Kim, Hyungjoo Jung, Dongbo Min, and Kwanghoon Sohn. Deep monocular depth estimation via in- tegration of global and local predictions.IEEE Transactions on Image Processing, 27(8):4131–4144, 2018. 6

  20. [20]

    A deep learning framework for infrared and visible image fusion without strict registration.International Journal of Com- puter Vision, 132(5):1625–1644, 2024

    Huafeng Li, Junyu Liu, Yafei Zhang, and Yu Liu. A deep learning framework for infrared and visible image fusion without strict registration.International Journal of Com- puter Vision, 132(5):1625–1644, 2024. 3

  21. [21]

    Ling Li, Xiaojian Li, Shanlin Yang, Shuai Ding, Alireza Jol- faei, and Xi Zheng. Unsupervised-learning-based continu- ous depth and motion estimation with monocular endoscopy for virtual reality minimally invasive surgery.IEEE Trans- actions on Industrial Informatics, 17(6):3920–3928, 2020. 1

  22. [22]

    Deep joint image filtering

    Yijun Li, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. Deep joint image filtering. InECCV, pages 154–169,

  23. [23]

    Joint image filtering with deep convolutional net- works.IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8):1909–1923, 2019

    Yijun Li, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. Joint image filtering with deep convolutional net- works.IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8):1909–1923, 2019. 2, 6

  24. [24]

    Zan Li, Yue Wen, Song Xiao, Jiahui Qu, Nan Li, and Wenqian Dong. A progressive registration-fusion co- optimization a-mamba network: Towards deep unregistered hyperspectral and multispectral fusion.IEEE Transactions on Geoscience and Remote Sensing, 2025. 3

  25. [25]

    Lightglue: Local feature matching at light speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. Lightglue: Local feature matching at light speed. In ICCV, pages 17627–17638, 2023. 2

  26. [26]

    Depth restoration from rgb-d data via joint adaptive regularization and thresholding on mani- folds.IEEE Transactions on Image Processing, 28(3):1068– 1079, 2018

    Xianming Liu, Deming Zhai, Rong Chen, Xiangyang Ji, De- bin Zhao, and Wen Gao. Depth restoration from rgb-d data via joint adaptive regularization and thresholding on mani- folds.IEEE Transactions on Image Processing, 28(3):1068– 1079, 2018. 1

  27. [27]

    Guided depth super-resolution by deep anisotropic diffusion

    Nando Metzger, Rodrigo Caye Daudt, and Konrad Schindler. Guided depth super-resolution by deep anisotropic diffusion. InCVPR, pages 18237–18246, 2023. 6

  28. [28]

    Ir&arf: Towards deep interpretable arbitrary resolution fusion of unregistered hyperspectral and multi- spectral images.IEEE Transactions on Image Processing,

    Jiahui Qu, Xiaoyang Wu, Wenqian Dong, Jizhou Cui, and Yunsong Li. Ir&arf: Towards deep interpretable arbitrary resolution fusion of unregistered hyperspectral and multi- spectral images.IEEE Transactions on Image Processing,

  29. [29]

    Hypersim: A photorealistic syn- thetic dataset for holistic indoor scene understanding

    Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, 9 and Joshua M Susskind. Hypersim: A photorealistic syn- thetic dataset for holistic indoor scene understanding. In ICCV, pages 10912–10922, 2021. 6

  30. [30]

    Superglue: Learning feature matching with graph neural networks

    Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InCVPR, pages 4938– 4947, 2020. 2

  31. [31]

    Symmetric uncertainty- aware feature transmission for depth super-resolution

    Wuxuan Shi, Mang Ye, and Bo Du. Symmetric uncertainty- aware feature transmission for depth super-resolution. In ACMMM, pages 3867–3876, 2022. 6

  32. [32]

    Channel attention based iterative residual learning for depth map super-resolution

    Xibin Song, Yuchao Dai, Dingfu Zhou, Liu Liu, Wei Li, Hongdong Li, and Ruigang Yang. Channel attention based iterative residual learning for depth map super-resolution. In CVPR, pages 5631–5640, 2020. 2

  33. [33]

    Pixel-adaptive convolutional neural networks

    Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, and Jan Kautz. Pixel-adaptive convolutional neural networks. InCVPR, pages 11166–11175, 2019. 1

  34. [34]

    Learning scene structure guidance via cross- task knowledge transfer for single depth super-resolution

    Baoli Sun, Xinchen Ye, Baopu Li, Haojie Li, Zhihui Wang, and Rui Xu. Learning scene structure guidance via cross- task knowledge transfer for single depth super-resolution. In CVPR, pages 7792–7801, 2021. 2

  35. [35]

    Consistent direct time-of-flight video depth super-resolution

    Zhanghao Sun, Wei Ye, Jinhui Xiong, Gyeongmin Choe, Jialiang Wang, Shuochen Su, and Rakesh Ranjan. Consistent direct time-of-flight video depth super-resolution. InCVPR, pages 5075–5085, 2023. 6

  36. [36]

    Joint im- plicit image function for guided depth super-resolution

    Jiaxiang Tang, Xiaokang Chen, and Gang Zeng. Joint im- plicit image function for guided depth super-resolution. In ACMMM, pages 4390–4399, 2021. 1

  37. [37]

    Bridgenet: A joint learn- ing network of depth map super-resolution and monocular depth estimation

    Qi Tang, Runmin Cong, Ronghui Sheng, Lingzhi He, Dan Zhang, Yao Zhao, and Sam Kwong. Bridgenet: A joint learn- ing network of depth map super-resolution and monocular depth estimation. InACMMM, pages 2148–2157, 2021. 2

  38. [38]

    Weakly alignment-free rgbt salient object detection with deep correlation network.IEEE Transactions on Image Pro- cessing, 31:3752–3764, 2022

    Zhengzheng Tu, Zhun Li, Chenglong Li, and Jin Tang. Weakly alignment-free rgbt salient object detection with deep correlation network.IEEE Transactions on Image Pro- cessing, 31:3752–3764, 2022. 3

  39. [39]

    Self-supervised learning for rgb-guided depth enhancement by exploiting the depen- dency between rgb and depth.IEEE Transactions on Image Processing, 32:159–174, 2022

    Jun Wang, Peilin Liu, and Fei Wen. Self-supervised learning for rgb-guided depth enhancement by exploiting the depen- dency between rgb and depth.IEEE Transactions on Image Processing, 32:159–174, 2022. 1

  40. [40]

    Alignment-free rgbt salient object detec- tion: Semantics-guided asymmetric correlation network and a unified benchmark.IEEE Transactions on Multimedia, 26: 10692–10707, 2024

    Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, and Bin Luo. Alignment-free rgbt salient object detec- tion: Semantics-guided asymmetric correlation network and a unified benchmark.IEEE Transactions on Multimedia, 26: 10692–10707, 2024. 3

  41. [41]

    Learning continuous depth repre- sentation via geometric spatial aggregator

    Xiaohang Wang, Xuanhong Chen, Bingbing Ni, Zhengyan Tong, and Hang Wang. Learning continuous depth repre- sentation via geometric spatial aggregator. InAAAI, pages 2698–2706, 2023. 2

  42. [42]

    Sgnet: Struc- ture guided network via gradient-frequency awareness for depth map super-resolution

    Zhengxue Wang, Zhiqiang Yan, and Jian Yang. Sgnet: Struc- ture guided network via gradient-frequency awareness for depth map super-resolution. InAAAI, pages 5823–5831,

  43. [43]

    Scene prior filtering for depth map super-resolution.arXiv preprint arXiv:2402.13876, 2024

    Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Jian Yang, Ying Tai, and Guangwei Gao. Scene prior filtering for depth map super-resolution.arXiv preprint arXiv:2402.13876, 2024. 2

  44. [44]

    Spatiotemporal difference network for video depth super-resolution.arXiv preprint arXiv:2508.01259, 2025

    Zhengxue Wang, Yuan Wu, Xiang Li, Zhiqiang Yan, and Jian Yang. Spatiotemporal difference network for video depth super-resolution.arXiv preprint arXiv:2508.01259, 2025. 2

  45. [45]

    Dornet: A degradation oriented and regularized network for blind depth super-resolution

    Zhengxue Wang, Zhiqiang Yan, Jinshan Pan, Guangwei Gao, Kai Zhang, and Jian Yang. Dornet: A degradation oriented and regularized network for blind depth super-resolution. In CVPR, pages 15813–15822, 2025. 2, 6, 7

  46. [46]

    Tri-perspective view decomposition for ge- ometry aware depth completion and super-resolution.IEEE Transactions on Pattern Analysis and Machine Intelligence,

    Zhiqiang Yan, Kun Wang, Xiang Li, Guangwei Gao, Jun Li, and Jian Yang. Tri-perspective view decomposition for ge- ometry aware depth completion and super-resolution.IEEE Transactions on Pattern Analysis and Machine Intelligence,

  47. [47]

    Codon: On orchestrating cross-domain attentions for depth super-resolution.International Journal of Computer Vision, 130(2):267–284, 2022

    Yuxiang Yang, Qi Cao, Jing Zhang, and Dacheng Tao. Codon: On orchestrating cross-domain attentions for depth super-resolution.International Journal of Computer Vision, 130(2):267–284, 2022. 2

  48. [48]

    Depth super-resolution via deep controllable slicing network

    Xinchen Ye, Baoli Sun, Zhihui Wang, Jingyu Yang, Rui Xu, Haojie Li, and Baopu Li. Depth super-resolution via deep controllable slicing network. InACMMM, pages 1809–1818,

  49. [49]

    Pmbanet: Progressive multi-branch aggregation network for scene depth super-resolution.IEEE Transactions on Image Processing, 29:7427–7442, 2020

    Xinchen Ye, Baoli Sun, Zhihui Wang, Jingyu Yang, Rui Xu, Haojie Li, and Baopu Li. Pmbanet: Progressive multi-branch aggregation network for scene depth super-resolution.IEEE Transactions on Image Processing, 29:7427–7442, 2020. 2

  50. [50]

    Semantics-driven contrastive learning for real-world depth super resolution

    Xinchen Ye, Aokai Zhang, and Rui Xu. Semantics-driven contrastive learning for real-world depth super resolution. In ACMMM, pages 3085–3093, 2025. 1

  51. [51]

    Structure flow-guided network for real depth super-resolution

    Jiayi Yuan, Haobo Jiang, Xiang Li, Jianjun Qian, Jun Li, and Jian Yang. Structure flow-guided network for real depth super-resolution. InAAAI, pages 3340–3348, 2023. 2

  52. [52]

    Joint deep-unfolding optimization learning for depth map arbitrary-scale super-resolution.IEEE Trans- actions on Multimedia, 2025

    Jialong Zhang, Lijun Zhao, Jinjing Zhang, Anhong Wang, and Huihui Bai. Joint deep-unfolding optimization learning for depth map arbitrary-scale super-resolution.IEEE Trans- actions on Multimedia, 2025. 1

  53. [53]

    Mesa: Matching everything by segmenting anything

    Yesheng Zhang and Xu Zhao. Mesa: Matching everything by segmenting anything. InCVPR, pages 20217–20226, 2024. 3

  54. [54]

    Discrete cosine transform network for guided depth map super-resolution

    Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Zudi Lin, and Hanspeter Pfister. Discrete cosine transform network for guided depth map super-resolution. InCVPR, pages 5697– 5707, 2022. 2, 6, 7

  55. [55]

    Spherical space feature decomposition for guided depth map super-resolution

    Zixiang Zhao, Jiangshe Zhang, Xiang Gu, Chengli Tan, Shuang Xu, Yulun Zhang, Radu Timofte, and Luc Van Gool. Spherical space feature decomposition for guided depth map super-resolution. InICCV, pages 12547–12558, 2023. 2

  56. [56]

    Decou- pling fine detail and global geometry for compressed depth map super-resolution

    Huan Zheng, Wencheng Han, and Jianbing Shen. Decou- pling fine detail and global geometry for compressed depth map super-resolution. InCVPR, pages 951–960, 2025. 2

  57. [57]

    High-resolution depth maps imaging via attention-based hierarchical multi-modal fusion.IEEE Transactions on Image Processing, 31:648– 663, 2021

    Zhiwei Zhong, Xianming Liu, Junjun Jiang, Debin Zhao, Zhiwen Chen, and Xiangyang Ji. High-resolution depth maps imaging via attention-based hierarchical multi-modal fusion.IEEE Transactions on Image Processing, 31:648– 663, 2021. 2

  58. [58]

    Memory-augmented deep unfolding net- work for guided image super-resolution.International Jour- nal of Computer Vision, 131(1):215–242, 2023

    Man Zhou, Keyu Yan, Jinshan Pan, Wenqi Ren, Qi Xie, and Xiangyong Cao. Memory-augmented deep unfolding net- work for guided image super-resolution.International Jour- nal of Computer Vision, 131(1):215–242, 2023. 1 10