pith. sign in

arxiv: 2606.11572 · v1 · pith:YPD4CEERnew · submitted 2026-06-10 · 💻 cs.CV

FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection

Pith reviewed 2026-06-27 10:51 UTC · model grok-4.3

classification 💻 cs.CV
keywords knowledge distillationinfrared object detectionfrequency decouplingcross-modal transfermultispectral pedestrian detectionKAIST datasetDINOv2
0
0 comments X

The pith

Frequency-decoupled distillation improves RGB-to-IR object detection by aligning shared structure while tolerating texture differences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the gap between RGB and infrared features varies by spatial frequency, with low-frequency shape and layout aligning more closely than high-frequency texture and edges. It introduces FreqKD to apply strict MSE supervision only to the low-frequency band and a relaxed, down-weighted log-MSE loss to the high-frequency band. This asymmetric treatment yields 64.1 mAP50 on the KAIST multispectral pedestrian detection benchmark, a 2.4-point gain over a DINOv2 baseline. The same representation also improves results when transferred to the FLIR ADAS dataset, MFNet segmentation, and a ResNet-50 backbone.

Core claim

Spectral analysis of 500 paired RGB-IR samples reveals that high-frequency feature divergence exceeds low-frequency divergence by a factor of 2.4 on average across transformer layers. FreqKD exploits this by enforcing strict mean-squared error alignment on low-frequency components to preserve shared structural information and applying a 0.1-weighted log-MSE loss on high-frequency components to supply edge guidance without forcing alignment of modality-specific texture.

What carries the argument

Frequency-decoupled distillation that splits features into low- and high-frequency bands and applies asymmetric losses (strict MSE on low, relaxed weighted log-MSE on high) according to measured cross-modal consistency.

If this is right

  • Raises KAIST mAP50 from the DINOv2 baseline to 64.1, a 2.4-point absolute gain.
  • Transfers the learned representation to FLIR ADAS with a 2.1 mAP50 improvement.
  • Improves mean intersection-over-union by 1.85 on MFNet segmentation.
  • Delivers a 1.0 mAP50 gain when the student uses a ResNet-50 backbone instead of the original transformer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same frequency split could be tested on other cross-modal pairs such as RGB-to-depth or RGB-to-event data where texture statistics also differ.
  • The fixed 0.1 weighting might be replaced by a small validation sweep or learned scalar without changing the core decoupling idea.
  • Because the method operates on intermediate features, it could be combined with existing KD techniques that already separate content and style.

Load-bearing premise

The extra divergence measured in high-frequency bands arises mainly from modality-specific characteristics that should be tolerated rather than aligned.

What would settle it

A controlled experiment in which uniform MSE applied across all frequencies on the same backbone and data produces equal or higher mAP50 than the frequency-decoupled version.

Figures

Figures reproduced from arXiv: 2606.11572 by Abdalmalek Aburaddaha, Keval Thaker, Samir A. Rawashdeh, Venkatraman Narayanan.

Figure 1
Figure 1. Figure 1: Overview of FreqKD. Stage 1 (left): a frozen RGB DINOv2 ViT-L teacher and an IR DINOv2 student with rank-64 LoRA adapters process registered RGB–IR pairs; only the LoRA parameters are trainable. Frequency decomposition module (centre): at each matched block l ∈ {7,15,19,21,23}, teacher and student features are channel-wise normalised, trans￾formed by 2D FFT, and split at radial cut-off rc=0.50 into a share… view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative results across the three transfer settings. Columns show ground truth, [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

Transfer learning from large-scale RGB foundation models to infrared (IR) imagery through knowledge distillation (KD) remains challenging due to fundamental differences in image formation physics. We investigate the spectral structure of the RGB--IR modality gap and observe that feature divergence is not uniform across spatial frequencies: low-frequency components (shape, layout) show greater cross-modal alignment than high-frequency components (texture, fine edges), which reflect modality-specific characteristics. Based on this analysis, we propose FreqKD, a frequency-decoupled distillation framework that applies asymmetric supervision adapted to each band's cross-modal consistency. The method employs strict mean squared error (MSE) on the low-frequency band to preserve shared structural information and a relaxed log-MSE loss (weighted at 0.1) on the high-frequency band to provide edge guidance while tolerating texture differences. Spectral divergence analysis on 500 paired samples shows that high-frequency divergence exceeds low-frequency divergence by a factor of 2.4x on average across all analysed transformer layers. On KAIST multispectral pedestrian detection, FreqKD achieves 64.1 mAP50, improving 2.4 points over the DINOv2 baseline. The learned representation transfers across datasets (FLIR ADAS, +2.1 mAP50), tasks (MFNet segmentation, +1.85 mean intersection-over-union), and architectures (ResNet-50, +1.0 mAP50). Code is available at: https://anonymous.4open.science/r/freq_decoupled_kd-5E5A

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 1 minor

Summary. The paper proposes FreqKD, a frequency-decoupled knowledge distillation method for transferring RGB foundation models (DINOv2) to infrared object detection. Based on spectral analysis of 500 paired RGB-IR samples showing 2.4x higher feature divergence in high-frequency bands than low-frequency bands across transformer layers, it applies strict MSE to low-frequency components and a relaxed 0.1-weighted log-MSE to high-frequency components. This yields 64.1 mAP50 on KAIST multispectral pedestrian detection (+2.4 over baseline), with reported transfers to FLIR ADAS (+2.1 mAP50), MFNet segmentation (+1.85 mIoU), and ResNet-50 backbone (+1.0 mAP50).

Significance. If the frequency-specific asymmetry is shown to be robust and causally responsible for the gains, the method could offer a practical way to handle modality gaps in cross-modal KD without forcing alignment on texture differences. The cross-dataset, cross-task, and cross-architecture transfer results suggest potential generality beyond the KAIST benchmark.

major comments (4)
  1. [Abstract / §3] Abstract and §3 (spectral analysis): the 2.4x high-frequency divergence claim is based on 500 paired samples but provides no description of the frequency decomposition procedure, no per-layer or per-sample variance, and no error bars; this measurement directly motivates the asymmetric loss design and must be reproducible.
  2. [§4] §4 (loss formulation): the 0.1 weighting factor on the high-frequency log-MSE term is presented as fixed without ablation against uniform weighting, alternative factors, or learned weights; because the central claim attributes the 2.4 mAP50 gain to this specific relaxation, the lack of sensitivity analysis makes the result under-determined.
  3. [§5] §5 (experiments): no control experiments test whether enforcing high-frequency alignment (e.g., equal MSE on both bands) actually harms downstream detection, nor whether the observed divergence gap persists in same-modality (RGB-RGB or IR-IR) pairs after controlling for general high-frequency noise.
  4. [§5] §5 (KAIST results): the 64.1 mAP50 figure and +2.4 improvement are reported without standard deviations across runs or statistical significance tests, weakening the claim that the frequency-decoupled losses are responsible for the observed gain.
minor comments (1)
  1. [Abstract] The code link is given as anonymous; a permanent repository with the exact spectral-analysis script would strengthen reproducibility.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the thoughtful comments on our manuscript. We address each of the major comments below and will incorporate revisions to improve clarity, reproducibility, and experimental rigor.

read point-by-point responses
  1. Referee: [Abstract / §3] the 2.4x high-frequency divergence claim is based on 500 paired samples but provides no description of the frequency decomposition procedure, no per-layer or per-sample variance, and no error bars; this measurement directly motivates the asymmetric loss design and must be reproducible.

    Authors: We agree that additional details are necessary for reproducibility. In the revised manuscript, we will provide a full description of the frequency decomposition procedure in §3, including the use of 2D Fourier transforms with specific low-pass and high-pass filters. We will also report per-layer and per-sample variance along with error bars on the 2.4x factor to quantify the consistency of the observation across the 500 samples. revision: yes

  2. Referee: [§4] the 0.1 weighting factor on the high-frequency log-MSE term is presented as fixed without ablation against uniform weighting, alternative factors, or learned weights; because the central claim attributes the 2.4 mAP50 gain to this specific relaxation, the lack of sensitivity analysis makes the result under-determined.

    Authors: We acknowledge this point and will add a comprehensive ablation study in the revised §5. This will include results for different weighting factors (0.01, 0.05, 0.1, 0.5, 1.0) on the high-frequency term, as well as uniform weighting across both bands. We will also explore a learned weight variant if feasible. These experiments will help substantiate the choice of 0.1 and its contribution to the performance gains. revision: yes

  3. Referee: [§5] no control experiments test whether enforcing high-frequency alignment (e.g., equal MSE on both bands) actually harms downstream detection, nor whether the observed divergence gap persists in same-modality (RGB-RGB or IR-IR) pairs after controlling for general high-frequency noise.

    Authors: We agree that such controls would provide stronger evidence for the frequency-specific approach. We commit to adding these experiments in the revision: (1) a variant with equal MSE on high-frequency components to measure any degradation in detection performance, and (2) divergence analysis on same-modality pairs to demonstrate that the observed gap is modality-specific. These will be presented in §5. revision: yes

  4. Referee: [§5] the 64.1 mAP50 figure and +2.4 improvement are reported without standard deviations across runs or statistical significance tests, weakening the claim that the frequency-decoupled losses are responsible for the observed gain.

    Authors: We will address this by conducting multiple runs (at least 5 seeds) and reporting mean performance with standard deviations for the KAIST results. We will also perform and report a statistical significance test (e.g., t-test) to support the significance of the +2.4 mAP50 improvement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method defined independently of results

full rationale

The paper measures spectral divergence empirically on 500 paired samples (2.4x factor) to motivate an asymmetric loss design (strict MSE on low-frequency, log-MSE weighted at 0.1 on high-frequency), then reports downstream mAP gains on public benchmarks. No equation shows the 0.1 weight or performance numbers reducing to the divergence measurement by construction, nor any self-citation chain, uniqueness theorem, or ansatz smuggling that would make the central claim tautological. The derivation chain remains self-contained against external data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on one explicit free parameter (the 0.1 high-frequency weight) and one domain assumption about frequency-specific cross-modal alignment; no new entities are postulated.

free parameters (1)
  • high-frequency loss weight = 0.1
    The factor 0.1 is introduced to relax supervision on the high-frequency band.
axioms (1)
  • domain assumption Low-frequency components exhibit greater cross-modal alignment than high-frequency components.
    This premise directly motivates the asymmetric loss design and is supported only by the internal spectral analysis on 500 samples.

pith-pipeline@v0.9.1-grok · 5830 in / 1336 out tokens · 34024 ms · 2026-06-27T10:51:13.571672+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 2 linked inside Pith

  1. [1]

    Multimodal object detection via probabilistic ensembling

    Yi-Ting Chen, Jinghao Shi, Zelin Ye, Christoph Mertz, Deva Ramanan, and Shu Kong. Multimodal object detection via probabilistic ensembling. InECCV, 2022

  2. [2]

    Cross-modality fusion transformer for multispectral object detection.arXiv:2111.00273, 2021

    Qingyun Fang, Dapeng Han, and Zhaokui Wang. Cross-modality fusion transformer for multispectral object detection.arXiv:2111.00273, 2021

  3. [3]

    Dantas, Luigi Di Caro, and Dino Ienco

    Roger Ferrod, Cássio F. Dantas, Luigi Di Caro, and Dino Ienco. Revisiting cross-modal knowledge distillation: A disentanglement approach for RGBD semantic segmentation. InECML-PKDD, 2025

  4. [4]

    Domain-adversarial train- ing of neural networks

    Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial train- ing of neural networks. InJMLR, volume 17, pages 1–35, 2016

  5. [5]

    Dual-stream spectral decoupling distillation for remote sensing object detection.IEEE Trans

    Xiangyi Gao, Danpei Zhao, Bo Yuan, and Wentao Li. Dual-stream spectral decoupling distillation for remote sensing object detection.IEEE Trans. Geosci. Remote Sens., 2025

  6. [6]

    ImageBind: One embedding space to bind them all

    Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Al- wala, Armand Joulin, and Ishan Misra. ImageBind: One embedding space to bind them all. InCVPR, 2023

  7. [7]

    A kernel two-sample test.JMLR, 13:723–773, 2012

    Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.JMLR, 13:723–773, 2012

  8. [8]

    MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes

    Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. InIROS, 2017

  9. [9]

    Distilling the knowledge in a neural network.arXiv:1503.02531, 2015

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv:1503.02531, 2015

  10. [10]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, Shen Yelong, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InICLR, 2022

  11. [11]

    C2KD: Bridg- ing the modality gap for cross-modal knowledge distillation

    Fushuo Huo, Wenchao Xu, Jingcai Guo, Haozhao Wang, and Song Guo. C2KD: Bridg- ing the modality gap for cross-modal knowledge distillation. InCVPR, 2024

  12. [12]

    KAIST multispectral pedestrian dataset.IEEE Trans

    Soonmin Hwang, Jaesik Park, Namil Kim, Yukyung Choi, and In So Kweon. KAIST multispectral pedestrian dataset.IEEE Trans. Intell. Transp. Syst., 19(3), 2018

  13. [13]

    LLVIP: A visible- infrared paired dataset for low-light vision

    Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. LLVIP: A visible- infrared paired dataset for low-light vision. InICCV Workshops, 2021

  14. [14]

    Contrast-guided cross-modal distillation for thermal object detection.arXiv:2511.01435, 2025

    SiWoo Kim and JhongHyun An. Contrast-guided cross-modal distillation for thermal object detection.arXiv:2511.01435, 2025. 16THAKER, NARA Y ANAN, ABURADDAHA, RAW ASHDEH: FREQKD CROSS-MODAL KD

  15. [15]

    Seg- ment anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Seg- ment anything. InICCV, 2023

  16. [16]

    Similarity of neural network representations revisited

    Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InICML, 2019

  17. [17]

    Multispectral pedestrian detection via simultaneous detection and segmentation

    Chengyang Li, Dan Song, Ruofeng Tong, and Min Tang. Multispectral pedestrian detection via simultaneous detection and segmentation. InBMVC, 2018

  18. [18]

    Multi-teacher knowledge distillation with triplet loss for cross-modal object tracking

    Yi Li, Lei Liu, Mengya Zhang, and Chenglong Li. Multi-teacher knowledge distillation with triplet loss for cross-modal object tracking. InInt. Conf. Brain Inspired Cognitive Systems (BICS), 2024

  19. [19]

    Distilling cross- modal knowledge via feature disentanglement

    Junhong Liu, Yuan Zhang, Tao Huang, Wenchao Xu, and Renyu Yang. Distilling cross- modal knowledge via feature disentanglement. InAAAI, 2026

  20. [20]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019

  21. [21]

    Guerrero Pena, Masih Aminbeidokhti, Thomas Dubail, Eric Granger, and Marco Pedersoli

    Heitor Rapela Medeiros, Fidel A. Guerrero Pena, Masih Aminbeidokhti, Thomas Dubail, Eric Granger, and Marco Pedersoli. HalluciDet: Hallucinating RGB modal- ity for person detection through privileged information. InWACV, 2024

  22. [22]

    DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research, 2024

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research, 2024

  23. [23]

    How do vision transformers work? InICLR, 2022

    Namuk Park and Songkuk Kim. How do vision transformers work? InICLR, 2022

  24. [24]

    Frequency attention for knowledge distillation

    Cuong Pham, Van-Anh Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, and Thanh- Toan Do. Frequency attention for knowledge distillation. InWACV, 2024

  25. [25]

    FcaNet: Frequency channel attention networks

    Zequn Qin, Pengyi Zhang, Fei Wu, and Xi Li. FcaNet: Frequency channel attention networks. InICCV, 2021

  26. [26]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, 2021

  27. [27]

    PHI-S: Distribution balancing for label-free multi-teacher distillation

    Mike Ranzinger, Jon Barker, Greg Heinrich, Pavlo Molchanov, Bryan Catanzaro, and Andrew Tao. PHI-S: Distribution balancing for label-free multi-teacher distillation. arXiv:2410.01680, 2024

  28. [28]

    AM-RADIO: Ag- glomerative vision foundation model reduce all domains into one

    Mike Ranzinger, Greg Heinrich, Jan Kautz, and Pavlo Molchanov. AM-RADIO: Ag- glomerative vision foundation model reduce all domains into one. InCVPR, pages 12490–12500, 2024

  29. [29]

    Global filter networks for image classification

    Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. Global filter networks for image classification. InNeurIPS, 2021. THAKER, NARA Y ANAN, ABURADDAHA, RAW ASHDEH: FREQKD CROSS-MODAL KD17

  30. [30]

    SAM 2: Segment anything in images and videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. SAM 2: Segment anything in images and videos. InICLR, 2025

  31. [31]

    Faster R-CNN: Towards real-time object detection with region proposal networks

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. InNeurIPS, 2015

  32. [32]

    FitNets: Hints for thin deep nets

    Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. FitNets: Hints for thin deep nets. InICLR, 2015

  33. [33]

    Return of frustratingly easy domain adaptation

    Baochen Sun and Kate Saenko. Return of frustratingly easy domain adaptation. In AAAI, 2016

  34. [34]

    CRC Press, 2016

    Glenn J Tattersall.Infrared Thermography: A Complete Laboratory and Field Manual. CRC Press, 2016

  35. [35]

    FLIR thermal dataset for algorithm training.https://oem.flir

    Teledyne FLIR. FLIR thermal dataset for algorithm training.https://oem.flir. com/solutions/automotive/adas-dataset-form/, 2018

  36. [36]

    SigLIP 2: Multilingual vision-language encoders with im- proved semantic understanding, localization, and dense features.arXiv:2502.14786, 2025

    Michael Tschannen et al. SigLIP 2: Multilingual vision-language encoders with im- proved semantic understanding, localization, and dense features.arXiv:2502.14786, 2025

  37. [37]

    SAMamba: Adaptive state space modeling with hierarchical vision for infrared small target detection.Information Fusion, 124, 2025

    Wenhao Xu, Shuchen Zheng, Changwei Wang, Zherui Zhang, Chuan Ren, Rongtao Xu, and Shibiao Xu. SAMamba: Adaptive state space modeling with hierarchical vision for infrared small target detection.Information Fusion, 124, 2025

  38. [38]

    DistillMatch: Leveraging knowledge distillation from vision foundation model for multimodal image matching.arXiv:2509.16017, 2025

    Meng Yang, Fan Fan, Zizhuo Li, Songchu Deng, Yong Ma, and Jiayi Ma. DistillMatch: Leveraging knowledge distillation from vision foundation model for multimodal image matching.arXiv:2509.16017, 2025

  39. [39]

    Focal and global knowledge distillation for detectors

    Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, and Chun Yuan. Focal and global knowledge distillation for detectors. InCVPR, 2022

  40. [40]

    Masked generative distillation

    Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, and Chun Yuan. Masked generative distillation. InECCV, 2022

  41. [41]

    Sigmoid loss for language image pre-training

    Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InICCV, 2023

  42. [42]

    DINO: DETR with improved denoising anchor boxes for end-to- end object detection

    Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. DINO: DETR with improved denoising anchor boxes for end-to- end object detection. InICLR, 2023

  43. [43]

    Wavelet knowledge distillation: Towards efficient image-to-image translation

    Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, and Kaisheng Ma. Wavelet knowledge distillation: Towards efficient image-to-image translation. In CVPR, 2022

  44. [44]

    Efficient RGB-T tracking via cross-modality distillation

    Tianlu Zhang, Hongyuan Guo, Qiang Jiao, Qiang Zhang, and Jungong Han. Efficient RGB-T tracking via cross-modality distillation. InCVPR, pages 5404–5413, 2023

  45. [45]

    SS- DC: Spatial-spectral decoupling and coupling across visible-infrared gap for domain adaptive object detection.arXiv:2507.12017, 2025

    Xiwei Zhang, Chunjin Yang, Yiming Xiao, Runtong Zhang, and Fanman Meng. SS- DC: Spatial-spectral decoupling and coupling across visible-infrared gap for domain adaptive object detection.arXiv:2507.12017, 2025. 18THAKER, NARA Y ANAN, ABURADDAHA, RAW ASHDEH: FREQKD CROSS-MODAL KD

  46. [46]

    FreeKD: Knowledge distillation via semantic frequency prompt

    Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, and Shanghang Zhang. FreeKD: Knowledge distillation via semantic frequency prompt. InCVPR, 2024

  47. [47]

    Unveiling the potential of segment anything model 2 for RGB-thermal semantic segmentation with language guidance

    Jiayi Zhao, Fei Teng, Kai Luo, Guoqiang Zhao, Zhiyong Li, Xu Zheng, and Kailun Yang. Unveiling the potential of segment anything model 2 for RGB-thermal semantic segmentation with language guidance. InIROS, 2025

  48. [48]

    Improving multispectral pedestrian detection by addressing modality imbalance problems

    Kailai Zhou, Linsen Chen, and Xun Cao. Improving multispectral pedestrian detection by addressing modality imbalance problems. InECCV, 2020