pith. sign in

arxiv: 2605.17286 · v1 · pith:M6JRDK2Znew · submitted 2026-05-17 · 💻 cs.CV

HyperVision: A Channel-Adaptive Ground-Based Hyperspectral Vision Pre-trained Backbone

Pith reviewed 2026-05-20 14:47 UTC · model grok-4.3

classification 💻 cs.CV
keywords hyperspectral imagingpre-trained backbonechannel-adaptive embeddingsemantic segmentationobject trackingsalient object detectioncross-modal distillation
0
0 comments X

The pith

HyperVision is the first ground-based hyperspectral pre-trained backbone that adapts to varying sensor channels and reaches state-of-the-art results on segmentation, tracking, and detection using only head adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that ground-based hyperspectral imaging has lacked a general pre-trained backbone because sensors differ in spectral channels, real labels are scarce and inconsistent, and existing datasets are small and narrow in scene coverage. To solve this, the authors build HyperVision by mapping any input channel count into a shared token space, generating training labels by combining spatial masks from SAM2 with spectral material cues from HyperFree, and distilling semantic knowledge from a large RGB model. Pre-trained on 15 000 images drawn from 26 datasets, the backbone then transfers to new tasks and new sensors without any backbone updates, producing measurable gains on three standard hyperspectral benchmarks.

Core claim

HyperVision supplies the first ground-based hyperspectral backbone by (1) a channel-adaptive dynamic embedding that projects inputs of arbitrary spectral length into one unified token space, (2) multi-source pseudo-labeling that merges SAM2 spatial structures with HyperFree spectral material information, and (3) cross-modal distillation that transfers rich semantics from a pre-trained RGB vision model. After training on 15 k images from 26 diverse ground-based datasets, the model yields state-of-the-art accuracy on semantic segmentation, object tracking, and salient-object detection when only a task-specific head is trained.

What carries the argument

Channel-adaptive dynamic embedding mechanism that maps heterogeneous spectral inputs into a unified token space while preserving spectral resolution.

If this is right

  • Any new ground-based hyperspectral sensor can be used immediately by feeding its raw band stack through the same backbone without retraining the feature extractor.
  • Labeling effort for future hyperspectral tasks drops sharply because the pre-trained model already carries both spatial layout and material identity cues.
  • Cross-modal distillation from RGB models becomes a standard route for enriching small hyperspectral collections.
  • Real-time hyperspectral applications such as material sorting or vegetation monitoring can adopt the same backbone across different camera hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same channel-adaptive design could be tested on airborne or satellite hyperspectral data to check whether the backbone transfers beyond ground-based scenes.
  • If the pseudo-labeling step is replaced by a small amount of human-verified labels, the performance gap between head-only and full fine-tuning might shrink further.
  • The reported gains on three disparate tasks suggest the backbone has learned a representation that is largely task-agnostic, which could support zero-shot or few-shot hyperspectral recognition.

Load-bearing premise

The pseudo-labels created by fusing SAM2 spatial structures with HyperFree spectral cues are accurate and consistent enough to train a backbone that generalizes across unseen sensors and tasks.

What would settle it

Train the backbone once, then evaluate it on a new ground-based hyperspectral dataset recorded by a sensor whose channel count and wavelength centers lie outside the 26 training datasets; if head-only adaptation fails to match or exceed task-specific baselines, the generalization claim is falsified.

Figures

Figures reproduced from arXiv: 2605.17286 by Diqi Chen, Fengchao Xiong, Guanyiman Fu, Jianfeng Lu, Jingtao Li, Jun Zhou, Yan Xu, Zhuanfeng Li, Zihang Cheng.

Figure 1
Figure 1. Figure 1: Comparison of airborne and ground-based HSI modeling. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of existing HSI modeling using pre-trained models for downstream [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The architecture of HyperVision. hyperspectral tasks [29, 48], we extend the embedding stage with a two-branch design that processes dynamically assembled layers in parallel. Specifically, X is patchified in step p and split into a key-channel component Xk and an intermediate-cube component Xc. Denoting the dictionary-based weight construction as g(b,β), we use two separate dictionaries βk and βc to proces… view at source ↗
Figure 4
Figure 4. Figure 4: Unsupervised representation learning with HyperVision. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of pseudo-masks generated by SAM2 and HyperFree. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of the prompt-driven segmentation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparisons of hyperspectral semantic segmentation results. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparisons of hyperspectral object tracking results. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual comparisons of salient object detection results. [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: t-SNE visualization comparing the feature representations of HyperFree and the [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
read the original abstract

While hyperspectral imaging provides rich spatial-spectral information across hundreds of narrow wavelength bands for precise material identification, ground-based hyperspectral pre-trained backbones remain absent, constrained by varying spectral configurations across sensors, the scarcity and inconsistency of labels, and the limited scale and scene diversity of existing datasets. To address these challenges and enable universal perception, we propose HyperVision, the first ground-based hyperspectral pre-trained backbone. First, to handle varying spectral configurations, HyperVision adopts a channel-adaptive dynamic embedding mechanism to map heterogeneous inputs into a unified token space. Second, to address the scarcity and inconsistency of labels, we introduce a multi-source pseudo-labeling method that fuses semantic representations from both spatial structures generated by SAM2 and fine-grained spectral material information extracted by HyperFree. Third, to compensate for limited dataset scale and enrich scene diversity, a cross-modal knowledge distillation mechanism is utilized to transfer rich semantic representations from a pre-trained RGB vision model to our hyperspectral backbone. Pre-trained on a collection of 15k images from 26 diverse ground-based datasets, HyperVision demonstrates exceptional generalization. Requiring only efficient head-only adaptation without adjusting backbone parameters, it achieves state-of-the-art performance compared to task-specific methods across three downstream tasks under varying sensor configurations, yielding up to a 16.3% relative improvement in hyperspectral semantic segmentation $\mathrm{Acc}_{\mathrm{M}}$, a 2.1% relative gain in object tracking AUC, and a 35.5% reduction in salient object detection MAE. The source code and pre-trained model will be publicly available at https://github.com/lronkitty/HyperVision .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces HyperVision as the first ground-based hyperspectral pre-trained backbone. It employs a channel-adaptive dynamic embedding to unify heterogeneous spectral inputs, a multi-source pseudo-labeling strategy that fuses SAM2-derived spatial structures with HyperFree spectral material cues to overcome label scarcity and inconsistency, and cross-modal distillation from RGB models to increase scene diversity. Pre-trained on 15k images drawn from 26 ground-based datasets, the backbone achieves reported state-of-the-art results on three downstream tasks using only head-only adaptation: up to 16.3% relative improvement in hyperspectral semantic segmentation Acc_M, 2.1% relative gain in object tracking AUC, and 35.5% reduction in salient object detection MAE across varying sensor configurations.

Significance. If the performance claims are substantiated by rigorous validation of the pseudo-label quality and experimental controls, the work would constitute a meaningful contribution by establishing the first general-purpose pre-trained model for ground-based hyperspectral perception. The channel-adaptive design and head-only adaptation protocol address practical deployment constraints across heterogeneous sensors, while the public release of code and weights would support reproducibility and follow-on research in material-aware vision tasks.

major comments (2)
  1. [Abstract (multi-source pseudo-labeling paragraph)] The multi-source pseudo-labeling method (described in the abstract paragraph on multi-source pseudo-labeling) is load-bearing for the central claim that a generalizable backbone can be trained despite label scarcity. The manuscript provides no quantitative validation of the fused pseudo-labels themselves, such as pixel-wise agreement with held-out human annotations, cross-sensor consistency scores, or error rates on a real-label validation subset. Without these metrics, downstream SOTA gains with head-only adaptation could plausibly arise from dataset curation or the distillation component rather than reliable spectral-spatial supervision.
  2. [Experimental evaluation (implied by abstract performance claims)] The abstract reports concrete relative gains (16.3% Acc_M, 2.1% AUC, 35.5% MAE reduction) on three tasks, yet the provided text contains no details on baseline implementations, statistical significance tests, error bars, dataset splits, or ablation studies isolating the contribution of each component. These omissions make it impossible to confirm that the reported improvements are robust and attributable to the proposed pre-training pipeline rather than implementation specifics.
minor comments (1)
  1. [Abstract] The symbols Acc_M and MAE are used without explicit definition in the abstract; a brief parenthetical clarification or reference to standard definitions would improve readability for readers outside the immediate hyperspectral community.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects for strengthening the presentation of our multi-source pseudo-labeling approach and the experimental evaluation. We address each major comment below and will incorporate revisions to provide the requested validations and details, thereby improving the rigor and reproducibility of the work.

read point-by-point responses
  1. Referee: [Abstract (multi-source pseudo-labeling paragraph)] The multi-source pseudo-labeling method (described in the abstract paragraph on multi-source pseudo-labeling) is load-bearing for the central claim that a generalizable backbone can be trained despite label scarcity. The manuscript provides no quantitative validation of the fused pseudo-labels themselves, such as pixel-wise agreement with held-out human annotations, cross-sensor consistency scores, or error rates on a real-label validation subset. Without these metrics, downstream SOTA gains with head-only adaptation could plausibly arise from dataset curation or the distillation component rather than reliable spectral-spatial supervision.

    Authors: We acknowledge that direct quantitative validation of the pseudo-label quality would further substantiate the reliability of the multi-source fusion strategy. While the manuscript demonstrates effectiveness through downstream task performance, we agree this leaves room for alternative explanations. In the revised manuscript, we will add a new subsection on pseudo-label validation. This will report: pixel-wise agreement (e.g., IoU and accuracy) against held-out human annotations from multiple sensors; cross-sensor consistency scores for scenes captured under varying spectral configurations; and error rates on a real-label validation subset. We will also include component-wise ablations of the SAM2 spatial and HyperFree spectral contributions to the fused labels, along with qualitative visualizations. These additions will help confirm the supervision quality and isolate its role from dataset curation or distillation. revision: yes

  2. Referee: [Experimental evaluation (implied by abstract performance claims)] The abstract reports concrete relative gains (16.3% Acc_M, 2.1% AUC, 35.5% MAE reduction) on three tasks, yet the provided text contains no details on baseline implementations, statistical significance tests, error bars, dataset splits, or ablation studies isolating the contribution of each component. These omissions make it impossible to confirm that the reported improvements are robust and attributable to the proposed pre-training pipeline rather than implementation specifics.

    Authors: We agree that expanded experimental details are essential to demonstrate robustness and attribute the gains specifically to our pre-training pipeline. In the revised manuscript, we will substantially expand the experimental section to include: full specifications of baseline implementations and adaptations for hyperspectral inputs; results of statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values); error bars from multiple runs with different random seeds; explicit descriptions of dataset splits for pre-training and each downstream task; and comprehensive ablation studies isolating the channel-adaptive dynamic embedding, multi-source pseudo-labeling, and cross-modal distillation components. We will also detail the head-only adaptation protocol and ensure all comparisons use consistent settings. These changes will make the evaluation transparent and confirm the improvements are attributable to the proposed methods. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed derivation or predictions

full rationale

The paper presents an empirical engineering contribution: a channel-adaptive embedding, multi-source pseudo-labeling via SAM2+HyperFree fusion, and cross-modal distillation, all trained on a collected 15k-image corpus. These are described as practical responses to sensor variation, label scarcity, and dataset scale rather than as outputs of any first-principles derivation or equation set. No mathematical predictions, fitted parameters renamed as forecasts, or self-citation chains that render the central claims tautological appear in the abstract or described methodology. Downstream metrics (Acc_M, AUC, MAE) are independent of the pre-training procedure itself. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of the three introduced mechanisms and on the quality of the 15k-image collection; no explicit free parameters beyond standard neural-network weights are named, and the pseudo-label fusion is treated as a domain assumption rather than a derived quantity.

axioms (1)
  • domain assumption SAM2 spatial structures and HyperFree spectral representations can be fused into reliable pseudo-labels despite label scarcity
    Invoked in the description of the multi-source pseudo-labeling method

pith-pipeline@v0.9.0 · 5856 in / 1429 out tokens · 43433 ms · 2026-05-20T14:47:10.703771+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

  1. [1]

    Sparse Recovery of Hyperspectral Signal from Nat- ural RGB Images

    Boaz Arad and Ohad Ben-Shahar. Sparse Recovery of Hyperspectral Signal from Nat- ural RGB Images. InProc. Eur . Conf. Comput. Vis. (ECCV), pages 19–34, 2016. ISBN 978-3-319-46478-7

  2. [2]

    Mohamed Mansoor Roomi

    Boaz Arad, Radu Timofte, Rony Yahel, Nimrod Morag, Amir Bernat, Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Luc Van Gool, Shuai Liu, Yongqiang Li, Chaoyu Feng, Lei Lei, Jiaojiao Li, Songcheng Du, Chaox- iong Wu, Yihong Leng, Rui Song, Mingwei Zhang, Chongxing Song, Shuyi Zhao, Zhiqiang Lang, Wei Wei, Lei Zhang, Renwei Di...

  3. [3]

    NTIRE 2022 Spectral Demosaicing Challenge and Data Set

    Boaz Arad, Radu Timofte, Rony Yahel, Nimrod Morag, Amir Bernat, Yaqi Wu, Xun Wu, Zhihao Fan, Chenjie Xia, Feng Zhang, Shuai Liu, Yongqiang Li, Chaoyu Feng, Lei Lei, Mingwei Zhang, Kai Feng, Xun Zhang, Jiaxin Yao, Yongqiang Zhao, Suina Ma, Fan He, Yangyang Dong, Shufang Yu, Difa Qiu, Jinhui Liu, Mengzhao Bi, Beibei Song, WenFang Sun, Jiesi Zheng, Bowen Zha...

  4. [4]

    Labeled Hyperspectral and RGB Images of Several Tree Species

    Ryan Brown and Josh Moser. Labeled Hyperspectral and RGB Images of Several Tree Species. 2021

  5. [5]

    SAM 3: Segment anything with concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

  6. [6]

    Statistics of real-world hyperspectral images

    Ayan Chakrabarti and Todd Zickler. Statistics of real-world hyperspectral images. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 193–200, 2011

  7. [7]

    Encoder-Decoder with Atrous Separable Convolution for Semantic Image Seg- mentation

    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Seg- mentation. InProc. Eur . Conf. Comput. Vis. (ECCV), 2018

  8. [8]

    SENSE: Hyperspectral video object tracker via fusing material and motion cues.Inf

    Yuzeng Chen, Qiangqiang Yuan, Yuqi Tang, Yi Xiao, Jiang He, and Zhenqi Liu. SENSE: Hyperspectral video object tracker via fusing material and motion cues.Inf. Fusion, 109:102395, 2024. ISSN 1566-2535. 16FU ET AL.: HYPERVISION

  9. [9]

    Foster and Adam Reeves

    David H. Foster and Adam Reeves. Colour constancy failures expected in colourful environments. InProc. R. Soc. B Biol. Sci., volume 289, page 20212483, 2022

  10. [10]

    Visible – Near infrared hyperspectral dataset of healthy and in- fected apple tree leaves images for the monitoring of apple fire blight.Data in Brief, 50:109532, 2023

    Belal Gaci, Florent Abdelghafour, Maxime Ryckewaert, Silvia Mas-Garcia, Marine Louargant, Florence Verpont, Yohana Laloum, Aude Moronvalle, Ryad Bendoula, and Jean-Michel Roger. Visible – Near infrared hyperspectral dataset of healthy and in- fected apple tree leaves images for the monitoring of apple fire blight.Data in Brief, 50:109532, 2023. ISSN 2352-3409

  11. [11]

    Gaidel, V .V

    A.V . Gaidel, V .V . Podlipnov, Ivliev Nikolay, R.A. Paringer, P.A. Ishkin, S.V . Mashkov, and R.V . Skidanov. Agricultural plant hyperspectral imaging dataset.Comput. Opt., 47, 2023

  12. [12]

    CBFF- Net: A New Framework for Efficient and Accurate Hyperspectral Object Tracking

    Long Gao, Pan Liu, Yan Jiang, Weiying Xie, Jie Lei, Yunsong Li, and Qian Du. CBFF- Net: A New Framework for Efficient and Accurate Hyperspectral Object Tracking. IEEE Trans. Geosci. Remote Sens., 61:1–14, 2023

  13. [13]

    Victoria Martínez, and Unai Martinez-Corral

    Jon Gutiérrez-Zaballa, Koldo Basterretxea, Javier Echanobe, M. Victoria Martínez, and Unai Martinez-Corral. HSI-Drive v2.0: More Data for New Challenges in Scene Un- derstanding for Autonomous Driving. InProc. IEEE Symp. Ser . Comput. Intell. (SSCI), pages 207–214, 2023

  14. [14]

    A Hyperspectral and RGB Dataset for Build- ing Façade Segmentation

    Nariman Habili, Ernest Kwan, Weihao Li, Christfried Webers, Jeremy Oorloff, Mo- hammad Ali Armin, and Lars Petersson. A Hyperspectral and RGB Dataset for Build- ing Façade Segmentation. InProc. Eur . Conf. Comput. Vis. (ECCV) Workshops, pages 258–267, 2023. ISBN 978-3-031-25082-8

  15. [15]

    Hyper-Drive: Visible-Short Wave Infrared Hyperspectral Imaging Datasets for Robots in Unstructured Environments

    Nathaniel Hanson, Benjamin Pyatski, Samuel Hibbard, Charles DiMarzio, and Ta¸ skın Padır. Hyper-Drive: Visible-Short Wave Infrared Hyperspectral Imaging Datasets for Robots in Unstructured Environments. InProc. Workshop Hyperspectral Imaging Sig- nal Process.: Evolution Remote Sens. (WHISPERS), pages 1–5, 2023

  16. [16]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016

  17. [17]

    Masked Autoencoders Are Scalable Vision Learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked Autoencoders Are Scalable Vision Learners. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recognit. (CVPR), pages 16000–16009, 2022

  18. [18]

    SpectralGPT: Spectral Remote Sensing Foun- dation Model.IEEE Trans

    Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, and Jocelyn Chanussot. SpectralGPT: Spectral Remote Sensing Foun- dation Model.IEEE Trans. Pattern Anal. Mach. Intell., 46(8):5227–5244, 2024

  19. [19]

    Spectral simulation and method design of camouflage textiles for concealment of hyperspectral imaging in UV-VIS-IR against multidimensional combat background.J

    Anowar Hossain. Spectral simulation and method design of camouflage textiles for concealment of hyperspectral imaging in UV-VIS-IR against multidimensional combat background.J. Text. Inst., 114(2):331–342, 2023

  20. [20]

    Spatial–Spectral Weighted and Regu- larized Tensor Sparse Correlation Filter for Object Tracking in Hyperspectral Videos

    Zengfu Hou, Wei Li, Jun Zhou, and Ran Tao. Spatial–Spectral Weighted and Regu- larized Tensor Sparse Correlation Filter for Object Tracking in Hyperspectral Videos. IEEE Trans. Geosci. Remote Sens., 60:1–12, 2022. FU ET AL.: HYPERVISION17

  21. [21]

    HSICityV2: Urban Scene Understanding via Hyperspectral Images, 2021

    Yuxing Huang, Tianqi Ren, Qiu Shen, Ying Fu, and Shaodi You. HSICityV2: Urban Scene Understanding via Hyperspectral Images, 2021

  22. [22]

    Hyperspectral adapter for semantic segmentation with vision foundation models.IEEE Robotics and Automation Letters, 11(3):3606–3613, 2026

    Juana Valeria Hurtado, Rohit Mohan, and Abhinav Valada. Hyperspectral adapter for semantic segmentation with vision foundation models.IEEE Robotics and Automation Letters, 11(3):3606–3613, 2026

  23. [23]

    Hyperspectral Image Dataset for Benchmarking on Salient Object Detection

    Nevrez Imamoglu, Yu Oishi, Xiaoqiang Zhang, Guanqun Ding, Yuming Fang, Toru Kouyama, and Ryosuke Nakamura. Hyperspectral Image Dataset for Benchmarking on Salient Object Detection. InProc. Int. Conf. Quality Multimedia Experience (QoMEX), pages 1–3, 2018

  24. [24]

    Hy- perFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery

    Jingtao Li, Yingyi Liu, Xinyu Wang, Yunning Peng, Chen Sun, Shaoyu Wang, Zhen- dong Sun, Tian Ke, Xiao Jiang, Tangwei Lu, Anran Zhao, and Yanfei Zhong. Hy- perFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 23048–23058, 2025

  25. [25]

    RGB-induced feature modulation network for hyperspectral image super-resolution.IEEE Transactions on Geoscience and Remote Sensing, 61:1–11, 2023

    Qiang Li, Maoguo Gong, Yuan Yuan, and Qi Wang. RGB-induced feature modulation network for hyperspectral image super-resolution.IEEE Transactions on Geoscience and Remote Sensing, 61:1–11, 2023. doi: 10.1109/TGRS.2023.3277486

  26. [26]

    SiamBAG: Band Attention Grouping- Based Siamese Object Tracking Network for Hyperspectral Videos.IEEE Trans

    Wei Li, Zengfu Hou, Jun Zhou, and Ran Tao. SiamBAG: Band Attention Grouping- Based Siamese Object Tracking Network for Hyperspectral Videos.IEEE Trans. Geosci. Remote Sens., 61:1–12, 2023

  27. [27]

    BAE-Net: A Band Attention Aware Ensemble Network for Hyperspectral Object Tracking

    Zhuanfeng Li, Fengchao Xiong, Jun Zhou, Jing Wang, Jianfeng Lu, and Yuntao Qian. BAE-Net: A Band Attention Aware Ensemble Network for Hyperspectral Object Tracking. InProc. IEEE Int. Conf. Image Process. (ICIP), pages 2106–2110, 2020

  28. [28]

    Material- Guided Siamese Fusion Network for Hyperspectral Object Tracking

    Zhuanfeng Li, Fengchao Xiong, Jianfeng Lu, Jun Zhou, and Yuntao Qian. Material- Guided Siamese Fusion Network for Hyperspectral Object Tracking. InProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 2809–2813, 2022

  29. [29]

    Learning a Deep Ensemble Network With Band Importance for Hyperspectral Object Tracking

    Zhuanfeng Li, Fengchao Xiong, Jun Zhou, Jianfeng Lu, and Yuntao Qian. Learning a Deep Ensemble Network With Band Importance for Hyperspectral Object Tracking. IEEE Trans. Image Process., 32:2901–2914, 2023

  30. [30]

    Spectrum- Driven Mixed-Frequency Network for Hyperspectral Salient Object Detection.IEEE Trans

    Peifu Liu, Tingfa Xu, Huan Chen, Shiyun Zhou, Haolin Qin, and Jianan Li. Spectrum- Driven Mixed-Frequency Network for Hyperspectral Salient Object Detection.IEEE Trans. Multimedia, 26:5296–5310, 2024

  31. [31]

    Swin Transformer: Hierarchical Vision Transformer Using Shifted Win- dows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical Vision Transformer Using Shifted Win- dows. InProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pages 10012–10022, 2021

  32. [32]

    SiamHYPER: Learning a hyperspectral object tracker from an rgb-based tracker.IEEE Trans

    Zhenqi Liu, Xinyu Wang, Yanfei Zhong, Meng Shu, and Chen Sun. SiamHYPER: Learning a hyperspectral object tracker from an rgb-based tracker.IEEE Trans. Image Process., 31:7116–7129, 2022

  33. [33]

    HSI Road: A Hyper Spectral Image Dataset For Road Segmentation

    Jiarou Lu, Huafeng Liu, Yazhou Yao, Shuyin Tao, Zhenming Tang, and Jianfeng Lu. HSI Road: A Hyper Spectral Image Dataset For Road Segmentation . InProc. IEEE Int. Conf. Multimedia Expo (ICME), pages 1–6, 2020. 18FU ET AL.: HYPERVISION

  34. [34]

    Nascimento, Kinjiro Amano, and David H

    Sérgio M.C. Nascimento, Kinjiro Amano, and David H. Foster. Spatial distributions of local illumination color in natural scenes.Vision Res., 120:39–44, 2016. ISSN 0042-6989

  35. [35]

    Context- Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

    Zhenliang Ni, Xinghao Chen, Yingjie Zhai, Yehui Tang, and Yunhe Wang. Context- Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation. InProc. Eur . Conf. Comput. Vis. (ECCV), pages 239–255, 2025. ISBN 978-3-031-72943-0

  36. [36]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El- Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick ...

  37. [37]

    Zaiane, and Martin Jagersand

    Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, and Martin Jagersand. U2-Net: Going deeper with nested U-structure for salient object detection.Pattern Recognit., 106:107404, 2020. ISSN 0031-3203

  38. [38]

    HSOD- BIT-V2: A Challenging Benchmark for Hyperspectral Salient Object Detection.Proc

    Yuhao Qiu, Shuyan Bai, Tingfa Xu, Peifu Liu, Haolin Qin, and Jianan Li. HSOD- BIT-V2: A Challenging Benchmark for Hyperspectral Salient Object Detection.Proc. AAAI Conf. Artif. Intell., 39(6):6630–6638, 2025

  39. [39]

    Learning Transferable Visual Models From Natural Language Supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models From Natural Language Supervision. InProc. Int. Conf. Mach. Learn. (ICML), pages 8748–8763, 2021

  40. [40]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollar, and Christoph Feichtenhofer. SAM 2: Segment Anything in Images and Videos. InProc. Int. Conf. Learn. Re...

  41. [41]

    A dataset for evaluating blood detection in hyperspectral images.F orensic Sci

    Michał Romaszewski, Przemysław Głomb, Arkadiusz Sochan, and Michał Cholewa. A dataset for evaluating blood detection in hyperspectral images.F orensic Sci. Int., 320: 110701, 2021. ISSN 0379-0738

  42. [42]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. InProc. Int. Conf. Med. Image Comput. Comput.- Assist. Intervention (MICCAI), pages 234–241, 2015. ISBN 978-3-319-24574-4

  43. [43]

    MobileNetV2: Inverted Residuals and Linear Bottlenecks

    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted Residuals and Linear Bottlenecks. InProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018

  44. [44]

    Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John FU ET AL.: HYPERVISION19 Brand...

  45. [45]

    BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

    Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, and Lizhuang Ma. BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 3162–3173, 2024

  46. [46]

    HS3-Bench: A Benchmark and Strong Baseline for Hyperspectral Semantic Segmentation in Driving Scenarios

    Nick Theisen, Robin Bartsch, Dietrich Paulus, and Peer Neubert. HS3-Bench: A Benchmark and Strong Baseline for Hyperspectral Semantic Segmentation in Driving Scenarios. InProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), pages 5895–5901, 2024

  47. [47]

    Measuring the Ripeness of Fruit with Hyperspectral Imaging and Deep Learning

    Leon Amadeus Varga, Jan Makowski, and Andreas Zell. Measuring the Ripeness of Fruit with Hyperspectral Imaging and Deep Learning. InProc. Int. Joint Conf. Neural Netw. (IJCNN), pages 1–8, 2021

  48. [48]

    A Fast Neighborhood Grouping Method for Hyperspectral Band Selection.IEEE Trans

    Qi Wang, Qiang Li, and Xuelong Li. A Fast Neighborhood Grouping Method for Hyperspectral Band Selection.IEEE Trans. Geosci. Remote Sens., 59(6):5028–5039, 2021

  49. [49]

    PVT v2: Improved baselines with pyramid vision trans- former.Comput

    Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. PVT v2: Improved baselines with pyramid vision trans- former.Comput. Vis. Media, 8(3):415–424, 2022

  50. [50]

    100 radical innovation breakthroughs for the future, 2019

    Philine Warnke, Kerstin Cuhls, Ulrich Schmoch, Lea Daniel, Liviu Andreescu, Bianca Dragomir, Radu Gheorghiu, Catalina Baboschi, Adrian Curaj, Marjukka Parkkinen, and Osmo Kuusi. 100 radical innovation breakthroughs for the future, 2019

  51. [51]

    HyKo: A Spectral Dataset for Scene Understanding

    Christian Winkens, Florian Sattler, Veronika Adams, and Dietrich Paulus. HyKo: A Spectral Dataset for Scene Understanding. InProc. IEEE Int. Conf. Comput. Vis. (ICCV) Workshops, 2017

  52. [52]

    Material Based Object Tracking in Hyperspectral Videos.IEEE Trans

    Fengchao Xiong, Jun Zhou, and Yuntao Qian. Material Based Object Tracking in Hyperspectral Videos.IEEE Trans. Image Process., 29:3719–3733, 2020

  53. [53]

    Hyperspectral Object Tracking Challenge, 2025

    Fengchao Xiong, Jun Zhou, Hiep Quang Luong, Mina Zahiri, Rafal Muszynski, Wouter Charle, Yanfei Zhong, Pedram Ghamisi, and Jocelyn Chanussot. Hyperspectral Object Tracking Challenge, 2025

  54. [54]

    Fumihito Yasuma, Tomoo Mitsunaga, Daisuke Iso, and Shree K. Nayar. Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spec- trum.IEEE Trans. Image Process., 19(9):2241–2253, 2010

  55. [55]

    Hyperspectral city v1

    Shaodi You, Erqi Huang, Shuaizhe Liang, Yongrong Zheng, Yunxiang Li, Fan Wang, Sen Lin, Qiu Shen, Xun Cao, Diming Zhang, et al. Hyperspectral city v1. 0 dataset and benchmark.arXiv preprint arXiv:1907.10270, 2019

  56. [56]

    Yanfei Zhong, Xin Hu, Chang Luo, Xinyu Wang, Ji Zhao, and Liangpei Zhang. Whu- hi: Uav-borne hyperspectral with high spatial resolution (h2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with crf.Remote Sensing of Environment, 250:112012, 2020. ISSN 0034-4257. 20FU ET AL.: HYPERVISION

  57. [57]

    Visual Prompt Multi-Modal Tracking

    Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. Visual Prompt Multi-Modal Tracking. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pages 9516–9526, 2023