Hyperspectral Image Classification via Efficient Global Spectral Supertoken Clustering
Pith reviewed 2026-05-07 10:00 UTC · model grok-4.3
The pith
DSCC decouples clustering from classification to produce boundary-aligned predictions from spectral supertokens in hyperspectral images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DSCC is an end-to-end framework that explicitly decouples clustering from classification. It groups spectrally similar and spatially proximate pixels into boundary-preserving spectral supertokens by computing an image-level multi-criteria feature distance, applying locality-aware assignment regularization, and selecting centers via density-isolation. Token-level prediction then uses a soft-label scheme that records class proportions within each supertoken. This design guarantees region-level, boundary-aligned classification outputs while handling mixed compositions and delivering a favorable accuracy-efficiency trade-off.
What carries the argument
The spectral supertoken, a cluster of spectrally similar and spatially proximate pixels carrying soft class-proportion labels, which shifts classification from pixel-wise to token-level and thereby enforces regional consistency.
Where Pith is reading between the lines
- The token-level soft-label scheme could be adapted to standard semantic segmentation tasks where pixels often straddle class boundaries.
- Density-isolation center selection might generalize to other clustering-based image partitioning problems that suffer from scale variation.
- The explicit decoupling of stages offers a template for improving spatial coherence in video or multi-temporal remote-sensing classification pipelines.
- Real-time remote-sensing systems could incorporate the dual-stage design to reduce per-frame compute while preserving edge accuracy.
Load-bearing premise
The multi-criteria feature distance plus locality-aware assignment regularization will reliably produce boundary-preserving supertokens whose soft-label proportions accurately reflect mixed land-cover content without introducing new classification errors.
What would settle it
Quantitative boundary-alignment evaluation or visual inspection on the WHU-OHS dataset showing that a substantial fraction of supertoken edges cross verified land-cover transitions, which would increase per-pixel classification errors relative to pixel-wise baselines.
Figures
read the original abstract
Hyperspectral image classification demands spatially coherent predictions and precise boundary delineation. Yet prevailing superpixel-based methods face an inherent contradiction: clustering aggregates similar pixels into regions, but the subsequent classifier operates pixel-wise, undermining regional consistency. Consequently, existing approaches do not guarantee region-level, boundary-aligned classification. To address this limitation, we propose the Dual-stage Spectrum-Constrained Clustering-based Classifier (DSCC), an end-to-end framework that explicitly decouples clustering from classification by first grouping spectral similar and spatially proximate pixels into spectral supertokens and then performing token-level prediction. At its core, DSCC computes an image-level multi-criteria feature distance between pixels and centers, followed by a locality-aware assignment regularization, enabling the generation of boundary-preserving spectral supertokens. A density-isolation based center selection further yields representative, well-separated centers, reducing redundancy and improving robustness to scale variation. To accommodate mixed land-cover compositions within each token, we introduce a soft-label scheme that encodes class proportions and improves robustness for mixed-class tokens. DSCC attains a CF1 of 0.728 at 197.75 FPS on the WHU-OHS dataset, offering a superior accuracy-efficiency trade-off compared with state-of-the-art methods. Extensive experiments further validate the effectiveness and generality of the proposed dual-stage paradigm for hyperspectral image classification. The source code is available at https://github.com/laprf/DSCC.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Dual-stage Spectrum-Constrained Clustering-based Classifier (DSCC) for hyperspectral image classification. It decouples clustering from classification by first forming boundary-preserving spectral supertokens via an image-level multi-criteria feature distance, locality-aware assignment regularization, and density-isolation center selection, then performing token-level prediction with a soft-label scheme that encodes class proportions for mixed land-cover pixels. The central empirical claim is a CF1 of 0.728 at 197.75 FPS on the WHU-OHS dataset, with a superior accuracy-efficiency trade-off versus state-of-the-art methods, supported by ablation studies on each component and direct comparisons under consistent protocols.
Significance. If the reported performance holds, the work is significant because it resolves the regional-consistency contradiction inherent in prior superpixel pipelines by enforcing an explicit dual-stage separation and soft-label encoding. The efficiency (nearly 200 FPS) and open-source code are practical strengths that could influence real-time remote-sensing applications. The ablations and reproducible implementation provide a solid basis for follow-on research.
minor comments (3)
- [Abstract] Abstract: The performance figures (CF1 and FPS) are stated without reference to hardware platform or batch size; while the full experimental section supplies these details, a brief qualifier in the abstract would improve immediate readability.
- [§2] §2: The contrast between spectral supertokens and conventional superpixels is conceptually clear, yet a short quantitative comparison (e.g., average token size or boundary F-score) in the related-work discussion would sharpen the novelty statement.
- [§4.2] §4.2: The density-isolation center selection is described algorithmically, but the sensitivity of the isolation threshold to image resolution is not tabulated; adding a one-line sensitivity plot would strengthen the robustness claim.
Simulated Author's Rebuttal
We thank the referee for the positive review, the recognition of the dual-stage paradigm's resolution of regional-consistency issues, and the recommendation to accept. We appreciate the note on practical strengths and reproducibility.
Circularity Check
No significant circularity; empirical performance claim
full rationale
The paper proposes an algorithmic framework (DSCC) with multi-criteria distance, locality-aware regularization, density-isolation center selection, and soft-label encoding, then reports empirical results (CF1 0.728 at 197.75 FPS on WHU-OHS) plus ablations and comparisons. No equations, fitted parameters, or predictions are defined in terms of themselves; the central claim is a measured trade-off under consistent protocols rather than a self-referential derivation. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz that would force the result.
Axiom & Free-Parameter Ledger
invented entities (1)
-
spectral supertokens
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Acosta,I.C.C.,Khodadadzadeh,M.,Tusa,L.,Ghamisi,P.,Gloaguen, R., 2019. A machine learning framework for drill-core mineral mapping using hyperspectral and high-resolution mineralogical data fusion. IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing 12, 4829–4842. doi:10.1109/JSTARS.2019. 2924292
-
[2]
Bai, H., Xu, T., Chen, H., Liu, P., Li, J., 2024. Content-driven magnitude-derivative spectrum complementary learning for hyper- spectral image classification. IEEE Transactions on Geoscience and Remote Sensing 62, 1–14. doi:10.1109/TGRS.2024.3435079
-
[4]
Bhadra, S., Sagan, V., Sarkar, S., Braud, M., Mockler, T.C., Eveland, A.L., 2024. Prosail-net: A transfer learning-based dual stream neural network to estimate leaf chlorophyll and leaf angle of crops from uav hyperspectralimages. ISPRSJournalofPhotogrammetryandRemote Sensing 210, 1–24. doi:https://doi.org/10.1016/j.isprsjprs.2024. 02.020
-
[6]
Neural clustering based visual representation learning, in: CVPR, pp
Chen, G., Li, X., Yang, Y., Wang, W., 2024. Neural clustering based visual representation learning, in: CVPR, pp. 5714–5725
work page 2024
-
[7]
Bridging3-d and2-dconvolutionforhyperspectralimageswithcross-dimensional spectral attention
Chen,H.,Xu,T.,Liu,P.,Bai,H.,Bian,Z.,Li,J.,2025a. Bridging3-d and2-dconvolutionforhyperspectralimageswithcross-dimensional spectral attention. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 19, 2497–2510. doi:10.1109/ JSTARS.2025.3608249
-
[8]
Chen, Y., Jiang, H., Li, C., Jia, X., Ghamisi, P., 2016. Deep feature extraction and classification of hyperspectral images based on con- volutional neural networks. IEEE Transactions on Geoscience and Remote Sensing 54, 6232–6251. doi:10.1109/TGRS.2016.2584107
-
[9]
Sit: Scale-interaction transformer for hyperspectral image classification
Chen, Z., Kumar Roy, S., Gao, H., Ding, Y., Zhang, B., 2025b. Sit: Scale-interaction transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 63, 1–17. doi:10.1109/TGRS.2025.3598290
-
[10]
Choi,S.,Kim,J.,Yun,J.,Choi,J.W.,2023. R-pred:Two-stagemotion prediction via tube-query attention-based trajectory refinement, in: ProceedingsoftheIEEE/CVFInternationalConferenceonComputer Vision (ICCV), pp. 8525–8535
work page 2023
-
[11]
An image is worth 16x16 words: Transformers for image recognition at scale, in: ICLR
Dosovitskiy,A.,Beyer,L.,Kolesnikov,A.,Weissenborn,D.,Zhai,X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An image is worth 16x16 words: Transformers for image recognition at scale, in: ICLR
work page 2021
-
[12]
Fu, D.Y., Epstein, E.L., Nguyen, E., Thomas, A.W., Zhang, M., Dao, T., Rudra, A., Re, C., 2023. Simple hardware-efficient long convolutions for sequencemodeling, in: Proceedings of International Conference on Machine Learning, pp. 10373–10391
work page 2023
-
[13]
Mamba: Linear-time sequence modeling with selective state spaces
Gu, A., Dao, T., 2024. Mamba: Linear-time sequence modeling with selective state spaces
work page 2024
-
[14]
Han, Z., Yang, J., Gao, L., Zeng, Z., Zhang, B., Chanussot, J.,
-
[15]
IEEETransactionsonGeoscienceandRemoteSensing 63, 1–14
Subpixel spectral variability network for hyperspectral image classification. IEEETransactionsonGeoscienceandRemoteSensing 63, 1–14. doi:10.1109/TGRS.2025.3535749
-
[16]
Hong, D., Han, Z., Yao, J., Gao, L., Zhang, B., Plaza, A., Chanussot, J., 2022. Spectralformer: Rethinking hyperspectral image classifica- tionwithtransformers.IEEETransactionsonGeoscienceandRemote Sensing 60, 1–15. doi:10.1109/TGRS.2021.3130716
-
[17]
Deep diversity-enhanced feature representation of hyperspectral images
Hou, J., Zhu, Z., Hou, J., Liu, H., Zeng, H., Meng, D., 2024. Deep diversity-enhanced feature representation of hyperspectral images. IEEE Transactions on Pattern Analysis and Machine Intelligence 46, 8123–8138. doi:10.1109/TPAMI.2024.3399753
-
[18]
Super- pixel sampling networks, in: ECCV
Jampani, V., Sun, D., Liu, M.Y., Yang, M.H., Kautz, J., 2018. Super- pixel sampling networks, in: ECCV
work page 2018
-
[19]
Whu-ohs: A benchmark dataset for large-scale hersepctral image classification
Li, J., Huang, X., Tu, L., 2022. Whu-ohs: A benchmark dataset for large-scale hersepctral image classification. International Journal of Applied Earth Observation and Geoinformation 113, 103022
work page 2022
-
[20]
Li, M., Liu, Y., Xue, G., Huang, Y., Yang, G., 2023. Exploring the relationship between center and neighborhoods: Central vector ori- ented self-similarity network for hyperspectral image classification. IEEETransactionsonCircuitsandSystemsforVideoTechnology33, 1979–1993. doi:10.1109/TCSVT.2022.3218284
-
[21]
Mambahsi: Spatial–spectral mamba for hyperspectral image classification
Li, Y., Luo, Y., Zhang, L., Wang, Z., Du, B., 2024. Mambahsi: Spatial–spectral mamba for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 62, 1–16. doi:10. Peifu Liu, et al:Preprint submitted to ElsevierPage 13 of 14 Hyperspectral Image Classification via Efficient Global Spectral Supertoken Clustering 1109/TGRS.2024.3430985
-
[22]
Dual-stage hyperspectralimageclassificationmodelwithspectralsupertoken,in: ECCV, pp
Liu, P., Xu, T., Wang, J., Chen, H., Bai, H., Li, J., 2025a. Dual-stage hyperspectralimageclassificationmodelwithspectralsupertoken,in: ECCV, pp. 368–386
-
[23]
Hypermamba: A spectral-spatial adaptive mamba for hyperspectral image classifi- cation
Liu, Q., Yue, J., Fang, Y., Xia, S., Fang, L., 2024. Hypermamba: A spectral-spatial adaptive mamba for hyperspectral image classifi- cation. IEEE Transactions on Geoscience and Remote Sensing 62, 1–14. doi:10.1109/TGRS.2024.3482473
-
[24]
Liu, S., Fu, C., Duan, Y., Wang, X., Luo, F., 2025b. Spatial–spectral enhancement and fusion network for hyperspectral image classifica- tionwithfewlabeledsamples. IEEETransactionsonGeoscienceand Remote Sensing 63, 1–14. doi:10.1109/TGRS.2024.3523578
-
[25]
Grouped multi-attention network for hyperspectral image spectral-spatial classification
Lu, T., Liu, M., Fu, W., Kang, X., 2023. Grouped multi-attention network for hyperspectral image spectral-spatial classification. IEEE Transactions on Geoscience and Remote Sensing 61, 1–12. doi:10. 1109/TGRS.2023.3263851
-
[26]
Image as set of points, in: ICLR
Ma, X., Zhou, Y., Wang, H., Qin, C., Sun, B., Liu, C., Fu, Y., 2023. Image as set of points, in: ICLR
work page 2023
-
[27]
Mei,S.,Song,C.,Ma,M.,Xu,F.,2022. Hyperspectralimageclassifi- cationusinggroup-awarehierarchicaltransformer.IEEETransactions onGeoscienceandRemoteSensing60,1–14. doi:10.1109/TGRS.2022. 3207933
-
[28]
Nartey, O.T., Sarpong, K., Addo, D., Rao, Y., Qin, Z., 2023. Picovs: Pixel-level with covariance pooling feature and superpixel-level fea- ture fusion for hyperspectral image classification. IEEE Transactions onGeoscienceandRemoteSensing61,1–20. doi:10.1109/TGRS.2023. 3322641
-
[29]
Pande, S., Banerjee, B., 2022. Hyperloopnet: Hyperspectral image classification using multiscale self-looping convolutional networks. ISPRS Journal of Photogrammetry and Remote Sensing 183, 422–
work page 2022
-
[30]
doi:https://doi.org/10.1016/j.isprsjprs.2021.11.021
-
[31]
Hsod-bit-v2: A challenging benchmark for hyperspectral salient object detection
Qiu, Y., Bai, S., Xu, T., Liu, P., Qin, H., Li, J., 2025. Hsod-bit-v2: A challenging benchmark for hyperspectral salient object detection. Proceedings of the AAAI Conference on Artificial Intelligence 39, 6630–6638
work page 2025
-
[32]
Faster r-cnn: Towards real-time object detection with region proposal networks
Ren, S., He, K., Girshick, R., Sun, J., 2017. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE TransactionsonPatternAnalysisandMachineIntelligence39,1137–
work page 2017
-
[33]
doi:10.1109/TPAMI.2016.2577031
-
[34]
Spectral–spatial morphological attention transformer for hyperspec- tral image classification
Roy, S.K., Deria, A., Shah, C., Haut, J.M., Du, Q., Plaza, A., 2023. Spectral–spatial morphological attention transformer for hyperspec- tral image classification. IEEE Transactions on Geoscience and Remote Sensing 61, 1–15. doi:10.1109/TGRS.2023.3242346
-
[35]
Shao, Z., Fu, H., Li, D., Altan, O., Cheng, T., 2019. Remote sensing monitoring of multi-scale watersheds impermeability for urban hy- drological evaluation. Remote Sensing of Environment 232, 111338
work page 2019
-
[36]
Sheng, J., Zhou, J., Wang, J., Ye, P., Fan, J., 2025. Dualmamba: A lightweight spectral–spatial mamba-convolution network for hyper- spectral image classification. IEEE Transactions on Geoscience and Remote Sensing 63, 1–15. doi:10.1109/TGRS.2024.3516817
-
[37]
Hyperspectral image classification using a superpixel–pixel–subpixel multilevel network
Tu, B., Ren, Q., Li, Q., He, W., He, W., 2023. Hyperspectral image classification using a superpixel–pixel–subpixel multilevel network. IEEE Transactions on Instrumentation and Measurement 72, 1–16. doi:10.1109/TIM.2023.3271713
-
[38]
Dual-stage construction of probability for hyperspectral image classification
Tu, B., Wang, J., Zhao, G., Zhang, X., Zhang, G., 2019. Dual-stage construction of probability for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters 17, 889–893
work page 2019
-
[39]
Attentionisallyouneed,in: NIPS
Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez, A.N.,Kaiser,L.u.,Polosukhin,I.,2017. Attentionisallyouneed,in: NIPS
work page 2017
-
[40]
Wan, L., Zhang, J., Xu, Y., Huang, Y., Zhou, W., Jiang, L., He, Y., Cen,H.,2021. Prosdm:Applicabilityofprospectmodelcoupledwith spectralderivativesandsimilaritymetricstoretrieveleafbiochemical traitsfrombidirectionalreflectance. RemoteSensingofEnvironment 267, 112761. doi:https://doi.org/10.1016/j.rse.2021.112761
-
[41]
Emogen: Emotional image content generation with text-to-image diffusion models,
Wang, G., Guo, Y., Xu, Z., Kankanhalli, M., 2024a. Bilateral adaptation for human-object interaction detection with occlusion- robustness, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 27970–27980. doi:10.1109/ CVPR52733.2024.02642
-
[42]
S2mamba: A spatial–spectral state space model for hyperspectral image classi- fication
Wang, G., Zhang, X., Peng, Z., Zhang, T., Jiao, L., 2025. S2mamba: A spatial–spectral state space model for hyperspectral image classi- fication. IEEE Transactions on Geoscience and Remote Sensing 63, 1–13. doi:10.1109/TGRS.2025.3530993
-
[43]
Wang, Y., Huang, D., Ye, W., Zhang, G., Ouyang, W., He, T., 2024b. Neurodin: A two-stage framework for high- fidelity neural surface reconstruction, in: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 103168–103197. URL: https://pro...
- [44]
-
[45]
From center to surrounding: An interactivelearningframeworkforhyperspectralimageclassification
Yang, J., Du, B., Zhang, L., 2023. From center to surrounding: An interactivelearningframeworkforhyperspectralimageclassification. ISPRS Journal of Photogrammetry and Remote Sensing 197, 145–
work page 2023
-
[46]
doi:https://doi.org/10.1016/j.isprsjprs.2023.01.024
-
[47]
Yu, D., Li, Q., Wang, X., Xu, C., Zhou, Y., 2022. A cross-level spectral–spatial joint encode learning framework for imbalanced hy- perspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 60, 1–17. doi:10.1109/TGRS.2022.3203980
-
[48]
Tcformer: Visual recognition via token clustering transformer
Zeng, W., Jin, S., Xu, L., Liu, W., Qian, C., Ouyang, W., Luo, P., Wang, X., 2024. Tcformer: Visual recognition via token clustering transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence 46, 9521–9535. doi:10.1109/TPAMI.2024.3425768
-
[49]
Zhang, Q., Dong, Y., Zheng, Y., Yu, H., Song, M., Zhang, L., Yuan, Q.,2024a. Three-dimensionspatial–spectralattentiontransformerfor hyperspectralimagedenoising.IEEETransactionsonGeoscienceand Remote Sensing 62, 1–13. doi:10.1109/TGRS.2024.3458174
-
[50]
Cooperated spectrallow-ranknessprioranddeepspatialpriorforhsiunsupervised denoising
Zhang,Q.,Yuan,Q.,Song,M.,Yu,H.,Zhang,L.,2022a. Cooperated spectrallow-ranknessprioranddeepspatialpriorforhsiunsupervised denoising. IEEE Transactions on Image Processing 31, 6356–6368. doi:10.1109/TIP.2022.3211471
-
[51]
Hyperspectral image denoising: From model-driven, data- driven,tomodel-data-driven
Zhang, Q., Zheng, Y., Yuan, Q., Song, M., Yu, H., Xiao, Y., 2024b. Hyperspectral image denoising: From model-driven, data- driven,tomodel-data-driven. IEEETransactionsonNeuralNetworks and Learning Systems 35, 13143–13163. doi:10.1109/TNNLS.2023. 3278866
-
[52]
Spectral–spatial self-attention networks for hyperspectral image classification
Zhang, X., Sun, G., Jia, X., Wu, L., Zhang, A., Ren, J., Fu, H., Yao, Y., 2022b. Spectral–spatial self-attention networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 60, 1–15. doi:10.1109/TGRS.2021.3102143
-
[53]
IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)
Zhao, C., Zhu, W., Feng, S., 2022. Superpixel guided deformable convolution network for hyperspectral image classification. IEEE Transactions on Image Processing 31, 3838–3851. doi:10.1109/TIP. 2022.3176537
work page doi:10.1109/tip 2022
-
[54]
Zheng,Z.,Zhong,Y.,Ma,A.,Zhang,L.,2020. Fpga:Fastpatch-free global learning framework for fully end-to-end hyperspectral image classification. IEEETransactionsonGeoscienceandRemoteSensing 58, 5612–5626. doi:10.1109/TGRS.2020.2967821
-
[55]
Zhong, Y., Hu, X., Luo, C., Wang, X., Zhao, J., Zhang, L., 2020. Whu-hi: Uav-borne hyperspectral with high spatial resolution (h2) benchmarkdatasetsandclassifierforprecisecropidentificationbased on deep convolutional neural network with crf. Remote Sensing of Environment 250, 112012. doi:https://doi.org/10.1016/j.rse.2020. 112012
-
[56]
Zhong, Z., Li, Y., Ma, L., Li, J., Zheng, W.S., 2022. Spectral–spatial transformer network for hyperspectral image classification: A fac- torized architecture search framework. IEEE Transactions on Geo- science and Remote Sensing 60, 1–15. doi:10.1109/TGRS.2021. 3115699. Peifu Liu, et al:Preprint submitted to ElsevierPage 14 of 14
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.