H-SPAM: Hierarchical Superpixel Anything Model
Pith reviewed 2026-05-10 15:36 UTC · model grok-4.3
The pith
H-SPAM builds perfectly nested hierarchical superpixels from fine partitions by merging regions in two phases guided by deep features and object priors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
H-SPAM is a unified framework for accurate, regular, and perfectly nested hierarchical superpixels. It begins from a fine partition and applies a two-phase region merging process guided by deep features and external object priors: the first phase preserves object consistency while the second allows controlled inter-object grouping. The hierarchy can be modulated using visual attention maps or user input to preserve important regions longer. Experiments on standard benchmarks show that H-SPAM strongly outperforms existing hierarchical methods in both accuracy and regularity, while performing on par with most recent state-of-the-art non-hierarchical methods.
What carries the argument
The two-phase region merging process that starts from a fine partition and uses deep features plus object priors to enforce exact nesting while controlling when inter-object merges occur.
If this is right
- The produced hierarchies can be plugged directly into multi-scale vision tasks without extra nesting enforcement steps.
- Object consistency is maintained longer because the first merge phase prioritizes it before allowing cross-object grouping.
- Attention maps or user clicks let users delay the disappearance of important regions during coarsening.
- Accuracy parity with flat state-of-the-art methods removes the usual quality penalty for gaining hierarchy.
Where Pith is reading between the lines
- The same merging logic could be tested on video sequences by propagating object priors across frames to create spatio-temporal hierarchies.
- Embedding H-SPAM outputs into end-to-end networks might let models learn to operate on variable-scale partitions rather than fixed grids.
- The regularity gains at coarse levels may improve downstream efficiency in tasks like object tracking or compression where compact region descriptions matter.
Load-bearing premise
The two-phase merging process guided by deep features and object priors will reliably keep every coarser level perfectly nested inside the finer ones without introducing boundary errors or dropping accuracy.
What would settle it
Apply H-SPAM to a standard superpixel benchmark set and measure whether any coarser hierarchy level shows non-nested boundaries or segmentation accuracy below a matched non-hierarchical method on the same metrics.
Figures
read the original abstract
Superpixels offer a compact image representation by grouping pixels into coherent regions. Recent methods have reached a plateau in terms of segmentation accuracy by generating noisy superpixel shapes. Moreover, most existing approaches produce a single fixed-scale partition that limits their use in vision pipelines that would benefit multi-scale representations. In this work, we introduce H-SPAM (Hierarchical Superpixel Anything Model), a unified framework for generating accurate, regular, and perfectly nested hierarchical superpixels. Starting from a fine partition, guided by deep features and external object priors, H-SPAM constructs the hierarchy through a two-phase region merging process that first preserves object consistency and then allows controlled inter-object grouping. The hierarchy can also be modulated using visual attention maps or user input to preserve important regions longer in the hierarchy. Experiments on standard benchmarks show that H-SPAM strongly outperforms existing hierarchical methods in both accuracy and regularity, while performing on par with most recent state-of-the-art non-hierarchical methods. Code and pretrained models are available: https://github.com/waldo-j/hspam.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces H-SPAM, a unified framework for generating accurate, regular, and perfectly nested hierarchical superpixels. It starts from a fine partition (informed by SAM-style methods), then applies a two-phase region merging process guided by deep features and external object priors: first preserving intra-object consistency, then allowing controlled inter-object grouping. The hierarchy can be modulated via attention maps or user input. Experiments on standard benchmarks are reported to show strong outperformance over existing hierarchical superpixel methods in both accuracy and regularity, while matching recent non-hierarchical state-of-the-art approaches. Code and pretrained models are released.
Significance. If the central performance claims hold after addressing the isolation of the two-phase contribution, H-SPAM would meaningfully advance hierarchical superpixel generation for multi-scale vision pipelines, addressing the plateau in accuracy and lack of nesting in prior work. The public release of code and models is a clear strength that supports reproducibility. The result would be of interest to the computer vision community working on segmentation and region-based representations.
major comments (2)
- [Experimental section] Experimental section: The manuscript does not report an ablation that applies the identical SAM-derived fine partition and deep feature affinities to a conventional single-stage merging procedure. Without this control, it remains unclear whether the reported gains in accuracy and regularity over prior hierarchical baselines are attributable to the two-phase construction (intra-object then inter-object grouping) or to the modern foundation-model inputs unavailable to the cited earlier methods. This directly affects the load-bearing status of the headline algorithmic claim.
- [§5] §5 (results on nesting and boundary fidelity): The claim of 'perfectly nested hierarchies' without boundary errors at coarser levels is central but lacks quantitative verification (e.g., a metric tracking nesting violations or boundary drift across hierarchy levels). The two-phase description does not specify how inter-object grouping is constrained to guarantee nesting while preserving accuracy.
minor comments (2)
- [Abstract] The abstract and introduction use 'external object priors' without a concise upfront definition or reference to how they are obtained; a single clarifying sentence would improve accessibility.
- Figure captions in the experimental section could more explicitly label which rows/columns correspond to different hierarchy levels and which metrics are being visualized to aid quick interpretation.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help improve the clarity and rigor of our work. We address each major comment point-by-point below, and plan to incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: Experimental section: The manuscript does not report an ablation that applies the identical SAM-derived fine partition and deep feature affinities to a conventional single-stage merging procedure. Without this control, it remains unclear whether the reported gains in accuracy and regularity over prior hierarchical baselines are attributable to the two-phase construction (intra-object then inter-object grouping) or to the modern foundation-model inputs unavailable to the cited earlier methods. This directly affects the load-bearing status of the headline algorithmic claim.
Authors: We acknowledge that an explicit ablation isolating the effect of the two-phase merging strategy from the use of modern foundation model features is valuable for substantiating our algorithmic contribution. In the revised version, we will add such an ablation study. Specifically, we will compare our two-phase approach against a baseline single-stage merging procedure that uses the exact same SAM-derived fine partition, deep feature affinities, and object priors. This will demonstrate that the performance improvements stem from the hierarchical two-phase design rather than solely from the input features. We believe this addition will address the referee's concern directly. revision: yes
-
Referee: §5 (results on nesting and boundary fidelity): The claim of 'perfectly nested hierarchies' without boundary errors at coarser levels is central but lacks quantitative verification (e.g., a metric tracking nesting violations or boundary drift across hierarchy levels). The two-phase description does not specify how inter-object grouping is constrained to guarantee nesting while preserving accuracy.
Authors: The 'perfectly nested' property is guaranteed by the region merging process, as coarser levels are formed exclusively by union of regions from finer levels without any boundary modifications or splits. To provide quantitative support, we will introduce a nesting violation metric in the revised §5, which counts the number of boundary inconsistencies across levels (expected to be zero by construction). We will also expand the method description to detail the constraints in the inter-object grouping phase: merging is restricted to adjacent regions with similar deep features and guided by attention maps to avoid crossing object boundaries unnecessarily, thus preserving accuracy. This clarification will be added to ensure the guarantee is explicit. revision: yes
Circularity Check
No circularity: algorithmic construction independent of fitted inputs or self-referential definitions
full rationale
The paper describes H-SPAM as an explicit algorithmic procedure: initialize from a fine partition (e.g., SAM-derived), then apply a two-phase region-merging process guided by deep features and external priors, with optional modulation by attention maps. No equations, parameters, or predictions are shown to be fitted to the target hierarchy and then re-used as outputs. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked to justify the core construction. The hierarchy is produced by the stated merging rules rather than being presupposed in the inputs. Experimental claims rest on benchmark comparisons, not on any definitional equivalence.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
IEEE TPAMI34, 2274–2282 (2012)
Achanta,R.,Shaji,A.,Smith,K.,Lucchi,A.,Fua,P.,Süsstrunk,S.:SLICsuperpix- els compared to state-of-the-art superpixel methods. IEEE TPAMI34, 2274–2282 (2012)
work page 2012
-
[2]
Achanta, R., Süsstrunk, S.: Superpixels and polygons using simple non-iterative clustering. In: CVPR (2017)
work page 2017
-
[3]
Belém, F., Kochem, F., Patrocínio, Z., Perret, B., Cousty, J., Falcão, A., Guimarães,S.J.F.:Measuringhierarchinessofimagesegmentations.In:SIBGRAPI (2024)
work page 2024
-
[4]
Nature methods16(12), 1226–1232 (2019)
Berg, S., Kutra, D., Kroeger, T., Straehle, C.N., Kausler, B.X., Haubold, C., Schiegg, M., Ales, J., Beier, T., Rudy, M., et al.: Ilastik: Interactive machine learn- ing for (bio) image analysis. Nature methods16(12), 1226–1232 (2019)
work page 2019
-
[5]
Brasó, G., Ošep, A., Leal-Taixé, L.: Native segmentation vision transformers (2025) 14 J. Walther et al. Image SH [31] RISF [11] CRTREES [34] Groundtruth HHTS [8] SIT-HSS [32]H-SP AM Image SH [31] RISF [11] CRTREES [34] Groundtruth HHTS [8] SIT-HSS [32]H-SP AM Fig.11:Qualitative comparison between hierarchical methods.Number of superpixels from left to ri...
work page 2025
-
[6]
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: CVPR (2021)
work page 2021
-
[7]
Chang, J., Wei, D., Fisher, J.W.: A video representation using temporal superpix- els. In: CVPR (2013)
work page 2013
-
[8]
Chang, T.V., Seibt, S., von Rymon Lipinski, B.: Hierarchical histogram threshold segmentation-auto-terminating high-detail oversegmentation. In: CVPR (2024)
work page 2024
-
[9]
Chen, J., Li, Z., Huang, B.: Linear spectral clustering superpixel. IEEE TIP26, 3317–3330 (2017)
work page 2017
-
[10]
IEEE TPAMI45(12), 15694–15705 (2023)
Chen, Z., Wang, C., Guo, Y.C., Zhang, S.H.: Structnerf: Neural radiance fields for indoor scenes with structural hints. IEEE TPAMI45(12), 15694–15705 (2023)
work page 2023
-
[11]
Pattern Recognition108, 107532 (2020)
Galvão, F.L., Guimarães, S.J.F., Falcão, A.X.: Image segmentation using dense and sparse hierarchies of superpixels. Pattern Recognition108, 107532 (2020)
work page 2020
-
[12]
Giraud, R., Clément, M.: Superpixel segmentation: A long-lasting ill-posed prob- lem. arXiv:2411.06478 (2024)
-
[13]
Giraud, R., Ta, V.T., Papadakis, N.: Evaluation framework of superpixel methods with a global regularity measure. JEI26(6) (2017)
work page 2017
-
[14]
In: ICCV (2009) H-SPAM: Hierarchical Superpixel Anything Model 15
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and seman- tically consistent regions. In: ICCV (2009) H-SPAM: Hierarchical Superpixel Anything Model 15
work page 2009
-
[15]
Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV80(3), 300–316 (2008)
work page 2008
-
[16]
Jampani, V., Sun, D., Liu, M.Y., Yang, M.H., Kautz, J.: Superpixel sampling networks. In: ECCV (2018)
work page 2018
-
[17]
Ke, T.W., Mo, S., Stella, X.Y.: Learning hierarchical image segmentation for recog- nition and by recognition. In: ICLR (2024)
work page 2024
-
[18]
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W., Dollár, P., Girshick, R.B.: Segment anything. In: ICCV (2023)
work page 2023
-
[19]
Li, J., Zhao, X., Wang, J., Wang, C., Wang, M.: Superpixel-informed implicit neural representation for multi-dimensional data. In: ECCV (2024)
work page 2024
-
[20]
IEEE Access13, 186449– 186464 (2025)
Liang, J., Wei, G.: HieraASGSegNet: hierarchical context fusion for semantic seg- mentation via adaptive superpixel graph reasoning. IEEE Access13, 186449– 186464 (2025)
work page 2025
-
[21]
IEEE TIP24(11), 3707–3716 (2015)
Machairas, V., Faessel, M., Cárdenas-Peña, D., Chabardes, T., Walter, T., Decen- cière, E.: Waterpixels. IEEE TIP24(11), 3707–3716 (2015)
work page 2015
-
[22]
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV (2001)
work page 2001
-
[23]
Mei, J., Chen, L.C., Yuille, A., Xie, C.: SPFormer: Enhancing vision transformer with superpixel representation. TMLR (2025)
work page 2025
-
[24]
Peng, H., Aviles-Rivero, A.I., Schönlieb, C.B.: Hers superpixels: Deep affinity learn- ing for hierarchical entropy rate segmentation (2021)
work page 2021
-
[25]
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: ECCV (2012)
work page 2012
-
[26]
Stutz, D., Hermans, A., Leibe, B.: Superpixels: An evaluation of the state-of-the- art. CVIU166, 1–27 (2018)
work page 2018
-
[27]
Tu, W.C., Liu, M.Y., Jampani, V., Sun, D., Chien, S.Y., Yang, M.H., Kautz, J.: Learning superpixels with segmentation-aware affinity loss. In: CVPR (2018)
work page 2018
-
[28]
Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L.: SEEDS: Superpixels extracted via energy-driven sampling. In: ECCV (2012)
work page 2012
-
[29]
Walther, J., Giraud, R., Clément, M.: Superpixel anything: A general object-based framework for accurate yet regular superpixel segmentation. In: BMVC (2025)
work page 2025
-
[30]
Wang, Y., Wei, Y., Qian, X., Zhu, L., Yang, Y.: AINet: Association implantation for superpixel segmentation. In: ICCV (2021)
work page 2021
-
[31]
IEEE TIP27(10), 4838–4849 (2018)
Wei, X., Yang, Q., Gong, Y., Ahuja, N., Yang, M.H.: Superpixel hierarchy. IEEE TIP27(10), 4838–4849 (2018)
work page 2018
-
[32]
In: SIAM International Conference on Data Mining (SDM) (2025)
Xie, M., Peng, H., Li, P., Zeng, G., Wang, S., Wu, J., Li, P., Yu, P.S.: Hierarchical superpixel segmentation via structural information theory. In: SIAM International Conference on Data Mining (SDM) (2025)
work page 2025
-
[33]
Xu, S., Wei, S., Ruan, T., Liao, L.: Learning invariant inter-pixel correlations for superpixel generation. In: AAAI (2024)
work page 2024
-
[34]
Yan, T., Huang, X., Zhao, Q.: Hierarchical superpixel segmentation by parallel crtrees labeling. IEEE TIP31, 4719–4732 (2022)
work page 2022
-
[35]
Yang, F., Sun, Q., Jin, H., Zhou, Z.: Superpixel segmentation with fully convolu- tional networks. In: CVPR (2020)
work page 2020
-
[36]
Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J.: Fast segment anything. arXiv:2306.12156 (2023)
-
[37]
Zhou, P., Kang, X., Ming, A.: Vine spread for superpixel segmentation. IEEE TIP 32, 878–891 (2023) H-SPAM: Hierarchical Superpixel Anything Model — Supplementary Material — Julien Walther1,2 , Rémi Giraud1, and Michaël Clément2 1 Univ. Bordeaux, CNRS, Bordeaux INP, IMS, UMR 5218, France 2 Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, France 1 Impac...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.