H-SPAM: Hierarchical Superpixel Anything Model

arxiv: 2604.11218 · v1 · submitted 2026-04-13 · 💻 cs.CV

H-SPAM: Hierarchical Superpixel Anything Model

Julien Walther , R\'emi Giraud , Micha\"el Cl\'ement This is my paper

Pith reviewed 2026-05-10 15:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords superpixelshierarchical segmentationregion mergingimage partitioningdeep featuresobject priorsnested hierarchiesmulti-scale representation

0 comments p. Extension

The pith

H-SPAM builds perfectly nested hierarchical superpixels from fine partitions by merging regions in two phases guided by deep features and object priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework to create multi-scale image partitions where each coarser superpixel level contains the finer ones exactly. Most superpixel methods stop at one fixed scale and often produce irregular shapes that limit their use in pipelines needing detail at several resolutions. H-SPAM starts with an over-segmented fine partition and merges regions first to keep objects intact, then permits limited mixing across objects. This produces a full hierarchy that stays accurate and regular at every level. The approach also accepts attention maps or clicks to protect key areas longer during merging.

Core claim

H-SPAM is a unified framework for accurate, regular, and perfectly nested hierarchical superpixels. It begins from a fine partition and applies a two-phase region merging process guided by deep features and external object priors: the first phase preserves object consistency while the second allows controlled inter-object grouping. The hierarchy can be modulated using visual attention maps or user input to preserve important regions longer. Experiments on standard benchmarks show that H-SPAM strongly outperforms existing hierarchical methods in both accuracy and regularity, while performing on par with most recent state-of-the-art non-hierarchical methods.

What carries the argument

The two-phase region merging process that starts from a fine partition and uses deep features plus object priors to enforce exact nesting while controlling when inter-object merges occur.

If this is right

The produced hierarchies can be plugged directly into multi-scale vision tasks without extra nesting enforcement steps.
Object consistency is maintained longer because the first merge phase prioritizes it before allowing cross-object grouping.
Attention maps or user clicks let users delay the disappearance of important regions during coarsening.
Accuracy parity with flat state-of-the-art methods removes the usual quality penalty for gaining hierarchy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same merging logic could be tested on video sequences by propagating object priors across frames to create spatio-temporal hierarchies.
Embedding H-SPAM outputs into end-to-end networks might let models learn to operate on variable-scale partitions rather than fixed grids.
The regularity gains at coarse levels may improve downstream efficiency in tasks like object tracking or compression where compact region descriptions matter.

Load-bearing premise

The two-phase merging process guided by deep features and object priors will reliably keep every coarser level perfectly nested inside the finer ones without introducing boundary errors or dropping accuracy.

What would settle it

Apply H-SPAM to a standard superpixel benchmark set and measure whether any coarser hierarchy level shows non-nested boundaries or segmentation accuracy below a matched non-hierarchical method on the same metrics.

Figures

Figures reproduced from arXiv: 2604.11218 by Julien Walther, Micha\"el Cl\'ement, R\'emi Giraud.

**Figure 1.** Figure 1: Hierarchical superpixel segmentation example. The proposed HSPAM method generates very regular and easily interpretable superpixels that are aligned with the image objects through the hierarchy, compared to HHTS [8]. Hierarchical superpixels. While progress has been made in accuracy and regularity, most existing methods operate at a single scale. Hierarchical decompositions solve this by enforcing nested… view at source ↗

**Figure 2.** Figure 2: Global framework of the H-SPAM method. An object-based superpixel segmentation provides a fine superpixel map along with a high-level object map and features that are used by to create a hierarchy that respects the image objects through the merging process. Visual attention can also be used to concentrate or delay the merge on specific areas. and regularity. This is achieved through a two-phase process th… view at source ↗

**Figure 3.** Figure 3: Object-based superpixel region adjacency graph. (b) Superposition of the prior object map, the initial superpixels, and the edges between the superpixels. The red (–) and green (–) connections respectively mean that the superpixels do not belong or belong to the same object. corresponds to a graph Gs . The construction forms a true hierarchy if, for any two levels i < j, the boundaries represented at leve… view at source ↗

**Figure 4.** Figure 4: Object-based hierarchy creation. Our merging process combines 2 distinct phases corresponding to the processing of high-level prior objects. The first one only allow merges inside objects while the second one merges objects. Phase 1: Intra-object merges. Let Θ : V → {1, . . . , M} denote the function assigning each superpixel u ∈ V to its object. During this first phase, only regions that share the same ob… view at source ↗

**Figure 5.** Figure 5: Illustration of the attention modes. With our object-based framework (middle row), the attention can be averaged within objects to provide a cleaner guide for the merging process. The bottom row shows the user interactive mode, where red/white crosses lead to fewer/more superpixels in an object, offering a complementary alternative to our multi-scale segmentation method. With standard hierarchical cluste… view at source ↗

**Figure 6.** Figure 6: Ablation study - Influence of parameters. (a) Importance of the prior mask. Using object priors to guide the hierarchy largely improves the segmentation accuracy. (b) Influence of wpos on the accuracy and regularity. 3.1 Validation Framework Datasets. We evaluate on standard superpixel benchmarks that provide manual semantic-agnostic precise object annotations: BSD [22], NYUv2 [25] and SBD [14] with resp… view at source ↗

**Figure 7.** Figure 7: Influence of the spatial parameter wpos for K=500 superpixels. This parameter controls the shape consistency of the decomposition. A high wpos enforces smallest shapes to merge. By default, we use wpos = 5. Finally, we also evaluate our hierarchy creation method on fine superpixel maps from CDS [33]. These maps obtain similar accuracy to the ones of SPAM [29] for high K, but are not constrained by object p… view at source ↗

**Figure 8.** Figure 8: Quantitative comparison of H-SPAM to state-of-the-art hierarchical superpixel methods. H-SPAM is the most accurate while being also the most regular hierarchical methods on the three datasets. 0 200 400 600 800 1000 1200 1400 Number of Superpixels (K) 0.92 0.93 0.94 0.95 0.96 0.97 0.98 Achievable Segmentation Accuracy (ASA) Method SLIC LSC SNIC VSSS SSN SEAL AINET CDS SPAM H-SPAM 0 200 400 600 800 1000 12… view at source ↗

**Figure 9.** Figure 9: Quantitative comparison of H-SPAM to state-of-the-art nonhierarchical superpixel methods on the BSD. Although constrained to be hierarchical, H-SPAM is one of the best performing method in both accuracy and regularity, preserving the performance of single scale object-based approach [29]. performance of H-SPAM is almost identical to that of SPAM. Overall, H-SPAM remains the most accurate and regular hiera… view at source ↗

**Figure 10.** Figure 10: Qualitative example of H-SPAM for different scales. From left to right: 1250, 500, 150, 50 superpixels. H-SPAM produces perfectly nested, very regular and easily interpretable regions, that align well with object boundaries. 4 Conclusion In this work, we introduced the Hierarchical Superpixel Anything Model (HSPAM), a unified method that can produce an accurate, easily interpretable and perfectly nested … view at source ↗

**Figure 11.** Figure 11: Qualitative comparison between hierarchical methods. Number of superpixels from left to right: 1250, 800, 500, 150, 50. H-SPAM produces regular and easily interpretable regions, that align well with object boundaries compared to other methods. 6. Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: CVPR (2021)… view at source ↗

**Figure 1.** Figure 1: Influence of the number of superpixel: Lower Nf performs better at small K, while higher Nf becomes better at intermediate K. w/o objects w/ objects w/ objects +clicks Object overlay Attention map watt = 0.01 watt = 0.5 [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the attention modes. With our object-based framework (middle row), the attention can be averaged within objects to provide a cleaner guide for the merging process. The bottom row shows the user interactive mode, where red/white crosses lead to fewer/more superpixels in the object [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison between hierarchical methods. Number of superpixels from left to right: 1250, 800, 500, 150, 50. H-SPAM produces regular and easily interpretable regions, that align well with object boundaries compared to other methods [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison between hierarchical methods. Number of superpixels from left to right: 1250, 800, 500, 150, 50. H-SPAM produces regular and easily interpretable regions, that align well with object boundaries compared to other methods [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative example of H-SPAM for different scales. From left to right: 1250, 500, 150, 50 superpixels. H-SPAM produces perfectly nested, very regular and easily interpretable regions, that align well with object boundaries compared to other methods [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

read the original abstract

Superpixels offer a compact image representation by grouping pixels into coherent regions. Recent methods have reached a plateau in terms of segmentation accuracy by generating noisy superpixel shapes. Moreover, most existing approaches produce a single fixed-scale partition that limits their use in vision pipelines that would benefit multi-scale representations. In this work, we introduce H-SPAM (Hierarchical Superpixel Anything Model), a unified framework for generating accurate, regular, and perfectly nested hierarchical superpixels. Starting from a fine partition, guided by deep features and external object priors, H-SPAM constructs the hierarchy through a two-phase region merging process that first preserves object consistency and then allows controlled inter-object grouping. The hierarchy can also be modulated using visual attention maps or user input to preserve important regions longer in the hierarchy. Experiments on standard benchmarks show that H-SPAM strongly outperforms existing hierarchical methods in both accuracy and regularity, while performing on par with most recent state-of-the-art non-hierarchical methods. Code and pretrained models are available: https://github.com/waldo-j/hspam.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

H-SPAM's two-phase merging produces nested hierarchies from fine partitions, but the reported gains probably trace more to SAM-style inputs and deep features than to the merging logic itself.

read the letter

The core contribution is a two-phase region merging process that starts from a fine partition, first locks in object consistency using external priors, then permits controlled cross-object merges, with optional attention modulation for preserving key areas longer. This yields perfectly nested multi-scale superpixels that the abstract claims are more accurate and regular than prior hierarchical methods while matching recent flat ones. They also release code and models, which is useful for anyone who needs hierarchical regions inside a larger pipeline. The approach is a clear extension of single-scale or non-nested merging ideas, and the object-consistency step plus attention option are concrete additions not described in the cited prior work. The experiments on standard benchmarks are presented as showing strong outperformance on hierarchical baselines, which would be practically helpful if the numbers hold. The main soft spot is that the abstract does not isolate whether the two-phase structure is load-bearing. Older hierarchical baselines lacked access to the same modern deep features or SAM-derived starting partitions, so the edge could come from those inputs rather than the specific intra-then-inter merging sequence. An ablation feeding the identical fine partition and affinities to a conventional single-stage merger would clarify this, but it is not mentioned. Without that control, the novelty of the merging strategy remains plausible but not fully demonstrated. This paper is for vision researchers who build systems that benefit from multi-scale nested superpixels, such as segmentation or detection pipelines. A reader working on superpixel methods or needing ready hierarchical output with public code would get direct value. It has enough concrete claims, released artifacts, and engagement with the literature to deserve serious referee time, even if revisions are needed to strengthen the ablation evidence.

Referee Report

2 major / 2 minor

Summary. The paper introduces H-SPAM, a unified framework for generating accurate, regular, and perfectly nested hierarchical superpixels. It starts from a fine partition (informed by SAM-style methods), then applies a two-phase region merging process guided by deep features and external object priors: first preserving intra-object consistency, then allowing controlled inter-object grouping. The hierarchy can be modulated via attention maps or user input. Experiments on standard benchmarks are reported to show strong outperformance over existing hierarchical superpixel methods in both accuracy and regularity, while matching recent non-hierarchical state-of-the-art approaches. Code and pretrained models are released.

Significance. If the central performance claims hold after addressing the isolation of the two-phase contribution, H-SPAM would meaningfully advance hierarchical superpixel generation for multi-scale vision pipelines, addressing the plateau in accuracy and lack of nesting in prior work. The public release of code and models is a clear strength that supports reproducibility. The result would be of interest to the computer vision community working on segmentation and region-based representations.

major comments (2)

[Experimental section] Experimental section: The manuscript does not report an ablation that applies the identical SAM-derived fine partition and deep feature affinities to a conventional single-stage merging procedure. Without this control, it remains unclear whether the reported gains in accuracy and regularity over prior hierarchical baselines are attributable to the two-phase construction (intra-object then inter-object grouping) or to the modern foundation-model inputs unavailable to the cited earlier methods. This directly affects the load-bearing status of the headline algorithmic claim.
[§5] §5 (results on nesting and boundary fidelity): The claim of 'perfectly nested hierarchies' without boundary errors at coarser levels is central but lacks quantitative verification (e.g., a metric tracking nesting violations or boundary drift across hierarchy levels). The two-phase description does not specify how inter-object grouping is constrained to guarantee nesting while preserving accuracy.

minor comments (2)

[Abstract] The abstract and introduction use 'external object priors' without a concise upfront definition or reference to how they are obtained; a single clarifying sentence would improve accessibility.
Figure captions in the experimental section could more explicitly label which rows/columns correspond to different hierarchy levels and which metrics are being visualized to aid quick interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help improve the clarity and rigor of our work. We address each major comment point-by-point below, and plan to incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: Experimental section: The manuscript does not report an ablation that applies the identical SAM-derived fine partition and deep feature affinities to a conventional single-stage merging procedure. Without this control, it remains unclear whether the reported gains in accuracy and regularity over prior hierarchical baselines are attributable to the two-phase construction (intra-object then inter-object grouping) or to the modern foundation-model inputs unavailable to the cited earlier methods. This directly affects the load-bearing status of the headline algorithmic claim.

Authors: We acknowledge that an explicit ablation isolating the effect of the two-phase merging strategy from the use of modern foundation model features is valuable for substantiating our algorithmic contribution. In the revised version, we will add such an ablation study. Specifically, we will compare our two-phase approach against a baseline single-stage merging procedure that uses the exact same SAM-derived fine partition, deep feature affinities, and object priors. This will demonstrate that the performance improvements stem from the hierarchical two-phase design rather than solely from the input features. We believe this addition will address the referee's concern directly. revision: yes
Referee: §5 (results on nesting and boundary fidelity): The claim of 'perfectly nested hierarchies' without boundary errors at coarser levels is central but lacks quantitative verification (e.g., a metric tracking nesting violations or boundary drift across hierarchy levels). The two-phase description does not specify how inter-object grouping is constrained to guarantee nesting while preserving accuracy.

Authors: The 'perfectly nested' property is guaranteed by the region merging process, as coarser levels are formed exclusively by union of regions from finer levels without any boundary modifications or splits. To provide quantitative support, we will introduce a nesting violation metric in the revised §5, which counts the number of boundary inconsistencies across levels (expected to be zero by construction). We will also expand the method description to detail the constraints in the inter-object grouping phase: merging is restricted to adjacent regions with similar deep features and guided by attention maps to avoid crossing object boundaries unnecessarily, thus preserving accuracy. This clarification will be added to ensure the guarantee is explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic construction independent of fitted inputs or self-referential definitions

full rationale

The paper describes H-SPAM as an explicit algorithmic procedure: initialize from a fine partition (e.g., SAM-derived), then apply a two-phase region-merging process guided by deep features and external priors, with optional modulation by attention maps. No equations, parameters, or predictions are shown to be fitted to the target hierarchy and then re-used as outputs. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked to justify the core construction. The hierarchy is produced by the stated merging rules rather than being presupposed in the inputs. Experimental claims rest on benchmark comparisons, not on any definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on standard deep feature extractors and region-merging principles from earlier superpixel literature; no new physical entities or unstated mathematical axioms are introduced in the abstract.

pith-pipeline@v0.9.0 · 5482 in / 1057 out tokens · 37498 ms · 2026-05-10T15:36:02.817746+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

IEEE TPAMI34, 2274–2282 (2012)

Achanta,R.,Shaji,A.,Smith,K.,Lucchi,A.,Fua,P.,Süsstrunk,S.:SLICsuperpix- els compared to state-of-the-art superpixel methods. IEEE TPAMI34, 2274–2282 (2012)

work page 2012
[2]

In: CVPR (2017)

Achanta, R., Süsstrunk, S.: Superpixels and polygons using simple non-iterative clustering. In: CVPR (2017)

work page 2017
[3]

Belém, F., Kochem, F., Patrocínio, Z., Perret, B., Cousty, J., Falcão, A., Guimarães,S.J.F.:Measuringhierarchinessofimagesegmentations.In:SIBGRAPI (2024)

work page 2024
[4]

Nature methods16(12), 1226–1232 (2019)

Berg, S., Kutra, D., Kroeger, T., Straehle, C.N., Kausler, B.X., Haubold, C., Schiegg, M., Ales, J., Beier, T., Rudy, M., et al.: Ilastik: Interactive machine learn- ing for (bio) image analysis. Nature methods16(12), 1226–1232 (2019)

work page 2019
[5]

Walther et al

Brasó, G., Ošep, A., Leal-Taixé, L.: Native segmentation vision transformers (2025) 14 J. Walther et al. Image SH [31] RISF [11] CRTREES [34] Groundtruth HHTS [8] SIT-HSS [32]H-SP AM Image SH [31] RISF [11] CRTREES [34] Groundtruth HHTS [8] SIT-HSS [32]H-SP AM Fig.11:Qualitative comparison between hierarchical methods.Number of superpixels from left to ri...

work page 2025
[6]

In: CVPR (2021)

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: CVPR (2021)

work page 2021
[7]

In: CVPR (2013)

Chang, J., Wei, D., Fisher, J.W.: A video representation using temporal superpix- els. In: CVPR (2013)

work page 2013
[8]

In: CVPR (2024)

Chang, T.V., Seibt, S., von Rymon Lipinski, B.: Hierarchical histogram threshold segmentation-auto-terminating high-detail oversegmentation. In: CVPR (2024)

work page 2024
[9]

IEEE TIP26, 3317–3330 (2017)

Chen, J., Li, Z., Huang, B.: Linear spectral clustering superpixel. IEEE TIP26, 3317–3330 (2017)

work page 2017
[10]

IEEE TPAMI45(12), 15694–15705 (2023)

Chen, Z., Wang, C., Guo, Y.C., Zhang, S.H.: Structnerf: Neural radiance fields for indoor scenes with structural hints. IEEE TPAMI45(12), 15694–15705 (2023)

work page 2023
[11]

Pattern Recognition108, 107532 (2020)

Galvão, F.L., Guimarães, S.J.F., Falcão, A.X.: Image segmentation using dense and sparse hierarchies of superpixels. Pattern Recognition108, 107532 (2020)

work page 2020
[12]

arXiv:2411.06478 (2024)

Giraud, R., Clément, M.: Superpixel segmentation: A long-lasting ill-posed prob- lem. arXiv:2411.06478 (2024)

work page arXiv 2024
[13]

JEI26(6) (2017)

Giraud, R., Ta, V.T., Papadakis, N.: Evaluation framework of superpixel methods with a global regularity measure. JEI26(6) (2017)

work page 2017
[14]

In: ICCV (2009) H-SPAM: Hierarchical Superpixel Anything Model 15

Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and seman- tically consistent regions. In: ICCV (2009) H-SPAM: Hierarchical Superpixel Anything Model 15

work page 2009
[15]

IJCV80(3), 300–316 (2008)

Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV80(3), 300–316 (2008)

work page 2008
[16]

In: ECCV (2018)

Jampani, V., Sun, D., Liu, M.Y., Yang, M.H., Kautz, J.: Superpixel sampling networks. In: ECCV (2018)

work page 2018
[17]

In: ICLR (2024)

Ke, T.W., Mo, S., Stella, X.Y.: Learning hierarchical image segmentation for recog- nition and by recognition. In: ICLR (2024)

work page 2024
[18]

In: ICCV (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W., Dollár, P., Girshick, R.B.: Segment anything. In: ICCV (2023)

work page 2023
[19]

In: ECCV (2024)

Li, J., Zhao, X., Wang, J., Wang, C., Wang, M.: Superpixel-informed implicit neural representation for multi-dimensional data. In: ECCV (2024)

work page 2024
[20]

IEEE Access13, 186449– 186464 (2025)

Liang, J., Wei, G.: HieraASGSegNet: hierarchical context fusion for semantic seg- mentation via adaptive superpixel graph reasoning. IEEE Access13, 186449– 186464 (2025)

work page 2025
[21]

IEEE TIP24(11), 3707–3716 (2015)

Machairas, V., Faessel, M., Cárdenas-Peña, D., Chabardes, T., Walter, T., Decen- cière, E.: Waterpixels. IEEE TIP24(11), 3707–3716 (2015)

work page 2015
[22]

In: ICCV (2001)

Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV (2001)

work page 2001
[23]

TMLR (2025)

Mei, J., Chen, L.C., Yuille, A., Xie, C.: SPFormer: Enhancing vision transformer with superpixel representation. TMLR (2025)

work page 2025
[24]

Peng, H., Aviles-Rivero, A.I., Schönlieb, C.B.: Hers superpixels: Deep affinity learn- ing for hierarchical entropy rate segmentation (2021)

work page 2021
[25]

In: ECCV (2012)

Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: ECCV (2012)

work page 2012
[26]

CVIU166, 1–27 (2018)

Stutz, D., Hermans, A., Leibe, B.: Superpixels: An evaluation of the state-of-the- art. CVIU166, 1–27 (2018)

work page 2018
[27]

In: CVPR (2018)

Tu, W.C., Liu, M.Y., Jampani, V., Sun, D., Chien, S.Y., Yang, M.H., Kautz, J.: Learning superpixels with segmentation-aware affinity loss. In: CVPR (2018)

work page 2018
[28]

In: ECCV (2012)

Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L.: SEEDS: Superpixels extracted via energy-driven sampling. In: ECCV (2012)

work page 2012
[29]

In: BMVC (2025)

Walther, J., Giraud, R., Clément, M.: Superpixel anything: A general object-based framework for accurate yet regular superpixel segmentation. In: BMVC (2025)

work page 2025
[30]

In: ICCV (2021)

Wang, Y., Wei, Y., Qian, X., Zhu, L., Yang, Y.: AINet: Association implantation for superpixel segmentation. In: ICCV (2021)

work page 2021
[31]

IEEE TIP27(10), 4838–4849 (2018)

Wei, X., Yang, Q., Gong, Y., Ahuja, N., Yang, M.H.: Superpixel hierarchy. IEEE TIP27(10), 4838–4849 (2018)

work page 2018
[32]

In: SIAM International Conference on Data Mining (SDM) (2025)

Xie, M., Peng, H., Li, P., Zeng, G., Wang, S., Wu, J., Li, P., Yu, P.S.: Hierarchical superpixel segmentation via structural information theory. In: SIAM International Conference on Data Mining (SDM) (2025)

work page 2025
[33]

In: AAAI (2024)

Xu, S., Wei, S., Ruan, T., Liao, L.: Learning invariant inter-pixel correlations for superpixel generation. In: AAAI (2024)

work page 2024
[34]

IEEE TIP31, 4719–4732 (2022)

Yan, T., Huang, X., Zhao, Q.: Hierarchical superpixel segmentation by parallel crtrees labeling. IEEE TIP31, 4719–4732 (2022)

work page 2022
[35]

In: CVPR (2020)

Yang, F., Sun, Q., Jin, H., Zhou, Z.: Superpixel segmentation with fully convolu- tional networks. In: CVPR (2020)

work page 2020
[36]

Fast segment anything,

Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J.: Fast segment anything. arXiv:2306.12156 (2023)

work page arXiv 2023
[37]

IEEE TIP 32, 878–891 (2023) H-SPAM: Hierarchical Superpixel Anything Model — Supplementary Material — Julien Walther1,2 , Rémi Giraud1, and Michaël Clément2 1 Univ

Zhou, P., Kang, X., Ming, A.: Vine spread for superpixel segmentation. IEEE TIP 32, 878–891 (2023) H-SPAM: Hierarchical Superpixel Anything Model — Supplementary Material — Julien Walther1,2 , Rémi Giraud1, and Michaël Clément2 1 Univ. Bordeaux, CNRS, Bordeaux INP, IMS, UMR 5218, France 2 Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, France 1 Impac...

work page 2023

[1] [1]

IEEE TPAMI34, 2274–2282 (2012)

Achanta,R.,Shaji,A.,Smith,K.,Lucchi,A.,Fua,P.,Süsstrunk,S.:SLICsuperpix- els compared to state-of-the-art superpixel methods. IEEE TPAMI34, 2274–2282 (2012)

work page 2012

[2] [2]

In: CVPR (2017)

Achanta, R., Süsstrunk, S.: Superpixels and polygons using simple non-iterative clustering. In: CVPR (2017)

work page 2017

[3] [3]

Belém, F., Kochem, F., Patrocínio, Z., Perret, B., Cousty, J., Falcão, A., Guimarães,S.J.F.:Measuringhierarchinessofimagesegmentations.In:SIBGRAPI (2024)

work page 2024

[4] [4]

Nature methods16(12), 1226–1232 (2019)

Berg, S., Kutra, D., Kroeger, T., Straehle, C.N., Kausler, B.X., Haubold, C., Schiegg, M., Ales, J., Beier, T., Rudy, M., et al.: Ilastik: Interactive machine learn- ing for (bio) image analysis. Nature methods16(12), 1226–1232 (2019)

work page 2019

[5] [5]

Walther et al

Brasó, G., Ošep, A., Leal-Taixé, L.: Native segmentation vision transformers (2025) 14 J. Walther et al. Image SH [31] RISF [11] CRTREES [34] Groundtruth HHTS [8] SIT-HSS [32]H-SP AM Image SH [31] RISF [11] CRTREES [34] Groundtruth HHTS [8] SIT-HSS [32]H-SP AM Fig.11:Qualitative comparison between hierarchical methods.Number of superpixels from left to ri...

work page 2025

[6] [6]

In: CVPR (2021)

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: CVPR (2021)

work page 2021

[7] [7]

In: CVPR (2013)

Chang, J., Wei, D., Fisher, J.W.: A video representation using temporal superpix- els. In: CVPR (2013)

work page 2013

[8] [8]

In: CVPR (2024)

Chang, T.V., Seibt, S., von Rymon Lipinski, B.: Hierarchical histogram threshold segmentation-auto-terminating high-detail oversegmentation. In: CVPR (2024)

work page 2024

[9] [9]

IEEE TIP26, 3317–3330 (2017)

Chen, J., Li, Z., Huang, B.: Linear spectral clustering superpixel. IEEE TIP26, 3317–3330 (2017)

work page 2017

[10] [10]

IEEE TPAMI45(12), 15694–15705 (2023)

Chen, Z., Wang, C., Guo, Y.C., Zhang, S.H.: Structnerf: Neural radiance fields for indoor scenes with structural hints. IEEE TPAMI45(12), 15694–15705 (2023)

work page 2023

[11] [11]

Pattern Recognition108, 107532 (2020)

Galvão, F.L., Guimarães, S.J.F., Falcão, A.X.: Image segmentation using dense and sparse hierarchies of superpixels. Pattern Recognition108, 107532 (2020)

work page 2020

[12] [12]

arXiv:2411.06478 (2024)

Giraud, R., Clément, M.: Superpixel segmentation: A long-lasting ill-posed prob- lem. arXiv:2411.06478 (2024)

work page arXiv 2024

[13] [13]

JEI26(6) (2017)

Giraud, R., Ta, V.T., Papadakis, N.: Evaluation framework of superpixel methods with a global regularity measure. JEI26(6) (2017)

work page 2017

[14] [14]

In: ICCV (2009) H-SPAM: Hierarchical Superpixel Anything Model 15

Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and seman- tically consistent regions. In: ICCV (2009) H-SPAM: Hierarchical Superpixel Anything Model 15

work page 2009

[15] [15]

IJCV80(3), 300–316 (2008)

Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D.: Multi-class segmentation with relative location prior. IJCV80(3), 300–316 (2008)

work page 2008

[16] [16]

In: ECCV (2018)

Jampani, V., Sun, D., Liu, M.Y., Yang, M.H., Kautz, J.: Superpixel sampling networks. In: ECCV (2018)

work page 2018

[17] [17]

In: ICLR (2024)

Ke, T.W., Mo, S., Stella, X.Y.: Learning hierarchical image segmentation for recog- nition and by recognition. In: ICLR (2024)

work page 2024

[18] [18]

In: ICCV (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W., Dollár, P., Girshick, R.B.: Segment anything. In: ICCV (2023)

work page 2023

[19] [19]

In: ECCV (2024)

Li, J., Zhao, X., Wang, J., Wang, C., Wang, M.: Superpixel-informed implicit neural representation for multi-dimensional data. In: ECCV (2024)

work page 2024

[20] [20]

IEEE Access13, 186449– 186464 (2025)

Liang, J., Wei, G.: HieraASGSegNet: hierarchical context fusion for semantic seg- mentation via adaptive superpixel graph reasoning. IEEE Access13, 186449– 186464 (2025)

work page 2025

[21] [21]

IEEE TIP24(11), 3707–3716 (2015)

Machairas, V., Faessel, M., Cárdenas-Peña, D., Chabardes, T., Walter, T., Decen- cière, E.: Waterpixels. IEEE TIP24(11), 3707–3716 (2015)

work page 2015

[22] [22]

In: ICCV (2001)

Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV (2001)

work page 2001

[23] [23]

TMLR (2025)

Mei, J., Chen, L.C., Yuille, A., Xie, C.: SPFormer: Enhancing vision transformer with superpixel representation. TMLR (2025)

work page 2025

[24] [24]

Peng, H., Aviles-Rivero, A.I., Schönlieb, C.B.: Hers superpixels: Deep affinity learn- ing for hierarchical entropy rate segmentation (2021)

work page 2021

[25] [25]

In: ECCV (2012)

Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: ECCV (2012)

work page 2012

[26] [26]

CVIU166, 1–27 (2018)

Stutz, D., Hermans, A., Leibe, B.: Superpixels: An evaluation of the state-of-the- art. CVIU166, 1–27 (2018)

work page 2018

[27] [27]

In: CVPR (2018)

Tu, W.C., Liu, M.Y., Jampani, V., Sun, D., Chien, S.Y., Yang, M.H., Kautz, J.: Learning superpixels with segmentation-aware affinity loss. In: CVPR (2018)

work page 2018

[28] [28]

In: ECCV (2012)

Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L.: SEEDS: Superpixels extracted via energy-driven sampling. In: ECCV (2012)

work page 2012

[29] [29]

In: BMVC (2025)

Walther, J., Giraud, R., Clément, M.: Superpixel anything: A general object-based framework for accurate yet regular superpixel segmentation. In: BMVC (2025)

work page 2025

[30] [30]

In: ICCV (2021)

Wang, Y., Wei, Y., Qian, X., Zhu, L., Yang, Y.: AINet: Association implantation for superpixel segmentation. In: ICCV (2021)

work page 2021

[31] [31]

IEEE TIP27(10), 4838–4849 (2018)

Wei, X., Yang, Q., Gong, Y., Ahuja, N., Yang, M.H.: Superpixel hierarchy. IEEE TIP27(10), 4838–4849 (2018)

work page 2018

[32] [32]

In: SIAM International Conference on Data Mining (SDM) (2025)

Xie, M., Peng, H., Li, P., Zeng, G., Wang, S., Wu, J., Li, P., Yu, P.S.: Hierarchical superpixel segmentation via structural information theory. In: SIAM International Conference on Data Mining (SDM) (2025)

work page 2025

[33] [33]

In: AAAI (2024)

Xu, S., Wei, S., Ruan, T., Liao, L.: Learning invariant inter-pixel correlations for superpixel generation. In: AAAI (2024)

work page 2024

[34] [34]

IEEE TIP31, 4719–4732 (2022)

Yan, T., Huang, X., Zhao, Q.: Hierarchical superpixel segmentation by parallel crtrees labeling. IEEE TIP31, 4719–4732 (2022)

work page 2022

[35] [35]

In: CVPR (2020)

Yang, F., Sun, Q., Jin, H., Zhou, Z.: Superpixel segmentation with fully convolu- tional networks. In: CVPR (2020)

work page 2020

[36] [36]

Fast segment anything,

Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J.: Fast segment anything. arXiv:2306.12156 (2023)

work page arXiv 2023

[37] [37]

IEEE TIP 32, 878–891 (2023) H-SPAM: Hierarchical Superpixel Anything Model — Supplementary Material — Julien Walther1,2 , Rémi Giraud1, and Michaël Clément2 1 Univ

Zhou, P., Kang, X., Ming, A.: Vine spread for superpixel segmentation. IEEE TIP 32, 878–891 (2023) H-SPAM: Hierarchical Superpixel Anything Model — Supplementary Material — Julien Walther1,2 , Rémi Giraud1, and Michaël Clément2 1 Univ. Bordeaux, CNRS, Bordeaux INP, IMS, UMR 5218, France 2 Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, France 1 Impac...

work page 2023