Causal Unsupervised Semantic Segmentation

Byung-Kwan Lee; Junho Kim; Yong Man Ro

arxiv: 2310.07379 · v1 · submitted 2023-10-11 · 💻 cs.CV · cs.AI· cs.LG

Causal Unsupervised Semantic Segmentation

Junho Kim , Byung-Kwan Lee , Yong Man Ro This is my paper

Pith reviewed 2026-05-24 05:51 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords unsupervised semantic segmentationcausal inferencefrontdoor adjustmentconcept clusterbookself-supervised learningpixel-level groupingmediatorclustering granularity

0 comments

The pith

Frontdoor adjustment from causal inference builds a mediator to set clustering levels for unsupervised semantic segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Unsupervised semantic segmentation faces the problem of choosing the correct clustering granularity for concepts when no labels are available. The paper proposes CAUSE, a framework that draws on causal inference by applying frontdoor adjustment to create a two-step unsupervised prediction process. The first step builds a concept clusterbook that acts as a discretized mediator capturing prototypes at multiple levels of detail. This mediator then connects directly to concept-wise self-supervised learning that performs the final pixel-level grouping. Experiments across datasets show this causal mediation yields state-of-the-art results by solving the clustering-level choice without human annotations.

Core claim

The paper claims that bridging an intervention-oriented causal approach, specifically frontdoor adjustment, defines suitable two-step tasks: first constructing a concept clusterbook as a mediator representing possible concept prototypes at different granularities in discretized form, then using that mediator to establish an explicit link to concept-wise self-supervised learning for pixel-level grouping, thereby addressing the clustering-level challenge and reaching state-of-the-art unsupervised semantic segmentation performance.

What carries the argument

The concept clusterbook mediator, built via frontdoor adjustment, which discretizes concept prototypes at varying granularity levels and links them to subsequent pixel grouping.

If this is right

The two-step causal process directly solves the problem of selecting appropriate clustering granularity without labels.
The mediator provides an explicit, interpretable connection between concept prototypes and pixel-level self-supervised grouping.
State-of-the-art unsupervised semantic segmentation performance is obtained across multiple standard datasets.
The framework corroborates the usefulness of intervention-based causal tools for defining unsupervised dense prediction tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mediator construction could be tested on other unsupervised dense prediction tasks such as instance segmentation or depth estimation.
Discretized clusterbooks may allow post-hoc analysis of which granularity levels contribute most to final segments.
The causal framing suggests possible combinations with other self-supervised pre-training objectives to further stabilize the mediator.

Load-bearing premise

The concept clusterbook produced by frontdoor adjustment correctly functions as a mediator that identifies the right clustering level for segmenting concepts.

What would settle it

If an ablation that removes the frontdoor adjustment step and directly trains the prediction head achieves equal or higher segmentation accuracy on the same benchmarks, the claim that the causal mediator is required would be falsified.

Figures

Figures reproduced from arXiv: 2310.07379 by Byung-Kwan Lee, Junho Kim, Yong Man Ro.

**Figure 2.** Figure 2: Causal diagram of CAUSE. We split USS into two steps to identify relation between pre-trained features T and semantic groups Y using clusterbook M. Specifically, the unsupervised segmentation (T → Y ) is a procedure for deriving semantically clustered groups Y distilled from pre-trained features T. However, the indeterminate U of unsupervised prediction (i.e., what and how to cluster) can lead confoundin… view at source ↗

**Figure 3.** Figure 3: The overall architecture of CAUSE comprises (i): constructing discretized concept cluster [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of unsupervised semantic segmentation for Cityscapes dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Additional experimental for in-depth analysis and ablation studies of CAUSE-TR. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Additional qualitative results of unsupervised semantic segmentation for Coco-Stuff. Please [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative results of unsupervised semantic segmentation for COCO-171, which is [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Additional qualitative results of unsupervised semantic segmentation for Cityscapes. Please [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative results of unsupervised semantic segmentation for PASCAL VOC and COCO [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Failure cases of CAUSE and comparison results with other baselines. [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Retrieval results of the concept with respect to the shared index on clusterBook. We select [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

read the original abstract

Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction heads for unsupervised dense prediction. However, a significant challenge in this unsupervised setup is determining the appropriate level of clustering required for segmenting concepts. To address it, we propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference. Specifically, we bridge intervention-oriented approach (i.e., frontdoor adjustment) to define suitable two-step tasks for unsupervised prediction. The first step involves constructing a concept clusterbook as a mediator, which represents possible concept prototypes at different levels of granularity in a discretized form. Then, the mediator establishes an explicit link to the subsequent concept-wise self-supervised learning for pixel-level grouping. Through extensive experiments and analyses on various datasets, we corroborate the effectiveness of CAUSE and achieve state-of-the-art performance in unsupervised semantic segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CAUSE frames unsupervised segmentation granularity via frontdoor adjustment on a clusterbook mediator, but skips the causal graph, variable definitions, and identifiability checks needed to make the claim hold.

read the letter

The paper's main contribution is applying frontdoor adjustment from causal inference to pick clustering granularity in unsupervised semantic segmentation. It builds a discretized concept clusterbook as mediator, then links that to concept-wise self-supervised learning for pixel grouping, and reports SOTA numbers on standard datasets. The two-step structure is presented as the direct result of the causal bridge. This specific use of frontdoor to define the tasks is not in the prior unsupervised segmentation literature they cite, so the framing itself counts as new. The practical problem it targets—how to choose the right level of clustering without labels—is real, and the mediator idea gives a concrete way to connect the steps. Experiments are described as extensive, which suggests they at least ran the usual benchmarks and saw gains. The soft spot is exactly the one in the stress-test note. The abstract and method sketch give no causal graph, no explicit treatment or outcome variables, and no verification that the clusterbook intercepts all paths or that there is no unmeasured confounding between the steps. Without those, the frontdoor formula is invoked but not shown to apply, so the causal language functions more as justification for the discretization than as a proven mechanism. The weakest assumption is that the clusterbook mediator correctly selects granularity because of the causal criterion rather than because the discretization happens to work. This paper is for computer-vision groups already working on unsupervised dense prediction who want to try a causal-flavored clustering trick. A reader who cares about whether the causal part is load-bearing will need the full derivations and ablations. It deserves peer review because the framing is distinct and the empirical claim is testable, even though the causal justification will require heavy revision.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes CAUSE, a framework for unsupervised semantic segmentation that applies frontdoor adjustment from causal inference. It constructs a 'concept clusterbook' as a mediator representing concept prototypes at varying granularities in discretized form, then uses this to define a subsequent concept-wise self-supervised learning step for pixel-level grouping. The authors claim this principled choice of clustering level yields state-of-the-art performance across datasets.

Significance. If the frontdoor application is shown to be valid and the empirical gains are reproducible, the work could supply a causal criterion for selecting granularity in unsupervised dense prediction, reducing reliance on heuristic clustering choices common in self-supervised segmentation pipelines.

major comments (2)

[framework description / two-step tasks] The method description (abstract and framework overview) invokes frontdoor adjustment via the concept clusterbook mediator without defining the treatment variable (pre-trained features?), outcome variable (pixel grouping or segmentation quality?), or verifying the three frontdoor identifiability conditions: (1) mediator intercepts all directed paths from treatment to outcome, (2) no unmeasured confounding between treatment-mediator and mediator-outcome, and (3) mediator is observed. This is load-bearing for the central claim that the clusterbook 'correctly determines the appropriate level of clustering'.
[method / causal inference bridge] No explicit causal graph, intervention definitions, or identifiability proof is provided to show that the discretization step into the clusterbook satisfies the frontdoor formula rather than being an ad-hoc two-stage procedure. Without this, the 'bridge' from causal inference to the unsupervised tasks remains unverified and the SOTA claim rests on empirical results alone.

minor comments (1)

[abstract] The abstract states 'we bridge intervention-oriented approach (i.e., frontdoor adjustment)' but does not cite the specific frontdoor formula or reference the original Pearl formulation used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the causal framework in our manuscript. We address the major points below and will revise the paper to improve clarity on the frontdoor adjustment application.

read point-by-point responses

Referee: [framework description / two-step tasks] The method description (abstract and framework overview) invokes frontdoor adjustment via the concept clusterbook mediator without defining the treatment variable (pre-trained features?), outcome variable (pixel grouping or segmentation quality?), or verifying the three frontdoor identifiability conditions: (1) mediator intercepts all directed paths from treatment to outcome, (2) no unmeasured confounding between treatment-mediator and mediator-outcome, and (3) mediator is observed. This is load-bearing for the central claim that the clusterbook 'correctly determines the appropriate level of clustering'.

Authors: We acknowledge that the manuscript does not explicitly define the treatment and outcome variables or verify the frontdoor conditions in detail. In the revised version, we will add explicit definitions: the treatment as the pre-trained feature representations, the outcome as the pixel-level grouping quality, and the mediator as the discretized concept clusterbook. We will also discuss satisfaction of the three conditions, noting that the clusterbook intercepts paths by representing concept prototypes at varying granularities, confounding is mitigated by the self-supervised construction, and the mediator is directly observed through discretization. This will better ground the claim on appropriate clustering level. revision: yes
Referee: [method / causal inference bridge] No explicit causal graph, intervention definitions, or identifiability proof is provided to show that the discretization step into the clusterbook satisfies the frontdoor formula rather than being an ad-hoc two-stage procedure. Without this, the 'bridge' from causal inference to the unsupervised tasks remains unverified and the SOTA claim rests on empirical results alone.

Authors: We agree that an explicit causal graph and intervention definitions would strengthen the presentation. The revision will include a causal graph figure showing treatment (pre-trained features), mediator (clusterbook), and outcome (segmentation), along with intervention definitions for granularity selection. We will provide a reasoned explanation of how the discretization satisfies the frontdoor formula via the mediator's properties rather than being ad-hoc. While a complete formal identifiability proof is beyond the scope of this applied contribution, the added discussion will clarify the bridge; empirical results serve as supporting evidence for the framework's utility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies external causal concepts without reduction to inputs

full rationale

The paper's central derivation introduces CAUSE by bridging frontdoor adjustment to motivate a two-step unsupervised segmentation pipeline (concept clusterbook mediator followed by concept-wise SSL). This is a conceptual mapping from causal inference literature rather than a self-contained mathematical reduction. No equations or steps are shown to be equivalent to their inputs by construction, no parameters are fitted then relabeled as predictions, and no load-bearing self-citations or uniqueness theorems from the same authors are invoked. The framework remains self-contained against external causal benchmarks and does not rename known empirical patterns as novel derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available, so no concrete free parameters, background axioms, or invented entities beyond the high-level description can be audited.

invented entities (1)

concept clusterbook no independent evidence
purpose: mediator representing possible concept prototypes at different levels of granularity in discretized form
Introduced in the abstract as the central mediator constructed via frontdoor adjustment.

pith-pipeline@v0.9.0 · 5696 in / 1148 out tokens · 27784 ms · 2026-05-24T05:51:19.097500+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 4 internal anchors

[1]

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Segdiscover: Visual concept discovery via unsuper- vised semantic segmentation

Haiyang Huang, Zhi Chen, and Cynthia Rudin. Segdiscover: Visual concept discovery via unsuper- vised semantic segmentation. arXiv preprint arXiv:2204.10926,

work page arXiv
[3]

Modern hierarchical, agglomerative clustering algorithms

Daniel Müllner. Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Discovering object masks with transformers for unsupervised semantic segmentation

Wouter Van Gansbeke, Simon Vandenhende, and Luc Van Gool. Discovering object masks with transformers for unsupervised semantic segmentation. arXiv preprint arXiv:2206.06363,

work page arXiv
[7]

One of the key challenges in unsupervised dense prediction is the need to learn semantic representations for each pixel without the guidance of labeled data

A E XPANSION OF RELATED WORKS Unsupervised Semantic Segmentation. One of the key challenges in unsupervised dense prediction is the need to learn semantic representations for each pixel without the guidance of labeled data. In an early work for unsupervised semantic segmentation (USS), Ji et al. (2019) introduced the IIC framework, which aims to maximize ...

work page 2019
[8]

More recently, the discovery of semantic consistency in pre-trained self-supervised frameworks at the feature attention map (Caron et al.,

or by incorporating saliency information in an end-to-end manner (Van Gansbeke et al., 2021; Ke et al., 2022). More recently, the discovery of semantic consistency in pre-trained self-supervised frameworks at the feature attention map (Caron et al.,

work page 2021
[9]

Hamilton et al

has led to prevalent approaches. Hamilton et al. (2022) introduced a method that leverages pre-trained knowledge and distills this information into the unsupervised segmentation task. Following this, various works (Wen et al., 2022; Yin et al., 2022; Ziegler & Asano,

work page 2022
[10]

have employed self-supervised representations as pseudo segmentation labels (Zadaianchuk et al., 2023; Li et al.,

work page 2023
[11]

or as pre-encoded representations to incorporate ad- ditional prior knowledge (Van Gansbeke et al., 2021; Zadaianchuk et al.,

work page 2021
[12]

Our work aligns with Hamilton et al

into the segmentation frameworks. Our work aligns with Hamilton et al. (2022); Seong et al. (2023) in terms of enhancing segmen- tation features solely with the pre-trained representation. However, we emphasize the presence of indeterminate clustering targets inherent in unsupervised segmentation tasks. Our qualitative and quantitative results have demons...

work page 2022
[13]

The fundamental approach to achieve causal identification involves blocking backdoor paths induced from confounders

have applied causal inference techniques in DNNs to estimate the true causal effects between treatments and outcomes of interest. The fundamental approach to achieve causal identification involves blocking backdoor paths induced from confounders. In several computer vision methods have employed various causal approaches such as backdoor adjustment establi...

work page 2020
[14]

which can identify causal effects without the requirement of observed confounders, but relatively less explored in the context of computer vision tasks (Yang et al., 2021b;a). Inspired by recent developments in discrete representation learning (Van Den Oord et al., 2017; Esser et al., 2021), we proactively build a discretized concept representation and se...

work page 2017
[15]

(2017); Carion et al

B.2 T RANSFORMER -BASED SEGMENTATION HEAD We use a single layer transformer decoder inspired by Vaswani et al. (2017); Carion et al. (2020) to build segmentation head with self-attention (SA), cross-attention (CA), and feed forward network (FFN) with its 2048 inner-dimension by default hyper-parameter (Vaswani et al., 2017), where a single head attention ...

work page 2017
[16]

Before re-sampling, 50% of Ybank is randomly discarded

B.5 C ONCEPT BANK In line 10 of Algorithm 2, the concept bank Ybank follows a specific rule: not all of the segmentation features Yema are collected, but they are instead 50% re-sampled based on the most closest concept indices individually, where the concept bank collects a maximum of 100 features per concept prototype. Before re-sampling, 50% of Ybank i...

work page 2048
[17]

For 17 inference phase, images are resized to320×320 along the minor axis followed by center crops of each validation image

which employ five-crop with crop ratio of 0.5 in full image resolution and resizes the cropped images to 224 × 224 for CAUSE-MLP in training phase. For 17 inference phase, images are resized to320×320 along the minor axis followed by center crops of each validation image. For CAUSE-TR, 320 × 320 image resolution is used to train segmentation head of a sin...

work page 2012
[18]

which employs multiple-crop with multiple ratio. A significant different point is that STEGO, HP, and TransFGU employ additional data-augmentation techniques, including Horizontal Flip, Color-Jittering, Gray-scaling, and Gaussian- Blurring as geometric and photometric transforms, but CAUSE utilizes Horizontal Flip only. C A DDITIONAL EXPERIMENTS Due to pa...

work page 2022
[19]

Additionally, we present qualitative results for object-centric semantic segmentation by providing visualizations for the PASCAL VOC, COCO-81 and COCO-171 in Fig

feature representations. Additionally, we present qualitative results for object-centric semantic segmentation by providing visualizations for the PASCAL VOC, COCO-81 and COCO-171 in Fig. 9 and Fig. 7, respectively. All of these datasets include an additional background class. While the negative relaxation is set to the same value of 0.1, we have adjusted...

work page 2019
[20]

It is significantly challenging to handle fine-grained and complex scenes when dealing with unsupervised semantic segmentation using pre-trained feature representation

D D ISCUSSIONS AND LIMITATIONS Bootstrapping Pre-trained Models. It is significantly challenging to handle fine-grained and complex scenes when dealing with unsupervised semantic segmentation using pre-trained feature representation. Based on the fact that the pre-trained features are designed to capture high-level semantic information, STEGO (Hamilton et...

work page 2022

[1] [1]

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Segdiscover: Visual concept discovery via unsuper- vised semantic segmentation

Haiyang Huang, Zhi Chen, and Cynthia Rudin. Segdiscover: Visual concept discovery via unsuper- vised semantic segmentation. arXiv preprint arXiv:2204.10926,

work page arXiv

[3] [3]

Modern hierarchical, agglomerative clustering algorithms

Daniel Müllner. Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Discovering object masks with transformers for unsupervised semantic segmentation

Wouter Van Gansbeke, Simon Vandenhende, and Luc Van Gool. Discovering object masks with transformers for unsupervised semantic segmentation. arXiv preprint arXiv:2206.06363,

work page arXiv

[7] [7]

One of the key challenges in unsupervised dense prediction is the need to learn semantic representations for each pixel without the guidance of labeled data

A E XPANSION OF RELATED WORKS Unsupervised Semantic Segmentation. One of the key challenges in unsupervised dense prediction is the need to learn semantic representations for each pixel without the guidance of labeled data. In an early work for unsupervised semantic segmentation (USS), Ji et al. (2019) introduced the IIC framework, which aims to maximize ...

work page 2019

[8] [8]

More recently, the discovery of semantic consistency in pre-trained self-supervised frameworks at the feature attention map (Caron et al.,

or by incorporating saliency information in an end-to-end manner (Van Gansbeke et al., 2021; Ke et al., 2022). More recently, the discovery of semantic consistency in pre-trained self-supervised frameworks at the feature attention map (Caron et al.,

work page 2021

[9] [9]

Hamilton et al

has led to prevalent approaches. Hamilton et al. (2022) introduced a method that leverages pre-trained knowledge and distills this information into the unsupervised segmentation task. Following this, various works (Wen et al., 2022; Yin et al., 2022; Ziegler & Asano,

work page 2022

[10] [10]

have employed self-supervised representations as pseudo segmentation labels (Zadaianchuk et al., 2023; Li et al.,

work page 2023

[11] [11]

or as pre-encoded representations to incorporate ad- ditional prior knowledge (Van Gansbeke et al., 2021; Zadaianchuk et al.,

work page 2021

[12] [12]

Our work aligns with Hamilton et al

into the segmentation frameworks. Our work aligns with Hamilton et al. (2022); Seong et al. (2023) in terms of enhancing segmen- tation features solely with the pre-trained representation. However, we emphasize the presence of indeterminate clustering targets inherent in unsupervised segmentation tasks. Our qualitative and quantitative results have demons...

work page 2022

[13] [13]

The fundamental approach to achieve causal identification involves blocking backdoor paths induced from confounders

have applied causal inference techniques in DNNs to estimate the true causal effects between treatments and outcomes of interest. The fundamental approach to achieve causal identification involves blocking backdoor paths induced from confounders. In several computer vision methods have employed various causal approaches such as backdoor adjustment establi...

work page 2020

[14] [14]

which can identify causal effects without the requirement of observed confounders, but relatively less explored in the context of computer vision tasks (Yang et al., 2021b;a). Inspired by recent developments in discrete representation learning (Van Den Oord et al., 2017; Esser et al., 2021), we proactively build a discretized concept representation and se...

work page 2017

[15] [15]

(2017); Carion et al

B.2 T RANSFORMER -BASED SEGMENTATION HEAD We use a single layer transformer decoder inspired by Vaswani et al. (2017); Carion et al. (2020) to build segmentation head with self-attention (SA), cross-attention (CA), and feed forward network (FFN) with its 2048 inner-dimension by default hyper-parameter (Vaswani et al., 2017), where a single head attention ...

work page 2017

[16] [16]

Before re-sampling, 50% of Ybank is randomly discarded

B.5 C ONCEPT BANK In line 10 of Algorithm 2, the concept bank Ybank follows a specific rule: not all of the segmentation features Yema are collected, but they are instead 50% re-sampled based on the most closest concept indices individually, where the concept bank collects a maximum of 100 features per concept prototype. Before re-sampling, 50% of Ybank i...

work page 2048

[17] [17]

For 17 inference phase, images are resized to320×320 along the minor axis followed by center crops of each validation image

which employ five-crop with crop ratio of 0.5 in full image resolution and resizes the cropped images to 224 × 224 for CAUSE-MLP in training phase. For 17 inference phase, images are resized to320×320 along the minor axis followed by center crops of each validation image. For CAUSE-TR, 320 × 320 image resolution is used to train segmentation head of a sin...

work page 2012

[18] [18]

which employs multiple-crop with multiple ratio. A significant different point is that STEGO, HP, and TransFGU employ additional data-augmentation techniques, including Horizontal Flip, Color-Jittering, Gray-scaling, and Gaussian- Blurring as geometric and photometric transforms, but CAUSE utilizes Horizontal Flip only. C A DDITIONAL EXPERIMENTS Due to pa...

work page 2022

[19] [19]

Additionally, we present qualitative results for object-centric semantic segmentation by providing visualizations for the PASCAL VOC, COCO-81 and COCO-171 in Fig

feature representations. Additionally, we present qualitative results for object-centric semantic segmentation by providing visualizations for the PASCAL VOC, COCO-81 and COCO-171 in Fig. 9 and Fig. 7, respectively. All of these datasets include an additional background class. While the negative relaxation is set to the same value of 0.1, we have adjusted...

work page 2019

[20] [20]

It is significantly challenging to handle fine-grained and complex scenes when dealing with unsupervised semantic segmentation using pre-trained feature representation

D D ISCUSSIONS AND LIMITATIONS Bootstrapping Pre-trained Models. It is significantly challenging to handle fine-grained and complex scenes when dealing with unsupervised semantic segmentation using pre-trained feature representation. Based on the fact that the pre-trained features are designed to capture high-level semantic information, STEGO (Hamilton et...

work page 2022