Dual-Foundation Models for Unsupervised Domain Adaptation

Aruna Balasubramanian; Francois Rameau; Yerin Cheon

arxiv: 2605.03365 · v1 · submitted 2026-05-05 · 💻 cs.CV

Dual-Foundation Models for Unsupervised Domain Adaptation

Yerin Cheon , Aruna Balasubramanian , Francois Rameau This is my paper

Pith reviewed 2026-05-08 01:29 UTC · model grok-4.3

classification 💻 cs.CV

keywords unsupervised domain adaptationsemantic segmentationfoundation modelsSAMDINOv3superpixel promptingclass prototypessynthetic-to-real

0 comments

The pith

Combining SAM with superpixel prompting and DINOv3 for prototypes improves unsupervised domain adaptation for semantic segmentation by addressing limits in pixel coverage and prototype stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that unsupervised domain adaptation for semantic segmentation can be strengthened by drawing on two foundation models instead of relying solely on high-confidence predictions or source-derived prototypes. It uses the Segment Anything Model prompted via superpixels to supervise a wider set of target pixels and DINOv3 to generate stable class prototypes that do not inherit domain bias. This matters to a reader because pixel-wise labeling of real images is costly, while synthetic data is plentiful, yet the domain gap has limited how well models transfer. If successful, the method would let practitioners achieve higher accuracy on real data with less manual effort. The approach is tested through experiments on two common synthetic-to-real benchmarks.

Core claim

We propose a dual-foundation UDA framework that leverages two complementary foundation models. First, we employ the Segment Anything Model (SAM) with superpixel-guided prompting to enable learning from a broader range of target pixels beyond high-confidence predictions. Second, we incorporate DINOv3 to construct stable, domain-invariant class prototypes through its robust representation learning.

What carries the argument

The dual-foundation UDA framework that pairs SAM superpixel-guided prompting for expanded target pixel supervision with DINOv3-derived domain-invariant class prototypes.

Load-bearing premise

The method rests on the assumption that SAM prompted by superpixels can provide reliable guidance for learning on low-confidence target pixels and that DINOv3 features produce class prototypes that remain unbiased across the source and target domains without additional tuning.

What would settle it

An experiment that disables the superpixel-guided prompting from SAM or replaces DINOv3 prototypes with source-initialized ones and finds no performance improvement on the GTA-to-Cityscapes task would show that these components are not responsible for the gains.

Figures

Figures reproduced from arXiv: 2605.03365 by Aruna Balasubramanian, Francois Rameau, Yerin Cheon.

**Figure 1.** Figure 1: Overview of our framework. (Blue) Source–Target Distillation: An online self-training scheme reduces the source–target domain gap using EMAupdated teacher predictions. (Yellow) Pseudo-Label Refinement: Superpixelprompted SAM masks are filtered, and each mask region is assigned a highconfidence, low-entropy pseudo-label. (Sec. 3.2 & Sec. 3.3) (Pink) Feature Alignment: Student features are projected into… view at source ↗

**Figure 2.** Figure 2: Comparison of SAM mask generation strategies. (a) Superpixelbased SAM prompting (Sec. 3.3). (b) SAM automatic mask generation. (c) Our superpixel-guided and filtered masks. The proposed method yields more meaningful and structurally coherent masks than the SAM auto mask generator. refinement. Our approach generates a compact and structurally coherent set of masks that is better suited for downstream sema… view at source ↗

**Figure 3.** Figure 3: Comparison of our method with the state-of-the-art baseline. Compared to MIC, our method produces improved segmentation for challenging classes such as traffic sign and terrain. In addition, influenced by SAM-based Pseudo Label refinement, fine structures such as bicycle wheels are more completely filled, closely matching the ground truth. (a) (b) (c) (d) view at source ↗

**Figure 4.** Figure 4: Ablation of DINO-based prototype alignment and Superpixelbased SAM. (a) Original image. (b) Without SAM and DINO. (c) Only DINObased prototype alignment. (d) With both SAM and DINO. DINO improves segmentation in hard-to-distinguish regions via feature alignment, while SAM enhances boundary-aware object prediction. and confidence-based pseudo-labeling with threshold τ = 0.968. In addition, we introduce a … view at source ↗

**Figure 5.** Figure 5: SAM Mask Comparison. Comparison between SAM AutoMask and our superpixel-based SAM with overlap-aware filtering. 4.3 Unsupervised Domain Adaptation for Semantic Segmentation view at source ↗

**Figure 6.** Figure 6: t-SNE visualization on Cityscapes val set. Compared with the baseline without SAM and DINO, ours forms more compact target-feature clusters around DINOv3 prototype anchors (black crosses), indicating improved prototype-guided alignment. and cleaner supervision to higher-performing baselines. Finally, we observe diminishing absolute performance gains when applying our method to stronger UDA baselines, a … view at source ↗

read the original abstract

Semantic segmentation provides pixel-level scene understanding essential for autonomous driving and fine-grained perception tasks. However, training segmentation models requires costly, labor-intensive annotations on real-world datasets. Unsupervised Domain Adaptation (UDA) addresses this by training models on labeled synthetic data and adapting them to unlabeled real images. While conceptually simple, adaptation is challenging due to the domain gap, i.e., differences in visual appearance and scene structure between synthetic and real data. Prior approaches bridge this gap through pixel-level mixing or feature-level contrastive learning. Yet, these techniques suffer from two major limitations: (1) reliance on high-confidence pseudo-labels restricts learning to a subset of the target domain, and (2) prototype-based contrastive methods initialize class prototypes from source-trained models, yielding biased and unstable anchors during adaptation. To address these issues, we propose a dual-foundation UDA framework that leverages two complementary foundation models. First, we employ the Segment Anything Model (SAM) with superpixel-guided prompting to enable learning from a broader range of target pixels beyond high-confidence predictions. Second, we incorporate DINOv3 to construct stable, domain-invariant class prototypes through its robust representation learning. Our method achieves consistent improvements of +1.3% and +1.4% mIoU over strong UDA baselines on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs SAM superpixel prompting with DINOv3 prototypes for UDA segmentation and reports modest benchmark gains, but the evidence is thin and the invariance assumption looks untested.

read the letter

The main takeaway is a concrete dual-foundation recipe that uses SAM with superpixel-guided prompting to pull in more target pixels and DINOv3 to build class prototypes that are meant to be more stable and less biased than source-initialized ones. It reports +1.3% and +1.4% mIoU lifts over strong baselines on the two standard synthetic-to-real driving benchmarks. That pairing is not in the prior work the abstract cites, so the specific combination counts as new even if the broader idea of plugging foundation models into UDA is incremental. The paper does a clean job naming the two recurring problems in existing UDA methods and showing how each foundation model targets one of them without requiring extra fine-tuning steps on the target domain. The numbers are consistent across the two shifts, which is better than isolated wins. The soft spots are mostly about missing support. The abstract supplies no ablations, no error bars, no baseline details, and no checks on whether the DINOv3 prototypes actually stay domain-invariant once the domain gap is present. If the embedding space still carries shift for classes like vehicles or road surfaces, the second pillar adds little and the method reduces to the SAM prompting piece alone. Without those diagnostics the gains are hard to trust as robust rather than tied to particular hyperparameter choices. This is for people already working on unsupervised adaptation for semantic segmentation in driving scenes. A reader in that niche can extract the prompting and prototype construction details and test them on their own data. It deserves a serious referee because the idea is well-motivated, the benchmarks are standard, and the claimed improvements are at least measurable, even though the manuscript will need substantial additions on experiments and analysis before it is ready for publication.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes a dual-foundation model framework for unsupervised domain adaptation (UDA) in semantic segmentation. It employs the Segment Anything Model (SAM) with superpixel-guided prompting to expand pseudo-label learning beyond high-confidence target pixels and incorporates DINOv3 to derive stable, domain-invariant class prototypes for contrastive learning. The approach reports consistent mIoU gains of +1.3% on GTA-to-Cityscapes and +1.4% on SYNTHIA-to-Cityscapes over strong UDA baselines.

Significance. If the empirical gains hold under rigorous validation and the DINOv3 component is shown to deliver genuinely less biased prototypes than source-derived alternatives, the work provides a practical template for leveraging complementary foundation models to mitigate two persistent UDA limitations. The modest but positive improvements indicate incremental utility for driving-scene segmentation, with potential to influence subsequent foundation-model-assisted adaptation research provided ablations confirm complementarity of the two pillars.

major comments (2)

§3.2 (DINOv3 prototype construction): The central claim that DINOv3 yields 'stable, domain-invariant class prototypes' without any target-domain adaptation, fine-tuning, or explicit alignment step is load-bearing for the dual-framework contribution. If residual domain shift persists in the DINOv3 embedding space for classes such as vehicle or pedestrian, the resulting anchors remain biased in the same manner as source-initialized prototypes, reducing the method to the SAM superpixel component alone. The manuscript should supply either quantitative invariance metrics (e.g., prototype drift across domains) or an ablation replacing DINOv3 with source-derived prototypes to substantiate the claim.
Table 1 (quantitative results): The reported +1.3% and +1.4% mIoU improvements are presented without standard deviations, multiple random seeds, or statistical significance tests. In the absence of these, it is impossible to determine whether the gains exceed implementation variance or hyper-parameter sensitivity, weakening the assertion of 'consistent improvements' over strong baselines.

minor comments (3)

Abstract: The phrase 'strong UDA baselines' should explicitly name the compared methods (e.g., DAFormer, HRDA) so readers can immediately gauge the strength of the reference points.
Figure 1 (framework overview): The diagram would be clearer if arrows explicitly labeled the information flow from SAM superpixel prompts into the segmentation loss and from DINOv3 features into prototype computation.
§4 (experimental protocol): The backbone architecture, training schedule, and hyper-parameter settings for both the segmentation network and the foundation-model components should be stated in a single consolidated table or paragraph for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. The feedback highlights important aspects for strengthening the empirical validation of our dual-foundation approach. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: §3.2 (DINOv3 prototype construction): The central claim that DINOv3 yields 'stable, domain-invariant class prototypes' without any target-domain adaptation, fine-tuning, or explicit alignment step is load-bearing for the dual-framework contribution. If residual domain shift persists in the DINOv3 embedding space for classes such as vehicle or pedestrian, the resulting anchors remain biased in the same manner as source-initialized prototypes, reducing the method to the SAM superpixel component alone. The manuscript should supply either quantitative invariance metrics (e.g., prototype drift across domains) or an ablation replacing DINOv3 with source-derived prototypes to substantiate the claim.

Authors: We agree that explicit validation of DINOv3's domain-invariance is necessary to support the dual-framework contribution. The manuscript motivates DINOv3 by its large-scale pretraining on diverse data, which we expect to yield more stable prototypes than source-only initialization. However, to directly address the concern, the revised version will add both an ablation replacing DINOv3 with source-derived prototypes and quantitative metrics (cosine similarity and drift between source/target embeddings for classes such as vehicle and pedestrian). These additions will clarify the incremental benefit of the DINOv3 component. revision: yes
Referee: Table 1 (quantitative results): The reported +1.3% and +1.4% mIoU improvements are presented without standard deviations, multiple random seeds, or statistical significance tests. In the absence of these, it is impossible to determine whether the gains exceed implementation variance or hyper-parameter sensitivity, weakening the assertion of 'consistent improvements' over strong baselines.

Authors: We acknowledge that the current Table 1 reports single-run results without variability measures or significance testing. The experiments were performed with a fixed seed for reproducibility. In the revision we will rerun all methods with at least three random seeds, report mean mIoU together with standard deviations, and add paired statistical significance tests to confirm that the observed gains exceed typical implementation variance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on benchmarks

full rationale

The paper introduces a dual-foundation UDA framework for semantic segmentation that combines SAM with superpixel-guided prompting and DINOv3-derived class prototypes. No equations, derivations, parameter fittings, or self-referential constructions appear in the abstract or described method. The reported gains (+1.3% and +1.4% mIoU on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes) are presented as empirical outcomes rather than results forced by definition or prior self-citations. The premise that DINOv3 yields domain-invariant prototypes is an external modeling assumption, not a tautological reduction within the paper's own logic, leaving the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new mathematical parameters, axioms, or invented entities are introduced in the abstract; the approach relies on off-the-shelf foundation models whose internal assumptions are inherited from prior work.

pith-pipeline@v0.9.0 · 5547 in / 1239 out tokens · 87709 ms · 2026-05-08T01:29:56.243034+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 2 internal anchors

[1]

org/abs/2407.21311, accessed 3 May 2026

Abedi, A., Wu, Q.M.J., Zhang, N., Pourpanah, F.: Euda: An efficient unsupervised domain adaptation via self-supervised vision transformer (2024),https://arxiv. org/abs/2407.21311, accessed 3 May 2026

work page arXiv 2024
[2]

In: ICML (2017)

Arpit, D., Jastrzębski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M.S., Ma- haraj, T., Fischer, A., Courville, A., Bengio, Y., et al.: A closer look at memoriza- tion in deep networks. In: ICML (2017)

work page 2017
[3]

T-PAMI39(12), 2481–2495 (2017)

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. T-PAMI39(12), 2481–2495 (2017)

work page 2017
[4]

Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuilière, S.: Collaborating foundationmodelsfordomaingeneralizedsemanticsegmentation.In:CVPR(2024)

work page 2024
[5]

In: ECCV (2012)

Van den Bergh, M., Boix, X., Roig, G., De Capitani, B., Van Gool, L.: Seeds: Superpixels extracted via energy-driven sampling. In: ECCV (2012)

work page 2012
[6]

NeurIPS (2019)

Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: A holistic approach to semi-supervised learning. NeurIPS (2019)

work page 2019
[7]

In: WACV (2023)

Brüggemann, D., Sakaridis, C., Truong, P., Van Gool, L.: Refign: Align and refine for adaptation of semantic segmentation to adverse conditions. In: WACV (2023)

work page 2023
[8]

T-PAMI40(4), 834–848 (2017)

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Se- mantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. T-PAMI40(4), 834–848 (2017)

work page 2017
[9]

In: CVPR (2016)

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

work page 2016
[10]

In: CVPR (2023)

Das, A., Xian, Y., Dai, D., Schiele, B.: Weakly-supervised domain adaptive seman- tic segmentation with prototypical contrastive learning. In: CVPR (2023)

work page 2023
[11]

In: CVPR (2024)

Englert, B.B., Piva, F.J., Kerssies, T., De Geus, D., Dubbelman, G.: Exploring the benefits of vision foundation models for unsupervised domain adaptation. In: CVPR (2024)

work page 2024
[12]

In: ICCV (2023)

Fahes, M., Vu, T.H., Bursuc, A., Pérez, P., De Charette, R.: Poda: Prompt-driven zero-shot domain adaptation. In: ICCV (2023)

work page 2023
[13]

In: CVPR (2024)

Fahes, M., Vu, T.H., Bursuc, A., Pérez, P., De Charette, R.: A simple recipe for language-guided domain generalized segmentation. In: CVPR (2024)

work page 2024
[14]

In: CVPR (2019)

Gong, R., Li, W., Chen, Y., Gool, L.V.: Dlow: Domain flow for adaptation and generalization. In: CVPR (2019)

work page 2019
[15]

In: CVPR (2021)

Guo, X., Yang, C., Li, B., Yuan, Y.: Metacorrection: Domain-aware meta loss cor- rection for unsupervised domain adaptation in semantic segmentation. In: CVPR (2021)

work page 2021
[16]

Hoshen, J., Kopelman, R.: Percolation and cluster distribution. i. cluster multiple labeling technique and critical concentration algorithm. Physical Review B14(8), 3438 (1976) 14 Y. Cheon et al

work page 1976
[17]

In: CVPR (2022)

Hoyer, L., Dai, D., Van Gool, L.: Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In: CVPR (2022)

work page 2022
[18]

In: ECCV (2022)

Hoyer, L., Dai, D., Van Gool, L.: Hrda: Context-aware high-resolution domain- adaptive semantic segmentation. In: ECCV (2022)

work page 2022
[19]

In: CVPR (2023)

Hoyer, L., Dai, D., Wang, H., Van Gool, L.: Mic: Masked image consistency for context-enhanced domain adaptation. In: CVPR (2023)

work page 2023
[20]

In: ECCV (2022)

Jiang, Z., Li, Y., Yang, C., Gao, P., Wang, Y., Tai, Y., Wang, C.: Prototypical contrast adaptation for domain adaptive semantic segmentation. In: ECCV (2022)

work page 2022
[21]

NeurIPS (2020)

Kang, G., Wei, Y., Yang, Y., Zhuang, Y., Hauptmann, A.: Pixel-level cycle asso- ciation: A new perspective for domain adaptive semantic segmentation. NeurIPS (2020)

work page 2020
[22]

In: CVPR (2020)

Kim, M., Byun, H.: Learning texture invariant representation for domain adapta- tion of semantic segmentation. In: CVPR (2020)

work page 2020
[23]

In: ICCV (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: ICCV (2023)

work page 2023
[24]

In: CVPR) (2024)

Kweon, H., Kim, J., Yoon, K.J.: Weakly supervised point cloud semantic segmen- tation via artificial oracle. In: CVPR) (2024)

work page 2024
[25]

In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M

Li, G., Kang, G., Liu, W., Wei, Y., Yang, Y.: Content-consistent matching for domain adaptive semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV (2020)

work page 2020
[26]

Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.: Samrefiner: Taming segment anything model for universal mask refinement (2025),https: //arxiv.org/abs/2502.06756, accessed 3 May 2026

work page arXiv 2025
[27]

Liu, C., Balaji, B., Hossain, S., Thomas, C., Lai, K.H., Vemulapalli, R., Wong, A., Rambhatla, S.: Langda: Building context-awareness via language for domain adap- tive semantic segmentation (2025),https://arxiv.org/abs/2503.12780, accessed 3 May 2026

work page arXiv 2025
[28]

Neuro- computing p

Liu,X., Wu,J., Lu, T., Zhang, S., Wang, G.: Srpl-sfda: Sam-guidedreliable pseudo- labels for source-free domain adaptation in medical image segmentation. Neuro- computing p. 130749 (2025)

work page 2025
[29]

In: ECCV (2024)

Mata, C., Ranasinghe, K., Ryoo, M.S.: Copt: Unsupervised domain adaptive seg- mentation using domain-agnostic text embeddings. In: ECCV (2024)

work page 2024
[30]

In: ICRA (2017)

McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: ICRA (2017)

work page 2017
[31]

In: CVPR (2021)

Melas-Kyriazi, L., Manrai, A.K.: Pixmatch: Unsupervised domain adaptation via pixelwise consistency training. In: CVPR (2021)

work page 2021
[32]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

work page internal anchor Pith review arXiv 2024
[33]

In: ECCV (2020)

Paul, S., Tsai, Y.H., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M.: Domain adaptive semantic segmentation using weak labels. In: ECCV (2020)

work page 2020
[34]

Peng, X., Chen, R., Qiao, F., Kong, L., Liu, Y., Wang, T., Zhu, X., Ma, Y.: Sam- guided unsupervised domain adaptation for 3d segmentation (2023)

work page 2023
[35]

In: CVPR (2024)

Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: CVPR (2024)

work page 2024
[36]

In: ICML (2021) Dual-Foundation Models for Unsupervised Domain Adaptation 15

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021) Dual-Foundation Models for Unsupervised Domain Adaptation 15

work page 2021
[37]

In: ECCV

Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: ECCV. Springer (2016)

work page 2016
[38]

In: CVPR (2016)

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of ur- ban scenes. In: CVPR (2016)

work page 2016
[39]

In: ICCV (2021)

Sakaridis, C., Dai, D., Van Gool, L.: Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: ICCV (2021)

work page 2021
[40]

In: CVPR Workshops (2025)

Sikdar, A., Kishor, A., Kadam, I., Sundaram, S.: Picazo: Pixel-aligned contrastive learning for zero-shot domain adaptation. In: CVPR Workshops (2025)

work page 2025
[41]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

work page internal anchor Pith review arXiv 2025
[42]

In: ECCV (2020)

Subhani, M.N., Ali, M.: Learning from scale-invariant examples for domain adap- tation in semantic segmentation. In: ECCV (2020)

work page 2020
[43]

NeuRIPS (2017)

Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeuRIPS (2017)

work page 2017
[44]

Technologies8(2), 35 (2020)

Toldo, M., Maracani, A., Michieli, U., Zanuttigh, P.: Unsupervised domain adap- tation in semantic segmentation: a review. Technologies8(2), 35 (2020)

work page 2020
[45]

In: WACV (2021)

Tranheden, W., Olsson, V., Pinto, J., Svensson, L.: Dacs: Domain adaptation via cross-domain mixed sampling. In: WACV (2021)

work page 2021
[46]

In: CVPR (2019)

Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P.: Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: CVPR (2019)

work page 2019
[47]

In: CVPR (2022)

Wang, Q., Fink, O., Van Gool, L., Dai, D.: Continual test-time domain adaptation. In: CVPR (2022)

work page 2022
[48]

In: ICCV (2021)

Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: ICCV (2021)

work page 2021
[49]

In: ICCV (2021)

Wang, Y., Peng, J., Zhang, Z.: Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation. In: ICCV (2021)

work page 2021
[50]

In: CVPR (2020)

Wang, Z., Yu, M., Wei, Y., Feris, R., Xiong, J., Hwu, W.m., Huang, T.S., Shi, H.: Differential treatment for stuff and things: A simple unsupervised domain adapta- tion method for semantic segmentation. In: CVPR (2020)

work page 2020
[51]

In: ACM Multime- dia (2024)

Wu, Y., Xing, M., Zhang, Y., Xie, Y., Qu, Y.: Clip2uda: Making frozen clip reward unsupervised domain adaptation in 3d semantic segmentation. In: ACM Multime- dia (2024)

work page 2024
[52]

Trans- actions on Intelligent Vehicles9(2), 3396–3408 (2024).https://doi.org/10.1109/ TIV.2023.3344754, accessed 3 May 2026

Yan, W., Qian, Y., Zhuang, H., Wang, C., Yang, M.: Sam4udass: When sam meets unsupervised domain adaptive semantic segmentation in intelligent vehicles. Trans- actions on Intelligent Vehicles9(2), 3396–3408 (2024).https://doi.org/10.1109/ TIV.2023.3344754, accessed 3 May 2026

work page arXiv 2024
[53]

Yang, J., Peng, X., Wang, K., Zhu, Z., Feng, J., Xie, L., You, Y.: Divide to adapt: Mitigating confirmation bias for domain adaptation of black-box predictors (2022), https://arxiv.org/abs/2205.14467, accessed 3 May 2026

work page arXiv 2022
[54]

In: CVPR (2024)

Yang, S., Tian, Z., Jiang, L., Jia, J.: Unified language-driven zero-shot domain adaptation. In: CVPR (2024)

work page 2024
[55]

In: CVPR (2020)

Yang, Y., Soatto, S.: Fda: Fourier domain adaptation for semantic segmentation. In: CVPR (2020)

work page 2020
[56]

In: CVPR (2021)

Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., Wen, F.: Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmen- tation. In: CVPR (2021)

work page 2021
[57]

In: WACV (2024)

Zhao, X., Mithun, N.C., Rajvanshi, A., Chiu, H.P., Samarasekera, S.: Unsupervised domain adaptation for semantic segmentation with pseudo label self-refinement. In: WACV (2024)

work page 2024

[1] [1]

org/abs/2407.21311, accessed 3 May 2026

Abedi, A., Wu, Q.M.J., Zhang, N., Pourpanah, F.: Euda: An efficient unsupervised domain adaptation via self-supervised vision transformer (2024),https://arxiv. org/abs/2407.21311, accessed 3 May 2026

work page arXiv 2024

[2] [2]

In: ICML (2017)

Arpit, D., Jastrzębski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M.S., Ma- haraj, T., Fischer, A., Courville, A., Bengio, Y., et al.: A closer look at memoriza- tion in deep networks. In: ICML (2017)

work page 2017

[3] [3]

T-PAMI39(12), 2481–2495 (2017)

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. T-PAMI39(12), 2481–2495 (2017)

work page 2017

[4] [4]

Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuilière, S.: Collaborating foundationmodelsfordomaingeneralizedsemanticsegmentation.In:CVPR(2024)

work page 2024

[5] [5]

In: ECCV (2012)

Van den Bergh, M., Boix, X., Roig, G., De Capitani, B., Van Gool, L.: Seeds: Superpixels extracted via energy-driven sampling. In: ECCV (2012)

work page 2012

[6] [6]

NeurIPS (2019)

Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: A holistic approach to semi-supervised learning. NeurIPS (2019)

work page 2019

[7] [7]

In: WACV (2023)

Brüggemann, D., Sakaridis, C., Truong, P., Van Gool, L.: Refign: Align and refine for adaptation of semantic segmentation to adverse conditions. In: WACV (2023)

work page 2023

[8] [8]

T-PAMI40(4), 834–848 (2017)

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Se- mantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. T-PAMI40(4), 834–848 (2017)

work page 2017

[9] [9]

In: CVPR (2016)

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

work page 2016

[10] [10]

In: CVPR (2023)

Das, A., Xian, Y., Dai, D., Schiele, B.: Weakly-supervised domain adaptive seman- tic segmentation with prototypical contrastive learning. In: CVPR (2023)

work page 2023

[11] [11]

In: CVPR (2024)

Englert, B.B., Piva, F.J., Kerssies, T., De Geus, D., Dubbelman, G.: Exploring the benefits of vision foundation models for unsupervised domain adaptation. In: CVPR (2024)

work page 2024

[12] [12]

In: ICCV (2023)

Fahes, M., Vu, T.H., Bursuc, A., Pérez, P., De Charette, R.: Poda: Prompt-driven zero-shot domain adaptation. In: ICCV (2023)

work page 2023

[13] [13]

In: CVPR (2024)

Fahes, M., Vu, T.H., Bursuc, A., Pérez, P., De Charette, R.: A simple recipe for language-guided domain generalized segmentation. In: CVPR (2024)

work page 2024

[14] [14]

In: CVPR (2019)

Gong, R., Li, W., Chen, Y., Gool, L.V.: Dlow: Domain flow for adaptation and generalization. In: CVPR (2019)

work page 2019

[15] [15]

In: CVPR (2021)

Guo, X., Yang, C., Li, B., Yuan, Y.: Metacorrection: Domain-aware meta loss cor- rection for unsupervised domain adaptation in semantic segmentation. In: CVPR (2021)

work page 2021

[16] [16]

Hoshen, J., Kopelman, R.: Percolation and cluster distribution. i. cluster multiple labeling technique and critical concentration algorithm. Physical Review B14(8), 3438 (1976) 14 Y. Cheon et al

work page 1976

[17] [17]

In: CVPR (2022)

Hoyer, L., Dai, D., Van Gool, L.: Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In: CVPR (2022)

work page 2022

[18] [18]

In: ECCV (2022)

Hoyer, L., Dai, D., Van Gool, L.: Hrda: Context-aware high-resolution domain- adaptive semantic segmentation. In: ECCV (2022)

work page 2022

[19] [19]

In: CVPR (2023)

Hoyer, L., Dai, D., Wang, H., Van Gool, L.: Mic: Masked image consistency for context-enhanced domain adaptation. In: CVPR (2023)

work page 2023

[20] [20]

In: ECCV (2022)

Jiang, Z., Li, Y., Yang, C., Gao, P., Wang, Y., Tai, Y., Wang, C.: Prototypical contrast adaptation for domain adaptive semantic segmentation. In: ECCV (2022)

work page 2022

[21] [21]

NeurIPS (2020)

Kang, G., Wei, Y., Yang, Y., Zhuang, Y., Hauptmann, A.: Pixel-level cycle asso- ciation: A new perspective for domain adaptive semantic segmentation. NeurIPS (2020)

work page 2020

[22] [22]

In: CVPR (2020)

Kim, M., Byun, H.: Learning texture invariant representation for domain adapta- tion of semantic segmentation. In: CVPR (2020)

work page 2020

[23] [23]

In: ICCV (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: ICCV (2023)

work page 2023

[24] [24]

In: CVPR) (2024)

Kweon, H., Kim, J., Yoon, K.J.: Weakly supervised point cloud semantic segmen- tation via artificial oracle. In: CVPR) (2024)

work page 2024

[25] [25]

In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M

Li, G., Kang, G., Liu, W., Wei, Y., Yang, Y.: Content-consistent matching for domain adaptive semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV (2020)

work page 2020

[26] [26]

Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.: Samrefiner: Taming segment anything model for universal mask refinement (2025),https: //arxiv.org/abs/2502.06756, accessed 3 May 2026

work page arXiv 2025

[27] [27]

Liu, C., Balaji, B., Hossain, S., Thomas, C., Lai, K.H., Vemulapalli, R., Wong, A., Rambhatla, S.: Langda: Building context-awareness via language for domain adap- tive semantic segmentation (2025),https://arxiv.org/abs/2503.12780, accessed 3 May 2026

work page arXiv 2025

[28] [28]

Neuro- computing p

Liu,X., Wu,J., Lu, T., Zhang, S., Wang, G.: Srpl-sfda: Sam-guidedreliable pseudo- labels for source-free domain adaptation in medical image segmentation. Neuro- computing p. 130749 (2025)

work page 2025

[29] [29]

In: ECCV (2024)

Mata, C., Ranasinghe, K., Ryoo, M.S.: Copt: Unsupervised domain adaptive seg- mentation using domain-agnostic text embeddings. In: ECCV (2024)

work page 2024

[30] [30]

In: ICRA (2017)

McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: ICRA (2017)

work page 2017

[31] [31]

In: CVPR (2021)

Melas-Kyriazi, L., Manrai, A.K.: Pixmatch: Unsupervised domain adaptation via pixelwise consistency training. In: CVPR (2021)

work page 2021

[32] [32]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

work page internal anchor Pith review arXiv 2024

[33] [33]

In: ECCV (2020)

Paul, S., Tsai, Y.H., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M.: Domain adaptive semantic segmentation using weak labels. In: ECCV (2020)

work page 2020

[34] [34]

Peng, X., Chen, R., Qiao, F., Kong, L., Liu, Y., Wang, T., Zhu, X., Ma, Y.: Sam- guided unsupervised domain adaptation for 3d segmentation (2023)

work page 2023

[35] [35]

In: CVPR (2024)

Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: CVPR (2024)

work page 2024

[36] [36]

In: ICML (2021) Dual-Foundation Models for Unsupervised Domain Adaptation 15

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021) Dual-Foundation Models for Unsupervised Domain Adaptation 15

work page 2021

[37] [37]

In: ECCV

Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: ECCV. Springer (2016)

work page 2016

[38] [38]

In: CVPR (2016)

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of ur- ban scenes. In: CVPR (2016)

work page 2016

[39] [39]

In: ICCV (2021)

Sakaridis, C., Dai, D., Van Gool, L.: Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: ICCV (2021)

work page 2021

[40] [40]

In: CVPR Workshops (2025)

Sikdar, A., Kishor, A., Kadam, I., Sundaram, S.: Picazo: Pixel-aligned contrastive learning for zero-shot domain adaptation. In: CVPR Workshops (2025)

work page 2025

[41] [41]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

work page internal anchor Pith review arXiv 2025

[42] [42]

In: ECCV (2020)

Subhani, M.N., Ali, M.: Learning from scale-invariant examples for domain adap- tation in semantic segmentation. In: ECCV (2020)

work page 2020

[43] [43]

NeuRIPS (2017)

Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeuRIPS (2017)

work page 2017

[44] [44]

Technologies8(2), 35 (2020)

Toldo, M., Maracani, A., Michieli, U., Zanuttigh, P.: Unsupervised domain adap- tation in semantic segmentation: a review. Technologies8(2), 35 (2020)

work page 2020

[45] [45]

In: WACV (2021)

Tranheden, W., Olsson, V., Pinto, J., Svensson, L.: Dacs: Domain adaptation via cross-domain mixed sampling. In: WACV (2021)

work page 2021

[46] [46]

In: CVPR (2019)

Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P.: Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: CVPR (2019)

work page 2019

[47] [47]

In: CVPR (2022)

Wang, Q., Fink, O., Van Gool, L., Dai, D.: Continual test-time domain adaptation. In: CVPR (2022)

work page 2022

[48] [48]

In: ICCV (2021)

Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: ICCV (2021)

work page 2021

[49] [49]

In: ICCV (2021)

Wang, Y., Peng, J., Zhang, Z.: Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation. In: ICCV (2021)

work page 2021

[50] [50]

In: CVPR (2020)

Wang, Z., Yu, M., Wei, Y., Feris, R., Xiong, J., Hwu, W.m., Huang, T.S., Shi, H.: Differential treatment for stuff and things: A simple unsupervised domain adapta- tion method for semantic segmentation. In: CVPR (2020)

work page 2020

[51] [51]

In: ACM Multime- dia (2024)

Wu, Y., Xing, M., Zhang, Y., Xie, Y., Qu, Y.: Clip2uda: Making frozen clip reward unsupervised domain adaptation in 3d semantic segmentation. In: ACM Multime- dia (2024)

work page 2024

[52] [52]

Trans- actions on Intelligent Vehicles9(2), 3396–3408 (2024).https://doi.org/10.1109/ TIV.2023.3344754, accessed 3 May 2026

Yan, W., Qian, Y., Zhuang, H., Wang, C., Yang, M.: Sam4udass: When sam meets unsupervised domain adaptive semantic segmentation in intelligent vehicles. Trans- actions on Intelligent Vehicles9(2), 3396–3408 (2024).https://doi.org/10.1109/ TIV.2023.3344754, accessed 3 May 2026

work page arXiv 2024

[53] [53]

Yang, J., Peng, X., Wang, K., Zhu, Z., Feng, J., Xie, L., You, Y.: Divide to adapt: Mitigating confirmation bias for domain adaptation of black-box predictors (2022), https://arxiv.org/abs/2205.14467, accessed 3 May 2026

work page arXiv 2022

[54] [54]

In: CVPR (2024)

Yang, S., Tian, Z., Jiang, L., Jia, J.: Unified language-driven zero-shot domain adaptation. In: CVPR (2024)

work page 2024

[55] [55]

In: CVPR (2020)

Yang, Y., Soatto, S.: Fda: Fourier domain adaptation for semantic segmentation. In: CVPR (2020)

work page 2020

[56] [56]

In: CVPR (2021)

Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., Wen, F.: Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmen- tation. In: CVPR (2021)

work page 2021

[57] [57]

In: WACV (2024)

Zhao, X., Mithun, N.C., Rajvanshi, A., Chiu, H.P., Samarasekera, S.: Unsupervised domain adaptation for semantic segmentation with pseudo label self-refinement. In: WACV (2024)

work page 2024