pith. sign in

arxiv: 2605.03365 · v1 · submitted 2026-05-05 · 💻 cs.CV

Dual-Foundation Models for Unsupervised Domain Adaptation

Pith reviewed 2026-05-08 01:29 UTC · model grok-4.3

classification 💻 cs.CV
keywords unsupervised domain adaptationsemantic segmentationfoundation modelsSAMDINOv3superpixel promptingclass prototypessynthetic-to-real
0
0 comments X

The pith

Combining SAM with superpixel prompting and DINOv3 for prototypes improves unsupervised domain adaptation for semantic segmentation by addressing limits in pixel coverage and prototype stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that unsupervised domain adaptation for semantic segmentation can be strengthened by drawing on two foundation models instead of relying solely on high-confidence predictions or source-derived prototypes. It uses the Segment Anything Model prompted via superpixels to supervise a wider set of target pixels and DINOv3 to generate stable class prototypes that do not inherit domain bias. This matters to a reader because pixel-wise labeling of real images is costly, while synthetic data is plentiful, yet the domain gap has limited how well models transfer. If successful, the method would let practitioners achieve higher accuracy on real data with less manual effort. The approach is tested through experiments on two common synthetic-to-real benchmarks.

Core claim

We propose a dual-foundation UDA framework that leverages two complementary foundation models. First, we employ the Segment Anything Model (SAM) with superpixel-guided prompting to enable learning from a broader range of target pixels beyond high-confidence predictions. Second, we incorporate DINOv3 to construct stable, domain-invariant class prototypes through its robust representation learning.

What carries the argument

The dual-foundation UDA framework that pairs SAM superpixel-guided prompting for expanded target pixel supervision with DINOv3-derived domain-invariant class prototypes.

Load-bearing premise

The method rests on the assumption that SAM prompted by superpixels can provide reliable guidance for learning on low-confidence target pixels and that DINOv3 features produce class prototypes that remain unbiased across the source and target domains without additional tuning.

What would settle it

An experiment that disables the superpixel-guided prompting from SAM or replaces DINOv3 prototypes with source-initialized ones and finds no performance improvement on the GTA-to-Cityscapes task would show that these components are not responsible for the gains.

Figures

Figures reproduced from arXiv: 2605.03365 by Aruna Balasubramanian, Francois Rameau, Yerin Cheon.

Figure 1
Figure 1. Figure 1: Overview of our framework. (Blue) Source–Target Distillation: An online self-training scheme reduces the source–target domain gap using EMA￾updated teacher predictions. (Yellow) Pseudo-Label Refinement: Superpixel￾prompted SAM masks are filtered, and each mask region is assigned a high￾confidence, low-entropy pseudo-label. (Sec. 3.2 & Sec. 3.3) (Pink) Feature Align￾ment: Student features are projected into… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of SAM mask generation strategies. (a) Superpixel￾based SAM prompting (Sec. 3.3). (b) SAM automatic mask generation. (c) Our superpixel-guided and filtered masks. The proposed method yields more mean￾ingful and structurally coherent masks than the SAM auto mask generator. refinement. Our approach generates a compact and structurally coherent set of masks that is better suited for downstream sema… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of our method with the state-of-the-art baseline. Compared to MIC, our method produces improved segmentation for challenging classes such as traffic sign and terrain. In addition, influenced by SAM-based Pseudo Label refinement, fine structures such as bicycle wheels are more com￾pletely filled, closely matching the ground truth. (a) (b) (c) (d) view at source ↗
Figure 4
Figure 4. Figure 4: Ablation of DINO-based prototype alignment and Superpixel￾based SAM. (a) Original image. (b) Without SAM and DINO. (c) Only DINO￾based prototype alignment. (d) With both SAM and DINO. DINO improves segmentation in hard-to-distinguish regions via feature alignment, while SAM enhances boundary-aware object prediction. and confidence-based pseudo-labeling with threshold τ = 0.968. In addition, we introduce a … view at source ↗
Figure 5
Figure 5. Figure 5: SAM Mask Comparison. Comparison between SAM AutoMask and our superpixel-based SAM with overlap-aware filtering. 4.3 Unsupervised Domain Adaptation for Semantic Segmentation view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE visualization on Cityscapes val set. Compared with the base￾line without SAM and DINO, ours forms more compact target-feature clus￾ters around DINOv3 prototype anchors (black crosses), indicating improved prototype-guided alignment. and cleaner supervision to higher-performing baselines. Finally, we observe di￾minishing absolute performance gains when applying our method to stronger UDA baselines, a … view at source ↗
read the original abstract

Semantic segmentation provides pixel-level scene understanding essential for autonomous driving and fine-grained perception tasks. However, training segmentation models requires costly, labor-intensive annotations on real-world datasets. Unsupervised Domain Adaptation (UDA) addresses this by training models on labeled synthetic data and adapting them to unlabeled real images. While conceptually simple, adaptation is challenging due to the domain gap, i.e., differences in visual appearance and scene structure between synthetic and real data. Prior approaches bridge this gap through pixel-level mixing or feature-level contrastive learning. Yet, these techniques suffer from two major limitations: (1) reliance on high-confidence pseudo-labels restricts learning to a subset of the target domain, and (2) prototype-based contrastive methods initialize class prototypes from source-trained models, yielding biased and unstable anchors during adaptation. To address these issues, we propose a dual-foundation UDA framework that leverages two complementary foundation models. First, we employ the Segment Anything Model (SAM) with superpixel-guided prompting to enable learning from a broader range of target pixels beyond high-confidence predictions. Second, we incorporate DINOv3 to construct stable, domain-invariant class prototypes through its robust representation learning. Our method achieves consistent improvements of +1.3% and +1.4% mIoU over strong UDA baselines on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes a dual-foundation model framework for unsupervised domain adaptation (UDA) in semantic segmentation. It employs the Segment Anything Model (SAM) with superpixel-guided prompting to expand pseudo-label learning beyond high-confidence target pixels and incorporates DINOv3 to derive stable, domain-invariant class prototypes for contrastive learning. The approach reports consistent mIoU gains of +1.3% on GTA-to-Cityscapes and +1.4% on SYNTHIA-to-Cityscapes over strong UDA baselines.

Significance. If the empirical gains hold under rigorous validation and the DINOv3 component is shown to deliver genuinely less biased prototypes than source-derived alternatives, the work provides a practical template for leveraging complementary foundation models to mitigate two persistent UDA limitations. The modest but positive improvements indicate incremental utility for driving-scene segmentation, with potential to influence subsequent foundation-model-assisted adaptation research provided ablations confirm complementarity of the two pillars.

major comments (2)
  1. §3.2 (DINOv3 prototype construction): The central claim that DINOv3 yields 'stable, domain-invariant class prototypes' without any target-domain adaptation, fine-tuning, or explicit alignment step is load-bearing for the dual-framework contribution. If residual domain shift persists in the DINOv3 embedding space for classes such as vehicle or pedestrian, the resulting anchors remain biased in the same manner as source-initialized prototypes, reducing the method to the SAM superpixel component alone. The manuscript should supply either quantitative invariance metrics (e.g., prototype drift across domains) or an ablation replacing DINOv3 with source-derived prototypes to substantiate the claim.
  2. Table 1 (quantitative results): The reported +1.3% and +1.4% mIoU improvements are presented without standard deviations, multiple random seeds, or statistical significance tests. In the absence of these, it is impossible to determine whether the gains exceed implementation variance or hyper-parameter sensitivity, weakening the assertion of 'consistent improvements' over strong baselines.
minor comments (3)
  1. Abstract: The phrase 'strong UDA baselines' should explicitly name the compared methods (e.g., DAFormer, HRDA) so readers can immediately gauge the strength of the reference points.
  2. Figure 1 (framework overview): The diagram would be clearer if arrows explicitly labeled the information flow from SAM superpixel prompts into the segmentation loss and from DINOv3 features into prototype computation.
  3. §4 (experimental protocol): The backbone architecture, training schedule, and hyper-parameter settings for both the segmentation network and the foundation-model components should be stated in a single consolidated table or paragraph for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. The feedback highlights important aspects for strengthening the empirical validation of our dual-foundation approach. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: §3.2 (DINOv3 prototype construction): The central claim that DINOv3 yields 'stable, domain-invariant class prototypes' without any target-domain adaptation, fine-tuning, or explicit alignment step is load-bearing for the dual-framework contribution. If residual domain shift persists in the DINOv3 embedding space for classes such as vehicle or pedestrian, the resulting anchors remain biased in the same manner as source-initialized prototypes, reducing the method to the SAM superpixel component alone. The manuscript should supply either quantitative invariance metrics (e.g., prototype drift across domains) or an ablation replacing DINOv3 with source-derived prototypes to substantiate the claim.

    Authors: We agree that explicit validation of DINOv3's domain-invariance is necessary to support the dual-framework contribution. The manuscript motivates DINOv3 by its large-scale pretraining on diverse data, which we expect to yield more stable prototypes than source-only initialization. However, to directly address the concern, the revised version will add both an ablation replacing DINOv3 with source-derived prototypes and quantitative metrics (cosine similarity and drift between source/target embeddings for classes such as vehicle and pedestrian). These additions will clarify the incremental benefit of the DINOv3 component. revision: yes

  2. Referee: Table 1 (quantitative results): The reported +1.3% and +1.4% mIoU improvements are presented without standard deviations, multiple random seeds, or statistical significance tests. In the absence of these, it is impossible to determine whether the gains exceed implementation variance or hyper-parameter sensitivity, weakening the assertion of 'consistent improvements' over strong baselines.

    Authors: We acknowledge that the current Table 1 reports single-run results without variability measures or significance testing. The experiments were performed with a fixed seed for reproducibility. In the revision we will rerun all methods with at least three random seeds, report mean mIoU together with standard deviations, and add paired statistical significance tests to confirm that the observed gains exceed typical implementation variance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on benchmarks

full rationale

The paper introduces a dual-foundation UDA framework for semantic segmentation that combines SAM with superpixel-guided prompting and DINOv3-derived class prototypes. No equations, derivations, parameter fittings, or self-referential constructions appear in the abstract or described method. The reported gains (+1.3% and +1.4% mIoU on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes) are presented as empirical outcomes rather than results forced by definition or prior self-citations. The premise that DINOv3 yields domain-invariant prototypes is an external modeling assumption, not a tautological reduction within the paper's own logic, leaving the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new mathematical parameters, axioms, or invented entities are introduced in the abstract; the approach relies on off-the-shelf foundation models whose internal assumptions are inherited from prior work.

pith-pipeline@v0.9.0 · 5547 in / 1239 out tokens · 87709 ms · 2026-05-08T01:29:56.243034+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 2 internal anchors

  1. [1]

    org/abs/2407.21311, accessed 3 May 2026

    Abedi, A., Wu, Q.M.J., Zhang, N., Pourpanah, F.: Euda: An efficient unsupervised domain adaptation via self-supervised vision transformer (2024),https://arxiv. org/abs/2407.21311, accessed 3 May 2026

  2. [2]

    In: ICML (2017)

    Arpit, D., Jastrzębski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M.S., Ma- haraj, T., Fischer, A., Courville, A., Bengio, Y., et al.: A closer look at memoriza- tion in deep networks. In: ICML (2017)

  3. [3]

    T-PAMI39(12), 2481–2495 (2017)

    Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. T-PAMI39(12), 2481–2495 (2017)

  4. [4]

    Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuilière, S.: Collaborating foundationmodelsfordomaingeneralizedsemanticsegmentation.In:CVPR(2024)

  5. [5]

    In: ECCV (2012)

    Van den Bergh, M., Boix, X., Roig, G., De Capitani, B., Van Gool, L.: Seeds: Superpixels extracted via energy-driven sampling. In: ECCV (2012)

  6. [6]

    NeurIPS (2019)

    Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: A holistic approach to semi-supervised learning. NeurIPS (2019)

  7. [7]

    In: WACV (2023)

    Brüggemann, D., Sakaridis, C., Truong, P., Van Gool, L.: Refign: Align and refine for adaptation of semantic segmentation to adverse conditions. In: WACV (2023)

  8. [8]

    T-PAMI40(4), 834–848 (2017)

    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Se- mantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. T-PAMI40(4), 834–848 (2017)

  9. [9]

    In: CVPR (2016)

    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

  10. [10]

    In: CVPR (2023)

    Das, A., Xian, Y., Dai, D., Schiele, B.: Weakly-supervised domain adaptive seman- tic segmentation with prototypical contrastive learning. In: CVPR (2023)

  11. [11]

    In: CVPR (2024)

    Englert, B.B., Piva, F.J., Kerssies, T., De Geus, D., Dubbelman, G.: Exploring the benefits of vision foundation models for unsupervised domain adaptation. In: CVPR (2024)

  12. [12]

    In: ICCV (2023)

    Fahes, M., Vu, T.H., Bursuc, A., Pérez, P., De Charette, R.: Poda: Prompt-driven zero-shot domain adaptation. In: ICCV (2023)

  13. [13]

    In: CVPR (2024)

    Fahes, M., Vu, T.H., Bursuc, A., Pérez, P., De Charette, R.: A simple recipe for language-guided domain generalized segmentation. In: CVPR (2024)

  14. [14]

    In: CVPR (2019)

    Gong, R., Li, W., Chen, Y., Gool, L.V.: Dlow: Domain flow for adaptation and generalization. In: CVPR (2019)

  15. [15]

    In: CVPR (2021)

    Guo, X., Yang, C., Li, B., Yuan, Y.: Metacorrection: Domain-aware meta loss cor- rection for unsupervised domain adaptation in semantic segmentation. In: CVPR (2021)

  16. [16]

    Hoshen, J., Kopelman, R.: Percolation and cluster distribution. i. cluster multiple labeling technique and critical concentration algorithm. Physical Review B14(8), 3438 (1976) 14 Y. Cheon et al

  17. [17]

    In: CVPR (2022)

    Hoyer, L., Dai, D., Van Gool, L.: Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In: CVPR (2022)

  18. [18]

    In: ECCV (2022)

    Hoyer, L., Dai, D., Van Gool, L.: Hrda: Context-aware high-resolution domain- adaptive semantic segmentation. In: ECCV (2022)

  19. [19]

    In: CVPR (2023)

    Hoyer, L., Dai, D., Wang, H., Van Gool, L.: Mic: Masked image consistency for context-enhanced domain adaptation. In: CVPR (2023)

  20. [20]

    In: ECCV (2022)

    Jiang, Z., Li, Y., Yang, C., Gao, P., Wang, Y., Tai, Y., Wang, C.: Prototypical contrast adaptation for domain adaptive semantic segmentation. In: ECCV (2022)

  21. [21]

    NeurIPS (2020)

    Kang, G., Wei, Y., Yang, Y., Zhuang, Y., Hauptmann, A.: Pixel-level cycle asso- ciation: A new perspective for domain adaptive semantic segmentation. NeurIPS (2020)

  22. [22]

    In: CVPR (2020)

    Kim, M., Byun, H.: Learning texture invariant representation for domain adapta- tion of semantic segmentation. In: CVPR (2020)

  23. [23]

    In: ICCV (2023)

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: ICCV (2023)

  24. [24]

    In: CVPR) (2024)

    Kweon, H., Kim, J., Yoon, K.J.: Weakly supervised point cloud semantic segmen- tation via artificial oracle. In: CVPR) (2024)

  25. [25]

    In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M

    Li, G., Kang, G., Liu, W., Wei, Y., Yang, Y.: Content-consistent matching for domain adaptive semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV (2020)

  26. [26]

    Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.: Samrefiner: Taming segment anything model for universal mask refinement (2025),https: //arxiv.org/abs/2502.06756, accessed 3 May 2026

  27. [27]

    Liu, C., Balaji, B., Hossain, S., Thomas, C., Lai, K.H., Vemulapalli, R., Wong, A., Rambhatla, S.: Langda: Building context-awareness via language for domain adap- tive semantic segmentation (2025),https://arxiv.org/abs/2503.12780, accessed 3 May 2026

  28. [28]

    Neuro- computing p

    Liu,X., Wu,J., Lu, T., Zhang, S., Wang, G.: Srpl-sfda: Sam-guidedreliable pseudo- labels for source-free domain adaptation in medical image segmentation. Neuro- computing p. 130749 (2025)

  29. [29]

    In: ECCV (2024)

    Mata, C., Ranasinghe, K., Ryoo, M.S.: Copt: Unsupervised domain adaptive seg- mentation using domain-agnostic text embeddings. In: ECCV (2024)

  30. [30]

    In: ICRA (2017)

    McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: ICRA (2017)

  31. [31]

    In: CVPR (2021)

    Melas-Kyriazi, L., Manrai, A.K.: Pixmatch: Unsupervised domain adaptation via pixelwise consistency training. In: CVPR (2021)

  32. [32]

    Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual features without su...

  33. [33]

    In: ECCV (2020)

    Paul, S., Tsai, Y.H., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M.: Domain adaptive semantic segmentation using weak labels. In: ECCV (2020)

  34. [34]

    Peng, X., Chen, R., Qiao, F., Kong, L., Liu, Y., Wang, T., Zhu, X., Ma, Y.: Sam- guided unsupervised domain adaptation for 3d segmentation (2023)

  35. [35]

    In: CVPR (2024)

    Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: CVPR (2024)

  36. [36]

    In: ICML (2021) Dual-Foundation Models for Unsupervised Domain Adaptation 15

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021) Dual-Foundation Models for Unsupervised Domain Adaptation 15

  37. [37]

    In: ECCV

    Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: ECCV. Springer (2016)

  38. [38]

    In: CVPR (2016)

    Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of ur- ban scenes. In: CVPR (2016)

  39. [39]

    In: ICCV (2021)

    Sakaridis, C., Dai, D., Van Gool, L.: Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: ICCV (2021)

  40. [40]

    In: CVPR Workshops (2025)

    Sikdar, A., Kishor, A., Kadam, I., Sundaram, S.: Picazo: Pixel-aligned contrastive learning for zero-shot domain adaptation. In: CVPR Workshops (2025)

  41. [41]

    DINOv3

    Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

  42. [42]

    In: ECCV (2020)

    Subhani, M.N., Ali, M.: Learning from scale-invariant examples for domain adap- tation in semantic segmentation. In: ECCV (2020)

  43. [43]

    NeuRIPS (2017)

    Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeuRIPS (2017)

  44. [44]

    Technologies8(2), 35 (2020)

    Toldo, M., Maracani, A., Michieli, U., Zanuttigh, P.: Unsupervised domain adap- tation in semantic segmentation: a review. Technologies8(2), 35 (2020)

  45. [45]

    In: WACV (2021)

    Tranheden, W., Olsson, V., Pinto, J., Svensson, L.: Dacs: Domain adaptation via cross-domain mixed sampling. In: WACV (2021)

  46. [46]

    In: CVPR (2019)

    Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P.: Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: CVPR (2019)

  47. [47]

    In: CVPR (2022)

    Wang, Q., Fink, O., Van Gool, L., Dai, D.: Continual test-time domain adaptation. In: CVPR (2022)

  48. [48]

    In: ICCV (2021)

    Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: ICCV (2021)

  49. [49]

    In: ICCV (2021)

    Wang, Y., Peng, J., Zhang, Z.: Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation. In: ICCV (2021)

  50. [50]

    In: CVPR (2020)

    Wang, Z., Yu, M., Wei, Y., Feris, R., Xiong, J., Hwu, W.m., Huang, T.S., Shi, H.: Differential treatment for stuff and things: A simple unsupervised domain adapta- tion method for semantic segmentation. In: CVPR (2020)

  51. [51]

    In: ACM Multime- dia (2024)

    Wu, Y., Xing, M., Zhang, Y., Xie, Y., Qu, Y.: Clip2uda: Making frozen clip reward unsupervised domain adaptation in 3d semantic segmentation. In: ACM Multime- dia (2024)

  52. [52]

    Trans- actions on Intelligent Vehicles9(2), 3396–3408 (2024).https://doi.org/10.1109/ TIV.2023.3344754, accessed 3 May 2026

    Yan, W., Qian, Y., Zhuang, H., Wang, C., Yang, M.: Sam4udass: When sam meets unsupervised domain adaptive semantic segmentation in intelligent vehicles. Trans- actions on Intelligent Vehicles9(2), 3396–3408 (2024).https://doi.org/10.1109/ TIV.2023.3344754, accessed 3 May 2026

  53. [53]

    Yang, J., Peng, X., Wang, K., Zhu, Z., Feng, J., Xie, L., You, Y.: Divide to adapt: Mitigating confirmation bias for domain adaptation of black-box predictors (2022), https://arxiv.org/abs/2205.14467, accessed 3 May 2026

  54. [54]

    In: CVPR (2024)

    Yang, S., Tian, Z., Jiang, L., Jia, J.: Unified language-driven zero-shot domain adaptation. In: CVPR (2024)

  55. [55]

    In: CVPR (2020)

    Yang, Y., Soatto, S.: Fda: Fourier domain adaptation for semantic segmentation. In: CVPR (2020)

  56. [56]

    In: CVPR (2021)

    Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., Wen, F.: Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmen- tation. In: CVPR (2021)

  57. [57]

    In: WACV (2024)

    Zhao, X., Mithun, N.C., Rajvanshi, A., Chiu, H.P., Samarasekera, S.: Unsupervised domain adaptation for semantic segmentation with pseudo label self-refinement. In: WACV (2024)