pith. machine review for the scientific record. sign in

arxiv: 2605.00906 · v1 · submitted 2026-04-29 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords generalized category discoverydomain shiftsfeature disentanglementprompt tuningvision-language modelsmutual informationfoundation models
0
0 comments X

The pith

Adapting foundation models via feature disentanglement enables generalized category discovery under domain shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to extend generalized category discovery to scenarios where unlabelled data exhibits domain shifts in addition to semantic shifts from known classes. It develops three frameworks that adapt different foundation models by disentangling domain-specific and semantic features. This is important because real-world unlabelled data frequently spans multiple domains, causing standard single-domain methods to underperform. The approaches demonstrate that shared design principles of disentanglement and prompt tuning can be applied across vision and vision-language backbones for better results.

Core claim

Generalized Category Discovery under domain shifts is addressed by proposing HiLo for self-supervised vision models through multi-level feature extraction and mutual information minimization combined with PatchMix and curriculum sampling, HLPrompt which adds semantic-aware spatial prompt tuning, and VLPrompt for vision-language models using factorized textual prompts and cross-modal consistency regularization, all yielding consistent improvements over strong baselines in experiments on synthetic corruptions and real-world multi-domain shifts.

What carries the argument

The HiLo disentanglement process that extracts features at multiple levels and minimizes mutual information to separate domain noise from semantic content.

If this is right

  • Vision and vision-language foundation models can be effectively adapted for GCD tasks involving domain shifts using shared principles.
  • Multi-level feature disentanglement with mutual information minimization reliably improves performance on shifted data.
  • Semantic-aware prompt tuning suppresses domain and background noise to aid discovery.
  • Cross-modal regularization in vision-language models enhances consistency for category discovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar disentanglement strategies could be applied to other computer vision tasks affected by domain shifts, such as object detection in varying environments.
  • Exploring the combination of these methods with emerging foundation models might yield further gains in robustness.
  • Validating on datasets with more severe or novel domain shifts would test the limits of the approach.

Load-bearing premise

That multi-level feature disentanglement via mutual information minimization and semantic-aware prompt tuning can reliably separate domain noise from semantic content without discarding discriminative information needed for category discovery.

What would settle it

Experiments on a new set of domain-shifted datasets where the proposed methods show no advantage or underperform compared to standard GCD baselines without the disentanglement components would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.00906 by Hongjun Wang, Kai Han, Po Hu.

Figure 1
Figure 1. Figure 1: We study Generalized Category Discovery with domain shifts, where a model must categorize unlabelled instances that may come from both seen and unseen categories, and also from seen and novel domains. In the example, the model is given labels only for the images in green boxes and must categorize all remaining unlabelled images, including those from different domains (top rows) and novel categories (rightm… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of HiLo framwork (Section IV). Samples are drawn through our proposed curriculum sampling approach, considering the difficulty of each sample. Labelled and unlabelled samples are paired and augmented through PatchMix which we subtly adapt in the embedding space for contrastive learning for GCD. The mixed-up embeddings are then processed by our network with a high-level (for semantic) and low-level… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of HLPrompt and VLPrompt. We summarize both the pure-vision path (HLPrompt; building on HiLo) and the vision￾language path (VLPrompt) in this figure. The core HiLo components shared by both paths are detailed in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Generalized Category Discovery (GCD) aims to categorize unlabelled instances from both known and unknown classes by transferring knowledge from labelled data of known classes. Existing methods assume all data comes from a single domain, yet real-world unlabelled data often exhibits domain shifts alongside semantic shifts. We study GCD under domain shifts and propose three frameworks that adapt foundation models, ranging from self-supervised vision models to vision-language models. (i) HiLo disentangles domain and semantic features through multi-level feature extraction and mutual information minimization, combined with PatchMix augmentation and curriculum sampling. (ii) HLPrompt extends HiLo with semantic-aware spatial prompt tuning to suppress background and domain noise. (iii) VLPrompt leverages vision-language models via factorized textual prompts and cross-modal consistency regularization. The three methods share core design principles while operating on different foundation backbones, making them suitable for different deployment scenarios. Extensive experiments on synthetic corruptions and real-world multi-domain shifts demonstrate consistent improvements over strong baselines. Project page: https://visual-ai.github.io/hilo/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper studies Generalized Category Discovery (GCD) under domain shifts and introduces three frameworks adapting foundation models: HiLo, which disentangles domain and semantic features using multi-level extraction and mutual information minimization along with PatchMix augmentation and curriculum sampling; HLPrompt, extending HiLo with semantic-aware spatial prompt tuning; and VLPrompt, leveraging vision-language models through factorized textual prompts and cross-modal consistency regularization. The authors claim that these methods, sharing core design principles, achieve consistent improvements over strong baselines in experiments involving synthetic corruptions and real-world multi-domain shifts.

Significance. If the results hold, this work would be significant for extending GCD to realistic scenarios with both semantic and domain shifts. The provision of adaptable frameworks for different backbones (vision to VLM) is a positive aspect, allowing flexibility in deployment. The focus on disentanglement and prompt-based approaches could inspire further research in robust category discovery.

major comments (2)
  1. [Abstract] Abstract: The claim of 'consistent improvements over strong baselines' is not accompanied by any quantitative metrics, specific baseline comparisons, or ablation studies, which is critical for substantiating the effectiveness of the proposed disentanglement and regularization techniques.
  2. [§3 (HiLo framework)] §3 (HiLo framework): The mutual information minimization for multi-level feature disentanglement is load-bearing for the claim that domain noise is separated from semantic content without loss of discriminative signals for novel classes. However, no ablations isolate this component's contribution, and no metrics (e.g., mutual information with ground-truth labels) are mentioned to verify preservation of semantic information, particularly when domain shifts may correlate with class boundaries.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify how to better substantiate our claims. We provide point-by-point responses to the major comments below and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim of 'consistent improvements over strong baselines' is not accompanied by any quantitative metrics, specific baseline comparisons, or ablation studies, which is critical for substantiating the effectiveness of the proposed disentanglement and regularization techniques.

    Authors: We agree that the abstract would benefit from greater specificity to support the high-level claim. The current abstract is intentionally concise, with all quantitative results, baseline comparisons, and ablations reserved for Section 4 and the associated tables/figures. In the revised manuscript we will update the abstract to briefly reference the scale of improvements (e.g., consistent gains across synthetic and real multi-domain benchmarks) while directing readers to the detailed experimental evidence. This change preserves abstract length while addressing the concern. revision: yes

  2. Referee: [§3 (HiLo framework)] §3 (HiLo framework): The mutual information minimization for multi-level feature disentanglement is load-bearing for the claim that domain noise is separated from semantic content without loss of discriminative signals for novel classes. However, no ablations isolate this component's contribution, and no metrics (e.g., mutual information with ground-truth labels) are mentioned to verify preservation of semantic information, particularly when domain shifts may correlate with class boundaries.

    Authors: We concur that an isolated ablation of the mutual information (MI) minimization term and supporting disentanglement metrics would strengthen the paper. The manuscript currently ablates the full HiLo pipeline and auxiliary components (PatchMix, curriculum sampling), but does not isolate MI removal. We will add a targeted ablation that removes only the MI term and reports the resulting drop in GCD accuracy for both known and novel classes. We will also include quantitative verification metrics, such as estimated MI between learned features and domain labels (expected reduction) versus class labels (expected preservation). Finally, we will add a short analysis discussing the case where domain shifts correlate with class boundaries and its implications for the method. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposals with experimental validation

full rationale

The paper proposes three new frameworks (HiLo, HLPrompt, VLPrompt) for GCD under domain shifts, describing their components (multi-level feature extraction with MI minimization, PatchMix, prompt tuning, cross-modal regularization) and supporting performance claims solely via experiments on synthetic corruptions and real multi-domain data. No equations, predictions, or first-principles results are presented that reduce by construction to fitted parameters, self-definitions, or self-citation chains; improvements are reported as observed outcomes, not quantities defined from the inputs. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract supplies no explicit free parameters or invented entities; typical ML loss-balancing weights and prompt hyperparameters are implied but unspecified. Core assumption is that foundation-model features can be disentangled along domain versus semantic axes.

axioms (1)
  • domain assumption Foundation model features contain separable domain-style and semantic-content components that can be isolated by mutual information minimization and prompt tuning.
    Invoked as the basis for HiLo and prompt-based methods.

pith-pipeline@v0.9.0 · 5485 in / 1192 out tokens · 68750 ms · 2026-05-09T20:14:31.121152+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

137 extracted references · 11 canonical work pages · 5 internal anchors

  1. [1]

    Category discovery: An open-world perspective,

    Z. He, Y . Liu, and K. Han, “Category discovery: An open-world perspective,”arXiv preprint arXiv:2509.22542, 2025

  2. [2]

    Learning to discover novel visual categories via deep transfer clustering,

    K. Han, A. Vedaldi, and A. Zisserman, “Learning to discover novel visual categories via deep transfer clustering,” inICCV, 2019

  3. [3]

    Generalized category discovery,

    S. Vaze, K. Han, A. Vedaldi, and A. Zisserman, “Generalized category discovery,” inCVPR, 2022

  4. [4]

    Parametric classification for generalized category discovery: A baseline study,

    X. Wen, B. Zhao, and X. Qi, “Parametric classification for generalized category discovery: A baseline study,” inICCV, 2023

  5. [5]

    Promptcal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery,

    S. Zhang, S. Khan, Z. Shen, M. Naseer, G. Chen, and F. Khan, “Promptcal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery,” inCVPR, 2023

  6. [6]

    Dynamic conceptional contrastive learning for generalized category discovery,

    N. Pu, Z. Zhong, and N. Sebe, “Dynamic conceptional contrastive learning for generalized category discovery,” inCVPR, 2023

  7. [7]

    Domain-adversarial training of neural networks,

    Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavio- lette, M. Marchand, and V . Lempitsky, “Domain-adversarial training of neural networks,”JMLR, 2016

  8. [8]

    Deep transfer learning with joint adaptation networks,

    M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Deep transfer learning with joint adaptation networks,” inICML, 2017

  9. [9]

    Patch-mix transformer for unsupervised domain adaptation: A game perspective,

    J. Zhu, H. Bai, and L. Wang, “Patch-mix transformer for unsupervised domain adaptation: A game perspective,” inCVPR, 2023

  10. [10]

    Domain generalization with mixstyle,

    K. Zhou, Y . Yang, Y . Qiao, and T. Xiang, “Domain generalization with mixstyle,” inICLR, 2021

  11. [11]

    Gradient matching for domain generalization,

    Y . Shi, J. Seely, P. H. S. Torr, N. Siddharth, A. Hannun, N. Usunier, and G. Synnaeve, “Gradient matching for domain generalization,” in ICLR, 2022

  12. [12]

    Emerging properties in self-supervised vision transformers,

    M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inICCV, 2021

  13. [13]

    DINOv2: Learning Robust Visual Features without Supervision

    M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. J ´egou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without su...

  14. [14]

    DINOv3

    O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. J ´egou, P. Labatut, and P. Bojanowski, “DINOv3,”arXiv preprint arXiv:2508.10104, 2025

  15. [15]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” inICML, 2021

  16. [16]

    Normalized cuts and image segmentation,

    J. Shi and J. Malik, “Normalized cuts and image segmentation,”IEEE TPAMI, 2000

  17. [17]

    Hilo: A learning framework for generalized category discovery robust to domain shifts,

    H. Wang, S. Vaze, and K. Han, “Hilo: A learning framework for generalized category discovery robust to domain shifts,” inICLR, 2025

  18. [18]

    Automatically discovering and learning new visual categories with ranking statistics,

    K. Han, S.-A. Rebuffi, S. Ehrhardt, A. Vedaldi, and A. Zisserman, “Automatically discovering and learning new visual categories with ranking statistics,”ICLR, 2020

  19. [19]

    Autonovel: Automatically discovering and learning novel visual categories,

    ——, “Autonovel: Automatically discovering and learning novel visual categories,”IEEE TPAMI, 2021

  20. [20]

    Neighborhood contrastive learning for novel class discovery,

    Z. Zhong, E. Fini, S. Roy, Z. Luo, E. Ricci, and N. Sebe, “Neighborhood contrastive learning for novel class discovery,” inCVPR, 2021

  21. [21]

    A unified objective for novel class discovery,

    E. Fini, E. Sangineto, S. Lathuili `ere, Z. Zhong, M. Nabi, and E. Ricci, “A unified objective for novel class discovery,” inICCV, 2021. 13

  22. [22]

    Joint representation learning and novel category discovery on single- and multi-modal data,

    X. Jia, K. Han, Y . Zhu, and B. Green, “Joint representation learning and novel category discovery on single- and multi-modal data,” inICCV, 2021

  23. [23]

    Openmix: Reviving known knowledge for discovering novel visual categories in an open world,

    Z. Zhong, L. Zhu, Z. Luo, S. Li, Y . Yang, and N. Sebe, “Openmix: Reviving known knowledge for discovering novel visual categories in an open world,” inCVPR, 2021

  24. [24]

    Class-relation knowledge distillation for novel class discovery,

    P. Gu, C. Zhang, R. Xu, and X. He, “Class-relation knowledge distillation for novel class discovery,” inICCV, 2023

  25. [25]

    Novel visual category discovery with dual ranking statistics and mutual knowledge distillation,

    B. Zhao and K. Han, “Novel visual category discovery with dual ranking statistics and mutual knowledge distillation,” inNeurIPS, 2021

  26. [26]

    Novel class discovery without forgetting,

    K. J. Joseph, S. Paul, G. Aggarwal, S. Biswas, P. Rai, K. Han, and V . N. Balasubramanian, “Novel class discovery without forgetting,” in ECCV, 2022

  27. [27]

    Learning semi-supervised gaussian mixture models for generalized category discovery,

    B. Zhao, X. Wen, and K. Han, “Learning semi-supervised gaussian mixture models for generalized category discovery,” inICCV, 2023

  28. [28]

    Cipr: An efficient framework with cross-instance positive relations for generalized category discovery,

    S. Hao, K. Han, and K.-Y . K. Wong, “Cipr: An efficient framework with cross-instance positive relations for generalized category discovery,” TMLR, 2023

  29. [29]

    SEAL: Semantic-aware hierarchical learning for generalized category discovery,

    Z. He, Y . Liu, and K. Han, “SEAL: Semantic-aware hierarchical learning for generalized category discovery,” inNeurIPS, 2025

  30. [30]

    Hyperbolic category discovery,

    Y . Liu, Z. He, and K. Han, “Hyperbolic category discovery,” inCVPR, 2025

  31. [31]

    Debgcd: Debiased learning with distribution guidance for generalized category discovery,

    Y . Liu and K. Han, “Debgcd: Debiased learning with distribution guidance for generalized category discovery,” inICLR, 2025

  32. [32]

    Learn to categorize or categorize to learn? self-coding for generalized category discovery,

    S. Rastegar, H. Doughty, and C. Snoek, “Learn to categorize or categorize to learn? self-coding for generalized category discovery,” inNeurIPS, 2024

  33. [33]

    Partco: Part-level correspondence priors enhance category discovery,

    F. J. Cendra and K. Han, “Partco: Part-level correspondence priors enhance category discovery,”arXiv preprint arXiv:2509.22769, 2025

  34. [34]

    Clip- gcd: Simple language guided generalized category discov- ery.arXiv preprint arXiv:2305.10420, 2023

    R. Ouldnoughi, C.-W. Kuo, and Z. Kira, “CLIP-GCD: Simple language guided generalized category discovery,”arXiv preprint arXiv:2305.10420, 2023

  35. [35]

    Consistent prompt tuning for generalized category discovery,

    M. Yang, J. Yin, Y . Gu, C. Deng, H. Zhang, and H. Zhu, “Consistent prompt tuning for generalized category discovery,”IJCV, 2025

  36. [36]

    GET: Unlocking the multi-modal potential of CLIP for generalized category discovery,

    E. Wang, Z. Peng, Z. Xie, F. Yang, X. Liu, and M.-M. Cheng, “GET: Unlocking the multi-modal potential of CLIP for generalized category discovery,” inCVPR, 2025

  37. [37]

    Learning to prompt for vision-language models,

    K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,”IJCV, 2022

  38. [38]

    Conditional prompt learning for vision-language models,

    ——, “Conditional prompt learning for vision-language models,” in CVPR, 2022

  39. [39]

    Visual prompt tuning,

    M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” inECCV, 2022

  40. [40]

    Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning,

    H. Wang, S. Vaze, and K. Han, “Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning,” inICLR, 2024

  41. [41]

    Promptccd: Learning gaussian mixture prompt pool for continual category discovery,

    F. J. Cendra, B. Zhao, and K. Han, “Promptccd: Learning gaussian mixture prompt pool for continual category discovery,” inECCV, 2024

  42. [42]

    Generalized category discovery under domain shift: A frequency domain perspective,

    W. Feng and Z. Ge, “Generalized category discovery under domain shift: A frequency domain perspective,” inNeurIPS, 2025

  43. [43]

    Deep Domain Confusion: Maximizing for Domain Invariance

    E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain confusion: Maximizing for domain invariance,”arXiv preprint arXiv:1412.3474, 2014

  44. [44]

    Learning transferable features with deep adaptation networks,

    M. Long, Y . Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” inICML, 2015

  45. [45]

    Open set domain adaptation,

    P. Panareda Busto and J. Gall, “Open set domain adaptation,” inICCV, 2017

  46. [46]

    Deep coral: Correlation alignment for deep domain adaptation,

    B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” inECCV Workshop, 2016

  47. [47]

    Adversarial discriminative domain adaptation,

    E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” inCVPR, 2017

  48. [48]

    Discriminative adversarial domain adaptation,

    H. Tang and K. Jia, “Discriminative adversarial domain adaptation,” in AAAI, 2020

  49. [49]

    Conditional adversarial domain adaptation,

    M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial domain adaptation,” inNeurIPS, 2018

  50. [50]

    Sliced wasserstein discrepancy for unsupervised domain adaptation,

    C.-Y . Lee, T. Batra, M. H. Baig, and D. Ulbricht, “Sliced wasserstein discrepancy for unsupervised domain adaptation,” inCVPR, 2019

  51. [51]

    Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,

    Y . Zou, Z. Yu, B. V . Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” inECCV, 2018

  52. [52]

    Model adaptation: Unsupervised domain adaptation without source data,

    R. Li, Q. Jiao, W. Cao, H.-S. Wong, and S. Wu, “Model adaptation: Unsupervised domain adaptation without source data,” inCVPR, 2020

  53. [53]

    Fixbi: Bridging domain spaces for unsupervised domain adaptation,

    J. Na, H. Jung, H. J. Chang, and W. Hwang, “Fixbi: Bridging domain spaces for unsupervised domain adaptation,” inCVPR, 2021

  54. [54]

    Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation,

    L. Chen, H. Chen, Z. Wei, X. Jin, X. Tan, Y . Jin, and E. Chen, “Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation,” inCVPR, 2022

  55. [55]

    Cross-domain gradient discrepancy minimization for unsupervised domain adaptation,

    Z. Du, J. Li, H. Su, L. Zhu, and K. Lu, “Cross-domain gradient discrepancy minimization for unsupervised domain adaptation,” in CVPR, 2021

  56. [56]

    Minimum class confusion for versatile domain adaptation,

    Y . Jin, X. Wang, M. Long, and J. Wang, “Minimum class confusion for versatile domain adaptation,” inECCV, 2020

  57. [57]

    Learning to generalize: Meta-learning for domain generalization,

    D. Li, Y . Yang, Y .-Z. Song, and T. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” inAAAI, 2018

  58. [58]

    Domain generalization via invariant feature representation,

    K. Muandet, D. Balduzzi, and B. Sch ¨olkopf, “Domain generalization via invariant feature representation,” inICML, 2013

  59. [59]

    Swad: Domain generalization by seeking flat minima,

    J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y . Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” inNeurIPS, 2021

  60. [60]

    Domain generalization using causal matching,

    D. Mahajan, S. Tople, and A. Sharma, “Domain generalization using causal matching,” inICML, 2021

  61. [61]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inIROS, 2017

  62. [62]

    Representation Learning with Contrastive Predictive Coding

    A. v. d. Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, 2018

  63. [63]

    Dynamic few-shot visual learning without forgetting,

    S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” inCVPR, 2018

  64. [64]

    Masked siamese networks for label-efficient learning,

    M. Assran, M. Caron, I. Misra, P. Bojanowski, F. Bordes, P. Vincent, A. Joulin, M. Rabbat, and N. Ballas, “Masked siamese networks for label-efficient learning,” inECCV, 2022

  65. [65]

    Learning deep representations by mutual information estimation and maximization,

    R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y . Bengio, “Learning deep representations by mutual information estimation and maximization,” inICLR, 2018

  66. [66]

    Transmix: Attend to mix for vision transformers,

    J.-N. Chen, S. Sun, J. He, P. H. Torr, A. Yuille, and S. Bai, “Transmix: Attend to mix for vision transformers,” inCVPR, 2022

  67. [67]

    Rethinking the inception architecture for computer vision,

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR, 2016

  68. [68]

    Scheduled sampling for sequence prediction with recurrent neural networks,

    S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” inNeurIPS, 2015

  69. [69]

    Exploring visual prompts for adapting large- scale models

    H. Bahng, A. Jahanian, S. Sankaranarayanan, and P. Isola, “Explor- ing visual prompts for adapting large-scale models,”arXiv preprint arXiv:2203.17274, 2022

  70. [70]

    Open-set recognition: A good closed-set classifier is all you need,

    S. Vaze, K. Han, A. Vedaldi, and A. Zisserman, “Open-set recognition: A good closed-set classifier is all you need,” inICLR, 2021

  71. [71]

    Benchmarking neural network robustness to common corruptions and perturbations,

    D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” inICLR, 2019

  72. [72]

    The Caltech-UCSD Birds-200-2011 dataset,

    C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD Birds-200-2011 dataset,” California Institute of Tech- nology, Tech. Rep., 2011

  73. [73]

    3d object representations for fine-grained categorization,

    J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” inICCV workshop, 2013

  74. [74]

    Fine-Grained Visual Classification of Aircraft

    S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine- grained visual classification of aircraft,”arXiv preprint arXiv:1306.5151, 2013

  75. [75]

    Moment matching for multi-source domain adaptation,

    X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” inICCV, 2019

  76. [76]

    Open-world semi-supervised learning,

    K. Cao, M. Brbic, and J. Leskovec, “Open-world semi-supervised learning,” inICLR, 2021

  77. [77]

    In search of forgotten domain generaliza- tion,

    P. Mayilvahanan, R. S. Zimmermann, T. Wiedemer, E. Rusak, A. Juhos, M. Bethge, and W. Brendel, “In search of forgotten domain generaliza- tion,” inICLR, 2025

  78. [78]

    Does clip’s generalization performance mainly stem from high train-test similarity?

    P. Mayilvahanan, T. Wiedemer, E. Rusak, M. Bethge, and W. Brendel, “Does clip’s generalization performance mainly stem from high train-test similarity?” inICLR, 2024

  79. [79]

    Analysis of representations for domain adaptation,

    S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” inNeurIPS, 2006

  80. [80]

    A theory of learning from different domains,

    S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,”Machine Learning, 2010

Showing first 80 references.