arxiv: 2605.00906 · v1 · submitted 2026-04-29 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models

Hongjun Wang , Po Hu , Kai Han

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords generalized category discoverydomain shiftsfeature disentanglementprompt tuningvision-language modelsmutual informationfoundation models

0 comments

The pith

Adapting foundation models via feature disentanglement enables generalized category discovery under domain shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to extend generalized category discovery to scenarios where unlabelled data exhibits domain shifts in addition to semantic shifts from known classes. It develops three frameworks that adapt different foundation models by disentangling domain-specific and semantic features. This is important because real-world unlabelled data frequently spans multiple domains, causing standard single-domain methods to underperform. The approaches demonstrate that shared design principles of disentanglement and prompt tuning can be applied across vision and vision-language backbones for better results.

Core claim

Generalized Category Discovery under domain shifts is addressed by proposing HiLo for self-supervised vision models through multi-level feature extraction and mutual information minimization combined with PatchMix and curriculum sampling, HLPrompt which adds semantic-aware spatial prompt tuning, and VLPrompt for vision-language models using factorized textual prompts and cross-modal consistency regularization, all yielding consistent improvements over strong baselines in experiments on synthetic corruptions and real-world multi-domain shifts.

What carries the argument

The HiLo disentanglement process that extracts features at multiple levels and minimizes mutual information to separate domain noise from semantic content.

If this is right

Vision and vision-language foundation models can be effectively adapted for GCD tasks involving domain shifts using shared principles.
Multi-level feature disentanglement with mutual information minimization reliably improves performance on shifted data.
Semantic-aware prompt tuning suppresses domain and background noise to aid discovery.
Cross-modal regularization in vision-language models enhances consistency for category discovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar disentanglement strategies could be applied to other computer vision tasks affected by domain shifts, such as object detection in varying environments.
Exploring the combination of these methods with emerging foundation models might yield further gains in robustness.
Validating on datasets with more severe or novel domain shifts would test the limits of the approach.

Load-bearing premise

That multi-level feature disentanglement via mutual information minimization and semantic-aware prompt tuning can reliably separate domain noise from semantic content without discarding discriminative information needed for category discovery.

What would settle it

Experiments on a new set of domain-shifted datasets where the proposed methods show no advantage or underperform compared to standard GCD baselines without the disentanglement components would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.00906 by Hongjun Wang, Kai Han, Po Hu.

**Figure 1.** Figure 1: We study Generalized Category Discovery with domain shifts, where a model must categorize unlabelled instances that may come from both seen and unseen categories, and also from seen and novel domains. In the example, the model is given labels only for the images in green boxes and must categorize all remaining unlabelled images, including those from different domains (top rows) and novel categories (rightm… view at source ↗

**Figure 2.** Figure 2: Overview of HiLo framwork (Section IV). Samples are drawn through our proposed curriculum sampling approach, considering the difficulty of each sample. Labelled and unlabelled samples are paired and augmented through PatchMix which we subtly adapt in the embedding space for contrastive learning for GCD. The mixed-up embeddings are then processed by our network with a high-level (for semantic) and low-level… view at source ↗

**Figure 3.** Figure 3: Overview of HLPrompt and VLPrompt. We summarize both the pure-vision path (HLPrompt; building on HiLo) and the visionlanguage path (VLPrompt) in this figure. The core HiLo components shared by both paths are detailed in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Generalized Category Discovery (GCD) aims to categorize unlabelled instances from both known and unknown classes by transferring knowledge from labelled data of known classes. Existing methods assume all data comes from a single domain, yet real-world unlabelled data often exhibits domain shifts alongside semantic shifts. We study GCD under domain shifts and propose three frameworks that adapt foundation models, ranging from self-supervised vision models to vision-language models. (i) HiLo disentangles domain and semantic features through multi-level feature extraction and mutual information minimization, combined with PatchMix augmentation and curriculum sampling. (ii) HLPrompt extends HiLo with semantic-aware spatial prompt tuning to suppress background and domain noise. (iii) VLPrompt leverages vision-language models via factorized textual prompts and cross-modal consistency regularization. The three methods share core design principles while operating on different foundation backbones, making them suitable for different deployment scenarios. Extensive experiments on synthetic corruptions and real-world multi-domain shifts demonstrate consistent improvements over strong baselines. Project page: https://visual-ai.github.io/hilo/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives three practical frameworks for doing generalized category discovery when domain shifts are present, but the evidence for the core disentanglement step is still thin.

read the letter

The main point is that they identify a real gap in generalized category discovery—standard methods assume single-domain data, but real unlabeled sets often mix domain and semantic shifts—and then supply three related ways to adapt foundation models to it. HiLo disentangles domain and semantic features at multiple levels via mutual information minimization, plus PatchMix and curriculum sampling. HLPrompt layers semantic-aware spatial prompt tuning on top to suppress noise. VLPrompt moves to vision-language models with factorized textual prompts and cross-modal consistency. The three share the same design principles but target different backbones, which is a sensible way to cover deployment options. This is new; the cited prior work does not cover domain-shifted GCD in this form. The paper does a clean job stating the problem and giving concrete, implementable extensions that practitioners could try. The soft spots sit in the validation. The abstract states consistent gains over strong baselines on synthetic corruptions and real multi-domain shifts, yet the text supplies no numbers, no list of baselines, and no ablation isolating the mutual-information term from the augmentations or prompts. The stress-test concern is worth taking seriously: if domain shifts are not independent of class boundaries, minimizing mutual information between domain and semantic features can remove signals needed to cluster the unknown classes. Without targeted checks—such as preserved mutual information with ground-truth labels for novel classes or an ablation that turns the MI loss on and off—the reported improvements could be driven by the extra tricks rather than successful disentanglement. This is for vision researchers who already work on category discovery or domain adaptation with foundation models. A reader who wants ready-to-adapt ideas for handling mixed shifts will get something useful, even if they have to re-run the experiments themselves. It deserves peer review because the problem is practical and the proposed adaptations are specific enough to be tested, though the authors will need to add quantitative details and targeted ablations before it convinces.

Referee Report

2 major / 0 minor

Summary. The paper studies Generalized Category Discovery (GCD) under domain shifts and introduces three frameworks adapting foundation models: HiLo, which disentangles domain and semantic features using multi-level extraction and mutual information minimization along with PatchMix augmentation and curriculum sampling; HLPrompt, extending HiLo with semantic-aware spatial prompt tuning; and VLPrompt, leveraging vision-language models through factorized textual prompts and cross-modal consistency regularization. The authors claim that these methods, sharing core design principles, achieve consistent improvements over strong baselines in experiments involving synthetic corruptions and real-world multi-domain shifts.

Significance. If the results hold, this work would be significant for extending GCD to realistic scenarios with both semantic and domain shifts. The provision of adaptable frameworks for different backbones (vision to VLM) is a positive aspect, allowing flexibility in deployment. The focus on disentanglement and prompt-based approaches could inspire further research in robust category discovery.

major comments (2)

[Abstract] Abstract: The claim of 'consistent improvements over strong baselines' is not accompanied by any quantitative metrics, specific baseline comparisons, or ablation studies, which is critical for substantiating the effectiveness of the proposed disentanglement and regularization techniques.
[§3 (HiLo framework)] §3 (HiLo framework): The mutual information minimization for multi-level feature disentanglement is load-bearing for the claim that domain noise is separated from semantic content without loss of discriminative signals for novel classes. However, no ablations isolate this component's contribution, and no metrics (e.g., mutual information with ground-truth labels) are mentioned to verify preservation of semantic information, particularly when domain shifts may correlate with class boundaries.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify how to better substantiate our claims. We provide point-by-point responses to the major comments below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of 'consistent improvements over strong baselines' is not accompanied by any quantitative metrics, specific baseline comparisons, or ablation studies, which is critical for substantiating the effectiveness of the proposed disentanglement and regularization techniques.

Authors: We agree that the abstract would benefit from greater specificity to support the high-level claim. The current abstract is intentionally concise, with all quantitative results, baseline comparisons, and ablations reserved for Section 4 and the associated tables/figures. In the revised manuscript we will update the abstract to briefly reference the scale of improvements (e.g., consistent gains across synthetic and real multi-domain benchmarks) while directing readers to the detailed experimental evidence. This change preserves abstract length while addressing the concern. revision: yes
Referee: [§3 (HiLo framework)] §3 (HiLo framework): The mutual information minimization for multi-level feature disentanglement is load-bearing for the claim that domain noise is separated from semantic content without loss of discriminative signals for novel classes. However, no ablations isolate this component's contribution, and no metrics (e.g., mutual information with ground-truth labels) are mentioned to verify preservation of semantic information, particularly when domain shifts may correlate with class boundaries.

Authors: We concur that an isolated ablation of the mutual information (MI) minimization term and supporting disentanglement metrics would strengthen the paper. The manuscript currently ablates the full HiLo pipeline and auxiliary components (PatchMix, curriculum sampling), but does not isolate MI removal. We will add a targeted ablation that removes only the MI term and reports the resulting drop in GCD accuracy for both known and novel classes. We will also include quantitative verification metrics, such as estimated MI between learned features and domain labels (expected reduction) versus class labels (expected preservation). Finally, we will add a short analysis discussing the case where domain shifts correlate with class boundaries and its implications for the method. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposals with experimental validation

full rationale

The paper proposes three new frameworks (HiLo, HLPrompt, VLPrompt) for GCD under domain shifts, describing their components (multi-level feature extraction with MI minimization, PatchMix, prompt tuning, cross-modal regularization) and supporting performance claims solely via experiments on synthetic corruptions and real multi-domain data. No equations, predictions, or first-principles results are presented that reduce by construction to fitted parameters, self-definitions, or self-citation chains; improvements are reported as observed outcomes, not quantities defined from the inputs. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract supplies no explicit free parameters or invented entities; typical ML loss-balancing weights and prompt hyperparameters are implied but unspecified. Core assumption is that foundation-model features can be disentangled along domain versus semantic axes.

axioms (1)

domain assumption Foundation model features contain separable domain-style and semantic-content components that can be isolated by mutual information minimization and prompt tuning.
Invoked as the basis for HiLo and prompt-based methods.

pith-pipeline@v0.9.0 · 5485 in / 1192 out tokens · 68750 ms · 2026-05-09T20:14:31.121152+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

137 extracted references · 11 canonical work pages · 5 internal anchors

[1]

Category discovery: An open-world perspective,

Z. He, Y . Liu, and K. Han, “Category discovery: An open-world perspective,”arXiv preprint arXiv:2509.22542, 2025

work page arXiv 2025
[2]

Learning to discover novel visual categories via deep transfer clustering,

K. Han, A. Vedaldi, and A. Zisserman, “Learning to discover novel visual categories via deep transfer clustering,” inICCV, 2019

2019
[3]

Generalized category discovery,

S. Vaze, K. Han, A. Vedaldi, and A. Zisserman, “Generalized category discovery,” inCVPR, 2022

2022
[4]

Parametric classification for generalized category discovery: A baseline study,

X. Wen, B. Zhao, and X. Qi, “Parametric classification for generalized category discovery: A baseline study,” inICCV, 2023

2023
[5]

Promptcal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery,

S. Zhang, S. Khan, Z. Shen, M. Naseer, G. Chen, and F. Khan, “Promptcal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery,” inCVPR, 2023

2023
[6]

Dynamic conceptional contrastive learning for generalized category discovery,

N. Pu, Z. Zhong, and N. Sebe, “Dynamic conceptional contrastive learning for generalized category discovery,” inCVPR, 2023

2023
[7]

Domain-adversarial training of neural networks,

Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavio- lette, M. Marchand, and V . Lempitsky, “Domain-adversarial training of neural networks,”JMLR, 2016

2016
[8]

Deep transfer learning with joint adaptation networks,

M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Deep transfer learning with joint adaptation networks,” inICML, 2017

2017
[9]

Patch-mix transformer for unsupervised domain adaptation: A game perspective,

J. Zhu, H. Bai, and L. Wang, “Patch-mix transformer for unsupervised domain adaptation: A game perspective,” inCVPR, 2023

2023
[10]

Domain generalization with mixstyle,

K. Zhou, Y . Yang, Y . Qiao, and T. Xiang, “Domain generalization with mixstyle,” inICLR, 2021

2021
[11]

Gradient matching for domain generalization,

Y . Shi, J. Seely, P. H. S. Torr, N. Siddharth, A. Hannun, N. Usunier, and G. Synnaeve, “Gradient matching for domain generalization,” in ICLR, 2022

2022
[12]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inICCV, 2021

2021
[13]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. J ´egou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without su...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

DINOv3

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. J ´egou, P. Labatut, and P. Bojanowski, “DINOv3,”arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” inICML, 2021

2021
[16]

Normalized cuts and image segmentation,

J. Shi and J. Malik, “Normalized cuts and image segmentation,”IEEE TPAMI, 2000

2000
[17]

Hilo: A learning framework for generalized category discovery robust to domain shifts,

H. Wang, S. Vaze, and K. Han, “Hilo: A learning framework for generalized category discovery robust to domain shifts,” inICLR, 2025

2025
[18]

Automatically discovering and learning new visual categories with ranking statistics,

K. Han, S.-A. Rebuffi, S. Ehrhardt, A. Vedaldi, and A. Zisserman, “Automatically discovering and learning new visual categories with ranking statistics,”ICLR, 2020

2020
[19]

Autonovel: Automatically discovering and learning novel visual categories,

——, “Autonovel: Automatically discovering and learning novel visual categories,”IEEE TPAMI, 2021

2021
[20]

Neighborhood contrastive learning for novel class discovery,

Z. Zhong, E. Fini, S. Roy, Z. Luo, E. Ricci, and N. Sebe, “Neighborhood contrastive learning for novel class discovery,” inCVPR, 2021

2021
[21]

A unified objective for novel class discovery,

E. Fini, E. Sangineto, S. Lathuili `ere, Z. Zhong, M. Nabi, and E. Ricci, “A unified objective for novel class discovery,” inICCV, 2021. 13

2021
[22]

Joint representation learning and novel category discovery on single- and multi-modal data,

X. Jia, K. Han, Y . Zhu, and B. Green, “Joint representation learning and novel category discovery on single- and multi-modal data,” inICCV, 2021

2021
[23]

Openmix: Reviving known knowledge for discovering novel visual categories in an open world,

Z. Zhong, L. Zhu, Z. Luo, S. Li, Y . Yang, and N. Sebe, “Openmix: Reviving known knowledge for discovering novel visual categories in an open world,” inCVPR, 2021

2021
[24]

Class-relation knowledge distillation for novel class discovery,

P. Gu, C. Zhang, R. Xu, and X. He, “Class-relation knowledge distillation for novel class discovery,” inICCV, 2023

2023
[25]

Novel visual category discovery with dual ranking statistics and mutual knowledge distillation,

B. Zhao and K. Han, “Novel visual category discovery with dual ranking statistics and mutual knowledge distillation,” inNeurIPS, 2021

2021
[26]

Novel class discovery without forgetting,

K. J. Joseph, S. Paul, G. Aggarwal, S. Biswas, P. Rai, K. Han, and V . N. Balasubramanian, “Novel class discovery without forgetting,” in ECCV, 2022

2022
[27]

Learning semi-supervised gaussian mixture models for generalized category discovery,

B. Zhao, X. Wen, and K. Han, “Learning semi-supervised gaussian mixture models for generalized category discovery,” inICCV, 2023

2023
[28]

Cipr: An efficient framework with cross-instance positive relations for generalized category discovery,

S. Hao, K. Han, and K.-Y . K. Wong, “Cipr: An efficient framework with cross-instance positive relations for generalized category discovery,” TMLR, 2023

2023
[29]

SEAL: Semantic-aware hierarchical learning for generalized category discovery,

Z. He, Y . Liu, and K. Han, “SEAL: Semantic-aware hierarchical learning for generalized category discovery,” inNeurIPS, 2025

2025
[30]

Hyperbolic category discovery,

Y . Liu, Z. He, and K. Han, “Hyperbolic category discovery,” inCVPR, 2025

2025
[31]

Debgcd: Debiased learning with distribution guidance for generalized category discovery,

Y . Liu and K. Han, “Debgcd: Debiased learning with distribution guidance for generalized category discovery,” inICLR, 2025

2025
[32]

Learn to categorize or categorize to learn? self-coding for generalized category discovery,

S. Rastegar, H. Doughty, and C. Snoek, “Learn to categorize or categorize to learn? self-coding for generalized category discovery,” inNeurIPS, 2024

2024
[33]

Partco: Part-level correspondence priors enhance category discovery,

F. J. Cendra and K. Han, “Partco: Part-level correspondence priors enhance category discovery,”arXiv preprint arXiv:2509.22769, 2025

work page arXiv 2025
[34]

Clip- gcd: Simple language guided generalized category discov- ery.arXiv preprint arXiv:2305.10420, 2023

R. Ouldnoughi, C.-W. Kuo, and Z. Kira, “CLIP-GCD: Simple language guided generalized category discovery,”arXiv preprint arXiv:2305.10420, 2023

work page arXiv 2023
[35]

Consistent prompt tuning for generalized category discovery,

M. Yang, J. Yin, Y . Gu, C. Deng, H. Zhang, and H. Zhu, “Consistent prompt tuning for generalized category discovery,”IJCV, 2025

2025
[36]

GET: Unlocking the multi-modal potential of CLIP for generalized category discovery,

E. Wang, Z. Peng, Z. Xie, F. Yang, X. Liu, and M.-M. Cheng, “GET: Unlocking the multi-modal potential of CLIP for generalized category discovery,” inCVPR, 2025

2025
[37]

Learning to prompt for vision-language models,

K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,”IJCV, 2022

2022
[38]

Conditional prompt learning for vision-language models,

——, “Conditional prompt learning for vision-language models,” in CVPR, 2022

2022
[39]

Visual prompt tuning,

M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” inECCV, 2022

2022
[40]

Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning,

H. Wang, S. Vaze, and K. Han, “Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning,” inICLR, 2024

2024
[41]

Promptccd: Learning gaussian mixture prompt pool for continual category discovery,

F. J. Cendra, B. Zhao, and K. Han, “Promptccd: Learning gaussian mixture prompt pool for continual category discovery,” inECCV, 2024

2024
[42]

Generalized category discovery under domain shift: A frequency domain perspective,

W. Feng and Z. Ge, “Generalized category discovery under domain shift: A frequency domain perspective,” inNeurIPS, 2025

2025
[43]

Deep Domain Confusion: Maximizing for Domain Invariance

E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain confusion: Maximizing for domain invariance,”arXiv preprint arXiv:1412.3474, 2014

work page Pith review arXiv 2014
[44]

Learning transferable features with deep adaptation networks,

M. Long, Y . Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” inICML, 2015

2015
[45]

Open set domain adaptation,

P. Panareda Busto and J. Gall, “Open set domain adaptation,” inICCV, 2017

2017
[46]

Deep coral: Correlation alignment for deep domain adaptation,

B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” inECCV Workshop, 2016

2016
[47]

Adversarial discriminative domain adaptation,

E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” inCVPR, 2017

2017
[48]

Discriminative adversarial domain adaptation,

H. Tang and K. Jia, “Discriminative adversarial domain adaptation,” in AAAI, 2020

2020
[49]

Conditional adversarial domain adaptation,

M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial domain adaptation,” inNeurIPS, 2018

2018
[50]

Sliced wasserstein discrepancy for unsupervised domain adaptation,

C.-Y . Lee, T. Batra, M. H. Baig, and D. Ulbricht, “Sliced wasserstein discrepancy for unsupervised domain adaptation,” inCVPR, 2019

2019
[51]

Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,

Y . Zou, Z. Yu, B. V . Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” inECCV, 2018

2018
[52]

Model adaptation: Unsupervised domain adaptation without source data,

R. Li, Q. Jiao, W. Cao, H.-S. Wong, and S. Wu, “Model adaptation: Unsupervised domain adaptation without source data,” inCVPR, 2020

2020
[53]

Fixbi: Bridging domain spaces for unsupervised domain adaptation,

J. Na, H. Jung, H. J. Chang, and W. Hwang, “Fixbi: Bridging domain spaces for unsupervised domain adaptation,” inCVPR, 2021

2021
[54]

Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation,

L. Chen, H. Chen, Z. Wei, X. Jin, X. Tan, Y . Jin, and E. Chen, “Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation,” inCVPR, 2022

2022
[55]

Cross-domain gradient discrepancy minimization for unsupervised domain adaptation,

Z. Du, J. Li, H. Su, L. Zhu, and K. Lu, “Cross-domain gradient discrepancy minimization for unsupervised domain adaptation,” in CVPR, 2021

2021
[56]

Minimum class confusion for versatile domain adaptation,

Y . Jin, X. Wang, M. Long, and J. Wang, “Minimum class confusion for versatile domain adaptation,” inECCV, 2020

2020
[57]

Learning to generalize: Meta-learning for domain generalization,

D. Li, Y . Yang, Y .-Z. Song, and T. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” inAAAI, 2018

2018
[58]

Domain generalization via invariant feature representation,

K. Muandet, D. Balduzzi, and B. Sch ¨olkopf, “Domain generalization via invariant feature representation,” inICML, 2013

2013
[59]

Swad: Domain generalization by seeking flat minima,

J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y . Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” inNeurIPS, 2021

2021
[60]

Domain generalization using causal matching,

D. Mahajan, S. Tople, and A. Sharma, “Domain generalization using causal matching,” inICML, 2021

2021
[61]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inIROS, 2017

2017
[62]

Representation Learning with Contrastive Predictive Coding

A. v. d. Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[63]

Dynamic few-shot visual learning without forgetting,

S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” inCVPR, 2018

2018
[64]

Masked siamese networks for label-efficient learning,

M. Assran, M. Caron, I. Misra, P. Bojanowski, F. Bordes, P. Vincent, A. Joulin, M. Rabbat, and N. Ballas, “Masked siamese networks for label-efficient learning,” inECCV, 2022

2022
[65]

Learning deep representations by mutual information estimation and maximization,

R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y . Bengio, “Learning deep representations by mutual information estimation and maximization,” inICLR, 2018

2018
[66]

Transmix: Attend to mix for vision transformers,

J.-N. Chen, S. Sun, J. He, P. H. Torr, A. Yuille, and S. Bai, “Transmix: Attend to mix for vision transformers,” inCVPR, 2022

2022
[67]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR, 2016

2016
[68]

Scheduled sampling for sequence prediction with recurrent neural networks,

S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” inNeurIPS, 2015

2015
[69]

Exploring visual prompts for adapting large- scale models

H. Bahng, A. Jahanian, S. Sankaranarayanan, and P. Isola, “Explor- ing visual prompts for adapting large-scale models,”arXiv preprint arXiv:2203.17274, 2022

work page arXiv 2022
[70]

Open-set recognition: A good closed-set classifier is all you need,

S. Vaze, K. Han, A. Vedaldi, and A. Zisserman, “Open-set recognition: A good closed-set classifier is all you need,” inICLR, 2021

2021
[71]

Benchmarking neural network robustness to common corruptions and perturbations,

D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” inICLR, 2019

2019
[72]

The Caltech-UCSD Birds-200-2011 dataset,

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD Birds-200-2011 dataset,” California Institute of Tech- nology, Tech. Rep., 2011

2011
[73]

3d object representations for fine-grained categorization,

J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” inICCV workshop, 2013

2013
[74]

Fine-Grained Visual Classification of Aircraft

S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine- grained visual classification of aircraft,”arXiv preprint arXiv:1306.5151, 2013

work page internal anchor Pith review arXiv 2013
[75]

Moment matching for multi-source domain adaptation,

X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” inICCV, 2019

2019
[76]

Open-world semi-supervised learning,

K. Cao, M. Brbic, and J. Leskovec, “Open-world semi-supervised learning,” inICLR, 2021

2021
[77]

In search of forgotten domain generaliza- tion,

P. Mayilvahanan, R. S. Zimmermann, T. Wiedemer, E. Rusak, A. Juhos, M. Bethge, and W. Brendel, “In search of forgotten domain generaliza- tion,” inICLR, 2025

2025
[78]

Does clip’s generalization performance mainly stem from high train-test similarity?

P. Mayilvahanan, T. Wiedemer, E. Rusak, M. Bethge, and W. Brendel, “Does clip’s generalization performance mainly stem from high train-test similarity?” inICLR, 2024

2024
[79]

Analysis of representations for domain adaptation,

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” inNeurIPS, 2006

2006
[80]

A theory of learning from different domains,

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,”Machine Learning, 2010

2010

Showing first 80 references.