Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement

De Cheng; Dongsheng Li; Lingfeng He; Nannan Wang; Xiangqian Zhao; Xinyang Jiang; Zhipeng Xu; Zilong Wang

arxiv: 2606.24161 · v1 · pith:INVXRWYQnew · submitted 2026-06-23 · 💻 cs.CV

Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement

Xiangqian Zhao , Xinyang Jiang , Zhipeng Xu , Lingfeng He , Zilong Wang , Dongsheng Li , De Cheng , Nannan Wang This is my paper

Pith reviewed 2026-06-26 01:05 UTC · model grok-4.3

classification 💻 cs.CV

keywords debiasingfoundation modelsspurious correlationsdiffusion modelsprompt tuningworst group accuracygroup unsupervised learning

0 comments

The pith

Dual-branch cross-projection removes spurious features via diffusion-disentangled concepts without group labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Foundation models trained on biased data latch onto non-causal attributes and fail on minority groups when group labels are absent. The paper first uses diffusion models to disentangle concept representations and mine reliable spurious attributes through a confidence-guided process. It then applies a dual-branch prompt-tuning scheme that isolates target and spurious representations and removes the latter via cross null-space projection. This yields state-of-the-art worst-group accuracy on four benchmarks while updating at most 0.22 percent of model parameters.

Core claim

Confidence-guided Bias Concept Mining (CBCM) extracts semantically aligned spurious attributes from diffusion-disentangled representations without annotations; Dual-branch Cross-projection Debiasing (DCD) then separates target and spurious features into parallel branches and explicitly nulls spurious directions while preserving target semantics.

What carries the argument

Dual-branch Cross-projection Debiasing (DCD) paired with Confidence-guided Bias Concept Mining (CBCM), where diffusion disentanglement supplies pseudo-supervision and cross null-space projection removes spurious information across branches.

If this is right

Group-unsupervised debiasing becomes feasible on any foundation model by tuning a tiny prompt subset.
Worst-group performance improves without explicit group or attribute labels at training time.
The same pipeline applies across multiple vision benchmarks with consistent gains among unsupervised methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on language or multimodal models where diffusion-style disentanglement is replaced by other generative priors.
If the mined concepts prove stable across datasets, they might serve as reusable bias detectors for downstream auditing tasks.

Load-bearing premise

Diffusion-disentangled concept representations can identify spurious attributes that match real-world biases without any attribute annotations.

What would settle it

A controlled test on a dataset whose known spurious correlations the diffusion model fails to separate, showing no gain in worst-group accuracy over a single-branch baseline.

Figures

Figures reproduced from arXiv: 2606.24161 by De Cheng, Dongsheng Li, Lingfeng He, Nannan Wang, Xiangqian Zhao, Xinyang Jiang, Zhipeng Xu, Zilong Wang.

**Figure 2.** Figure 2: Overview of the proposed framework. The framework consists of two stages: bias concept mining and dual-branch [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Group-wise recall of the pseudo spurious labels [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative analysis of concept manipulation and quantized reconstruction in diffusion generation. Middle: original [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of the sampling exponent 𝛼 in the weighted sampling strategy across Waterbirds, CelebA, and MetaShift. Results are averaged over three random seeds, and the error bars denote the standard deviation across different runs. over-emphasized, the model may overfit these samples and lose its ability to generalize well across all groups. Across all three datasets, the best worst-group accuracy is achieved … view at source ↗

**Figure 6.** Figure 6: Additional qualitative results of concept manipulation and quantized reconstruction on CMNIST [ [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Waterbirds GradCAM [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

read the original abstract

Foundation models trained on biased datasets often rely on spurious correlations between target labels and non-causal attributes, resulting in poor generalization on minority groups. Bias mitigation remains challenging due to two fundamental issues. First, when group labels are unavailable, existing group-unsupervised methods typically infer spurious attributes implicitly from model behavior, making it difficult to identify spurious factors that are semantically aligned with real-world biases. Second, even with pseudo spurious supervision, most existing debiasing methods follow a single-branch design that operates within a single shared feature space, where target and spurious attributes are intrinsically entangled. To address the first challenge, we introduce Confidence-guided Bias Concept Mining (CBCM), which leverages diffusion-disentangled, semantically grounded concept representations to identify reliable spurious attributes without attribute annotations. To address the second challenge, we propose Dual-branch Cross-projection Debiasing (DCD), a prompt-tuning framework that separates target and spurious representations into two branches and explicitly removes spurious information through cross null-space projection while preserving target-relevant semantics. Extensive experiments on four benchmark datasets show that our method achieves state-of-the-art worst group accuracy among group-unsupervised approaches, while tuning at most 0.22% of the model parameters. The source code is available in the supplementary materials.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's gains depend on whether diffusion-mined concepts actually capture the target biases without any validation step.

read the letter

The core new elements are CBCM, which mines spurious concepts from diffusion-disentangled representations using confidence guidance, and DCD, which applies dual-branch prompt tuning with cross null-space projection to remove spurious directions while keeping target semantics. This pairing is presented as addressing both the lack of group labels and the entanglement problem in single-branch debiasing.

The work does a clean job of keeping the tunable parameters tiny (0.22%) and running on four standard benchmarks with a claim of leading worst-group accuracy among unsupervised methods. The two-stage structure is described as independent of the evaluation metrics, which avoids obvious circularity.

The soft spot is exactly the one flagged in the stress test. Diffusion models surface many factors, and nothing in the abstract or described setup shows an independent check that the mined concepts line up with the actual spurious attributes on datasets like Waterbirds or CelebA. If that mapping is noisy or off, the cross-projection step has nothing reliable to project out. The paper would be stronger with even a small human validation or comparison to known bias factors.

This is aimed at researchers working on group-unsupervised fairness in vision foundation models. A reader already following prompt-tuning or diffusion-based disentanglement could extract usable ideas, but the central assumption needs scrutiny.

I would send it to peer review. The components are distinct enough from prior single-branch or implicit methods that referees can evaluate whether the concept alignment holds up in the full experiments and ablations.

Referee Report

2 major / 2 minor

Summary. The paper claims that foundation models suffer from spurious correlations when group labels are unavailable, and proposes two components to address this: Confidence-guided Bias Concept Mining (CBCM), which uses diffusion models to produce disentangled concept representations for identifying spurious attributes without annotations, and Dual-branch Cross-projection Debiasing (DCD), a prompt-tuning method that separates target and spurious features into dual branches and applies cross null-space projection to remove spurious information. Extensive experiments on four benchmarks are reported to achieve state-of-the-art worst-group accuracy among group-unsupervised methods while tuning at most 0.22% of parameters, with code released.

Significance. If the central claims hold, the work would be significant for group-unsupervised bias mitigation by offering an explicit mechanism to surface semantically grounded spurious factors via diffusion disentanglement and to enforce separation via dual-branch projection, rather than implicit inference within a shared space. The parameter efficiency and reproducibility via released code are additional strengths that would make the approach practically attractive if the gains prove robust.

major comments (2)

[§3] §3 (CBCM): The central claim that diffusion-disentangled concepts yield reliable, semantically aligned spurious attributes without any attribute annotations or validation is load-bearing for attributing the reported worst-group gains to the method. No independent check (human validation of mined concepts, alignment with known bias factors on Waterbirds/CelebA, or ablation replacing mined concepts with random directions) is described that would confirm the mapping holds rather than surfacing unrelated factors such as lighting or style.
[§4] §4 (DCD and experiments): The cross null-space projection in the dual-branch setup is presented as explicitly removing spurious information while preserving target semantics, but the manuscript provides no quantitative verification (e.g., via concept activation vectors or post-projection spurious correlation metrics) that the projection direction identified by CBCM is the correct one; if the mined direction is misaligned, the worst-group improvements reduce to an unverified assumption.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a brief statement of the precise datasets used and the definition of 'group-unsupervised' to avoid ambiguity with related work.
[§4] Notation for the null-space projection operator and the two branches should be introduced with a single equation or diagram for clarity before the experimental results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for stronger validation of the semantic alignment in CBCM and the effectiveness of the projection in DCD. We address each point below and commit to revisions that add the requested checks where feasible.

read point-by-point responses

Referee: [§3] §3 (CBCM): The central claim that diffusion-disentangled concepts yield reliable, semantically aligned spurious attributes without any attribute annotations or validation is load-bearing for attributing the reported worst-group gains to the method. No independent check (human validation of mined concepts, alignment with known bias factors on Waterbirds/CelebA, or ablation replacing mined concepts with random directions) is described that would confirm the mapping holds rather than surfacing unrelated factors such as lighting or style.

Authors: We agree that direct validation of the mined concepts would strengthen the attribution of gains to CBCM. The original manuscript relies on end-to-end worst-group accuracy on benchmarks with established spurious factors (Waterbirds background, CelebA hair color) as indirect support. We will add in revision: (1) an ablation replacing CBCM-mined directions with random vectors, and (2) qualitative examples of mined concepts on Waterbirds/CelebA showing alignment with documented biases. Human validation is inherently subjective and was not performed; we view the random-direction ablation as the most objective check. revision: yes
Referee: [§4] §4 (DCD and experiments): The cross null-space projection in the dual-branch setup is presented as explicitly removing spurious information while preserving target semantics, but the manuscript provides no quantitative verification (e.g., via concept activation vectors or post-projection spurious correlation metrics) that the projection direction identified by CBCM is the correct one; if the mined direction is misaligned, the worst-group improvements reduce to an unverified assumption.

Authors: We acknowledge the absence of direct post-projection metrics in the original submission. The dual-branch design and cross-projection are motivated by the separation of target and spurious branches, with gains measured via worst-group accuracy. In the revision we will add quantitative verification: correlation between the projected features and known spurious attributes (where group labels are available for analysis) and concept activation vector similarity before/after projection. This will be reported on at least two benchmarks. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The provided abstract and description introduce CBCM for mining spurious attributes via diffusion-disentangled representations and DCD for cross-projection debiasing as independent methodological contributions. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, or ansatzes imported from prior author work are present in the text. The performance claims rest on external benchmark experiments rather than reducing to quantities defined by the method's own inputs. The derivation chain is therefore self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the premise that diffusion models produce semantically grounded disentangled representations usable for bias mining and that cross null-space projection cleanly separates target from spurious signals; both are domain assumptions not derived in the abstract.

axioms (2)

domain assumption Diffusion models can produce disentangled, semantically grounded concept representations that align with real-world spurious attributes.
Invoked to justify CBCM without attribute annotations.
domain assumption Target and spurious attributes remain intrinsically entangled in a single shared feature space, necessitating a dual-branch design.
Stated as the motivation for DCD.

pith-pipeline@v0.9.1-grok · 5772 in / 1273 out tokens · 22575 ms · 2026-06-26T01:05:11.642118+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 2 canonical work pages

[1]

Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. 2021. Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers). 7319–7328

2021
[3]

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2020. Invariant Risk Minimization. arXiv:1907.02893 [stat.ML] https://arxiv.org/abs/ 1907.02893

Pith/arXiv arXiv 2020
[4]

Ihab Asaad, Maha Shadaydeh, and Joachim Denzler. 2025. Gradient Extrapolation for Debiased Representation Learning. arXiv:2503.13236 [cs.LG] https://arxiv. org/abs/2503.13236

arXiv 2025
[5]

Saeid Asgari, Aliasghar Khani, Fereshte Khani, Ali Gholami, Linh Tran, Ali Mahdavi-Amiri, and Ghassan Hamarneh. 2022. MaskTune: Mitigating Spurious Correlations by Forcing to Explore. InAdvances in Neural Information Processing Systems

2022
[6]

Rwiddhi Chakraborty, Adrian Sletten, and Michael C Kampffmeyer. 2024. Exmap: Leveraging explainability heatmaps for unsupervised group robustness to spuri- ous correlations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12017–12026

2024
[8]

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. 2022. Adaptformer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems35 (2022), 16664–16678

2022
[9]

De Cheng, Haichun Tai, Nannan Wang, Xiangqian Zhao, Jie Li, and Xinbo Gao
[10]

A Multi-Granularity Scene-Aware Graph Convolution Method for Weakly Supervised Person Search.International Journal of Computer Vision134, 1 (2026), 27

2026
[11]

De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, and Xinbo Gao. 2026. Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence(2026)

2026
[12]

De Cheng, Zhipeng Xu, Xinyang Jiang, Nannan Wang, Dongsheng Li, and Xinbo Gao. 2024. Disentangled prompt representation for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 23595–23604

2024
[13]

De Cheng, Mingyue Zeng, Zhipeng Xu, Di Xu, Nannan Wang, and Xinbo Gao
[14]

InProceedings of the Fourteenth Inter- national Conference on Learning Representations

Interference-Isolated Elastic Weight Consolidation and Knowledge Cali- bration for Incremental Object Detection. InProceedings of the Fourteenth Inter- national Conference on Learning Representations. https://openreview.net/forum? id=VrXdmCjni4
[15]

Elliot Creager, Jörn-Henrik Jacobsen, and Richard Zemel. 2021. Environment Inference for Invariant Learning. arXiv:2010.07249 [cs.LG] https://arxiv.org/abs/ 2010.07249

arXiv 2021
[16]

Yihe Deng, Yu Yang, Baharan Mirzasoleiman, and Quanquan Gu. 2023. Ro- bust Learning with Progressive Data Expansion Against Spurious Correlation. arXiv:2306.04949 [cs.LG] https://arxiv.org/abs/2306.04949

arXiv 2023
[17]

Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J Clark, and Mehdi Rezagholizadeh. 2025. KronA: Parameter-efficient tuning with kronecker adapter. InEnhancing LLM Performance: Efficacy, Fine-Tuning, and Inference Techniques. Springer, 49–65

2025
[18]

Yujin Han and Difan Zou. 2024. Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference. arXiv:2404.13815 [cs.LG] https: //arxiv.org/abs/2404.13815

arXiv 2024
[19]

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. 2024. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608(2024)

Pith/arXiv arXiv 2024
[20]

Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, and Bohan Zhuang. 2023. Sensitivity-aware visual parameter-efficient fine-tuning. InProceedings of the IEEE/CVF international conference on computer vision. 11825–11835

2023
[21]

Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2021. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366(2021)

arXiv 2021
[22]

Lingfeng He, De Cheng, Huaijie Wang, Xi Yang, Nannan Wang, and Xinbo Gao. 2026. Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning. arXiv:2603.00191 [cs.LG] https: //arxiv.org/abs/2603.00191

Pith/arXiv arXiv 2026
[23]

Lingfeng He, De Cheng, Di Xu, Huaijie Wang, and Nannan Wang. 2026. Harness- ing textual semantic priors for knowledge transfer and refinement in clip-driven continual learning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 21645–21653

2026
[24]

Badr Youbi Idrissi, Martin Arjovsky, Mohammad Pezeshki, and David Lopez- Paz. 2022. Simple data balancing achieves competitive worst-group-accuracy. arXiv:2110.14503 [cs.LG] https://arxiv.org/abs/2110.14503

arXiv 2022
[25]

Pavel Izmailov, Polina Kirichenko, Nate Gruver, and Andrew Gordon Wil- son. 2022. On Feature Learning in the Presence of Spurious Correlations. arXiv:2210.11369 [cs.LG] https://arxiv.org/abs/2210.11369

arXiv 2022
[26]

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. InEuro- pean conference on computer vision. Springer, 709–727

2022
[27]

Sangwon Jung, Sanghyuk Chun, and Taesup Moon. 2022. Learning Fair Classifiers with Partially Annotated Group Labels. arXiv:2111.14581 [cs.LG] https://arxiv. org/abs/2111.14581

arXiv 2022
[28]

Polina Kirichenko, Pavel Izmailov, and Andrew Gordon Wilson. 2023. Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations. arXiv:2204.02937 [cs.LG] https://arxiv.org/abs/2204.02937

arXiv 2023
[29]

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, Dominica...

2021
[30]

doi:10.18653/v1/2021.emnlp-main.243

work page doi:10.18653/v1/2021.emnlp-main.243 2021
[31]

Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Associa- tion for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Ass...

work page doi:10.18653/v1/2021.acl-long.353 2021
[32]

Weixin Liang and James Zou. 2022. Metashift: A dataset of datasets for eval- uating contextual distribution shifts and training conflicts.arXiv preprint arXiv:2202.06523(2022)

arXiv 2022
[33]

Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, and Chelsea Finn

Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, and Chelsea Finn. 2021. Just Train Twice: Improving Group Robustness without Training Group Informa- tion. arXiv:2107.09044 [cs.LG] https://arxiv.org/abs/2107.09044

arXiv 2021
[34]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. InProceedings of the IEEE international conference on computer vision. 3730–3738

2015
[35]

Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, and Pascal Vincent. 2025. Compositional Risk Minimization. arXiv:2410.06303 [cs.LG] https://arxiv.org/abs/2410.06303

arXiv 2025
[36]

Junhyun Nam, Hyuntak Cha, Sungsoo Ahn, Jaeho Lee, and Jinwoo Shin. 2020. Learning from Failure: Training Debiased Classifier from Biased Classifier. arXiv:2007.02561 [cs.LG] https://arxiv.org/abs/2007.02561

arXiv 2020
[37]

Junhyun Nam, Jaehyung Kim, Jaeho Lee, and Jinwoo Shin. 2022. Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation. arXiv:2204.02070 [cs.LG] https://arxiv.org/abs/2204.02070

arXiv 2022
[38]

Sungho Park, Jewook Lee, Pilhyeon Lee, Sunhee Hwang, Dohyung Kim, and Hyeran Byun. 2022. Fair Contrastive Learning for Facial Attribute Classification. arXiv:2203.16209 [cs.CV] https://arxiv.org/abs/2203.16209

arXiv 2022
[39]

Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, and David Lopez-Paz. 2024. Discovering environments with XRM. arXiv:2309.16748 [cs.LG] https://arxiv.org/abs/2309.16748

arXiv 2024
[40]

Hoang Phan, Andrew Gordon Wilson, and Qi Lei. 2024. Controllable Prompt Tuning For Balancing Group Distributional Robustness. arXiv:2403.02695 [cs.LG] https://arxiv.org/abs/2403.02695

arXiv 2024
[41]

Shikai Qiu, Andres Potapczynski, Pavel Izmailov, and Andrew Gordon Wilson
[42]

arXiv:2306.11074 [cs.LG] https://arxiv.org/abs/2306.11074

Simple and Fast Group Robustness by Automatic Feature Reweighting. arXiv:2306.11074 [cs.LG] https://arxiv.org/abs/2306.11074

arXiv
[43]

Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. [n. d.]. Distributionally Robust Neural Networks. InInternational Conference on Learning Representations
[45]

Hashimoto, and Percy Liang

Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization. arXiv:1911.08731 [cs.LG] https: //arxiv.org/abs/1911.08731

Pith/arXiv arXiv 2020
[46]

Sohoni, Jared A

Nimit S. Sohoni, Jared A. Dunnmon, Geoffrey Angus, Albert Gu, and Christopher Ré. 2022. No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems. arXiv:2011.12945 [cs.LG] https://arxiv.org/abs/2011. 12945

arXiv 2022
[47]

Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. 2022. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5227–5237. Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

2022
[48]

Xinyu Tian, Shu Zou, Zhaoyuan Yang, Mengqi He, and Jing Zhang. 2025. Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision- Language Recognition. arXiv:2502.15809 [cs.LG] https://arxiv.org/abs/2502.15809

arXiv 2025
[49]

Christos Tsirigotis, Joao Monteiro, Pau Rodriguez, David Vazquez, and Aaron Courville. 2023. Group Robust Classification Without Any Group Information. arXiv:2310.18555 [cs.LG] https://arxiv.org/abs/2310.18555

arXiv 2023
[50]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie
[51]

The caltech-ucsd birds-200-2011 dataset. (2011)

2011
[52]

Huaijie Wang, De Cheng, Guozhang Li, Zhipeng Xu, Lingfeng He, Jie Li, Nannan Wang, and Xinbo Gao. 2026. StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning. InProceedings of the Fourteenth International Conference on Learning Representations. https://openreview.net/ forum?id=VAn2YVMuZC

2026
[53]

Yubin Wang, Xinyang Jiang, De Cheng, Xiangqian Zhao, Zilong Wang, Dong- sheng Li, and Cairong Zhao. [n. d.]. Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts. InThe Fourteenth International Conference on Learning Representations
[54]

Tao Wen, Zihan Wang, Quan Zhang, and Qi Lei. 2025. Elastic Representation: Mitigating Spurious Correlations for Group Robustness. arXiv:2502.09850 [cs.LG] https://arxiv.org/abs/2502.09850

arXiv 2025
[55]

Jingkai Xu, De Cheng, Xiangqian Zhao, Jungang Yang, Zilong Wang, Xinyang Jiang, Xufang Luo, Lili Chen, Xiaoli Ning, Chengxu Li, et al . 2025. DermINO: Hybrid pretraining for a versatile dermatology foundation model.arXiv preprint arXiv:2508.12190(2025)

arXiv 2025
[56]

Zhipeng Xu, De Cheng, Xinyang Jiang, Nannan Wang, Dongsheng Li, and Xinbo Gao. 2025. Adversarial domain prompt tuning and generation for single domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18584–18595

2025
[57]

Zhipeng Xu, Zilong Wang, Xinyang Jiang, Dongsheng Li, De Cheng, and Nannan Wang. 2026. Reasoning-Driven Multimodal LLM for Domain Generalization. InThe Fourteenth International Conference on Learning Representations. https: //openreview.net/forum?id=psJiUopUt7

2026
[58]

Tao Yang, Yuwang Wang, Yan Lu, and Nanning Zheng. 2024. Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement.NeurIPS(2024)

2024
[59]

Jie ZHANG, Xiaosong Ma, Song Guo, Peng Li, Wenchao Xu, Xueyang Tang, and Zicong Hong. 2024. Amend to Alignment: Decoupled Prompt Tuning for Mitigat- ing Spurious Correlation in Vision-Language Models. InForty-first International Conference on Machine Learning. https://openreview.net/forum?id=f8G2KSCSdp

2024
[60]

Michael Zhang and Christopher Ré. 2022. Contrastive Adapters for Foundation Model Group Robustness. arXiv:2207.07180 [cs.LG] https://arxiv.org/abs/2207. 07180

arXiv 2022
[61]

Sohoni, Hongyang R

Michael Zhang, Nimit S. Sohoni, Hongyang R. Zhang, Chelsea Finn, and Christo- pher Ré. 2024. Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations. arXiv:2203.01517 [cs.LG] https://arxiv.org/ abs/2203.01517

arXiv 2024
[62]

Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, and Ioannis Patras. 2025. AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data. arXiv:2503.05665 [cs.CV] https://arxiv. org/abs/2503.05665

arXiv 2025
[63]

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba
[64]

Places: A 10 million Image Database for Scene Recognition.IEEE Transac- tions on Pattern Analysis and Machine Intelligence(2017)

2017
[65]

Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement

Beier Zhu, Jiequan Cui, Hanwang Zhang, and Chi Zhang. 2025. Project-Probe- Aggregate: Efficient Fine-Tuning for Group Robustness. arXiv:2503.09487 [cs.CV] https://arxiv.org/abs/2503.09487 Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Xiangqian Zhao, Xinyang Jiang, Zhipeng Xu, Lingfeng He, Zilong Wang, Dongsheng Li, De Cheng, and Nannan Wang Supp...

arXiv 2025

[1] [1]

Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. 2021. Intrinsic dimen- sionality explains the effectiveness of language model fine-tuning. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers). 7319–7328

2021

[2] [3]

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2020. Invariant Risk Minimization. arXiv:1907.02893 [stat.ML] https://arxiv.org/abs/ 1907.02893

Pith/arXiv arXiv 2020

[3] [4]

Ihab Asaad, Maha Shadaydeh, and Joachim Denzler. 2025. Gradient Extrapolation for Debiased Representation Learning. arXiv:2503.13236 [cs.LG] https://arxiv. org/abs/2503.13236

arXiv 2025

[4] [5]

Saeid Asgari, Aliasghar Khani, Fereshte Khani, Ali Gholami, Linh Tran, Ali Mahdavi-Amiri, and Ghassan Hamarneh. 2022. MaskTune: Mitigating Spurious Correlations by Forcing to Explore. InAdvances in Neural Information Processing Systems

2022

[5] [6]

Rwiddhi Chakraborty, Adrian Sletten, and Michael C Kampffmeyer. 2024. Exmap: Leveraging explainability heatmaps for unsupervised group robustness to spuri- ous correlations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12017–12026

2024

[6] [8]

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. 2022. Adaptformer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems35 (2022), 16664–16678

2022

[7] [9]

De Cheng, Haichun Tai, Nannan Wang, Xiangqian Zhao, Jie Li, and Xinbo Gao

[8] [10]

A Multi-Granularity Scene-Aware Graph Convolution Method for Weakly Supervised Person Search.International Journal of Computer Vision134, 1 (2026), 27

2026

[9] [11]

De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, and Xinbo Gao. 2026. Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence(2026)

2026

[10] [12]

De Cheng, Zhipeng Xu, Xinyang Jiang, Nannan Wang, Dongsheng Li, and Xinbo Gao. 2024. Disentangled prompt representation for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 23595–23604

2024

[11] [13]

De Cheng, Mingyue Zeng, Zhipeng Xu, Di Xu, Nannan Wang, and Xinbo Gao

[12] [14]

InProceedings of the Fourteenth Inter- national Conference on Learning Representations

Interference-Isolated Elastic Weight Consolidation and Knowledge Cali- bration for Incremental Object Detection. InProceedings of the Fourteenth Inter- national Conference on Learning Representations. https://openreview.net/forum? id=VrXdmCjni4

[13] [15]

Elliot Creager, Jörn-Henrik Jacobsen, and Richard Zemel. 2021. Environment Inference for Invariant Learning. arXiv:2010.07249 [cs.LG] https://arxiv.org/abs/ 2010.07249

arXiv 2021

[14] [16]

Yihe Deng, Yu Yang, Baharan Mirzasoleiman, and Quanquan Gu. 2023. Ro- bust Learning with Progressive Data Expansion Against Spurious Correlation. arXiv:2306.04949 [cs.LG] https://arxiv.org/abs/2306.04949

arXiv 2023

[15] [17]

Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J Clark, and Mehdi Rezagholizadeh. 2025. KronA: Parameter-efficient tuning with kronecker adapter. InEnhancing LLM Performance: Efficacy, Fine-Tuning, and Inference Techniques. Springer, 49–65

2025

[16] [18]

Yujin Han and Difan Zou. 2024. Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference. arXiv:2404.13815 [cs.LG] https: //arxiv.org/abs/2404.13815

arXiv 2024

[17] [19]

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. 2024. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608(2024)

Pith/arXiv arXiv 2024

[18] [20]

Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, and Bohan Zhuang. 2023. Sensitivity-aware visual parameter-efficient fine-tuning. InProceedings of the IEEE/CVF international conference on computer vision. 11825–11835

2023

[19] [21]

Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2021. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366(2021)

arXiv 2021

[20] [22]

Lingfeng He, De Cheng, Huaijie Wang, Xi Yang, Nannan Wang, and Xinbo Gao. 2026. Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning. arXiv:2603.00191 [cs.LG] https: //arxiv.org/abs/2603.00191

Pith/arXiv arXiv 2026

[21] [23]

Lingfeng He, De Cheng, Di Xu, Huaijie Wang, and Nannan Wang. 2026. Harness- ing textual semantic priors for knowledge transfer and refinement in clip-driven continual learning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 21645–21653

2026

[22] [24]

Badr Youbi Idrissi, Martin Arjovsky, Mohammad Pezeshki, and David Lopez- Paz. 2022. Simple data balancing achieves competitive worst-group-accuracy. arXiv:2110.14503 [cs.LG] https://arxiv.org/abs/2110.14503

arXiv 2022

[23] [25]

Pavel Izmailov, Polina Kirichenko, Nate Gruver, and Andrew Gordon Wil- son. 2022. On Feature Learning in the Presence of Spurious Correlations. arXiv:2210.11369 [cs.LG] https://arxiv.org/abs/2210.11369

arXiv 2022

[24] [26]

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. InEuro- pean conference on computer vision. Springer, 709–727

2022

[25] [27]

Sangwon Jung, Sanghyuk Chun, and Taesup Moon. 2022. Learning Fair Classifiers with Partially Annotated Group Labels. arXiv:2111.14581 [cs.LG] https://arxiv. org/abs/2111.14581

arXiv 2022

[26] [28]

Polina Kirichenko, Pavel Izmailov, and Andrew Gordon Wilson. 2023. Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations. arXiv:2204.02937 [cs.LG] https://arxiv.org/abs/2204.02937

arXiv 2023

[27] [29]

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, Dominica...

2021

[28] [30]

doi:10.18653/v1/2021.emnlp-main.243

work page doi:10.18653/v1/2021.emnlp-main.243 2021

[29] [31]

Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Associa- tion for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Ass...

work page doi:10.18653/v1/2021.acl-long.353 2021

[30] [32]

Weixin Liang and James Zou. 2022. Metashift: A dataset of datasets for eval- uating contextual distribution shifts and training conflicts.arXiv preprint arXiv:2202.06523(2022)

arXiv 2022

[31] [33]

Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, and Chelsea Finn

Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, and Chelsea Finn. 2021. Just Train Twice: Improving Group Robustness without Training Group Informa- tion. arXiv:2107.09044 [cs.LG] https://arxiv.org/abs/2107.09044

arXiv 2021

[32] [34]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. InProceedings of the IEEE international conference on computer vision. 3730–3738

2015

[33] [35]

Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, and Pascal Vincent. 2025. Compositional Risk Minimization. arXiv:2410.06303 [cs.LG] https://arxiv.org/abs/2410.06303

arXiv 2025

[34] [36]

Junhyun Nam, Hyuntak Cha, Sungsoo Ahn, Jaeho Lee, and Jinwoo Shin. 2020. Learning from Failure: Training Debiased Classifier from Biased Classifier. arXiv:2007.02561 [cs.LG] https://arxiv.org/abs/2007.02561

arXiv 2020

[35] [37]

Junhyun Nam, Jaehyung Kim, Jaeho Lee, and Jinwoo Shin. 2022. Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation. arXiv:2204.02070 [cs.LG] https://arxiv.org/abs/2204.02070

arXiv 2022

[36] [38]

Sungho Park, Jewook Lee, Pilhyeon Lee, Sunhee Hwang, Dohyung Kim, and Hyeran Byun. 2022. Fair Contrastive Learning for Facial Attribute Classification. arXiv:2203.16209 [cs.CV] https://arxiv.org/abs/2203.16209

arXiv 2022

[37] [39]

Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, and David Lopez-Paz. 2024. Discovering environments with XRM. arXiv:2309.16748 [cs.LG] https://arxiv.org/abs/2309.16748

arXiv 2024

[38] [40]

Hoang Phan, Andrew Gordon Wilson, and Qi Lei. 2024. Controllable Prompt Tuning For Balancing Group Distributional Robustness. arXiv:2403.02695 [cs.LG] https://arxiv.org/abs/2403.02695

arXiv 2024

[39] [41]

Shikai Qiu, Andres Potapczynski, Pavel Izmailov, and Andrew Gordon Wilson

[40] [42]

arXiv:2306.11074 [cs.LG] https://arxiv.org/abs/2306.11074

Simple and Fast Group Robustness by Automatic Feature Reweighting. arXiv:2306.11074 [cs.LG] https://arxiv.org/abs/2306.11074

arXiv

[41] [43]

Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. [n. d.]. Distributionally Robust Neural Networks. InInternational Conference on Learning Representations

[42] [45]

Hashimoto, and Percy Liang

Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization. arXiv:1911.08731 [cs.LG] https: //arxiv.org/abs/1911.08731

Pith/arXiv arXiv 2020

[43] [46]

Sohoni, Jared A

Nimit S. Sohoni, Jared A. Dunnmon, Geoffrey Angus, Albert Gu, and Christopher Ré. 2022. No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems. arXiv:2011.12945 [cs.LG] https://arxiv.org/abs/2011. 12945

arXiv 2022

[44] [47]

Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. 2022. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5227–5237. Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

2022

[45] [48]

Xinyu Tian, Shu Zou, Zhaoyuan Yang, Mengqi He, and Jing Zhang. 2025. Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision- Language Recognition. arXiv:2502.15809 [cs.LG] https://arxiv.org/abs/2502.15809

arXiv 2025

[46] [49]

Christos Tsirigotis, Joao Monteiro, Pau Rodriguez, David Vazquez, and Aaron Courville. 2023. Group Robust Classification Without Any Group Information. arXiv:2310.18555 [cs.LG] https://arxiv.org/abs/2310.18555

arXiv 2023

[47] [50]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie

[48] [51]

The caltech-ucsd birds-200-2011 dataset. (2011)

2011

[49] [52]

Huaijie Wang, De Cheng, Guozhang Li, Zhipeng Xu, Lingfeng He, Jie Li, Nannan Wang, and Xinbo Gao. 2026. StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning. InProceedings of the Fourteenth International Conference on Learning Representations. https://openreview.net/ forum?id=VAn2YVMuZC

2026

[50] [53]

Yubin Wang, Xinyang Jiang, De Cheng, Xiangqian Zhao, Zilong Wang, Dong- sheng Li, and Cairong Zhao. [n. d.]. Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts. InThe Fourteenth International Conference on Learning Representations

[51] [54]

Tao Wen, Zihan Wang, Quan Zhang, and Qi Lei. 2025. Elastic Representation: Mitigating Spurious Correlations for Group Robustness. arXiv:2502.09850 [cs.LG] https://arxiv.org/abs/2502.09850

arXiv 2025

[52] [55]

Jingkai Xu, De Cheng, Xiangqian Zhao, Jungang Yang, Zilong Wang, Xinyang Jiang, Xufang Luo, Lili Chen, Xiaoli Ning, Chengxu Li, et al . 2025. DermINO: Hybrid pretraining for a versatile dermatology foundation model.arXiv preprint arXiv:2508.12190(2025)

arXiv 2025

[53] [56]

Zhipeng Xu, De Cheng, Xinyang Jiang, Nannan Wang, Dongsheng Li, and Xinbo Gao. 2025. Adversarial domain prompt tuning and generation for single domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18584–18595

2025

[54] [57]

Zhipeng Xu, Zilong Wang, Xinyang Jiang, Dongsheng Li, De Cheng, and Nannan Wang. 2026. Reasoning-Driven Multimodal LLM for Domain Generalization. InThe Fourteenth International Conference on Learning Representations. https: //openreview.net/forum?id=psJiUopUt7

2026

[55] [58]

Tao Yang, Yuwang Wang, Yan Lu, and Nanning Zheng. 2024. Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement.NeurIPS(2024)

2024

[56] [59]

Jie ZHANG, Xiaosong Ma, Song Guo, Peng Li, Wenchao Xu, Xueyang Tang, and Zicong Hong. 2024. Amend to Alignment: Decoupled Prompt Tuning for Mitigat- ing Spurious Correlation in Vision-Language Models. InForty-first International Conference on Machine Learning. https://openreview.net/forum?id=f8G2KSCSdp

2024

[57] [60]

Michael Zhang and Christopher Ré. 2022. Contrastive Adapters for Foundation Model Group Robustness. arXiv:2207.07180 [cs.LG] https://arxiv.org/abs/2207. 07180

arXiv 2022

[58] [61]

Sohoni, Hongyang R

Michael Zhang, Nimit S. Sohoni, Hongyang R. Zhang, Chelsea Finn, and Christo- pher Ré. 2024. Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations. arXiv:2203.01517 [cs.LG] https://arxiv.org/ abs/2203.01517

arXiv 2024

[59] [62]

Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, and Ioannis Patras. 2025. AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data. arXiv:2503.05665 [cs.CV] https://arxiv. org/abs/2503.05665

arXiv 2025

[60] [63]

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba

[61] [64]

Places: A 10 million Image Database for Scene Recognition.IEEE Transac- tions on Pattern Analysis and Machine Intelligence(2017)

2017

[62] [65]

Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement

Beier Zhu, Jiequan Cui, Hanwang Zhang, and Chi Zhang. 2025. Project-Probe- Aggregate: Efficient Fine-Tuning for Group Robustness. arXiv:2503.09487 [cs.CV] https://arxiv.org/abs/2503.09487 Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Xiangqian Zhao, Xinyang Jiang, Zhipeng Xu, Lingfeng He, Zilong Wang, Dongsheng Li, De Cheng, and Nannan Wang Supp...

arXiv 2025