CouCE: A Unified Causal Framework for Debiased Deep Metric Learning

Huilin Zhu; Kui Jiang; Meiqi Wan; Xin Xu; Xin Yuan; Zhenyang Niu

arxiv: 2606.30365 · v1 · pith:VGJA5P5Znew · submitted 2026-06-29 · 💻 cs.CV

CouCE: A Unified Causal Framework for Debiased Deep Metric Learning

Xin Yuan , Zhenyang Niu , Meiqi Wan , Huilin Zhu , Xin Xu , Kui Jiang This is my paper

Pith reviewed 2026-06-30 06:18 UTC · model grok-4.3

classification 💻 cs.CV

keywords deep metric learningcausal debiasingbackdoor adjustmentcausal interventionzero-shot generalizationconfoundersproxy-based lossimage retrieval

0 comments

The pith

CouCE debiases deep metric learning by separately neutralizing background spurious correlations and foreground nuisance perturbations with targeted causal interventions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deep metric learning models capture co-occurring patterns rather than causal similarities, leading to poor zero-shot generalization on new classes. The paper identifies two confounders with distinct causal roles: background elements that create backdoor paths and foreground variations like pose or lighting that add non-semantic noise. Existing approaches tackle only one pathway at a time, but CouCE introduces a single framework that applies orthogonal dictionary adjustment to backgrounds and multi-scale Fourier randomization to foregrounds. These steps integrate into standard proxy-based losses with little added cost and no inference changes. A reader would care because the result is embeddings that focus on semantic causes instead of shortcuts, improving retrieval accuracy across standard benchmarks.

Core claim

The paper claims that explicitly modeling the two structurally distinct confounders and neutralizing them through Orthogonal Dictionary-Based Backdoor Adjustment for backgrounds and Multi-Scale Randomized Causal Intervention for foregrounds within the Counterfactual Causal Embedding framework allows any proxy-based loss to produce debiased embeddings that generalize better, as shown by state-of-the-art results on CUB-200-2011, Cars-196, and Stanford Online Products.

What carries the argument

Counterfactual Causal Embedding (CouCE) using Orthogonal Dictionary-Based Backdoor Adjustment (ODBA) to isolate and disentangle spurious background patterns via variance-gated dictionary and soft orthogonal regularization, together with Multi-Scale Randomized Causal Intervention (MSRCI) to enforce invariance via multi-scale Fourier amplitude randomization and symmetric KL constraint.

If this is right

CouCE integrates directly with any existing proxy-based loss function.
Training adds only modest overhead while inference uses the original architecture unchanged.
The approach yields consistent state-of-the-art retrieval performance on CUB-200-2011, Cars-196, and Stanford Online Products.
Both confounders must be addressed together because their pathways cannot be handled by prior single-target methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit separation of background and foreground interventions may suggest similar causal splits could help other vision tasks that suffer from multiple independent shortcuts.
Because the method adds no inference cost, it could be tested in large-scale retrieval systems where deployment constraints matter more than training time.
If the interventions prove robust, they might be combined with other regularization techniques to further reduce dataset size requirements for good generalization.

Load-bearing premise

The two confounders have fundamentally distinct causal roles that require separate simultaneous interventions which can neutralize them without losing semantic information or creating new biases.

What would settle it

An experiment that removes either the orthogonal regularization or the multi-scale randomization component on the same three datasets and checks whether the remaining single intervention still matches the full method's reported gains over baselines.

Figures

Figures reproduced from arXiv: 2606.30365 by Huilin Zhu, Kui Jiang, Meiqi Wan, Xin Xu, Xin Yuan, Zhenyang Niu.

**Figure 1.** Figure 1: Two confounders in DML. Red boxes mark regions where background context misleads retrieval; solid red circles [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Structural Causal Model for DML. (a) Observational [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the CouCE framework. Images pass through Stage 1 to yield feature maps [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation results of core continuous hyperparameters on CUB-200-2011 and Cars-196 (ResNet-50). From left to right: [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: t-SNE visualization of test-set embeddings (ResNet [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Top-5 retrieval examples on CUB-200-2011 (Left) [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Grad-CAM attention maps on CUB-200-2011 (top) [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Motivation for CouCE. Bars represent query simi [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Visual verification of background isolation via Grad [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

read the original abstract

Deep Metric Learning (DML) often struggles with zero-shot generalization because standard objectives inherently capture what co-occurs rather than what causes similarity. Consequently, DML models are vulnerable to shortcut learning driven by two structurally distinct confounders: background spurious correlations (which create backdoor paths via scene context) and foreground nuisance perturbations (which inject non-semantic variations like pose or illumination). Although existing methods have proposed targeted solutions for each pathway individually, none can simultaneously address both due to their fundamentally distinct causal roles. To bridge this gap, we propose the Counterfactual Causal Embedding (CouCE), a unified causal framework that explicitly models and neutralizes both confounders. Specifically, we introduce Orthogonal Dictionary-Based Backdoor Adjustment (ODBA), which isolates spurious background patterns into a variance-gated dictionary and stably disentangles them from the learned embeddings via soft orthogonal regularization. Simultaneously, we propose Multi-Scale Randomized Causal Intervention (MSRCI) to enforce causal invariance against foreground nuisances through multi-scale Fourier amplitude randomization and a symmetric KL invariance constraint. Notably, CouCE seamlessly integrates with any proxy-based loss, incurring modest training overhead without requiring architectural modifications during inference. Extensive experiments on CUB-200-2011, Cars-196, and Stanford Online Products demonstrate that CouCE consistently achieves state-of-the-art performance, providing a principled and robust solution for debiased DML.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CouCE claims the first unified causal fix for both background backdoor and foreground nuisance confounders in DML, but the interventions read as regularizers whose causal status is unverified from the abstract.

read the letter

The core new thing is the attempt to treat background spurious correlations and foreground nuisances as structurally distinct causal pathways that need simultaneous, separate interventions inside a single framework. ODBA uses a variance-gated dictionary plus soft orthogonal regularization to isolate backgrounds, while MSRCI applies multi-scale Fourier amplitude randomization plus symmetric KL to push invariance on foreground factors. The authors position this as the first method that can do both at once and still plug into any proxy-based loss with only training overhead.

The practical side is handled cleanly: no architecture changes at inference, and they report consistent gains on the standard DML suites (CUB-200-2011, Cars-196, Stanford Online Products). That combination of scope and compatibility is the part worth noting.

The soft spot is exactly the one the stress-test flags. The abstract gives no equations or derivations showing that ODBA equals backdoor adjustment or that MSRCI equals a do-intervention on the foreground path. Without that mapping, the results could just as easily come from generic disentanglement or stronger regularization rather than causal neutralization. It is also unclear whether the two pathways are independent enough that intervening on one does not leak into the other or discard semantic signal. If the full paper contains the graph, the do-calculus steps, and ablations that rule out semantic loss, the claim strengthens; right now it rests on the assumption.

This is for people already working on debiased metric learning or causal regularization in vision. A reader who wants a concrete recipe that slots into existing losses might extract value even if the causal story needs more proof. I would send it to peer review so the derivations and controls can be checked directly.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce CouCE, a unified causal framework for debiased deep metric learning. It identifies two confounders with distinct causal roles: background spurious correlations addressed by Orthogonal Dictionary-Based Backdoor Adjustment (ODBA) using a variance-gated dictionary and soft orthogonal regularization, and foreground nuisance perturbations handled by Multi-Scale Randomized Causal Intervention (MSRCI) via multi-scale Fourier amplitude randomization and symmetric KL invariance constraint. CouCE integrates with any proxy-based loss with modest overhead and no inference changes, achieving state-of-the-art performance on CUB-200-2011, Cars-196, and Stanford Online Products.

Significance. If the proposed ODBA and MSRCI methods indeed correspond to causal interventions that neutralize the confounders without semantic loss, this work would offer a significant advance in debiased DML by providing a unified framework for multiple confounders. The seamless integration with existing losses is a practical advantage. The paper's strength lies in attempting to ground the method in causal reasoning, though verification of this grounding is needed.

major comments (2)

[Abstract] Abstract (structural distinction of pathways): The assumption that the two confounders occupy structurally distinct pathways that can be neutralized independently by ODBA and MSRCI is central but not supported by a formal causal graph or proof; if the pathways are not independent, the unified framework may not deliver the promised debiasing.
[Abstract] Description of ODBA and MSRCI: There is no derivation showing that the orthogonal regularization and Fourier randomization equal do-calculus interventions (backdoor adjustment + invariance) on the posited graph; they read as heuristic regularizers, and success could be due to generic disentanglement rather than causal neutralization.

minor comments (1)

[Abstract] The abstract mentions 'extensive experiments' but does not specify the metrics or baselines used to claim SOTA performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the causal foundations of CouCE. We address each major point below by referencing the relevant sections of the full manuscript and indicate planned revisions to improve clarity without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract (structural distinction of pathways): The assumption that the two confounders occupy structurally distinct pathways that can be neutralized independently by ODBA and MSRCI is central but not supported by a formal causal graph or proof; if the pathways are not independent, the unified framework may not deliver the promised debiasing.

Authors: Section 3.1 of the manuscript presents the formal causal graph (Figure 1) along with the corresponding structural causal model. Background spurious correlations are modeled as creating backdoor paths through scene context variables, while foreground nuisance perturbations act as direct interventions on object-level features; the two pathways are independent by construction in the SCM, justifying separate neutralization via ODBA and MSRCI. We will revise the abstract to include a concise reference to this graph and the distinct pathways. revision: yes
Referee: [Abstract] Description of ODBA and MSRCI: There is no derivation showing that the orthogonal regularization and Fourier randomization equal do-calculus interventions (backdoor adjustment + invariance) on the posited graph; they read as heuristic regularizers, and success could be due to generic disentanglement rather than causal neutralization.

Authors: Section 4 derives ODBA as an approximation to backdoor adjustment: the variance-gated dictionary identifies and isolates spurious background patterns, after which soft orthogonal regularization blocks the backdoor path in embedding space. MSRCI implements a randomized intervention via multi-scale Fourier amplitude randomization on nuisance factors, with the symmetric KL constraint enforcing the resulting invariance. These steps follow directly from the interventional semantics on the graph in Section 3. The abstract is necessarily concise, but we will add a brief sentence linking the operations to the causal interventions. revision: partial

Circularity Check

0 steps flagged

No circularity detected; derivation chain is self-contained with novel proposed components.

full rationale

The provided abstract and description introduce ODBA and MSRCI as new regularization techniques for addressing two distinct confounders in DML, without any equations, fitted parameters renamed as predictions, or load-bearing self-citations. No step reduces a claimed result to its own inputs by construction, and the methods are presented as independent proposals rather than derived from prior author work in a circular manner. The framework is described as integrable with existing losses, with performance claims based on experiments rather than tautological definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only view shows reliance on standard causal assumptions about distinct confounder roles but introduces no explicit free parameters or new entities beyond the named methods; full paper would be needed to audit any fitted components in ODBA or MSRCI.

axioms (1)

domain assumption Background spurious correlations and foreground nuisance perturbations have fundamentally distinct causal roles requiring separate interventions.
Invoked to explain why prior targeted solutions cannot be combined and to motivate the unified framework.

pith-pipeline@v0.9.1-grok · 5783 in / 1156 out tokens · 36595 ms · 2026-06-30T06:18:42.724976+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Martin Arjovsky, Leon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant Risk Minimization.arXiv preprint arXiv:1907.02893(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[2]

Adrien Bardes, Jean Ponce, and Yann LeCun. 2022. VICReg: Variance-Invariance- Covariance Regularization for Self-Supervised Learning. InProceedings of the International Conference on Learning Representations (ICLR)

2022
[3]

Shubhang Bhatnagar and Narendra Ahuja. 2025. Potential Field Based Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 25549–25559

2025
[4]

Kit Mills Bransby, Arian Beqiri, Woo-Jin Cho Kim, Jorge Oliveira, Agisilaos Chartsias, and Alberto Gomez. 2024. BackMix: Mitigating Shortcut Learning in Echocardiography with Minimal Supervision. InProceedings of the Interna- tional Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). 570–579

2024
[5]

Xinlei Chen and Kaiming He. 2021. Exploring Simple Siamese Representation Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15750–15758

2021
[6]

Xiang Deng and Zhongfei Zhang. 2022. Deep Causal Metric Learning. InProceed- ings of the International Conference on Machine Learning (ICML), Vol. 162. PMLR, 4993–5006

2022
[7]

Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. Invariant Representation Learning for Multimedia Recommendation. InProceedings of the 30th ACM International Conference on Multimedia. 619–628

2022
[8]

Takuya Furusawa. 2024. Mean Field Theory in Deep Metric Learning. InProceed- ings of the International Conference on Learning Representations (ICLR)

2024
[9]

Wichmann

Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Matthew Zeiler, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. 2020. Shortcut Learning in Deep Neural Networks.Nature Machine Intelligence2, 11 (2020), 665–673

2020
[10]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2. 1735–1742

2006
[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778

2016
[12]

Xilin He, Jingyu Hu, Qinliang Lin, Cheng Luo, Weicheng Xie, Siyang Song, Muhammad Haris Khan, and Linlin Shen. 2024. Towards Combating Frequency Simplicity-biased Learning for Domain Generalization. InAdvances in Neural Information Processing Systems (NeurIPS)

2024
[13]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. InProceedings of the International Conference on Machine Learning (ICML). 448–456

2015
[14]

Xiruo Jiang, Yazhou Yao, Xili Dai, Fumin Shen, Liqiang Nie, and Heng Tao Shen. 2024. Anti-Collapse Loss for Deep Metric Learning.IEEE Transactions on Multimedia26 (2024), 11139–11150

2024
[15]

Mahmut Kaya and Hasan S. Bilge. 2019. Deep Metric Learning: A Survey.Sym- metry11, 9 (2019), 1066

2019
[16]

Sungyeon Kim, Boseung Jung, and Suha Kwak. 2023. HIER: Metric Learning Be- yond Class Labels via Hierarchical Regularization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19903–19912

2023
[17]

Sungyeon Kim, Dongwon Kim, Minsu Cho, and Suha Kwak. 2020. Proxy Anchor Loss for Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3235–3244

2020
[18]

Konstantin Kobs and Andreas Hotho. 2022. On Background Bias in Deep Metric Learning.arXiv preprint arXiv:2210.01615(2022)

work page arXiv 2022
[19]

Aneesh Komanduri, Yongkai Wu, Feng Chen, and Xintao Wu. 2024. Learning Causally Disentangled Representations via the Principle of Independent Causal Mechanisms. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24). International Joint Conferences on Artificial Intelligence Organization, 4308–4316

2024
[20]

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Repre- sentations for Fine-Grained Categorization. InProceedings of the IEEE Interna- tional Conference on Computer Vision (ICCV) Workshops. 554–561

2013
[21]

Jongin Lim, Sangdoo Yun, Seulki Park, and Jin Young Choi. 2022. Hypergraph- Induced Semantic Tuplet Loss for Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 212– 222

2022
[22]

Lizhao Liu, Shan Huang, Zhuangwei Zhuang, Ran Yang, Mingkui Tan, and Yaowei Wang. 2022. DAS: Densely-Anchored Sampling for Deep Metric Learning. In Proceedings of the European Conference on Computer Vision (ECCV)

2022
[23]

Marcin Maciąg and Grzegorz Sarwas. 2026. Adversarial Robustness of Proxy- Based Metric Learning Models. InProceedings of the 21st International Conference on Computer Vision Theory and Applications (VISAPP). SciTePress, 469–476

2026
[24]

Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, and Charles Blundell. 2021. Representation Learning via Invariant Causal Mechanisms. In Proceedings of the International Conference on Learning Representations (ICLR)

2021
[25]

Yair Movshovitz-Attias, Alexander Toshev, Thomas Leung, Sergey Ioffe, and Saurabh Singh. 2017. No Fuss Distance Metric Learning Using Proxies. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV). 360–368

2017
[26]

Kevin Musgrave, Serge Belongie, and Ser-Nam Lim. 2020. A Metric Learning Reality Check. InProceedings of the European Conference on Computer Vision (ECCV). 681–699

2020
[27]

Oppenheim and James S

Alan V. Oppenheim and James S. Lim. 1981. The Importance of Phase in Signals. Proc. IEEE69, 5 (1981), 529–541

1981
[28]

Jinhee Park, Jisoo Park, Dagyeong Na, and Junseok Kwon. 2025. Deep Disen- tangled Metric Learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 39. 19830–19838

2025
[29]

Yash Patel, Giorgos Tolias, and Jiri Matas. 2024. Three Things to Know about Deep Metric Learning.arXiv preprint arXiv:2412.12432(2024)

work page arXiv 2024
[30]

2009.Causality

Judea Pearl. 2009.Causality. Cambridge University Press

2009
[31]

Wenjie Peng, Quhui Ke, Jinglin Liang, Shuangping Huang, and Tianshui Chen
[32]

Proxy-AN Loss for Deep Metric Learning.Neural Networks195 (2026), 108254

2026
[33]

Piotrowski and Fergus William Campbell

Leon N. Piotrowski and Fergus William Campbell. 1982. A Demonstration of the Visual Importance and Flexibility of Spatial-Frequency Amplitude and Phase. Perception11, 3 (1982), 337–346

1982
[34]

Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, and Rong Jin. 2019. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 6449–6457

2019
[35]

Ruihong Qiu, Sen Wang, Zhi Chen, Hongzhi Yin, and Zi Huang. 2021. CausalRec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3844–3852

2021
[36]

Li Ren, Chen Chen, Liqiang Wang, and Kien Hua. 2024. Towards Improved Proxy-based Deep Metric Learning via Data-Augmented Domain Adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)

2024
[37]

Karsten Roth, Oriol Vinyals, and Zeynep Akata. 2022. Integrating Language Guidance into Vision-based Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16156–16168

2022
[38]

Karsten Roth, Oriol Vinyals, and Zeynep Akata. 2022. Non-isotropy Regular- ization for Proxy-based Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7410–7420. Conference’17, July 2017, Washington, DC, USA Trovato et al

2022
[39]

Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. 2021. Toward Causal Repre- sentation Learning.Proc. IEEE109, 5 (2021), 612–634

2021
[40]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A Unified Embedding for Face Recognition and Clustering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 815–823

2015
[41]

Pedro Silva, Guilherme A. L. Silva, Pablo Coelho, Vander Freitas, Gladston Mor- eira, David Menotti, and Eduardo Luz. 2025. PD-Loss: Proxy-Decidability for Efficient Metric Learning.arXiv preprint arXiv:2508.17082(2025)

work page arXiv 2025
[42]

Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep Metric Learning via Lifted Structured Feature Embedding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4004–4012

2016
[43]

Pengzhan Sun, Bo Wu, Xunsong Li, Wen Li, Lixin Duan, and Chuang Gan. 2021. Counterfactual Debiasing Inference for Compositional Action Recognition. In Proceedings of the 29th ACM International Conference on Multimedia. 3220–3228

2021
[44]

2011.The Caltech-UCSD Birds-200-2011 Dataset

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011.The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-

2011
[45]

California Institute of Technology
[46]

Chengkun Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, and Jiwen Lu. 2024. Introspective Deep Metric Learning.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 4 (2024), 1964–1980

2024
[47]

Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, and Hao Li. 2022. Semantic Data Augmentation based Distance Metric Learning for Domain Generalization. InProceedings of the 30th ACM International Conference on Multimedia. 3214– 3223

2022
[48]

Tan Wang, Chang Zhou, Qianru Sun, and Hanwang Zhang. 2021. Causal Atten- tion for Unbiased Visual Recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

2021
[49]

Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R. Scott
[50]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Multi-Similarity Loss With General Pair Weighting for Deep Metric Learn- ing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5017–5025
[51]

Manmatha, Alexander J

Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, and Philipp Krähenbühl
[52]

InProceedings of the IEEE International Conference on Computer Vision (ICCV)

Sampling Matters in Deep Embedding Learning. InProceedings of the IEEE International Conference on Computer Vision (ICCV). 2840–2848
[53]

Qinwei Xu, Ruipeng Zhang, Ya Zhang, Yanfeng Wang, and Qi Tian. 2021. A Fourier-Based Framework for Domain Generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14378– 14387

2021
[54]

Xin Xu, Xin Yuan, Zheng Wang, Kai Zhang, and Ruimin Hu. 2022. Rank-in-Rank Loss for Person Re-Identification.ACM Transactions on Multimedia Computing, Communications, and Applications18, 2s (2022), 1–21

2022
[55]

Bailin Yang, Haoqiang Sun, Frederick W. B. Li, Zheng Chen, Jianlu Cai, and Chao Song. 2023. HSE: Hybrid Species Embedding for Deep Metric Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 11047–11057

2023
[56]

Xu Yang, Hanwang Zhang, and Jianfei Cai. 2023. Deconfounded Image Caption- ing: A Causal Retrospect.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 11 (2023), 12996–13010

2023
[57]

Xin Yuan, Xin Xu, Xiao Wang, Kai Zhang, Liang Liao, Zheng Wang, and Chia-Wen Lin. 2023. OSAP-Loss: Efficient Optimization of Average Precision via Involving Samples After Positive Ones Towards Remote Sensing Image Retrieval.CAAI Transactions on Intelligence Technology8, 4 (2023), 1191–1212

2023
[58]

Xin Yuan, Xin Xu, Zheng Wang, Kai Zhang, Wei Liu, and Ruimin Hu. 2023. Searching Parameterized Retrieval & Verification Loss for Re-Identification.IEEE Journal of Selected Topics in Signal Processing17, 3 (2023), 560–574. CouCE: A Unified Causal Framework for Debiased Deep Metric Learning Conference’17, July 2017, Washington, DC, USA Supplementary Materia...

2023

[1] [1]

Martin Arjovsky, Leon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant Risk Minimization.arXiv preprint arXiv:1907.02893(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[2] [2]

Adrien Bardes, Jean Ponce, and Yann LeCun. 2022. VICReg: Variance-Invariance- Covariance Regularization for Self-Supervised Learning. InProceedings of the International Conference on Learning Representations (ICLR)

2022

[3] [3]

Shubhang Bhatnagar and Narendra Ahuja. 2025. Potential Field Based Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 25549–25559

2025

[4] [4]

Kit Mills Bransby, Arian Beqiri, Woo-Jin Cho Kim, Jorge Oliveira, Agisilaos Chartsias, and Alberto Gomez. 2024. BackMix: Mitigating Shortcut Learning in Echocardiography with Minimal Supervision. InProceedings of the Interna- tional Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). 570–579

2024

[5] [5]

Xinlei Chen and Kaiming He. 2021. Exploring Simple Siamese Representation Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15750–15758

2021

[6] [6]

Xiang Deng and Zhongfei Zhang. 2022. Deep Causal Metric Learning. InProceed- ings of the International Conference on Machine Learning (ICML), Vol. 162. PMLR, 4993–5006

2022

[7] [7]

Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. Invariant Representation Learning for Multimedia Recommendation. InProceedings of the 30th ACM International Conference on Multimedia. 619–628

2022

[8] [8]

Takuya Furusawa. 2024. Mean Field Theory in Deep Metric Learning. InProceed- ings of the International Conference on Learning Representations (ICLR)

2024

[9] [9]

Wichmann

Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Matthew Zeiler, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. 2020. Shortcut Learning in Deep Neural Networks.Nature Machine Intelligence2, 11 (2020), 665–673

2020

[10] [10]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2. 1735–1742

2006

[11] [11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778

2016

[12] [12]

Xilin He, Jingyu Hu, Qinliang Lin, Cheng Luo, Weicheng Xie, Siyang Song, Muhammad Haris Khan, and Linlin Shen. 2024. Towards Combating Frequency Simplicity-biased Learning for Domain Generalization. InAdvances in Neural Information Processing Systems (NeurIPS)

2024

[13] [13]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. InProceedings of the International Conference on Machine Learning (ICML). 448–456

2015

[14] [14]

Xiruo Jiang, Yazhou Yao, Xili Dai, Fumin Shen, Liqiang Nie, and Heng Tao Shen. 2024. Anti-Collapse Loss for Deep Metric Learning.IEEE Transactions on Multimedia26 (2024), 11139–11150

2024

[15] [15]

Mahmut Kaya and Hasan S. Bilge. 2019. Deep Metric Learning: A Survey.Sym- metry11, 9 (2019), 1066

2019

[16] [16]

Sungyeon Kim, Boseung Jung, and Suha Kwak. 2023. HIER: Metric Learning Be- yond Class Labels via Hierarchical Regularization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19903–19912

2023

[17] [17]

Sungyeon Kim, Dongwon Kim, Minsu Cho, and Suha Kwak. 2020. Proxy Anchor Loss for Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3235–3244

2020

[18] [18]

Konstantin Kobs and Andreas Hotho. 2022. On Background Bias in Deep Metric Learning.arXiv preprint arXiv:2210.01615(2022)

work page arXiv 2022

[19] [19]

Aneesh Komanduri, Yongkai Wu, Feng Chen, and Xintao Wu. 2024. Learning Causally Disentangled Representations via the Principle of Independent Causal Mechanisms. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24). International Joint Conferences on Artificial Intelligence Organization, 4308–4316

2024

[20] [20]

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Repre- sentations for Fine-Grained Categorization. InProceedings of the IEEE Interna- tional Conference on Computer Vision (ICCV) Workshops. 554–561

2013

[21] [21]

Jongin Lim, Sangdoo Yun, Seulki Park, and Jin Young Choi. 2022. Hypergraph- Induced Semantic Tuplet Loss for Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 212– 222

2022

[22] [22]

Lizhao Liu, Shan Huang, Zhuangwei Zhuang, Ran Yang, Mingkui Tan, and Yaowei Wang. 2022. DAS: Densely-Anchored Sampling for Deep Metric Learning. In Proceedings of the European Conference on Computer Vision (ECCV)

2022

[23] [23]

Marcin Maciąg and Grzegorz Sarwas. 2026. Adversarial Robustness of Proxy- Based Metric Learning Models. InProceedings of the 21st International Conference on Computer Vision Theory and Applications (VISAPP). SciTePress, 469–476

2026

[24] [24]

Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, and Charles Blundell. 2021. Representation Learning via Invariant Causal Mechanisms. In Proceedings of the International Conference on Learning Representations (ICLR)

2021

[25] [25]

Yair Movshovitz-Attias, Alexander Toshev, Thomas Leung, Sergey Ioffe, and Saurabh Singh. 2017. No Fuss Distance Metric Learning Using Proxies. InPro- ceedings of the IEEE International Conference on Computer Vision (ICCV). 360–368

2017

[26] [26]

Kevin Musgrave, Serge Belongie, and Ser-Nam Lim. 2020. A Metric Learning Reality Check. InProceedings of the European Conference on Computer Vision (ECCV). 681–699

2020

[27] [27]

Oppenheim and James S

Alan V. Oppenheim and James S. Lim. 1981. The Importance of Phase in Signals. Proc. IEEE69, 5 (1981), 529–541

1981

[28] [28]

Jinhee Park, Jisoo Park, Dagyeong Na, and Junseok Kwon. 2025. Deep Disen- tangled Metric Learning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 39. 19830–19838

2025

[29] [29]

Yash Patel, Giorgos Tolias, and Jiri Matas. 2024. Three Things to Know about Deep Metric Learning.arXiv preprint arXiv:2412.12432(2024)

work page arXiv 2024

[30] [30]

2009.Causality

Judea Pearl. 2009.Causality. Cambridge University Press

2009

[31] [31]

Wenjie Peng, Quhui Ke, Jinglin Liang, Shuangping Huang, and Tianshui Chen

[32] [32]

Proxy-AN Loss for Deep Metric Learning.Neural Networks195 (2026), 108254

2026

[33] [33]

Piotrowski and Fergus William Campbell

Leon N. Piotrowski and Fergus William Campbell. 1982. A Demonstration of the Visual Importance and Flexibility of Spatial-Frequency Amplitude and Phase. Perception11, 3 (1982), 337–346

1982

[34] [34]

Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, and Rong Jin. 2019. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 6449–6457

2019

[35] [35]

Ruihong Qiu, Sen Wang, Zhi Chen, Hongzhi Yin, and Zi Huang. 2021. CausalRec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3844–3852

2021

[36] [36]

Li Ren, Chen Chen, Liqiang Wang, and Kien Hua. 2024. Towards Improved Proxy-based Deep Metric Learning via Data-Augmented Domain Adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)

2024

[37] [37]

Karsten Roth, Oriol Vinyals, and Zeynep Akata. 2022. Integrating Language Guidance into Vision-based Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16156–16168

2022

[38] [38]

Karsten Roth, Oriol Vinyals, and Zeynep Akata. 2022. Non-isotropy Regular- ization for Proxy-based Deep Metric Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7410–7420. Conference’17, July 2017, Washington, DC, USA Trovato et al

2022

[39] [39]

Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. 2021. Toward Causal Repre- sentation Learning.Proc. IEEE109, 5 (2021), 612–634

2021

[40] [40]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A Unified Embedding for Face Recognition and Clustering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 815–823

2015

[41] [41]

Pedro Silva, Guilherme A. L. Silva, Pablo Coelho, Vander Freitas, Gladston Mor- eira, David Menotti, and Eduardo Luz. 2025. PD-Loss: Proxy-Decidability for Efficient Metric Learning.arXiv preprint arXiv:2508.17082(2025)

work page arXiv 2025

[42] [42]

Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep Metric Learning via Lifted Structured Feature Embedding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4004–4012

2016

[43] [43]

Pengzhan Sun, Bo Wu, Xunsong Li, Wen Li, Lixin Duan, and Chuang Gan. 2021. Counterfactual Debiasing Inference for Compositional Action Recognition. In Proceedings of the 29th ACM International Conference on Multimedia. 3220–3228

2021

[44] [44]

2011.The Caltech-UCSD Birds-200-2011 Dataset

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011.The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-

2011

[45] [45]

California Institute of Technology

[46] [46]

Chengkun Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, and Jiwen Lu. 2024. Introspective Deep Metric Learning.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 4 (2024), 1964–1980

2024

[47] [47]

Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, and Hao Li. 2022. Semantic Data Augmentation based Distance Metric Learning for Domain Generalization. InProceedings of the 30th ACM International Conference on Multimedia. 3214– 3223

2022

[48] [48]

Tan Wang, Chang Zhou, Qianru Sun, and Hanwang Zhang. 2021. Causal Atten- tion for Unbiased Visual Recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

2021

[49] [49]

Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R. Scott

[50] [50]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Multi-Similarity Loss With General Pair Weighting for Deep Metric Learn- ing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5017–5025

[51] [51]

Manmatha, Alexander J

Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, and Philipp Krähenbühl

[52] [52]

InProceedings of the IEEE International Conference on Computer Vision (ICCV)

Sampling Matters in Deep Embedding Learning. InProceedings of the IEEE International Conference on Computer Vision (ICCV). 2840–2848

[53] [53]

Qinwei Xu, Ruipeng Zhang, Ya Zhang, Yanfeng Wang, and Qi Tian. 2021. A Fourier-Based Framework for Domain Generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14378– 14387

2021

[54] [54]

Xin Xu, Xin Yuan, Zheng Wang, Kai Zhang, and Ruimin Hu. 2022. Rank-in-Rank Loss for Person Re-Identification.ACM Transactions on Multimedia Computing, Communications, and Applications18, 2s (2022), 1–21

2022

[55] [55]

Bailin Yang, Haoqiang Sun, Frederick W. B. Li, Zheng Chen, Jianlu Cai, and Chao Song. 2023. HSE: Hybrid Species Embedding for Deep Metric Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 11047–11057

2023

[56] [56]

Xu Yang, Hanwang Zhang, and Jianfei Cai. 2023. Deconfounded Image Caption- ing: A Causal Retrospect.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 11 (2023), 12996–13010

2023

[57] [57]

Xin Yuan, Xin Xu, Xiao Wang, Kai Zhang, Liang Liao, Zheng Wang, and Chia-Wen Lin. 2023. OSAP-Loss: Efficient Optimization of Average Precision via Involving Samples After Positive Ones Towards Remote Sensing Image Retrieval.CAAI Transactions on Intelligence Technology8, 4 (2023), 1191–1212

2023

[58] [58]

Xin Yuan, Xin Xu, Zheng Wang, Kai Zhang, Wei Liu, and Ruimin Hu. 2023. Searching Parameterized Retrieval & Verification Loss for Re-Identification.IEEE Journal of Selected Topics in Signal Processing17, 3 (2023), 560–574. CouCE: A Unified Causal Framework for Debiased Deep Metric Learning Conference’17, July 2017, Washington, DC, USA Supplementary Materia...

2023