arxiv: 2605.06368 · v1 · submitted 2026-05-07 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts

Paulo Mario P. Medina , Jose Marie Antonio Mi\~noza , Sebastian C. Iba\~nez

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:16 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords distribution shiftsspurious correlationsGrad-CAMrobustnessvisual explanationsregularizationdomain invarianceconfounders

0 comments

The pith

eX2L decorrelates confounding features by penalizing Grad-CAM map similarity between primary and confounder classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an explanation-based regularization method called eX2L that trains a primary label classifier alongside a confounder classifier and adds a loss penalizing similarity in their Grad-CAM activation maps. This penalization is intended to push confounding features out of the latent representations used for the main task. A sympathetic reader would care because current approaches to distribution shifts often underperform simple baselines or lack direct interpretability. If the mechanism holds, it would allow models to reach higher worst-group accuracy on benchmarks with spurious correlations by explicitly decoupling label and nuisance attributes at the group level. The result is presented as a path to functional domain invariance without indirect or overly complex interventions.

Core claim

eX2L decorrelates confounding features from a classifier's latent representations during training by penalizing the similarity between Grad-CAM activation maps generated by a primary label classifier and those from a concurrently trained confounder classifier. On the Spawrious Many-to-Many Hard Challenge benchmark, this yields an average accuracy of 82.24% and a worst-group accuracy of 66.31%, exceeding prior state-of-the-art by 5.49% and 10.90%. The work shows that functional domain invariance can be achieved by explicitly decoupling label and nuisance attributes at the group level.

What carries the argument

The central mechanism is contrastive penalization of Grad-CAM activation maps from the primary label classifier and the parallel confounder classifier to enforce dissimilarity in how each attends to image regions.

If this is right

Classifiers achieve higher average and worst-group accuracy on distribution-shift benchmarks by reducing dependence on spurious features.
Functional domain invariance follows from explicit group-level decoupling of label and nuisance attributes.
The framework supplies built-in interpretability because training directly manipulates visual explanation maps.
The approach outperforms empirical risk minimization and prior methods on the tested many-to-many hard shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same paired-classifier structure could be tested with other explanation techniques to check whether the gains depend on Grad-CAM specifically.
The explicit separation of attributes may simplify post-training bias audits in deployed vision systems.
Adding the confounder branch increases training cost, which would need to be weighed against robustness gains in resource-limited settings.
The method invites experiments on whether the same decorrelation principle transfers to non-image modalities that have their own explanation tools.

Load-bearing premise

Penalizing similarity between Grad-CAM activation maps of the primary label classifier and the confounder classifier will reliably decorrelate confounding features from the latent representations without harming the primary task or introducing new unintended correlations.

What would settle it

If ablating the contrastive penalization term on the Spawrious Many-to-Many Hard Challenge produces no improvement in worst-group accuracy over standard training, or if the confounder classifier does not capture the relevant nuisance attributes, the proposed decorrelation mechanism would be falsified.

Figures

Figures reproduced from arXiv: 2605.06368 by Jose Marie Antonio Mi\~noza, Paulo Mario P. Medina, Sebastian C. Iba\~nez.

**Figure 1.** Figure 1: Grad-CAM plots of eX2L and others with their corresponding true labels y, predicted labels yˆ, and true confounders c. eX2L’s 11.91% improvement in worst-group accuracy against GroupDRO can be directly attributed to the mechanical shift visualized in the last row where GroupDRO’s and ERM’s attention is diffused across the background: eX2L restricts focus exclusively to the dog’s ear, effectively ignoring t… view at source ↗

**Figure 2.** Figure 2: UMAP Plots of different algorithms’ latent representations on the Waterbirds dataset. (a) maps the color of each point by the label while (b) maps the color of each point by the confounder. The compactness, clear label separation, and lack of confounder reliance of the representations supports the observed targeted visual focus by eX2L as seen in view at source ↗

**Figure 3.** Figure 3: UMAP Plots of different algorithms’ latent representations on the Hard Many-to-Many Spawrious dataset. Colors map the three defined training and test environments. While not explicitly using environmental annotations, eX2L demonstrates better domain invariance than DANN, as evidenced by its lower MMDEnv and more highly interspersed environments. 5.2. Visual Interpretability Analysis Based on view at source ↗

read the original abstract

Despite extensive research into mitigating distribution shifts, many existing algorithms yield inconsistent performance, often failing to outperform baseline Empirical Risk Minimization (ERM) across diverse scenarios. Furthermore, high algorithmic complexity frequently limits interpretability and offers only an indirect means of addressing spurious correlations. We propose eXplaining to Learn (eX2L): an interpretable, explanation-based framework that decorrelates confounding features from a classifier's latent representations during training. eX2L achieves this by penalizing the similarity between Grad-CAM activation maps generated by a primary label classifier and those from a concurrently trained confounder classifier. On the rigorous Spawrious Many-to-Many Hard Challenge benchmark, eX2L achieves an average accuracy (AA) of 82.24% +/- 3.87% and a worst-group accuracy (WGA) of 66.31% +/- 8.73%, outperforming the current state-of-the-art (SOTA) by 5.49% and 10.90%, respectively. Beyond its competitive performance, eX2L demonstrates that functional domain invariance can be achieved by explicitly decoupling label and nuisance attributes at the group level.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The core idea of penalizing Grad-CAM map similarity between label and confounder heads is new, but it only touches input saliency and does not clearly constrain the latent representations the abstract claims to decorrelate.

read the letter

The paper's main contribution is a regularization term that trains a primary classifier and a confounder classifier in parallel, then adds a contrastive penalty so their Grad-CAM activation maps become dissimilar. This is presented as a way to achieve functional domain invariance by decoupling label and nuisance attributes at the group level. On the Spawrious Many-to-Many Hard benchmark it reports 82.24% average accuracy and 66.31% worst-group accuracy, beating the cited SOTA by roughly 5.5 and 10.9 points respectively, with error bars shown. Those numbers are concrete and the benchmark is known to be demanding, so the empirical claim is worth checking if the method can be reproduced. The approach is also more directly tied to explanations than standard ERM or domain-adversarial baselines, which is a modest but real distinction. The central weakness is that the regularization only operates on the spatial saliency maps produced by Grad-CAM from the final convolutional layer. Nothing in the construction forces the shared encoder to stop embedding confounder information in forms that are invisible to those maps or that reappear once the confounder head is dropped at test time. The abstract states that the method decorrelates confounding features from the latent representations, yet the penalty does not directly act on those representations. Without ablations that measure actual feature correlation inside the encoder, or code that lets someone verify the effect, the performance gains could stem from other factors. This work is aimed at researchers who want interpretable regularization for distribution shift in vision. A reader already familiar with Grad-CAM and spurious-correlation benchmarks could extract the high-level idea and the reported numbers, but the missing implementation details and the gap between the claimed mechanism and what the penalty actually constrains limit how far it can be taken without further work. I would send it to peer review because the benchmark results are specific enough to be falsifiable and the regularization construction is distinct from prior art, even though the mechanistic story needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes eX2L, an interpretable regularization method for distribution shifts that trains a primary label classifier alongside a confounder classifier and adds a penalty on the similarity of their Grad-CAM activation maps. This is claimed to decorrelate confounding features from the primary classifier's latent representations. On the Spawrious Many-to-Many Hard Challenge, eX2L reports average accuracy of 82.24% ± 3.87% and worst-group accuracy of 66.31% ± 8.73%, outperforming prior SOTA by 5.49% and 10.90% respectively.

Significance. If the regularization mechanism reliably decorrelates confounders from latent representations without introducing new correlations, the approach would offer a low-complexity, explanation-driven alternative to existing robustness methods. The reported benchmark gains on a rigorous many-to-many shift task would be practically relevant for computer vision applications where spurious correlations are common.

major comments (2)

[Abstract] Abstract: the central claim that penalizing Grad-CAM map similarity 'decorrelates confounding features from a classifier's latent representations' is not supported by the described construction. Grad-CAM produces input-space saliency maps from gradients w.r.t. the final convolutional feature maps; the penalty therefore only encourages the two heads to attend to different spatial regions of the input. Nothing in the formulation constrains the post-convolutional latent embeddings themselves, leaving open the possibility that the primary encoder still encodes confounder information in a form invisible to the Grad-CAM head.
[Abstract] Abstract and results section: the reported gains (AA 82.24% ± 3.87%, WGA 66.31% ± 8.73%) are presented without any ablation on the explanation-similarity penalty weight, without verification that the penalty term actually reduces correlation between latent features and confounders, and without implementation details sufficient to reproduce the numbers. These omissions make it impossible to confirm that the claimed decorrelation mechanism is responsible for the observed improvements.

minor comments (2)

The manuscript should include a clear statement of the exact loss formulation, including how the confounder classifier is trained and how the penalty is weighted relative to the primary task loss.
Error bars are reported but the number of runs and random seeds are not stated; this information is needed to assess the statistical significance of the 5.49% and 10.90% improvements over SOTA.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications and committing to revisions that strengthen the presentation of the method and its empirical validation.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that penalizing Grad-CAM map similarity 'decorrelates confounding features from a classifier's latent representations' is not supported by the described construction. Grad-CAM produces input-space saliency maps from gradients w.r.t. the final convolutional feature maps; the penalty therefore only encourages the two heads to attend to different spatial regions of the input. Nothing in the formulation constrains the post-convolutional latent embeddings themselves, leaving open the possibility that the primary encoder still encodes confounder information in a form invisible to the Grad-CAM head.

Authors: We appreciate the referee's precise analysis of the mechanism. The penalty operates on Grad-CAM saliency maps derived from gradients with respect to the final convolutional feature maps, which encourages the primary classifier to attend to different spatial regions than the confounder classifier. While this does not impose an explicit constraint on every possible encoding within the latent feature maps, the resulting difference in attention patterns is intended to reduce the primary classifier's reliance on confounding features for its predictions. We acknowledge that the original wording in the abstract overstates the direct effect on latent representations. In the revision we will rephrase the central claim to emphasize that the approach 'encourages the primary classifier to rely on label-relevant spatial features by penalizing overlap in explanation maps with a confounder classifier' and will add a short discussion of the indirect influence on feature usage. revision: partial
Referee: [Abstract] Abstract and results section: the reported gains (AA 82.24% ± 3.87%, WGA 66.31% ± 8.73%) are presented without any ablation on the explanation-similarity penalty weight, without verification that the penalty term actually reduces correlation between latent features and confounders, and without implementation details sufficient to reproduce the numbers. These omissions make it impossible to confirm that the claimed decorrelation mechanism is responsible for the observed improvements.

Authors: We agree that these supporting analyses are necessary to substantiate the role of the explanation-similarity penalty. In the revised manuscript we will add (i) an ablation study sweeping the penalty weight and reporting its effect on both average and worst-group accuracy, (ii) quantitative verification that the penalty reduces correlation between the primary classifier's latent features and confounder labels (e.g., via linear probing accuracy or estimated mutual information), and (iii) expanded implementation details, including exact hyper-parameters, network architectures, and training schedules, placed in the main text or a dedicated reproducibility section. These additions will allow readers to confirm that the observed gains are attributable to the proposed regularization. revision: yes

Circularity Check

0 steps flagged

No circularity: method explicitly defined via penalty on Grad-CAM similarity; performance claims are empirical measurements, not derived quantities.

full rationale

The paper defines eX2L directly as a regularization term that penalizes similarity between Grad-CAM activation maps of the primary classifier and a concurrent confounder classifier. This construction is stated as the mechanism for decorrelating confounders from latent representations, but the link is presented as a modeling choice rather than a mathematical reduction. No equations show a fitted parameter or self-referential definition that forces the target metric by construction. Reported accuracies on Spawrious are measured outcomes on held-out data, not predictions derived from the same inputs used to fit the model. No self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text as load-bearing steps. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that Grad-CAM maps faithfully represent the features driving each classifier's decisions and that reducing map similarity equates to reduced feature correlation in the latent space.

free parameters (1)

explanation similarity penalty weight
The strength of the contrastive penalty term must be chosen or tuned; its value is not reported in the abstract.

axioms (1)

domain assumption Grad-CAM activation maps accurately reflect the decision-relevant features of each classifier
The entire regularization strategy depends on this property of Grad-CAM.

pith-pipeline@v0.9.0 · 5534 in / 1398 out tokens · 35990 ms · 2026-05-08T13:16:09.066032+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 13 canonical work pages · 3 internal anchors

[1]

Sanity checks for saliency maps

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. Sanity checks for saliency maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pp.\ 9525–9536, Red Hook, NY, USA, 2018. Curran Associates Inc

2018
[2]

Back-to-bones: Rediscovering the Role of Backbones in Domain Generalization

Angarano, S., Martini, M., Salvetti, F., Mazzia, V., and Chiaberge, M. Back-to-bones: Rediscovering the Role of Backbones in Domain Generalization . Pattern Recognition, 156: 0 110762, 2024. doi:10.1016/j.patcog.2024.110762

work page doi:10.1016/j.patcog.2024.110762 2024
[3]

Invariant Risk Minimization

Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization, 2019. URL https://arxiv.org/abs/1907.02893

work page internal anchor Pith review arXiv 2019
[4]

Dammu, P. P. S. and Shah, C. Detecting spurious correlations via robust visual concepts in real and AI -generated image classification. In XAI in Action: Past, Present, and Future Applications, 2023. URL https://openreview.net/forum?id=ewagDhIy8Y

2023
[5]

Domain-adversarial training of neural networks

Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., and Lempitsky, V. Domain-adversarial training of neural networks. Journal of machine learning research, 17 0 (59): 0 1--35, 2016

2016
[6]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \"o lkopf, B., and Smola, A. A kernel two-sample test. Journal of Machine Learning Research, 13 0 (25): 0 723--773, 2012. URL http://jmlr.org/papers/v13/gretton12a.html

2012
[7]

and Lopez-Paz, D

Gulrajani, I. and Lopez-Paz, D. In search of lost domain generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=lQdXeXDoWtI

2021
[8]

T., Curran, K

Hagos, M. T., Curran, K. M., and Namee, B. M. Identifying spurious correlations and correcting them with an explanation-based learning, 2022. URL https://arxiv.org/abs/2211.08285

work page arXiv 2022
[9]

Han and Y

Han, X. and Tsvetkov, Y. Influence tuning: Demoting spurious correlations via instance attribution and instance-driven updates. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, pp.\ 4398--4409, Punta Cana, Dominican Republic, November 2021. Association for Computationa...

work page doi:10.18653/v1/2021.findings-emnlp.374 2021
[10]

Kirichenko, P., Izmailov, P., and Wilson, A. G. Last layer re-training is sufficient for robustness to spurious correlations. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ylpMUNYWpX

2023
[11]

W., Sagawa, S., Marklund, H., Xie, S

Koh, P. W., Sagawa, S., Marklund, H., Xie, S. M., Zhang, M., Balsubramani, A., Hu, W., Yasunaga, M., Phillips, R. L., Gao, I., Lee, T., David, E., Stavness, I., Guo, W., Earnshaw, B. A., Haque, I. S., Beery, S. M., Leskovec, J., Kundaje, A. B., Pierson, E., Levine, S., Finn, C., and Liang, P. Wilds: A benchmark of in-the-wild distribution shifts. In Inter...

2020
[12]

J., Wang, S., and Kot, A

Li, H., Pan, S. J., Wang, S., and Kot, A. C. Domain generalization with adversarial feature learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 5400--5409, 2018. doi:10.1109/CVPR.2018.00566

work page doi:10.1109/cvpr.2018.00566 2018
[13]

Z., Haghgoo, B., Chen, A

Liu, E. Z., Haghgoo, B., Chen, A. S., Raghunathan, A., Koh, P. W., Sagawa, S., Liang, P., and Finn, C. Just train twice: Improving group robustness without training group information. In Proceedings of the 38th International Conference on Machine Learning, pp.\ 6781--6792. PMLR, 2021 a

2021
[14]

Towards out-of-distribution generalization: A survey.arXiv preprint arXiv:2108.13624, 2021

Liu, J., Shen, Z., He, Y., Zhang, X., Xu, R., Yu, H., and Cui, P. Towards out-of-distribution generalization: A survey, 2021 b . URL https://arxiv.org/abs/2108.13624

work page arXiv 2021
[15]

Challenging common assumptions in the unsupervised learning of disentangled representations

Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Sch \"o lkopf, B., and Bachem, O. Challenging common assumptions in the unsupervised learning of disentangled representations. In International Conference on Machine Learning, pp.\ 4114--4124, 2019

2019
[16]

Long, M., Cao, Z., Wang, J., and Jordan, M. I. Conditional adversarial domain adaptation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pp.\ 1647–1657, Red Hook, NY, USA, 2018. Curran Associates Inc

2018
[17]

J.-S., Kaddour, J., and Silva, R

Lynch, A., Dovonon, G. J.-S., Kaddour, J., and Silva, R. Spawrious: A benchmark for fine control of spurious correlation biases. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=0S0oITNTCz

2025
[18]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

McInnes, L., Healy, J., and Melville, J. UMAP : Uniform manifold approximation and projection for dimension reduction, 2018. URL https://arxiv.org/abs/1802.03426

work page internal anchor Pith review arXiv 2018
[19]

2016, in 2016 Fourth International Conference on 3D Vision (3DV), IEEE, 565–571, doi: 10.1109/3DV.2016.79

Milletari, F., Navab, N., and Ahmadi, S.-A. V-Net : Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp.\ 565--571, 2016. doi:10.1109/3DV.2016.79

work page doi:10.1109/3dv.2016.79 2016
[20]

Proceedings of the AAAI Conference on Artificial Intelligence , author =

Ming, Y., Yin, H., and Li, Y. On the impact of spurious correlation for out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.\ 10051--10059, 2022. doi:10.1609/aaai.v36i9.21244

work page doi:10.1609/aaai.v36i9.21244 2022
[21]

Mitigating spurious correlations in image recognition models using performance-based feature sampling

Monga, A., Somou, R., Zhang, S., and Ortega, A. Mitigating spurious correlations in image recognition models using performance-based feature sampling. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=DRv8wcssgs

2025
[22]

Learning from failure: De-biasing Classifier from Biased Classifier

Nam, J., Cha, H., Ahn, S., Lee, J., and Shin, J. Learning from failure: De-biasing Classifier from Biased Classifier . In Advances in Neural Information Processing Systems, volume 33, pp.\ 20673--20684, 2020

2020
[23]

Rahman, M. A. and Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Bebis, G., Boyle, R., Parvin, B., Koracin, D., Porikli, F., Skaff, S., Entezari, A., Min, J., Iwai, D., Sadagic, A., Scheidegger, C., and Isenberg, T. (eds.), Advances in Visual Computing, pp.\ 234--244, Cham, 2016. Springer International Publi...

2016
[24]

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Sagawa, S., Koh, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization, 2019. URL https://arxiv.org/abs/1911.08731

work page internal anchor Pith review arXiv 2019
[25]

W., Hashimoto, T

Sagawa, S., Koh, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS

2020
[26]

R.et al.Grad-cam: Visual explanations from deep networks via gradient-based localization.International Journal of Computer Vision128, 336–359 (2019)

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128 0 (2): 0 336–359, October 2019. ISSN 1573-1405. doi:10.1007/s11263-019-01228-7. URL http://dx.doi.org/10.1007/s11263-019-01228-7

work page doi:10.1007/s11263-019-01228-7 2019
[27]

and Zhao, Z

Shen, H. and Zhao, Z. Boosting test performance with importance sampling-a subpopulation perspective. In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence, AAAI'25/IAAI'25/E...

work page doi:10.1609/aaai.v39i19.34244 2025
[28]

Shortcut learning susceptibility in vision classifiers

Suhail, P., Goel, V., and Sethi, A. Shortcut learning susceptibility in vision classifiers. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=dvafjL2zXP

2025
[29]

and Saenko, K

Sun, B. and Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Hua, G. and J \'e gou, H. (eds.), Computer Vision -- ECCV 2016 Workshops, pp.\ 443--450, Cham, 2016. Springer International Publishing. ISBN 978-3-319-49409-8

2016
[30]

and Zaslavsky, N

Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop, pp.\ 1--5, 2015

2015
[31]

Image quality assessment: from error visibility to structural similarity

Wang, Z., Bovik, A., Sheikh, H., and Simoncelli, E. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13 0 (4): 0 600--612, 2004. doi:10.1109/TIP.2003.819861

work page doi:10.1109/tip.2003.819861 2004
[32]

D., and Cemgil, A

Wiles, O., Gowal, S., Stimberg, F., Rebuffi, S.-A., Ktena, I., Dvijotham, K. D., and Cemgil, A. T. A fine-grained analysis on distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Dl4LetuLdyK

2022
[33]

Change is hard: A closer look at subpopulation shift

Yang, Y., Zhang, H., Katabi, D., and Ghassemi, M. Change is hard: A closer look at subpopulation shift. In International Conference on Machine Learning, 2023

2023
[34]

Towards a theoretical framework of out-of-distribution generalization

Ye, H., Xie, C., Cai, T., Li, R., Li, Z., and Wang, L. Towards a theoretical framework of out-of-distribution generalization. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.\ 23519--23531. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/...

2021