Recognition: unknown
eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts
Pith reviewed 2026-05-08 13:16 UTC · model grok-4.3
The pith
eX2L decorrelates confounding features by penalizing Grad-CAM map similarity between primary and confounder classifiers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
eX2L decorrelates confounding features from a classifier's latent representations during training by penalizing the similarity between Grad-CAM activation maps generated by a primary label classifier and those from a concurrently trained confounder classifier. On the Spawrious Many-to-Many Hard Challenge benchmark, this yields an average accuracy of 82.24% and a worst-group accuracy of 66.31%, exceeding prior state-of-the-art by 5.49% and 10.90%. The work shows that functional domain invariance can be achieved by explicitly decoupling label and nuisance attributes at the group level.
What carries the argument
The central mechanism is contrastive penalization of Grad-CAM activation maps from the primary label classifier and the parallel confounder classifier to enforce dissimilarity in how each attends to image regions.
If this is right
- Classifiers achieve higher average and worst-group accuracy on distribution-shift benchmarks by reducing dependence on spurious features.
- Functional domain invariance follows from explicit group-level decoupling of label and nuisance attributes.
- The framework supplies built-in interpretability because training directly manipulates visual explanation maps.
- The approach outperforms empirical risk minimization and prior methods on the tested many-to-many hard shifts.
Where Pith is reading between the lines
- The same paired-classifier structure could be tested with other explanation techniques to check whether the gains depend on Grad-CAM specifically.
- The explicit separation of attributes may simplify post-training bias audits in deployed vision systems.
- Adding the confounder branch increases training cost, which would need to be weighed against robustness gains in resource-limited settings.
- The method invites experiments on whether the same decorrelation principle transfers to non-image modalities that have their own explanation tools.
Load-bearing premise
Penalizing similarity between Grad-CAM activation maps of the primary label classifier and the confounder classifier will reliably decorrelate confounding features from the latent representations without harming the primary task or introducing new unintended correlations.
What would settle it
If ablating the contrastive penalization term on the Spawrious Many-to-Many Hard Challenge produces no improvement in worst-group accuracy over standard training, or if the confounder classifier does not capture the relevant nuisance attributes, the proposed decorrelation mechanism would be falsified.
Figures
read the original abstract
Despite extensive research into mitigating distribution shifts, many existing algorithms yield inconsistent performance, often failing to outperform baseline Empirical Risk Minimization (ERM) across diverse scenarios. Furthermore, high algorithmic complexity frequently limits interpretability and offers only an indirect means of addressing spurious correlations. We propose eXplaining to Learn (eX2L): an interpretable, explanation-based framework that decorrelates confounding features from a classifier's latent representations during training. eX2L achieves this by penalizing the similarity between Grad-CAM activation maps generated by a primary label classifier and those from a concurrently trained confounder classifier. On the rigorous Spawrious Many-to-Many Hard Challenge benchmark, eX2L achieves an average accuracy (AA) of 82.24% +/- 3.87% and a worst-group accuracy (WGA) of 66.31% +/- 8.73%, outperforming the current state-of-the-art (SOTA) by 5.49% and 10.90%, respectively. Beyond its competitive performance, eX2L demonstrates that functional domain invariance can be achieved by explicitly decoupling label and nuisance attributes at the group level.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes eX2L, an interpretable regularization method for distribution shifts that trains a primary label classifier alongside a confounder classifier and adds a penalty on the similarity of their Grad-CAM activation maps. This is claimed to decorrelate confounding features from the primary classifier's latent representations. On the Spawrious Many-to-Many Hard Challenge, eX2L reports average accuracy of 82.24% ± 3.87% and worst-group accuracy of 66.31% ± 8.73%, outperforming prior SOTA by 5.49% and 10.90% respectively.
Significance. If the regularization mechanism reliably decorrelates confounders from latent representations without introducing new correlations, the approach would offer a low-complexity, explanation-driven alternative to existing robustness methods. The reported benchmark gains on a rigorous many-to-many shift task would be practically relevant for computer vision applications where spurious correlations are common.
major comments (2)
- [Abstract] Abstract: the central claim that penalizing Grad-CAM map similarity 'decorrelates confounding features from a classifier's latent representations' is not supported by the described construction. Grad-CAM produces input-space saliency maps from gradients w.r.t. the final convolutional feature maps; the penalty therefore only encourages the two heads to attend to different spatial regions of the input. Nothing in the formulation constrains the post-convolutional latent embeddings themselves, leaving open the possibility that the primary encoder still encodes confounder information in a form invisible to the Grad-CAM head.
- [Abstract] Abstract and results section: the reported gains (AA 82.24% ± 3.87%, WGA 66.31% ± 8.73%) are presented without any ablation on the explanation-similarity penalty weight, without verification that the penalty term actually reduces correlation between latent features and confounders, and without implementation details sufficient to reproduce the numbers. These omissions make it impossible to confirm that the claimed decorrelation mechanism is responsible for the observed improvements.
minor comments (2)
- The manuscript should include a clear statement of the exact loss formulation, including how the confounder classifier is trained and how the penalty is weighted relative to the primary task loss.
- Error bars are reported but the number of runs and random seeds are not stated; this information is needed to assess the statistical significance of the 5.49% and 10.90% improvements over SOTA.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications and committing to revisions that strengthen the presentation of the method and its empirical validation.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that penalizing Grad-CAM map similarity 'decorrelates confounding features from a classifier's latent representations' is not supported by the described construction. Grad-CAM produces input-space saliency maps from gradients w.r.t. the final convolutional feature maps; the penalty therefore only encourages the two heads to attend to different spatial regions of the input. Nothing in the formulation constrains the post-convolutional latent embeddings themselves, leaving open the possibility that the primary encoder still encodes confounder information in a form invisible to the Grad-CAM head.
Authors: We appreciate the referee's precise analysis of the mechanism. The penalty operates on Grad-CAM saliency maps derived from gradients with respect to the final convolutional feature maps, which encourages the primary classifier to attend to different spatial regions than the confounder classifier. While this does not impose an explicit constraint on every possible encoding within the latent feature maps, the resulting difference in attention patterns is intended to reduce the primary classifier's reliance on confounding features for its predictions. We acknowledge that the original wording in the abstract overstates the direct effect on latent representations. In the revision we will rephrase the central claim to emphasize that the approach 'encourages the primary classifier to rely on label-relevant spatial features by penalizing overlap in explanation maps with a confounder classifier' and will add a short discussion of the indirect influence on feature usage. revision: partial
-
Referee: [Abstract] Abstract and results section: the reported gains (AA 82.24% ± 3.87%, WGA 66.31% ± 8.73%) are presented without any ablation on the explanation-similarity penalty weight, without verification that the penalty term actually reduces correlation between latent features and confounders, and without implementation details sufficient to reproduce the numbers. These omissions make it impossible to confirm that the claimed decorrelation mechanism is responsible for the observed improvements.
Authors: We agree that these supporting analyses are necessary to substantiate the role of the explanation-similarity penalty. In the revised manuscript we will add (i) an ablation study sweeping the penalty weight and reporting its effect on both average and worst-group accuracy, (ii) quantitative verification that the penalty reduces correlation between the primary classifier's latent features and confounder labels (e.g., via linear probing accuracy or estimated mutual information), and (iii) expanded implementation details, including exact hyper-parameters, network architectures, and training schedules, placed in the main text or a dedicated reproducibility section. These additions will allow readers to confirm that the observed gains are attributable to the proposed regularization. revision: yes
Circularity Check
No circularity: method explicitly defined via penalty on Grad-CAM similarity; performance claims are empirical measurements, not derived quantities.
full rationale
The paper defines eX2L directly as a regularization term that penalizes similarity between Grad-CAM activation maps of the primary classifier and a concurrent confounder classifier. This construction is stated as the mechanism for decorrelating confounders from latent representations, but the link is presented as a modeling choice rather than a mathematical reduction. No equations show a fitted parameter or self-referential definition that forces the target metric by construction. Reported accuracies on Spawrious are measured outcomes on held-out data, not predictions derived from the same inputs used to fit the model. No self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text as load-bearing steps. The derivation chain is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
free parameters (1)
- explanation similarity penalty weight
axioms (1)
- domain assumption Grad-CAM activation maps accurately reflect the decision-relevant features of each classifier
Reference graph
Works this paper leans on
-
[1]
Sanity checks for saliency maps
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. Sanity checks for saliency maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pp.\ 9525–9536, Red Hook, NY, USA, 2018. Curran Associates Inc
2018
-
[2]
Back-to-bones: Rediscovering the Role of Backbones in Domain Generalization
Angarano, S., Martini, M., Salvetti, F., Mazzia, V., and Chiaberge, M. Back-to-bones: Rediscovering the Role of Backbones in Domain Generalization . Pattern Recognition, 156: 0 110762, 2024. doi:10.1016/j.patcog.2024.110762
-
[3]
Arjovsky, M., Bottou, L., Gulrajani, I., and Lopez-Paz, D. Invariant risk minimization, 2019. URL https://arxiv.org/abs/1907.02893
work page internal anchor Pith review arXiv 2019
-
[4]
Dammu, P. P. S. and Shah, C. Detecting spurious correlations via robust visual concepts in real and AI -generated image classification. In XAI in Action: Past, Present, and Future Applications, 2023. URL https://openreview.net/forum?id=ewagDhIy8Y
2023
-
[5]
Domain-adversarial training of neural networks
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., and Lempitsky, V. Domain-adversarial training of neural networks. Journal of machine learning research, 17 0 (59): 0 1--35, 2016
2016
-
[6]
M., Rasch, M
Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \"o lkopf, B., and Smola, A. A kernel two-sample test. Journal of Machine Learning Research, 13 0 (25): 0 723--773, 2012. URL http://jmlr.org/papers/v13/gretton12a.html
2012
-
[7]
and Lopez-Paz, D
Gulrajani, I. and Lopez-Paz, D. In search of lost domain generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=lQdXeXDoWtI
2021
-
[8]
Hagos, M. T., Curran, K. M., and Namee, B. M. Identifying spurious correlations and correcting them with an explanation-based learning, 2022. URL https://arxiv.org/abs/2211.08285
-
[9]
Han, X. and Tsvetkov, Y. Influence tuning: Demoting spurious correlations via instance attribution and instance-driven updates. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, pp.\ 4398--4409, Punta Cana, Dominican Republic, November 2021. Association for Computationa...
-
[10]
Kirichenko, P., Izmailov, P., and Wilson, A. G. Last layer re-training is sufficient for robustness to spurious correlations. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ylpMUNYWpX
2023
-
[11]
W., Sagawa, S., Marklund, H., Xie, S
Koh, P. W., Sagawa, S., Marklund, H., Xie, S. M., Zhang, M., Balsubramani, A., Hu, W., Yasunaga, M., Phillips, R. L., Gao, I., Lee, T., David, E., Stavness, I., Guo, W., Earnshaw, B. A., Haque, I. S., Beery, S. M., Leskovec, J., Kundaje, A. B., Pierson, E., Levine, S., Finn, C., and Liang, P. Wilds: A benchmark of in-the-wild distribution shifts. In Inter...
2020
-
[12]
Li, H., Pan, S. J., Wang, S., and Kot, A. C. Domain generalization with adversarial feature learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 5400--5409, 2018. doi:10.1109/CVPR.2018.00566
-
[13]
Z., Haghgoo, B., Chen, A
Liu, E. Z., Haghgoo, B., Chen, A. S., Raghunathan, A., Koh, P. W., Sagawa, S., Liang, P., and Finn, C. Just train twice: Improving group robustness without training group information. In Proceedings of the 38th International Conference on Machine Learning, pp.\ 6781--6792. PMLR, 2021 a
2021
-
[14]
Towards out-of-distribution generalization: A survey.arXiv preprint arXiv:2108.13624, 2021
Liu, J., Shen, Z., He, Y., Zhang, X., Xu, R., Yu, H., and Cui, P. Towards out-of-distribution generalization: A survey, 2021 b . URL https://arxiv.org/abs/2108.13624
-
[15]
Challenging common assumptions in the unsupervised learning of disentangled representations
Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Sch \"o lkopf, B., and Bachem, O. Challenging common assumptions in the unsupervised learning of disentangled representations. In International Conference on Machine Learning, pp.\ 4114--4124, 2019
2019
-
[16]
Long, M., Cao, Z., Wang, J., and Jordan, M. I. Conditional adversarial domain adaptation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pp.\ 1647–1657, Red Hook, NY, USA, 2018. Curran Associates Inc
2018
-
[17]
J.-S., Kaddour, J., and Silva, R
Lynch, A., Dovonon, G. J.-S., Kaddour, J., and Silva, R. Spawrious: A benchmark for fine control of spurious correlation biases. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=0S0oITNTCz
2025
-
[18]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
McInnes, L., Healy, J., and Melville, J. UMAP : Uniform manifold approximation and projection for dimension reduction, 2018. URL https://arxiv.org/abs/1802.03426
work page internal anchor Pith review arXiv 2018
-
[19]
Milletari, F., Navab, N., and Ahmadi, S.-A. V-Net : Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp.\ 565--571, 2016. doi:10.1109/3DV.2016.79
-
[20]
Proceedings of the AAAI Conference on Artificial Intelligence , author =
Ming, Y., Yin, H., and Li, Y. On the impact of spurious correlation for out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.\ 10051--10059, 2022. doi:10.1609/aaai.v36i9.21244
-
[21]
Mitigating spurious correlations in image recognition models using performance-based feature sampling
Monga, A., Somou, R., Zhang, S., and Ortega, A. Mitigating spurious correlations in image recognition models using performance-based feature sampling. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=DRv8wcssgs
2025
-
[22]
Learning from failure: De-biasing Classifier from Biased Classifier
Nam, J., Cha, H., Ahn, S., Lee, J., and Shin, J. Learning from failure: De-biasing Classifier from Biased Classifier . In Advances in Neural Information Processing Systems, volume 33, pp.\ 20673--20684, 2020
2020
-
[23]
Rahman, M. A. and Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Bebis, G., Boyle, R., Parvin, B., Koracin, D., Porikli, F., Skaff, S., Entezari, A., Min, J., Iwai, D., Sadagic, A., Scheidegger, C., and Isenberg, T. (eds.), Advances in Visual Computing, pp.\ 234--244, Cham, 2016. Springer International Publi...
2016
-
[24]
Sagawa, S., Koh, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization, 2019. URL https://arxiv.org/abs/1911.08731
work page internal anchor Pith review arXiv 2019
-
[25]
W., Hashimoto, T
Sagawa, S., Koh, P. W., Hashimoto, T. B., and Liang, P. Distributionally robust neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS
2020
-
[26]
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128 0 (2): 0 336–359, October 2019. ISSN 1573-1405. doi:10.1007/s11263-019-01228-7. URL http://dx.doi.org/10.1007/s11263-019-01228-7
-
[27]
Shen, H. and Zhao, Z. Boosting test performance with importance sampling-a subpopulation perspective. In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence, AAAI'25/IAAI'25/E...
-
[28]
Shortcut learning susceptibility in vision classifiers
Suhail, P., Goel, V., and Sethi, A. Shortcut learning susceptibility in vision classifiers. In Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025. URL https://openreview.net/forum?id=dvafjL2zXP
2025
-
[29]
and Saenko, K
Sun, B. and Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Hua, G. and J \'e gou, H. (eds.), Computer Vision -- ECCV 2016 Workshops, pp.\ 443--450, Cham, 2016. Springer International Publishing. ISBN 978-3-319-49409-8
2016
-
[30]
and Zaslavsky, N
Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop, pp.\ 1--5, 2015
2015
-
[31]
Image quality assessment: from error visibility to structural similarity
Wang, Z., Bovik, A., Sheikh, H., and Simoncelli, E. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13 0 (4): 0 600--612, 2004. doi:10.1109/TIP.2003.819861
-
[32]
D., and Cemgil, A
Wiles, O., Gowal, S., Stimberg, F., Rebuffi, S.-A., Ktena, I., Dvijotham, K. D., and Cemgil, A. T. A fine-grained analysis on distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Dl4LetuLdyK
2022
-
[33]
Change is hard: A closer look at subpopulation shift
Yang, Y., Zhang, H., Katabi, D., and Ghassemi, M. Change is hard: A closer look at subpopulation shift. In International Conference on Machine Learning, 2023
2023
-
[34]
Towards a theoretical framework of out-of-distribution generalization
Ye, H., Xie, C., Cai, T., Li, R., Li, Z., and Wang, L. Towards a theoretical framework of out-of-distribution generalization. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.\ 23519--23531. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.