Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations
Pith reviewed 2026-05-21 05:01 UTC · model grok-4.3
The pith
Replacing global average pooling with attention-based reweighting allows post-hoc retraining to suppress spurious features before they mix with core ones in CNNs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Global Average Pooling layer indiscriminately collapses spatially distinct core and spurious features into one representation, limiting the effectiveness of retraining only the classifier head. Deep Attention Reweighting replaces this pooling with an adaptive weighting of spatial locations across feature maps, enabling selective suppression of spurious features before entanglement. When the new module is retrained jointly with the classification head on a target dataset, it consistently outperforms Deep Feature Reweighting across datasets, metrics, and ablations.
What carries the argument
Deep Attention Reweighting (DAR), a post-hoc attention-based aggregation module that replaces Global Average Pooling and computes adaptive weights for spatial locations in feature maps to suppress spurious signals.
If this is right
- Selective spatial suppression before pooling reduces a model's reliance on spurious correlations more effectively than operating on already-entangled features.
- The performance advantage of DAR over DFR holds across multiple datasets, evaluation metrics, and ablation settings.
- Joint retraining of the aggregation module and head is sufficient to realize the gains without updating the convolutional backbone.
- Attention-based aggregation mitigates the specific limitation introduced by fixed global average pooling under spurious correlations.
Where Pith is reading between the lines
- Similar attention reweighting could be inserted at other aggregation points inside CNNs or in non-CNN vision architectures to limit spurious feature propagation.
- Preventing entanglement at the pooling stage might lower the cost of later interventions and encourage training pipelines that preserve spatial distinctions from the start.
- Applying the same module during initial training rather than only post-hoc could reveal whether early intervention prevents spurious correlations from forming at all.
Load-bearing premise
The entanglement of core and spurious features is fundamentally caused by the Global Average Pooling layer indiscriminately collapsing spatially distinct features.
What would settle it
Measuring attention weights produced by DAR on held-out examples from a dataset with spatially localized spurious cues; if the weights do not systematically down-weight the spurious spatial regions while accuracy on core-only tests improves, the proposed mechanism is not operating as claimed.
Figures
read the original abstract
Convolutional Neural Networks (CNNs) often exploit spurious correlations in datasets, learning superficially predictive yet causally irrelevant features, leading to poor generalization and fairness issues. Deep Feature Reweighting (DFR) is a post-hoc technique that reduces a trained model's reliance on spurious correlations by retraining its classification head on a target dataset. However, we show that DFR is fundamentally constrained by operating on entangled features, limiting its ability to amplify the core features while simultaneously suppressing the spurious ones. We trace this entanglement to the ubiquitous Global Average Pooling (GAP) layer, which indiscriminately collapses spatially distinct core and spurious features into a single representation. To address this, we propose Deep Attention Reweighting (DAR), a post-hoc attention-based aggregation module that replaces GAP and is retrained jointly with the classification head. DAR computes an adaptive weighting of spatial locations across feature maps, enabling selective suppression of spurious features before the collapse into entangled features. Across various datasets, metrics, and ablations, DAR consistently outperforms DFR, demonstrating that our attention-based aggregation mitigates GAP-induced entanglement and reduces spurious reliance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Deep Attention Reweighting (DAR), a post-hoc module that replaces Global Average Pooling (GAP) in a frozen CNN backbone. DAR is retrained jointly with the classification head on a target dataset to adaptively weight spatial locations in feature maps, with the goal of selectively suppressing spurious features before they collapse into an entangled representation. The central claim is that this addresses a fundamental limitation of Deep Feature Reweighting (DFR), which operates on already-entangled features, and that DAR yields consistent improvements over DFR across datasets, metrics, and ablations.
Significance. If the mechanistic claim holds, the work offers a lightweight, architecture-compatible improvement to post-hoc debiasing methods for CNNs, with potential benefits for OOD generalization and fairness. The empirical scope (multiple datasets, ablations, and direct comparison to DFR) is a strength; however, the absence of direct evidence that attention maps perform the claimed selective suppression limits the interpretability of the gains.
major comments (2)
- [Abstract, §3] Abstract and §3 (DAR formulation): the claim that DAR 'enables selective suppression of spurious features before the collapse' is load-bearing for the paper's contribution over DFR, yet the experiments provide no inspection of attention maps, no correlation with core/spurious region masks, and no control experiment isolating whether gains arise from selective suppression versus generic spatial reweighting or added capacity.
- [§4] §4 (experimental results): while consistent outperformance versus DFR is reported, the absence of attention-map analysis or quantitative differential weighting metrics means the central explanation (mitigation of GAP-induced entanglement via selective suppression) remains unverified; this must be addressed before the mechanistic interpretation can be accepted.
minor comments (2)
- [§3] Notation for the attention weight computation (likely Eq. (X) in §3) should explicitly state whether the attention module shares parameters with the backbone or is trained from scratch, and whether any regularization is applied to encourage sparsity or selectivity.
- [Figures in §4] Figure captions and axis labels in the ablation plots could be expanded to clarify which metrics correspond to core-feature accuracy versus spurious-feature suppression.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments, which help clarify the need for stronger mechanistic evidence. We address each major point below and have incorporated revisions to include attention map analyses, quantitative metrics, and control experiments.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (DAR formulation): the claim that DAR 'enables selective suppression of spurious features before the collapse' is load-bearing for the paper's contribution over DFR, yet the experiments provide no inspection of attention maps, no correlation with core/spurious region masks, and no control experiment isolating whether gains arise from selective suppression versus generic spatial reweighting or added capacity.
Authors: We agree that direct inspection of the attention mechanism is necessary to substantiate the selective suppression claim. In the revised manuscript, we have added visualizations of the learned attention maps on datasets with available core/spurious region annotations (e.g., Waterbirds and CelebA), along with quantitative correlations between attention weights and ground-truth masks. We also include a new control experiment comparing DAR against a non-adaptive spatial reweighting baseline (fixed uniform weights plus added capacity) and a random attention variant. These results show that performance gains are attributable to adaptive, selective weighting rather than generic reweighting or capacity alone, and we have updated the abstract and §3 to reference these findings. revision: yes
-
Referee: [§4] §4 (experimental results): while consistent outperformance versus DFR is reported, the absence of attention-map analysis or quantitative differential weighting metrics means the central explanation (mitigation of GAP-induced entanglement via selective suppression) remains unverified; this must be addressed before the mechanistic interpretation can be accepted.
Authors: We acknowledge that the original experiments lacked direct verification of the proposed mechanism. The revised §4 now incorporates attention-map analysis across all evaluated datasets and introduces quantitative differential weighting metrics, specifically the mean attention ratio on core versus spurious regions (computed using available annotations or proxy masks derived from dataset structure). These metrics demonstrate statistically higher weighting on core features under DAR compared to GAP, supporting the mitigation of entanglement. New figures and tables present these results alongside the existing performance comparisons, and we have added a brief discussion of how this evidence strengthens the interpretation over DFR. revision: yes
Circularity Check
No circularity: empirical method proposal with no derivation chain reducing to fitted inputs or self-citations by construction.
full rationale
The paper proposes DAR as a post-hoc attention module replacing GAP, retrained with the classification head, and evaluates it empirically against DFR on datasets. The abstract and provided text contain no equations, no fitted parameters renamed as predictions, no self-citations invoked as uniqueness theorems, and no ansatz smuggled via prior work. The central claim (attention enables selective suppression before collapse) is supported by experimental comparisons rather than any self-referential reduction. This matches the default case of a self-contained empirical contribution with independent content.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Global Average Pooling indiscriminately collapses spatially distinct core and spurious features into entangled representations
invented entities (1)
-
Deep Attention Reweighting (DAR) module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Ahuja, K., Shanmugam, K., Varshney, K., Dhurandhar, A.: Invariant risk min- imization games. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th Inter- national Conference on Machine Learning. Proceedings of Machine Learning Re- search, vol. 119, pp. 145–155. PMLR (13–18 Jul 2020),https://proceedings. mlr.press/v119/ahuja20a.html
work page 2020
-
[2]
Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization (2020)
work page 2020
-
[3]
Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented con- volutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019),https://openaccess.thecvf.com/content_ ICCV_2019/html/Bello_Attention_Augmented_Convolutional_Networks_ICCV_ 2019_paper.html
work page 2019
-
[4]
Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence (2013).https://doi.org/10.1109/TPAMI.2013.50
-
[5]
Burgess, C.P., Higgins, I., Pal, A., Matthey, L., Watters, N., Desjardins, G., Ler- chner, A.: Understanding disentangling inβ-vae (2018),https://arxiv.org/abs/ 1804.03599
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Carbonneau, M.A., Zaïdi, J., Boilard, J., Gagnon, G.: Measuring disentanglement: A review of metrics. IEEE Transactions on Neural Networks and Learning Systems 35(7), 8747–8761 (2024).https://doi.org/10.1109/TNNLS.2022.3218982
-
[7]
Chen, A.S., Lee, Y., Setlur, A., Levine, S., Finn, C.: Confidence-based model se- lection: When to take shortcuts for subpopulation shifts (2023)
work page 2023
-
[8]
Chen, R.T.Q., Li, X., Grosse, R., Duvenaud, D.: Isolating sources of disentangle- ment in variational autoencoders (2019),https://arxiv.org/abs/1802.04942
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[9]
IEEE Signal Processing Magazine29(6), 141–142 (2012)
Deng, L.: The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine29(6), 141–142 (2012)
work page 2012
-
[10]
IEEE Transactions on Multimedia24, 2407–2421 (2022).https://doi.org/10.1109/ TMM.2021.3080516
Deng, W., Zhao, L., Liao, Q., Guo, D., Kuang, G., Hu, D., Pietikäinen, M., Liu, L.: Informative feature disentanglement for unsupervised domain adaptation. IEEE Transactions on Multimedia24, 2407–2421 (2022).https://doi.org/10.1109/ TMM.2021.3080516
-
[11]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021),https: //openreview.net/forum?id=YicbFdNTTy
work page 2021
-
[12]
Dupont, E.: Learning disentangled joint continuous and discrete representations (2018),https://arxiv.org/abs/1804.00104
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
Shortcut Learning in Deep Neural Networks , journal =
Geirhos, R., Jacobsen, J., Michaelis, C., Zemel, R.S., Brendel, W., Bethge, M., Wichmann, F.A.: Shortcut learning in deep neural networks. CoRR abs/2004.07780(2020),https://arxiv.org/abs/2004.07780
- [14]
-
[15]
IEEE Transactions on Pattern Analysis and Machine Intelligence45(1), 87–110 (2023)
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y., Yang, Z., Zhang, Y., Tao, D.: A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence45(1), 87–110 (2023). https://doi.org/10.1109/TPAMI.2022.3152247 16 Chew and Wang
-
[16]
He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016),https://openaccess.thecvf.com/content_cvpr_ 2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
work page 2016
-
[17]
Heinze-Deml, C., Peters, J., Meinshausen, N.: Invariant causal prediction for non- linear models (2018)
work page 2018
-
[18]
In: International Conference on Learning Repre- sentations (ICLR) (2017)
Higgins, I., Matthey, L., Pal, A., Burgess, C.P., Glorot, X., Botvinick, M., Mo- hamed, S., Lerchner, A.: Beta-vae: Learning basic visual concepts with a con- strained variational framework. In: International Conference on Learning Repre- sentations (ICLR) (2017)
work page 2017
-
[19]
Higgins, I., Sonnerat, N., Matthey, L., Pal, A., Burgess, C.P., Bosnjak, M., Shana- han, M., Botvinick, M., Hassabis, D., Lerchner, A.: Scan: Learning hierarchical compositional visual concepts (2018),https://arxiv.org/abs/1707.03389
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018), https://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze- and- Excitation_Networks_CVPR_2018_paper.html
work page 2018
-
[21]
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR) (2017),https://openaccess.thecvf.com/ content _ cvpr _ 2017 / html / Huang _ Densely _ Connected _ Convolutional _ CVPR _ 2017_paper.html
work page 2017
- [22]
-
[23]
Idrissi, B., Arjovsky, M., Pezeshki, M., Lopez-Paz, D.: Simple data balancing achieves competitive worst-group-accuracy. In: International Conference on Ar- tificial Intelligence and Statistics (AISTATS) (2022),https://proceedings.mlr. press/v177/idrissi22a.html
work page 2022
-
[24]
Jetley, S., Lord, N.A., Lee, N., Torr, P.H.S.: Learn to pay attention. In: In- ternational Conference on Learning Representations (ICLR) (2018),https:// openreview.net/forum?id=HkG3SJZ1D
work page 2018
-
[25]
Joshi, S., Yang, Y., Xue, Y., Yang, W., Mirzasoleiman, B.: Towards mitigating spurious correlations in the wild: A benchmark and a more realistic dataset (2023)
work page 2023
-
[26]
Kim, H., Mnih, A.: Disentangling by factorising (2019),https://arxiv.org/abs/ 1802.05983
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[27]
Kim, M., Wang, Y., Sahu, P., Pavlovic, V.: Relevance factor vae: Learning and identifying disentangled factors (2019),https://arxiv.org/abs/1902.01568
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[28]
In: International Conference on Learning Representations (ICLR)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR). San Diego, CA, USA (2015)
work page 2015
-
[29]
Kingma,D.P.,Welling,M.:Auto-encodingvariationalbayes.In:InternationalCon- ference on Learning Representations (ICLR) (2014),https://arxiv.org/abs/ 1312.6114
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[30]
Kirichenko, P., Izmailov, P., Wilson, A.G.: Last layer re-training is sufficient for robustness to spurious correlations. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=Zb6c8A- Fghk
work page 2023
-
[31]
In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S
Kong, L., Xie, S., Yao, W., Zheng, Y., Chen, G., Stojanov, P., Akinwande, V., Zhang, K.: Partial disentanglement for domain adaptation. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Deep Attention Reweighting 17 Learning Rese...
work page 2022
-
[32]
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Tech. Rep. 0, University of Toronto, Toronto, Ontario (2009),https://www.cs. toronto.edu/~kriz/learning-features-2009-TR.pdf
work page 2009
-
[33]
Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled la- tent concepts from unlabeled observations (2018),https://arxiv.org/abs/1711. 00848
work page 2018
-
[34]
In: Oh, A., Neumann, T., Glober- son, A., Saenko, K., Hardt, M., Levine, S
LaBonte, T., Muthukumar, V., Kumar, A.: Towards last-layer retraining for group robustness with fewer annotations. In: Oh, A., Neumann, T., Glober- son, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Infor- mation Processing Systems. vol. 36, pp. 11552–11579. Curran Associates, Inc. (2023),https : / / proceedings . neurips . cc / paper _ ...
work page 2023
-
[35]
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people (2016)
work page 2016
-
[36]
Lee, S., Cho, S., Im, S.: Dranet: Disentangling representation and adaptation net- worksforunsupervisedcross-domainadaptation.In:ProceedingsoftheIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15252– 15261 (June 2021)
work page 2021
-
[37]
Lee, Y., Yao, H., Finn, C.: Diversify and disambiguate: Learning from underspec- ified data (2023)
work page 2023
- [38]
- [39]
-
[40]
In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J
Liang, W., Mao, Y., Kwon, Y., Yang, X., Zou, J.: Accuracy on the curve: On the nonlinear correlation of ML performance between data subpopulations. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceed- ings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 20...
work page 2023
-
[41]
Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (ICLR) (2014),https://openreview.net/forum?id= ylE6yojDR5yqX
work page 2014
-
[42]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Lin, Y., Dong, H., Wang, H., Zhang, T.: Bayesian invariant risk minimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16021–16030 (June 2022)
work page 2022
-
[43]
In: International Conference on Machine Learning (ICML) (2021)
Liu, S., Beery, S., Teney, D., Liu, S., van den Hengel, A., Gould, S.: Just train twice: Improving group robustness without training group information. In: International Conference on Machine Learning (ICML) (2021)
work page 2021
-
[44]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021),https : / / openaccess . thecvf . com / content / ICCV2021 / html / Liu _ Swin _ Transformer _ Hierarchical _ Vision _ Tr...
work page 2021
-
[45]
Deep Learning Face Attributes in the Wild
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015),https://arxiv.org/abs/1411.7766 18 Chew and Wang
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [46]
-
[47]
Lopez-Paz, D.: From dependence to causation (2016)
work page 2016
-
[48]
Lynch, A., Dovonon, G.J.S., Kaddour, J., Silva, R.: Spawrious: A benchmark for fine control of spurious correlation biases (2023)
work page 2023
-
[49]
Marcus, G.: Deep learning: A critical appraisal (2018)
work page 2018
-
[50]
Mathieu, E., Rainforth, T., Siddharth, N., Teh, Y.W.: Disentangling disentangle- ment in variational autoencoders (2019),https://arxiv.org/abs/1812.02833
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[51]
Tesseract: A search-based decoder for quantum error correction.arXiv preprint arXiv:2503.10988, 2025
Nagarajan, V., Andreassen, A., Neyshabur, B.: Understanding the failure modes of out-of-distribution generalization (2020).https://doi.org/10.48550/ARXIV. 2010.15775,https://arxiv.org/abs/2010.15775
work page internal anchor Pith review doi:10.48550/arxiv 2020
-
[52]
Nam,J.,Cha,H.,Ahn,S.,Lee,J.,Shin,J.:Learningfromfailure:Trainingdebiased classifier from biased classifier (2020)
work page 2020
-
[53]
Pagliardini, M., Jaggi, M., Fleuret, F., Karimireddy, S.P.: Agree to disagree: Di- versity through disagreement for better transferability (2022)
work page 2022
-
[54]
Pearl, J.: The do-calculus revisited (2012),https://arxiv.org/abs/1210.4852
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[55]
Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference using invariant pre- diction: identification and confidence intervals (2015)
work page 2015
-
[56]
Pezeshki, M., Kaba, S., Bengio, Y., Courville, A.C., Precup, D., Lajoie, G.: Gradi- ent starvation: A learning proclivity in neural networks. CoRRabs/2011.09468 (2020),https://arxiv.org/abs/2011.09468
-
[57]
In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J
Qiu, S., Potapczynski, A., Izmailov, P., Wilson, A.G.: Simple and fast group ro- bustness by automatic feature reweighting. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th Inter- national Conference on Machine Learning. Proceedings of Machine Learning Re- search, vol. 202, pp. 28448–28467. PM...
work page 2023
-
[58]
Sagawa*, S., Koh*, P.W., Hashimoto, T.B., Liang, P.: Distributionally robust neu- ral networks. In: International Conference on Learning Representations (2020), https://openreview.net/forum?id=ryxGuJrFvS
work page 2020
-
[59]
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR) (2018),https: //openaccess.thecvf.com/content_cvpr_2018/html/Sandler_MobileNetV2_ Inverted_Residuals_CVPR_2018_paper.html
work page 2018
-
[60]
doi:10.1007/s11263-019-01228-7
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision128(2), 336–359 (oct 2019).https: //doi.org/10.1007/s11263-019-01228-7,https://doi.org/10.1007/s11263- 019-01228-7
-
[61]
In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H
Shah, H., Tamuly, K., Raghunathan, A., Jain, P., Netrapalli, P.: The pitfalls of simplicity bias in neural networks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 9573–9585. Curran Associates, Inc. (2020),https://proceedings. neurips.cc/paper/2020/file/6cfe0e6127fa2...
work page 2020
-
[62]
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) Deep Attention Reweighting 19
work page 2015
-
[63]
Taghanaki, S.A., Khani, A., Khani, F., Gholami, A., Tran, L., Mahdavi-Amiri, A., Hamarneh, G.: Masktune: Mitigating spurious correlations by forcing to explore (2022)
work page 2022
-
[64]
Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neu- ral networks. In: International Conference on Machine Learning (ICML) (2019), https://proceedings.mlr.press/v97/tan19a.html
work page 2019
-
[65]
In: Interna- tional Conference on Machine Learning (ICML) (2021),https://proceedings
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., J’egou, H.: Train- ing data-efficient image transformers & distillation through attention. In: Interna- tional Conference on Machine Learning (ICML) (2021),https://proceedings. mlr.press/v139/touvron21a.html
work page 2021
- [66]
-
[67]
In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017),https://proceedings.neurips....
work page 2017
- [68]
- [69]
-
[70]
In: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR). pp. 7794–7803 (2018),https://openaccess.thecvf.com/content_ cvpr_2018/html/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.html
work page 2018
-
[71]
Wang, X., Chen, H., Tang, S., Wu, Z., Zhu, W.: Disentangled representation learn- ing. IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 9677–9696 (2024).https://doi.org/10.1109/TPAMI.2024.3420937
-
[72]
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-ucsd birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology (2010)
work page 2010
-
[73]
J., 2022, in Bambi C., Santangelo A., eds, , Handbook of X-ray and Gamma-ray Astrophysics
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block atten- tion module. In: Computer Vision – ECCV 2018. pp. 3–19 (2018).https:// doi.org/10.1007/978- 3- 030- 01234- 2_1,https://openaccess.thecvf.com/ content _ ECCV _ 2018 / html / Sanghyun _ Woo _ Convolutional _ Block _ Attention _ Module_ECCV_2018_paper.html
- [74]
- [75]
-
[76]
In: Proceedings of the 40th International Conference on Machine Learning
Yang, Y., Zhang, H., Katabi, D., Ghassemi, M.: Change is hard: a closer look at subpopulation shift. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, JMLR.org (2023)
work page 2023
-
[77]
Ye, W., Zheng, G., Cao, X., Ma, Y., Hu, X., Zhang, A.: Spurious correlations in machine learning: A survey (2024) 20 Chew and Wang
work page 2024
- [78]
-
[79]
In: 2024 7th International Conference on Artificial Intelli- gence and Big Data (ICAIBD)
Yue, D., Zou, J., Jin, X., Leng, T.: Causal inference for confounder-purify vi- sion transformers. In: 2024 7th International Conference on Artificial Intelli- gence and Big Data (ICAIBD). pp. 530–537 (2024).https://doi.org/10.1109/ ICAIBD62003.2024.10604648
-
[80]
In: International Conference on Machine Learning (ICML) (2022),https : / / proceedings
Zhang, M., Jia, R., Misra, D.: Correct-n-contrast: A contrastive approach for improving robustness to spurious correlations. In: International Conference on Machine Learning (ICML) (2022),https : / / proceedings . mlr . press / v162 / zhang22c.html
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.