On the Properties of Feature Attribution for Supervised Contrastive Learning
Pith reviewed 2026-05-08 12:10 UTC · model grok-4.3
The pith
Neural networks trained with supervised contrastive learning yield feature attributions that are more faithful, less complex, and more continuous than those from standard contrastive learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Neural networks for image classification trained with supervised contrastive learning produce feature attribution explanations that outperform those from models trained with contrastive learning on the metrics of faithfulness, complexity, and continuity, as shown through direct empirical comparison on image datasets.
What carries the argument
Quantitative metrics of faithfulness, complexity, and continuity applied to feature attributions extracted from networks trained under supervised versus unsupervised contrastive objectives.
Load-bearing premise
The chosen metrics of faithfulness, complexity, and continuity together with the attribution methods used are adequate to judge overall explanation quality without being swayed by differences in model capacity or training details.
What would settle it
Repeating the experiments on the same architectures and datasets but with matched hyperparameters and random seeds across training objectives, then finding that the ranking on faithfulness, complexity, or continuity reverses or disappears.
Figures
read the original abstract
Most Neural Networks (NNs) for classification are trained using Cross-Entropy as a loss function. This approach requires the model to have an explicit classification layer. However, there exist alternative approaches, such as Contrastive Learning (CL). Instead of explicitly operating a classification, CL has the NN produce an embedding space where projections of similar data are pulled together, while projections of dissimilar data are pushed apart. In the case of Supervised CL (SCL), labels are adopted as similarity criteria, thus creating an embedding space where the projected data points are well-clustered. SCL provides crucial advantages over CE with regard to adversarial robustness and out-of-distribution detection, thus making it a more natural choice in safety-critical scenarios. In the present paper, we empirically show that NNs for image classification trained with SCL present higher-quality feature attribution explanations than CL with regard to faithfulness, complexity, and continuity. These results reinforce previous findings about CL-based approaches when targeting more trustworthy and transparent NNs and can guide practitioners in the selection of training objectives targeting not only accuracy, but also transparency of the models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript empirically compares feature attribution quality for image classification neural networks trained via Supervised Contrastive Learning (SCL) versus standard (unsupervised) Contrastive Learning (CL). It claims that SCL yields higher-quality attributions on three metrics—faithfulness, complexity, and continuity—positioning SCL as preferable for transparent models in addition to its known benefits in adversarial robustness and out-of-distribution detection.
Significance. If the central empirical comparison is placed on a sound footing, the result would usefully extend the literature on how contrastive objectives affect downstream explainability. It offers practitioners concrete guidance when accuracy alone is insufficient and could inform training choices in safety-critical domains. The work correctly situates the question within the broader advantages already established for SCL.
major comments (2)
- [§4] §4 (Experimental Setup): the SCL versus CL comparison does not report or enforce matched final classification accuracy, embedding dimensionality, batch size, temperature schedule, or optimizer hyperparameters. Because attribution metrics can be sensitive to these factors, the observed gaps in faithfulness, complexity, and continuity cannot be attributed to the presence of label supervision in the contrastive loss. This is load-bearing for the central claim.
- [§5] §5 (Results and Tables): no statistical significance tests, confidence intervals, or ablation on architecture capacity are provided for the metric differences. Without these, it is impossible to assess whether the reported superiority of SCL is robust or could be explained by uncontrolled variation in model properties.
minor comments (2)
- [Abstract] Abstract: the datasets, architectures, and exact feature attribution methods (e.g., which saliency technique) should be named explicitly so readers can immediately gauge the scope of the empirical result.
- [§3] §3 (Metrics): the precise definitions or implementations of the faithfulness, complexity, and continuity scores should be restated or referenced to a standard source to avoid ambiguity in replication.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments help clarify the experimental controls needed to strengthen our central claim that supervised contrastive learning produces higher-quality feature attributions than unsupervised contrastive learning. We address each major comment below and will incorporate the suggested changes in the revised manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Setup): the SCL versus CL comparison does not report or enforce matched final classification accuracy, embedding dimensionality, batch size, temperature schedule, or optimizer hyperparameters. Because attribution metrics can be sensitive to these factors, the observed gaps in faithfulness, complexity, and continuity cannot be attributed to the presence of label supervision in the contrastive loss. This is load-bearing for the central claim.
Authors: We appreciate the referee's emphasis on isolating the effect of label supervision. In the original experiments we adopted the standard hyperparameter configurations reported in the SupCon and SimCLR papers for each respective method. To address the concern directly, the revised manuscript will include an explicit table of all training hyperparameters (embedding dimension, batch size, temperature, optimizer, learning-rate schedule, and number of epochs) for both SCL and CL. In addition, we will run controlled experiments in which training duration or learning rate is adjusted so that final test accuracy is matched between the two objectives, and we will report the three attribution metrics under these matched-accuracy conditions. This will allow readers to attribute any remaining differences more confidently to the presence of label supervision. revision: yes
-
Referee: [§5] §5 (Results and Tables): no statistical significance tests, confidence intervals, or ablation on architecture capacity are provided for the metric differences. Without these, it is impossible to assess whether the reported superiority of SCL is robust or could be explained by uncontrolled variation in model properties.
Authors: We agree that statistical rigor and capacity ablations are necessary. The revised version will report results averaged over five independent random seeds, together with standard deviations and 95 % confidence intervals for every metric. We will also add paired t-tests (or Wilcoxon signed-rank tests where normality assumptions are violated) to establish statistical significance of the observed differences. Finally, we will include a new ablation subsection that repeats the full evaluation pipeline on ResNet-18, ResNet-34, and ResNet-50 backbones, confirming that the advantages of SCL persist across model capacities. revision: yes
Circularity Check
No circularity: purely empirical comparison of attribution metrics
full rationale
The paper reports experimental results comparing feature attribution faithfulness, complexity, and continuity for image classifiers trained under Supervised Contrastive Learning versus standard Contrastive Learning. No mathematical derivation, first-principles prediction, parameter fitting, or uniqueness theorem is claimed. The central claim rests on direct metric evaluation across trained models rather than any self-referential reduction or load-bearing self-citation. This matches the default expectation for non-circular empirical work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Advances in neural information processing systems31 (2018)
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. Advances in neural information processing systems31 (2018)
work page 2018
-
[2]
In: International conference on machine learning
van Amersfoort, J., Smith, L., Teh, Y.W., Gal, Y.: Uncertainty estimation using a single deep deterministic neural network. In: International conference on machine learning. pp. 9690–9700. PMLR (2020)
work page 2020
-
[3]
Information Fusion77, 261–295 (2022)
Anders, C.J., Weber, L., Neumann, D., Samek, W., Müller, K.R., Lapuschkin, S.: Finding and removing Clever Hans: Using explanation methods to debug and improve deep models. Information Fusion77, 261–295 (2022)
work page 2022
-
[4]
Arrighi,L.,BarbonJunior,S.,Pellegrino,F.A.,Simonato,M.,Zullich,M.:Explain- able Automated Anomaly Recognition in Failure Analysis: is Deep Learning Doing it Correctly? In: Explainable Artificial Intelligence. pp. 420–432. Communications in Computer and Information Science (2023)
work page 2023
-
[5]
In: Image Analysis and Processing - ICIAP 2025 Workshops
Arrighi, L., de Moraes, I.A., Simonato, M., Barbon Junior, S.: Discriminating Short-Term Moisture Changes in Stuffed Pasta Using Deep Computer Vision. In: Image Analysis and Processing - ICIAP 2025 Workshops. pp. 489–496. Springer Nature Switzerland (2026)
work page 2025
-
[6]
PloS one10(7), e0130140 (2015)
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one10(7), e0130140 (2015)
work page 2015
-
[7]
Evaluating and aggre- gating feature-based model explanations.arXiv preprint arXiv:2005.00631, 2020
Bhatt, U., Weller, A., Moura, J.M.: Evaluating and aggregating feature-based model explanations. arXiv preprint arXiv:2005.00631 (2020)
-
[8]
In: International Conference on Ma- chine Learning
Chalasani, P., Chen, J., Chowdhury, A.R., Wu, X., Jha, S.: Concise explanations of neural networks using adversarial training. In: International Conference on Ma- chine Learning. pp. 1383–1391. PMLR (2020)
work page 2020
-
[9]
In: 2018 IEEE winter conference on applications of computer vision (WACV)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad- cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE winter conference on applications of computer vision (WACV). pp. 839–847. IEEE (2018)
work page 2018
-
[10]
In: International Conference on Machine Learning (ICML)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International Conference on Machine Learning (ICML). pp. 1597–1607. PMLR (2020) XAI in Contrastive Learning 19
work page 2020
-
[11]
In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W
Chen, T., Luo, C., Li, L.: Intriguing properties of contrastive losses. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems. vol. 34, pp. 11834–11845. Curran Asso- ciates, Inc. (2021)
work page 2021
-
[12]
In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). vol. 1, pp. 539–546. IEEE (2005)
work page 2005
-
[13]
XAI in Action: Past, Present, and Future Applications (2023)
Deck, L., Schoeffer, J., De-Arteaga, M., Kuehl, N.: A critical survey on fairness benefits of XAI. XAI in Action: Past, Present, and Future Applications (2023)
work page 2023
-
[14]
In: 2009 IEEE conference on computer vision and pattern recognition
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
work page 2009
-
[15]
Fan, L., Liu, S., Chen, P.Y., Zhang, G., Gan, C.: When does contrastive learning preserve adversarial robustness from pretraining to finetuning? Advances in neural information processing systems34, 21480–21492 (2021)
work page 2021
-
[16]
Gao, S., Li, Z.Y., Yang, M.H., Cheng, M.M., Han, J., Torr, P.: Large-scale Unsu- pervised Semantic Segmentation (2022)
work page 2022
-
[17]
In: Proceedings of the AAAI conference on artificial intelligence
Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33, pp. 3681–3688 (2019)
work page 2019
-
[18]
https://github.com/ jacobgil/pytorch-grad-cam (2021)
Gildenblat, J., contributors: Pytorch library for cam methods. https://github.com/ jacobgil/pytorch-grad-cam (2021)
work page 2021
-
[19]
In: International Conference on Machine Learning
Graf, F., Hofer, C., Niethammer, M., Kwitt, R.: Dissecting supervised contrastive learning. In: International Conference on Machine Learning. pp. 3821–3830. PMLR (2021)
work page 2021
-
[20]
Journal of machine learning research3(Mar), 1157–1182 (2003)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of machine learning research3(Mar), 1157–1182 (2003)
work page 2003
-
[21]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9729–9738 (2020)
work page 2020
-
[22]
He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
work page 2016
-
[23]
In: Signal and Information Processing, Net- working and Computers
He, M., Li, B., Sun, S.: A Survey of Class Activation Mapping for the Interpretabil- ity of Convolution Neural Networks. In: Signal and Information Processing, Net- working and Computers. pp. 399–407. Springer Nature (2023)
work page 2023
-
[24]
Journal of Machine Learning Research24(34), 1–11 (2023)
Hedström, A., Weber, L., Krakowczyk, D., Bareeva, D., Motzkus, F., Samek, W., Lapuschkin, S., Höhne, M.M.C.: Quantus: An explainable ai toolkit for respon- sible evaluation of neural network explanations and beyond. Journal of Machine Learning Research24(34), 1–11 (2023)
work page 2023
-
[25]
In Defense of the Triplet Loss for Person Re-Identification
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re- identification. arXiv preprint arXiv:1703.07737 (2017)
work page Pith review arXiv 2017
-
[26]
Advances in neural information processing systems32(2019)
Hooker, S., Erhan, D., Kindermans, P.J., Kim, B.: A benchmark for interpretabil- ity methods in deep neural networks. Advances in neural information processing systems32(2019)
work page 2019
-
[27]
Jacovi, A., Goldberg, Y.: Towards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 4198–4205 (2020)
work page 2020
-
[28]
arXiv preprint arXiv:2506.09810 (2025) 20 Arrighi et al
Jeong, M., Hero, A.: Generalizing supervised contrastive learning: A projection perspective. arXiv preprint arXiv:2506.09810 (2025) 20 Arrighi et al
-
[29]
In: Proceedings of the 2020 CHI conference on human factors in computing systems
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., Wortman Vaughan, J.: Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In: Proceedings of the 2020 CHI conference on human factors in computing systems. pp. 1–14 (2020)
work page 2020
-
[30]
Advances in neural information processing systems33, 18661–18673 (2020)
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. Advances in neural information processing systems33, 18661–18673 (2020)
work page 2020
-
[31]
In: 2020 international joint conference on neural networks (IJCNN)
Kohlbrenner, M., Bauer, A., Nakajima, S., Binder, A., Samek, W., Lapuschkin, S.: Towards best practice in explaining neural network decisions with lrp. In: 2020 international joint conference on neural networks (IJCNN). pp. 1–7. IEEE (2020)
work page 2020
-
[32]
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
work page 2009
-
[33]
Information Fusion106, 102301 (2024)
Longo, L., Brcic, M., Cabitza, F., Choi, J., Confalonieri, R., Del Ser, J., Guidotti, R., Hayashi, Y., Herrera, F., Holzinger, A., et al.: Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Information Fusion106, 102301 (2024)
work page 2024
-
[34]
Multimedia Tools and Applications81(28), 41059–41077 (2022)
Lopes, J.F., da Costa, V.G.T., Barbin, D.F., Cruz-Tirado, L.J.P., Baeten, V., Bar- bon Junior, S.: Deep computer vision system for cocoa classification. Multimedia Tools and Applications81(28), 41059–41077 (2022)
work page 2022
-
[35]
Advances in neural information processing systems30(2017)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. Advances in neural information processing systems30(2017)
work page 2017
-
[36]
In: 2020 international joint conference on neural networks (IJCNN)
Muhammad, M.B., Yeasin, M.: Eigen-cam: Class activation map using principal components. In: 2020 international joint conference on neural networks (IJCNN). pp. 1–7. IEEE (2020)
work page 2020
-
[37]
ACM Computing Surveys55(13s), 1–42 (2023)
Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., Schlötterer, J., van Keulen, M., Seifert, C.: From anecdotal evidence to quantitative evalua- tion methods: A systematic review on evaluating explainable ai. ACM Computing Surveys55(13s), 1–42 (2023)
work page 2023
-
[38]
In: Proceedings of the IEEE con- ference on computer vision and pattern recognition
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE con- ference on computer vision and pattern recognition. pp. 427–436 (2015)
work page 2015
-
[39]
Advances in neural information process- ing systems32(2019)
Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J., Lak- shminarayanan, B., Snoek, J.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in neural information process- ing systems32(2019)
work page 2019
-
[40]
In: 2023 IEEE International Symposium on Information Theory (ISIT)
Paes, L.M., Cruz, R., Calmon, F.P., Diaz, M.: On the inevitability of the rashomon effect. In: 2023 IEEE International Symposium on Information Theory (ISIT). pp. 549–554. IEEE (2023)
work page 2023
-
[41]
Proceedings of the National Academy of Sciences117(40), 24652–24663 (2020)
Papyan, V., Han, X., Donoho, D.L.: Prevalence of neural collapse during the ter- minal phase of deep learning training. Proceedings of the National Academy of Sciences117(40), 24652–24663 (2020)
work page 2020
-
[42]
Pattern Recognition131, 108889 (2022)
Qian, Z., Huang, K., Wang, Q.F., Zhang, X.Y.: A survey of robust adversarial training in pattern recognition: Fundamental, theory, and methodologies. Pattern Recognition131, 108889 (2022)
work page 2022
-
[43]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Rebuffi, S.A., Fong, R., Ji, X., Vedaldi, A.: There and back again: Revisiting back- propagation saliency methods. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8839–8848 (2020)
work page 2020
-
[44]
Model-Agnostic Interpretability of Machine Learning
Ribeiro, M.T., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016)
work page Pith review arXiv 2016
-
[45]
Applied Sciences14(19), 8884 (2024) XAI in Contrastive Learning 21
Saarela, M., Podgorelec, V.: Recent applications of Explainable AI (XAI): A sys- tematic literature review. Applied Sciences14(19), 8884 (2024) XAI in Contrastive Learning 21
work page 2024
-
[46]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 815–823 (2015)
work page 2015
-
[47]
International Journal of Computer Vision128(2), 336–359 (2020)
Selvaraju, R.R., Cogswell, M., Abhishek, D., Ramakrishna, V., Devi, P., Dhruv, B.: Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Lo- calization. International Journal of Computer Vision128(2), 336–359 (2020)
work page 2020
-
[48]
In: International conference on machine learn- ing
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: International conference on machine learn- ing. pp. 3145–3153. PMlR (2017)
work page 2017
-
[49]
In: International conference on machine learning
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International conference on machine learning. pp. 3319–3328. PMLR (2017)
work page 2017
-
[50]
Intriguing properties of neural networks
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fer- gus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
work page internal anchor Pith review arXiv 2013
-
[51]
Tang, D., Chen, J., Ren, L., Wang, X., Li, D., Zhang, H.: Reviewing CAM-Based Deep Explainable Methods in Healthcare. Applied Sciences14(10) (2024)
work page 2024
-
[52]
Advances in Neural Information Processing Systems36, 74952–74965 (2023)
Turpin, M., Michael, J., Perez, E., Bowman, S.: Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems36, 74952–74965 (2023)
work page 2023
-
[53]
van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: scikit-image: image processing in python. PeerJ 2, e453 (2014)
work page 2014
-
[54]
In: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition workshops
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., Hu, X.: Score-cam: Score-weighted visual explanations for convolutional neural net- works. In: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition workshops. pp. 24–25 (2020)
work page 2020
-
[55]
Variational supervised contrastive learning
Wang, Z., Fan, J., Nguyen, T., Ji, H., Liu, G.: Variational supervised contrastive learning. arXiv preprint arXiv:2506.07413 (2025)
-
[56]
Wilfling, J., Valdenegro-Toro, M., Zullich, M.: Evaluating the Quality of Saliency Maps for Distilled Convolutional Neural Networks. In: 32nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2024)
work page 2024
-
[57]
PLoS medicine15(11), e1002683 (2018)
Zech, J.R., Badgeley, M.A., Liu, M., Costa, A.B., Titano, J.J., Oermann, E.K.: Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS medicine15(11), e1002683 (2018)
work page 2018
-
[58]
International Journal of Computer Vision 126(10), 1084–1102 (2018)
Zhang, J., Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neu- ral attention by excitation backprop. International Journal of Computer Vision 126(10), 1084–1102 (2018)
work page 2018
-
[59]
arXiv preprint arXiv:2111.14271 (2021)
Zhang, Z., Jang, J., Trabelsi, C., Li, R., Sanner, S., Jeong, Y., Shim, D.: Excon: Explanation-driven supervised contrastive learning for image classification. arXiv preprint arXiv:2111.14271 (2021)
-
[60]
In: The Thirteenth International Conference on Learning Representations (2025)
Zheng,X.,Shirani,F.,Chen,Z.,Lin,C.,Cheng,W.,Guo,W.,Luo,D.:F-fidelity:A robust framework for faithfulness evaluation of explainable AI. In: The Thirteenth International Conference on Learning Representations (2025)
work page 2025
-
[61]
In: Proceedings of the 2021 Conference on Empirical Meth- ods in Natural Language Processing
Zhou, W., Liu, F., Chen, M.: Contrastive Out-of-Distribution Detection for Pre- trained Transformers. In: Proceedings of the 2021 Conference on Empirical Meth- ods in Natural Language Processing. pp. 1100–1111 (2021)
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.