arxiv: 2603.08639 · v2 · submitted 2026-03-09 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

UNBOX: Unveiling Black-box visual models with Natural-language

Simone Carnemolla , Chiara Russo , Simone Palazzo , Quentin Bouniot , Daniela Giordano , Zeynep Akata , Matteo Pennisi , Concetto Spampinato

Authors on Pith no claims yet

Pith reviewed 2026-05-15 14:23 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords black-box interpretabilitynatural language explanationsactivation maximizationvision modelsmodel auditingdiffusion modelslarge language models

0 comments

The pith

Black-box vision models can be interpreted by finding natural-language concepts that maximize their class probabilities using only output scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UNBOX as a way to dissect proprietary visual recognition systems that expose only class probabilities. It recasts activation maximization as a semantic search: large language models propose candidate text descriptors while text-to-image diffusion models generate proxy visuals whose scores from the target model steer the search. The process requires no gradients, parameters, architecture details, or training data. If the approach holds, it makes auditing, bias detection, and failure analysis feasible for real-world API-deployed models where white-box methods cannot apply.

Core claim

UNBOX performs class-wise model dissection under fully data-free, gradient-free constraints by using large language models to generate text descriptors and text-to-image diffusion models to create visual proxies, with output probabilities serving as the sole optimization signal; the resulting descriptors reveal the concepts each class has implicitly learned, the training distribution reflected, and potential bias sources.

What carries the argument

Semantic search that couples LLM-generated descriptors with diffusion-model visual proxies, scored directly against black-box class probabilities to perform activation maximization.

If this is right

The descriptors expose the specific concepts the model has learned for each class.
Bias sources and training-distribution artifacts become visible through the recovered concepts.
Auditing and failure analysis become possible for models available only as black-box APIs.
Performance matches state-of-the-art white-box methods on ImageNet-1K, Waterbirds, and CelebA despite stricter constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Regulators could apply the technique to inspect commercial vision services without requiring internal access.
The same probability-driven semantic search could extend to auditing black-box models in other domains such as audio or tabular data.
Interactive querying of deployed models becomes feasible by testing user-provided natural-language concepts against output scores.

Load-bearing premise

Pre-trained language and diffusion models can reliably translate black-box output probabilities into the actual visual concepts that drive the model's decisions.

What would settle it

Generated text descriptors fail to produce high model scores on images that visually match those descriptors while low scores appear on mismatched images in controlled tests.

read the original abstract

Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision systems are increasingly deployed as proprietary black-box APIs, exposing only output probabilities and hiding architecture, parameters, gradients, and training data. This opacity prevents meaningful auditing, bias detection, and failure analysis. Existing explanation methods assume white- or gray-box access or knowledge of the training distribution, making them unusable in these real-world settings. We introduce UNBOX, a framework for class-wise model dissection under fully data-free, gradient-free, and backpropagation-free constraints. UNBOX leverages Large Language Models and text-to-image diffusion models to recast activation maximization as a purely semantic search driven by output probabilities. The method produces human-interpretable text descriptors that maximally activate each class, revealing the concepts a model has implicitly learned, the training distribution it reflects, and potential sources of bias. We evaluate UNBOX on ImageNet-1K, Waterbirds, and CelebA through semantic fidelity tests, visual-feature correlation analyses and slice-discovery auditing. Despite operating under the strictest black-box constraints, UNBOX performs competitively with state-of-the-art white-box interpretability methods. This demonstrates that meaningful insight into a model's internal reasoning can be recovered without any internal access, enabling more trustworthy and accountable visual recognition systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UNBOX recasts black-box activation maximization as LLM-guided semantic search with diffusion images, but the abstract's competitive-performance claim rests on missing numbers.

read the letter

The core idea is to treat the black-box model's scalar output probabilities as a signal for searching over text prompts generated by an LLM and images synthesized by a diffusion model. This produces class-wise text descriptors without gradients, training data, or model internals. That combination under strict black-box constraints is the main novelty; prior activation-max work either needed white-box access or stayed in pixel space without the semantic layer. The approach is practical for auditing proprietary APIs, which is a real gap in current interpretability tools. It also tries to surface potential biases through the recovered concepts on datasets like Waterbirds and CelebA. The framing is clean and the motivation is sound. The main weakness is that the abstract asserts competitive results against white-box methods on ImageNet-1K and the other benchmarks but supplies no metrics, baselines, or ablation tables. Without those numbers it is hard to judge whether the semantic concepts actually align with what the model uses or whether they are just correlated but non-causal. The stress-test point about missing low-level texture or edge statistics is worth checking in the full experiments; if the paper only validates at the semantic level it may overstate generality. The method depends on the quality of the external LLM and diffusion priors, which could introduce their own biases. This paper is aimed at interpretability researchers who care about real-world deployment constraints rather than pure theory. A reader working on black-box auditing would find the setup useful to think about, even if the current evidence is thin. It deserves peer review once the quantitative sections are in place, because the problem it targets matters and the direction is worth testing properly.

Referee Report

2 major / 1 minor

Summary. The paper introduces UNBOX, a framework that recasts activation maximization for black-box visual classifiers as a semantic search over LLM-generated text descriptors and diffusion-synthesized images, using only scalar output probabilities. It claims to produce human-interpretable concepts for ImageNet-1K, Waterbirds, and CelebA classes, revealing learned concepts, training biases, and distribution shifts, while performing competitively with white-box interpretability methods under fully data-free, gradient-free constraints.

Significance. If the quantitative results hold, UNBOX would represent a meaningful advance in auditing proprietary vision APIs without internal access, enabling bias detection and failure analysis in real-world deployments. However, the absence of metrics, baselines, or ablation details in the abstract undermines confidence in the central claim of competitive performance.

major comments (2)

[Abstract] Abstract: The claim that UNBOX 'performs competitively with state-of-the-art white-box interpretability methods' on ImageNet-1K, Waterbirds, and CelebA is unsupported by any quantitative metrics, baselines, ablation studies, or statistical comparisons, which is load-bearing for the central claim and prevents assessment of whether the semantic search actually recovers causal features.
[Abstract] Abstract and methods description: The assumption that LLM-generated descriptors and diffusion images reliably map to the visual features driving the target model's decisions lacks direct validation against white-box methods on the same models and images; this risks surfacing correlated but non-causal concepts (e.g., texture or color statistics without natural-language labels), as noted in the skeptic analysis.

minor comments (1)

[Abstract] The abstract references 'semantic fidelity tests, visual-feature correlation analyses and slice-discovery auditing' but provides no details on methodology or results; these should be expanded with concrete evaluation protocols.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important issues in how our claims are presented. We agree that the abstract should better support its assertions with quantitative details and that additional direct validation would strengthen the causal mapping argument. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that UNBOX 'performs competitively with state-of-the-art white-box interpretability methods' on ImageNet-1K, Waterbirds, and CelebA is unsupported by any quantitative metrics, baselines, ablation studies, or statistical comparisons, which is load-bearing for the central claim and prevents assessment of whether the semantic search actually recovers causal features.

Authors: We acknowledge that the abstract does not include specific quantitative metrics or explicit baseline comparisons, making the competitive performance claim difficult to evaluate directly from the abstract alone. The full manuscript reports results from semantic fidelity tests, visual-feature correlation analyses, and slice-discovery auditing across the three datasets, which demonstrate alignment with white-box methods. To address this, we will revise the abstract to include key quantitative indicators (such as fidelity scores and correlation values) and a brief reference to the baselines used, ensuring the claim is supported within the abstract itself. revision: yes
Referee: [Abstract] Abstract and methods description: The assumption that LLM-generated descriptors and diffusion images reliably map to the visual features driving the target model's decisions lacks direct validation against white-box methods on the same models and images; this risks surfacing correlated but non-causal concepts (e.g., texture or color statistics without natural-language labels), as noted in the skeptic analysis.

Authors: We agree this is a substantive concern regarding potential non-causal correlations. The manuscript already includes visual-feature correlation analyses and slice-discovery auditing to link recovered concepts to model decisions, but we recognize that a more explicit side-by-side comparison with white-box methods on identical images would provide stronger evidence against non-causal artifacts. We will add a dedicated validation subsection and accompanying figure that directly compares UNBOX concepts with white-box saliency outputs on the same inputs, quantifying regional overlap to better confirm that the semantic descriptors capture causally relevant features. revision: partial

Circularity Check

0 steps flagged

No circularity: method relies on external pre-trained models and empirical evaluation

full rationale

The UNBOX framework recasts activation maximization as semantic search over LLM-generated descriptors and diffusion-synthesized images, driven solely by scalar output probabilities from the target black-box model. No equations, parameter fittings, or derivations appear in the provided text that reduce the recovered concepts or performance claims to quantities defined by the target model itself. Evaluations on ImageNet-1K, Waterbirds, and CelebA are presented as independent semantic fidelity and correlation tests against white-box baselines, without self-citation chains, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation. The derivation chain is therefore self-contained against external benchmarks and does not collapse by construction to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method implicitly assumes that LLM and diffusion model outputs can serve as faithful proxies for visual concepts without further justification.

pith-pipeline@v0.9.0 · 5558 in / 995 out tokens · 36521 ms · 2026-05-15T14:23:46.948493+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

UNBOX leverages Large Language Models and text-to-image diffusion models to recast activation maximization as a purely semantic search driven by output probabilities.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The method produces human-interpretable text descriptors that maximally activate each class

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 4 internal anchors

[1]

In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pp

Park, D.H., Hendricks, L.A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T., Rohrbach, M.: Multimodal explanations: Justifying decisions and pointing to the evi- dence. In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pp. 8779–8788 (2018)

work page 2018
[2]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Sammani, F., Deligiannis, N.: Uni-nlx: Uni- fying textual explanations for vision and vision-language tasks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4634–4639 (2023)

work page 2023
[3]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Sammani, F., Mukherjee, T., Deligiannis, N.: Nlx-gpt: A model for natural language explanations in vision and vision-language 15 tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8322–8332 (2022)

work page 2022
[4]

In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp

Hendricks, L.A., Hu, R., Darrell, T., Akata, Z.: Grounding visual explanations. In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp. 264–279 (2018)

work page 2018
[5]

2019 ieee

Zellers, R., Bisk, Y., Farhadi, A., Choi, Y.: From recognition to cognition: Visual com- monsense reasoning. 2019 ieee. In: CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pp. 6713–6724 (2018)

work page 2019
[6]

Computer Vision and Image Understanding262, 104559 (2025) https://doi.org/10.1016/j.cviu.2025

Pennisi, M., Bellitto, G., Palazzo, S., Kava- sidis, I., Shah, M., Spampinato, C.: Diffex- plainer: Towards cross-modal global explana- tions with diffusion models. Computer Vision and Image Understanding262, 104559 (2025) https://doi.org/10.1016/j.cviu.2025. 104559

work page doi:10.1016/j.cviu.2025 2025
[7]

Advances in Neural Infor- mation Processing Systems (2025)

Carnemolla, S., Pennisi, M., Samarasinghe, S., Bellitto, G., Palazzo, S., Giordano, D., Shah, M., Spampinato, C.: Dexter: Diffusion- guided explanations with textual reasoning for vision models. Advances in Neural Infor- mation Processing Systems (2025)

work page 2025
[8]

In: The Eleventh International Confer- ence on Learning Representations (2023)

Oikarinen, T., Weng, T.-W.: CLIP-dissect: Automatic description of neuron rep- resentations in deep vision networks. In: The Eleventh International Confer- ence on Learning Representations (2023). https://openreview.net/forum?id=iPWiwWHc1V

work page 2023
[9]

TextGrad: Automatic "Differentiation" via Text

Yuksekgonul, M., Bianchi, F., Boen, J., Liu, S., Huang, Z., Guestrin, C., Zou, J.: Textgrad: Automatic" differentiation" via text. arXiv preprint arXiv:2406.07496 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). IEEE

work page 2009
[11]

Sagawa, S., Koh, P.W., Hashimoto, T.B., Liang, P.: Distributionally robust neural net- works.In:InternationalConferenceonLearn- ing Representations (2020)

work page 2020
[12]

Deep Learning Face Attributes in the Wild

Liu, Z., Luo, P., Wang, X., Tang, X.: Deep Learning Face Attributes in the Wild (2015). https://arxiv.org/abs/1411.7766

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

In: International Conference on Artificial General Intelligence, pp

Elton, D.C.: Self-explaining ai as an alter- native to interpretable ai. In: International Conference on Artificial General Intelligence, pp. 95–106 (2020). Springer

work page 2020
[14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Yang, Y., Panagopoulou, A., Zhou, S., Jin, D., Callison-Burch, C., Yatskar, M.: Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19187–19197 (2023)

work page 2023
[15]

In: Proceed- ings of the IEEE International Conference on Computer Vision, pp

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- cam: Visual explanations from deep networks via gradient-based localization. In: Proceed- ings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

work page 2017
[16]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Ahn, Y.H., Kim, H.B., Kim, S.T.: Www: a unified framework for explaining what where and why of neural networks by interpretation of neuron concepts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10968–10977 (2024)

work page 2024
[17]

Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualis- ing image classification models and saliency maps.arXivpreprintarXiv:1312.6034(2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[18]

In: International Conference on Machine Learn- ing, pp

Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learn- ing, pp. 3319–3328 (2017). PMLR

work page 2017
[19]

In: European Conference on Computer Vision, pp

Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833 (2014). Springer 16

work page 2014
[20]

Advances in neural information processing systems32(2019)

Srinivas, S., Fleuret, F.: Full-gradient rep- resentation for neural network visualization. Advances in neural information processing systems32(2019)

work page 2019
[21]

In: Proceedings of the IEEE Interna- tional Conference on Computer Vision, pp

Fong, R.C., Vedaldi, A.: Interpretable expla- nations of black boxes by meaningful pertur- bation. In: Proceedings of the IEEE Interna- tional Conference on Computer Vision, pp. 3429–3437 (2017)

work page 2017
[22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Wagner,J.,Kohler,J.M.,Gindele,T.,Hetzel, L., Wiedemer, J.T., Behnke, S.: Interpretable and fine-grained visual explanations for con- volutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9097– 9107 (2019)

work page 2019
[23]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Kim, Y., Mo, S., Kim, M., Lee, K., Lee, J., Shin, J.: Discovering and mitigating visual biases through keyword explanation. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11082–11092 (2024)

work page 2024
[24]

why should i trust you?

Ribeiro, M.T., Singh, S., Guestrin, C.: " why should i trust you?" explaining the predic- tions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

work page 2016
[25]

Advances in neural information processing systems30(2017)

Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. Advances in neural information processing systems30(2017)

work page 2017
[26]

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., Sayres, R.: Inter- pretability beyond feature attribution: Quan- titative testing with concept activation vec- tors. arxiv e-prints (nov. arXiv preprint stat.ML/1711.11279 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Advances in Neural Information Processing Systems35, 2590–2607 (2022)

Crabbé, J., Schaar, M.: Concept activa- tion regions: A generalized framework for concept-based explanations. Advances in Neural Information Processing Systems35, 2590–2607 (2022)

work page 2022
[28]

In: Proceedings of the European ConferenceonComputerVision(ECCV),pp

Zhou, B., Sun, Y., Bau, D., Torralba, A.: Interpretable basis decomposition for visual explanation. In: Proceedings of the European ConferenceonComputerVision(ECCV),pp. 119–134 (2018)

work page 2018
[29]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Fel, T., Picard, A., Bethune, L., Boissin, T., Vigouroux, D., Colin, J., Cadène, R., Serre, T.: Craft: Concept recursive activation fac- torization for explainability. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2711– 2721 (2023)

work page 2023
[30]

In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pp

Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: Quantify- ing interpretability of deep visual represen- tations. In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)

work page 2017
[31]

In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Fong, R., Vedaldi, A.: Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8730– 8738 (2018)

work page 2018
[32]

Machine Vision and Applications36(2), 33 (2025)

Gurkan, M.K., Arica, N., Yarman Vural, F.T.: A concept-aware explainability method for convolutional neural networks. Machine Vision and Applications36(2), 33 (2025)

work page 2025
[33]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Wang, A., Lee, W.-N., Qi, X.: Hint: Hier- archical neuron concept explainer. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10254–10264 (2022)

work page 2022
[34]

In: International Conference on Learning Representations (2021)

Hernandez, E., Schwettmann, S., Bau, D., Bagashvili, T., Torralba, A., Andreas, J.: Natural language descriptions of deep visual features. In: International Conference on Learning Representations (2021)

work page 2021
[35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Kim, S., Oh, J., Lee, S., Yu, S., Do, J., Taghavi,T.:Groundingcounterfactualexpla- nation of image classifiers to textual concept space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10942–10950 (2023)

work page 2023
[36]

In: ICLR 2024 Workshop on Reliable and Responsible Foundation Models (2024)

Asgari, S., Khani, A., Khasahmadi, A.H., 17 Sanghi, A., Willis, K.D., Amiri, A.M.: tex- plain: Post-hoc textual explanation of image classifiers with pre-trained language models. In: ICLR 2024 Workshop on Reliable and Responsible Foundation Models (2024)

work page 2024
[37]

arXiv preprint arXiv:2411.15605 (2024)

Zablocki, Ă., Gerard, V., Cardiel, A., Gaussier, E., Cord, M., Valle, E., et al.: Gift: A framework for global interpretable faith- ful textual explanations of vision classifiers. arXiv preprint arXiv:2411.15605 (2024)

work page arXiv 2024
[38]

In: Findings of the Association for Computational Linguistics: ACL 2025, pp

Ghosh, S., Syed, R., Wang, C., Choudhary, V., Li, B., Poynton, C.B., Visweswaran, S., Batmanghelich, K.: Ladder: Language-driven slicediscoveryanderrorrectificationinvision classifiers. In: Findings of the Association for Computational Linguistics: ACL 2025, pp. 22935–22970 (2025)

work page 2025
[39]

Advances in Neural Information Processing Systems33, 20673– 20684 (2020)

Nam, J., Cha, H., Ahn, S., Lee, J., Shin, J.: Learning from failure: De-biasing classi- fier from biased classifier. Advances in Neural Information Processing Systems33, 20673– 20684 (2020)

work page 2020
[40]

Advances in Neural Infor- mation Processing Systems33, 19339–19352 (2020)

Sohoni, N., Dunnmon, J., Angus, G., Gu, A., Ré, C.: No subclass left behind: Fine- grained robustness in coarse-grained classifi- cation problems. Advances in Neural Infor- mation Processing Systems33, 19339–19352 (2020)

work page 2020
[41]

In: International Conference on Machine Learning, pp

Liu, E.Z., Haghgoo, B., Chen, A.S., Raghu- nathan, A., Koh, P.W., Sagawa, S., Liang, P., Finn, C.: Just train twice: Improving group robustness without training group information. In: International Conference on Machine Learning, pp. 6781–6792 (2021). PMLR

work page 2021
[42]

arXiv preprint arXiv:2203.01517 (2022)

Zhang, M., Sohoni, N.S., Zhang, H.R., Finn, C., Ré, C.: Correct-n-contrast: A contrastive approach for improving robust- ness to spurious correlations. arXiv preprint arXiv:2203.01517 (2022)

work page arXiv 2022
[43]

Singla, S., Feizi, S.: Salient imagenet: How to discover spurious features in deep learning? arXiv preprint arXiv:2110.04301 (2021)

work page arXiv 2021
[44]

arXiv preprint arXiv:2501.19032 (2025)

Yu, H., Liu, J., Zou, H., Xu, R., He, Y., Zhang, X., Cui, P.: Error slice discovery via manifold compactness. arXiv preprint arXiv:2501.19032 (2025)

work page arXiv 2025
[45]

https://github.com/ black-forest-labs/flux (2024)

Labs, B.F.: FLUX. https://github.com/ black-forest-labs/flux (2024)

work page 2024
[46]

https://arxiv.org/abs/2508

OpenAI: gpt-oss-120b & gpt-oss-20b Model Card (2025). https://arxiv.org/abs/2508. 10925

work page 2025
[47]

Technical Report CNS-TR- 2011-001, California Institute of Technology (2011) 18

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S. Technical Report CNS-TR- 2011-001, California Institute of Technology (2011) 18

work page 2011