pith. sign in

arxiv: 2604.14477 · v1 · submitted 2026-04-15 · 💻 cs.AI

Seeing Through Circuits: Faithful Mechanistic Interpretability for Vision Transformers

Pith reviewed 2026-05-10 12:28 UTC · model grok-4.3

classification 💻 cs.AI
keywords mechanistic interpretabilityvision transformerscircuit discoverycomputational graphsCLIPtypographic attacksmodel steering
0
0 comments X

The pith

Vision transformers contain recoverable edge-based circuits that explain image classification and allow correction of attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether vision transformers have understandable internal wiring in the form of task-specific graphs made from edges between components. It introduces an automatic method to recover these circuits for particular image classes, for how models like CLIP respond to text overlays, and for redirecting outputs away from errors. A sympathetic reader would care because current vision models act as black boxes; if their routing can be mapped this way, it becomes possible to inspect, debug, and adjust specific computations instead of treating the whole network as opaque. The work shows that such edge circuits can be found and used in practice.

Core claim

We propose Automatic Visual Circuit Discovery (Vi-CD) and demonstrate that it recovers class-specific circuits for classification in vision transformers, circuits that underlie typographic attacks in CLIP, and circuits that can be steered to correct harmful model behavior. These edge-based graphs add transparency by showing how information is routed through the model rather than only which features are encoded.

What carries the argument

Automatic Visual Circuit Discovery (Vi-CD), a method that identifies computational graphs formed by edges connecting components inside vision transformers for specific tasks.

If this is right

  • Class-specific circuits can be used to trace exactly which connections the model relies on when recognizing a given object category.
  • Typographic attack circuits make visible the pathways through which overlaid text influences the output.
  • Steerable circuits provide targeted points for intervening to reduce unwanted or incorrect behaviors without retraining the entire model.
  • Edge-based circuits supply routing details that neuron-only analyses miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same discovery approach could be tested on other vision tasks such as detection or segmentation to see if similar structures appear.
  • Comparing recovered circuits across different vision transformer variants might reveal whether core routing patterns are shared.
  • These circuits open a route to model editing where only the relevant edges are modified to change behavior on narrow tasks.

Load-bearing premise

The circuits located by the method reflect the model's genuine internal computations rather than patterns created by the search procedure itself.

What would settle it

A test in which ablating or editing the edges of a discovered circuit leaves the model's classification accuracy or attack susceptibility unchanged would show that the circuit does not capture the actual reasoning.

Figures

Figures reproduced from arXiv: 2604.14477 by Bernt Schiele, Jonas Fischer, Nina \.Zukowska, Wolfgang Stammer.

Figure 1
Figure 1. Figure 1: Discovering Visual Mechanistic Circuits in computation graphs. Left: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Vi-CD: Circuit discovery in vision computation graphs. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transformer circuitry in a 2- layer toy transformer. Left: Red edges are simplified in Vi-CD for scalability: mul￾tiple attention-head receiver nodes are col￾lapsed into a single attention-input node. Right: Green edges correspond to input the sender node, Yellow edges correspond to MLPs sender nodes, and Purple edges correspond to attention heads sender nodes. We use the ForAug dataset [25], which provide… view at source ↗
Figure 4
Figure 4. Figure 4: Vi-CD finds 10x sparser circuits. We report accuracy of the circuit on the target class (↑ higher is better) as faithfulness and report different sparsity lev￾els for circuits as edges remaining (↓ lower is better) for different circuit extraction methods indicated by colors. We compare linear probe classification performance of (ViT-B)OpenCLIP and of a supervised ViT-B on Imagenet data. 4.2 Benchmarking C… view at source ↗
Figure 5
Figure 5. Figure 5: Vi-CD discovers circuits reflecting semantic similarity of classes. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Circuit-based steering prevents typographic attacks without harming [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Types of typographic corruptions. Left to right: Bezel, Multiple Small Texts, and Big Text on Image typographic corruptions. C Typographic Attacks: Steering using Faithful Circuits C.1 Overview We study activation steering as a defense against typographic corruptions by explicitly estimating and subtracting corruption-induced directions in representa￾tion space. Steering vectors are derived from faithful c… view at source ↗
Figure 8
Figure 8. Figure 8: Faceted results for the typographic object “Orange” with Big Text [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: RoCOCO steering as a function of steering strength [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Overlap of class circuits in CLIP. Each cell shows the Jaccard similarity between the sets of edges present in all runs (frequency = 1.0) for a pair of classes. Classes are ordered by hierarchical clustering. Dog breeds (bottom-right block) share substantially more core edges with each other than with semantically unrelated classes [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Circuit size and stability per class. Circuit sizes in CLIP and stability across Imagenette[17] classes. We report the average circuit size and the mean pairwise Jaccard similarity between circuits mined from repeated runs [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Stability of different network components. [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Circuit stability with #sam￾ples. Mean within-class pairwise Jaccard similarity between circuits mined from re￾peated runs, as a function of the number of datapoints used for circuit mining (log scale). Circuit stability increases consistently with dataset size. Effect of Dataset Size on Circuit Sta￾bility. We investigate how the num￾ber of datapoints used for circuit min￾ing affects the consistency of th… view at source ↗
Figure 14
Figure 14. Figure 14: Zero-shot classification performance using unions of class-specific circuits. Pairwise Classification Circuits. Circuit compositionality for classification. We evaluate circuit-based pairwise classification with CLIP, where logits are computed via dot product against the full ImageNet-1k text embedding matrix, as described in Sec. B. We explicitly mine binary circuits for each class pair using the target … view at source ↗
Figure 15
Figure 15. Figure 15: Circuit class specificity. For each class pair (A, B), edges in each binary circuit run are classified as: appearing in the union of all per-class circuits across runs for class A but not B (A only); appearing in the union for class B but not A (B only); appearing in the union for both classes (both A&B); or appearing in neither class-specific union (only in binary). The y-axis reports the mean edge count… view at source ↗
Figure 16
Figure 16. Figure 16: Ablations of selection criterion. For edge typographic circuits we report accuracy and achieved sparsity for each target class for different selection criteria and backbones. Green dotted line marks 70% sparsity, red dotted line marks 70% accuracy. Circuit edges. Steering along discovered circuit edges (Fig. 17a–b) yields a favor￾able trade-off between safety and utility. At low-to-moderate steering stren… view at source ↗
Figure 17
Figure 17. Figure 17: Ablations of typographic circuits up to a layer. [PITH_FULL_IMAGE:figures/full_fig_p037_17.png] view at source ↗
read the original abstract

Transparency of neural networks' internal reasoning is at the heart of interpretability research, adding to trust, safety, and understanding of these models. The field of mechanistic interpretability has recently focused on studying task-specific computational graphs, defined by connections (edges) between model components. Such edge-based circuits have been defined in the context of large language models, yet vision-based approaches so far only consider neuron-based circuits. These tell which information is encoded, but not how it is routed through the complex wiring of a neural network. In this work, we investigate whether useful mechanistic circuits can be identified through computational graphs in vision transformers. We propose an effective method for Automatic Visual Circuit Discovery (Vi-CD) that recovers class-specific circuits for classification, identifies circuits underlying typographic attacks in CLIP, and discovers circuits that lend themselves for steering to correct harmful model behavior. Overall, we find that insightful and actionable edge-based circuits can be recovered from vision transformers, adding transparency to the internal computations of these models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Vi-CD (Automatic Visual Circuit Discovery), a method to recover edge-based computational circuits in Vision Transformers. It claims to identify class-specific circuits for image classification, circuits underlying typographic attacks in CLIP, and steerable circuits that can correct harmful model behaviors, thereby extending mechanistic interpretability from language models to vision transformers.

Significance. If the recovered circuits prove mechanistically faithful, the work would meaningfully extend edge-based circuit analysis to ViTs, offering potential for greater transparency, safety interventions, and behavioral steering in vision and multimodal models.

major comments (2)
  1. Abstract: the abstract asserts success on classification, attack identification, and steering but supplies no quantitative results, validation metrics, baselines, or method details, leaving central claims unsupported in available text.
  2. The load-bearing claim that Vi-CD recovers causally faithful mechanistic pathways (rather than correlational patterns or discovery-heuristic artifacts) lacks concrete validation via interventions, ablations, or ground-truth comparisons; without these, the transparency benefit cannot be established.
minor comments (1)
  1. The title and abstract use 'faithful' without an explicit operational definition or set of verification criteria tailored to vision transformers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our paper extending edge-based circuit analysis to Vision Transformers. We address each major comment point by point below, with clarifications on our validation approach and revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: Abstract: the abstract asserts success on classification, attack identification, and steering but supplies no quantitative results, validation metrics, baselines, or method details, leaving central claims unsupported in available text.

    Authors: We agree that the abstract would be strengthened by including quantitative highlights. In the revised manuscript, we have updated the abstract to incorporate key metrics, including the fraction of model accuracy retained by recovered circuits (e.g., >90% on ImageNet subsets), steering success rates for typographic attack correction (e.g., 75% reduction in attack efficacy), and brief baseline comparisons to random and activation-based methods. Full methodological details and additional results remain in the main text and supplementary material. revision: yes

  2. Referee: The load-bearing claim that Vi-CD recovers causally faithful mechanistic pathways (rather than correlational patterns or discovery-heuristic artifacts) lacks concrete validation via interventions, ablations, or ground-truth comparisons; without these, the transparency benefit cannot be established.

    Authors: We appreciate this emphasis on causal validation. Our experiments already include intervention-based tests: we ablate and activate discovered edges to measure direct causal effects on model logits and outputs, showing that circuit interventions predictably alter classification decisions and mitigate typographic attacks in CLIP, while non-circuit edges do not. We have added further ablations in the revision, comparing Vi-CD circuits against random edge subsets and alternative heuristics (e.g., activation patching baselines), with results demonstrating superior causal faithfulness via metrics such as logit difference and behavioral change scores. Although ground-truth circuits are unavailable for these complex models, we use controlled proxy tasks and faithfulness quantification to distinguish mechanistic pathways from correlations. revision: partial

Circularity Check

0 steps flagged

No circularity; method applied to external model behaviors without self-referential reduction

full rationale

The paper introduces Vi-CD as a method to recover edge-based circuits in vision transformers and demonstrates its application to class-specific classification circuits, CLIP typographic attack circuits, and steerable circuits for correcting behavior. No derivation chain, equation, or self-citation reduces a claimed prediction or result to a fitted parameter or prior definition by construction. The central results consist of empirical recovery and validation on held-out model behaviors rather than tautological mappings. Any self-citations present are non-load-bearing for the core claims, which rest on the novel application and observed outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; method name Vi-CD and 'edge-based circuits' are introduced but not formalized here.

pith-pipeline@v0.9.0 · 5475 in / 1030 out tokens · 35855 ms · 2026-05-10T12:28:45.527363+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    arXiv preprint arXiv:2602.22968 (2026) 4

    Anani, A., Lorenz, T., Schiele, B., Fritz, M., Fischer, J.: Certified circuits: Stability guarantees for mechanistic circuits. arXiv preprint arXiv:2602.22968 (2026) 4

  2. [2]

    In: Advances in Neural Information Processing Systems (2024) 8

    Arditi, A., Obeso, O., Syed, A., Paleka, D., Panickssery, N., Gurnee, W., Nanda, N.: Refusal in language models is mediated by a single direction. In: Advances in Neural Information Processing Systems (2024) 8

  3. [3]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2025) 7

    Bader, J., Girrbach, L., Alaniz, S., Akata, Z.: SUB: Benchmarking CBM generalization via synthetic attribute substitutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2025) 7

  4. [4]

    Transactions on Machine Learning Research (2024) 2

    Bereska, L., Gavves, S.: Mechanistic interpretability for AI safety – a review. Transactions on Machine Learning Research (2024) 2

  5. [5]

    In: Advances in Neural Information Processing Systems

    Bhaskar, A., Wettig, A., Friedman, D., Chen, D.: Finding transformer circuits with edge pruning. In: Advances in Neural Information Processing Systems. pp. 18506–18534 (2024) 2, 3

  6. [6]

    Distill (2020)

    Cammarata, N., Carter, S., Goh, G., Olah, C., Petrov, M., Schubert, L., Voss, C., Egan, B., Lim, S.K.: Thread: Circuits. Distill (2020). https: //doi.org/10.23915/distill.000242, 3

  7. [7]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gor- don, C., Schuhmann, C., Schmidt, L., Jitsev, J.: Reproducible scaling laws for contrastive language-image learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2818–2829 (2023) 7, 9, 18, 21

  8. [8]

    In: Advances in Neural Information Processing Systems

    Conmy, A., Mavor-Parker, A., Lynch, A., Heimersheim, S., Garriga-Alonso, A.: Towards automated circuit discovery for mechanistic interpretability. In: Advances in Neural Information Processing Systems. pp. 16318–16352 (2023) 2, 3, 5, 6, 34

  9. [9]

    In: Proceedings of the World Conference on Explainable Artificial Intelligence

    Dorszewski, T., Tětková, L., Jenssen, R., Hansen, L.K., Wickstrøm, K.K.: From colors to classes: Emergence of concepts in vision transformers. In: Proceedings of the World Conference on Explainable Artificial Intelligence. pp. 28–47 (2025) 11

  10. [10]

    In: Proceedings of the International Conference on Learning Representations (2021) 7, 9, 18

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Un- terthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations (2021) 7, 9, 18

  11. [11]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

    Dreyer, M., Purelku, E., Vielhaben, J., Samek, W., Lapuschkin, S.: PURE: Turning polysemantic neurons into pure features by identifying relevant circuits. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. pp. 8212–8217 (2024) 2, 3

  12. [12]

    Transformer Circuits Thread (2021) 2 16 N

    Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., et al.: A mathematical framework for transformer circuits. Transformer Circuits Thread (2021) 2 16 N. Żukowska et al

  13. [13]

    Nature Machine Intelligence2(11), 665–673 (2020) 2

    Geirhos, R., Jacobsen, J.H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., Wichmann, F.A.: Shortcut learning in deep neural networks. Nature Machine Intelligence2(11), 665–673 (2020) 2

  14. [14]

    arXiv preprint arXiv:2206.01627 (2022) 3

    Hamblin, C.J., Konkle, T., Alvarez, G.A.: Pruning for interpretable, feature- preserving circuits in CNNs. arXiv preprint arXiv:2206.01627 (2022) 3

  15. [15]

    In: Advances in Neural Information Processing Systems (2023) 2

    Hanna, M., Liu, O., Variengien, A.: How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. In: Advances in Neural Information Processing Systems (2023) 2

  16. [16]

    arXiv preprint arXiv:2403.17806 (2024) 3, 5, 9, 19

    Hanna, M., Pezzelle, S., Belinkov, Y.: Have faith in faithfulness: Going beyond circuit overlap when finding model mechanisms. arXiv preprint arXiv:2403.17806 (2024) 3, 5, 9, 19

  17. [17]

    Howard, J.: Imagenette: A smaller subset of 10 easily classified classes from ImageNet (2019),https://github.com/fastai/imagenette31, 33

  18. [18]

    In: Proceedings of the International Conference on Learning Representations (2025) 3

    Hsu, A.R., Zhou, G., Cherapanamjeri, Y., Huang, Y., Odisho, A.Y., Carroll, P.R., Yu, B.: Efficient automated circuit discovery in transformers using contextual decomposition. In: Proceedings of the International Conference on Learning Representations (2025) 3

  19. [19]

    In: Proceedings of the International Conference on Learning Representations (2026) 2, 8

    Hufe, L., Venhoff, C., Dreyer, M., Purelku, E., Lapuschkin, S., Samek, W.: Dyslexify: A mechanistic defense against typographic attacks in CLIP. In: Proceedings of the International Conference on Learning Representations (2026) 2, 8

  20. [20]

    InInternational Conference on Learning Representations

    Jafari, F.R., Eberle, O., Khakzar, A., Nanda, N.: RelP: Faithful and efficient circuit discovery in language models via relevance patching. arXiv preprint arXiv:2508.21258 (2025) 3

  21. [21]

    arXiv preprint arXiv:2504.19475 (2025) 9, 19

    Joseph, S., Suresh, P., Hufe, L., Stevinson, E., Graham, R., Vadi, Y., Bzdok, D., Lapuschkin, S., Sharkey, L., Richards, B.A.: Prisma: An open source toolkit for mechanistic interpretability in vision and video. arXiv preprint arXiv:2504.19475 (2025) 9, 19

  22. [22]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Kowal, M., Wildes, R.P., Derpanis, K.G.: Visual concept connectome (VCC): Open world concept discovery and their interlayer connections in deep models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10895–10905 (2024) 3

  23. [23]

    In: Advances in Neural Information Processing Systems

    Lindner, D., Kramár, J., Farquhar, S., Rahtz, M., McGrath, T., Mikulik, V.: Tracr: Compiled transformers as a laboratory for interpretability. In: Advances in Neural Information Processing Systems. pp. 37876–37899 (2023) 2

  24. [24]

    In: Proceedings of the Annual Meeting of the Association for Computational Linguistics

    Mondorf, P., Wold, S., Plank, B.: Circuit compositions: Exploring modular structures in transformer-based language models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics. pp. 14934– 14955 (2025) 14

  25. [25]

    arXiv preprint arXiv:2503.09399 (2025) 7, 18

    Nauen, T.C., Moser, B., Raue, F., Frolov, S., Dengel, A.: ForAug: Recom- bining foregrounds and backgrounds to improve vision transformer training with bias mitigation. arXiv preprint arXiv:2503.09399 (2025) 7, 18

  26. [26]

    In: Proceedings of the European Conference on Computer Vision

    Park, S., Um, D., Yoon, H., Chun, S., Yun, S.: RoCOCO: Robustness benchmark of MS-COCO to stress-test image-text matching models. In: Proceedings of the European Conference on Computer Vision. pp. 71–91 (2024) 9, 12, 13, 28, 29 Faithful Mechanistic Interpretability for Vision Transformers 17

  27. [27]

    In: Proceedings of the International Conference on Machine Learning

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning. pp. 8748–8763 (2021) 9, 18, 21

  28. [28]

    arXiv preprint arXiv:2404.14349 (2024) 2, 3

    Rajaram, A., Chowdhury, N., Torralba, A., Andreas, J., Schwettmann, S.: Automatic discovery of visual circuits. arXiv preprint arXiv:2404.14349 (2024) 2, 3

  29. [29]

    Nature Machine Intelligence2(8), 476–486 (2020) 2

    Schramowski, P., Stammer, W., Teso, S., Brugger, A., Herbert, F., Shao, X., Luigs, H.G., Mahlein, A.K., Kersting, K.: Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nature Machine Intelligence2(8), 476–486 (2020) 2

  30. [30]

    Navigating shortcuts, spurious correlations, and confounders: From origins via detection to mitigation.arXiv preprint arXiv:2412.05152, 2024

    Steinmann, D., Divo, F., Kraus, M., Wüst, A., Struppek, L., Friedrich, F., Kersting, K.: Navigating shortcuts, spurious correlations, and confounders: From origins via detection to mitigation. arXiv preprint arXiv:2412.05152 (2024) 2

  31. [31]

    In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

    Syed, A., Rager, C., Conmy, A.: Attribution patching outperforms auto- mated circuit discovery. In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. pp. 407–416 (2024) 3, 9, 19

  32. [32]

    Angular steering: Behavior control via rotation in activation space.arXiv preprint arXiv:2510.26243,

    Vu, H.M., Nguyen, T.M.: Angular steering: Behavior control via rotation in activation space. arXiv preprint arXiv:2510.26243 (2025) 8

  33. [33]

    In: Proceedings of the International Conference on Learning Representations (2023) 2

    Wang, K.R., Variengien, A., Conmy, A., Shlegeris, B., Steinhardt, J.: In- terpretability in the wild: A circuit for indirect object identification in GPT-2 small. In: Proceedings of the International Conference on Learning Representations (2023) 2

  34. [34]

    In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

    Wang, X., Zhao, Z., Larson, M.: Typographic attacks in a multi-image setting. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pp. 12594–12604 (2025) 2, 8, 9, 20

  35. [35]

    In: Proceedings of the International Conference on Learning Representations (2025) 3

    Wang, Y., Liu, Y., Shi, Y., Li, C., Pang, A., Yang, S., Yu, J., Ren, K.: Discovering influential neuron path in vision transformers. In: Proceedings of the International Conference on Learning Representations (2025) 3

  36. [36]

    SCAM: A real-world typographic robustness evaluation for multimodal foundation models,

    Westerhoff, J., Purelku, E., Hackstein, J., Pinetzki, L., Hufe, L.: SCAM: A real-world typographic robustness evaluation for multimodal foundation models. arXiv preprint arXiv:2504.04893 (2025) 2 18 N. Żukowska et al. A Mining class Circuits Dataset.Class circuits are mined using the ForAug dataset [25], which provides ImageNet images processed via a segm...