pith. sign in

arxiv: 2605.12485 · v2 · pith:GAPQ5F2Gnew · submitted 2026-05-12 · 🧬 q-bio.NC · q-bio.QM

Letting the neural code speak: Automated characterization of monkey visual neurons through human language

Pith reviewed 2026-05-20 21:13 UTC · model grok-4.3

classification 🧬 q-bio.NC q-bio.QM
keywords visual cortexneural selectivitysemantic descriptionsdigital twinsmacaque V1macaque V4generative modelsrepresentational similarity
0
0 comments X

The pith

Semantic descriptions in natural language capture the selectivity of most neurons in macaque V1 and V4.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that natural language can characterize what most neurons in macaque visual areas respond to by converting their activity patterns into concise, testable descriptions. A closed-loop system uses digital twins of real neurons to caption preferred and non-preferred images, form semantic hypotheses, synthesize new images, and verify whether those images activate or suppress the neurons as predicted. This succeeds across V1 for simple features like edges and across V4 for complex combinations of form, color, and texture, with V4 activating images pushing 96.1 percent of neurons above the 95th percentile of natural-image responses. Readers would care because the method replaces opaque mathematical models with human-readable explanations that can be checked directly.

Core claim

Using digital twins of V1 and V4 neurons, a closed-loop framework translates each neuron's high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico. Descriptions range from oriented edges and spatial frequency in V1 to conjunctions of form, color, and texture in V4. In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses, respectively (vs. ~10% for random images).

What carries the argument

Closed-loop framework that turns neuron responses into captions, semantic hypotheses, and verified synthesized images via digital twins.

If this is right

  • Concise semantic descriptions capture selectivity for most neurons across V1 and V4.
  • Activating hypothesis images drive 96.1% of V4 neurons above the 95th percentile of natural-image responses.
  • Suppressing hypothesis images drive 97.6% of V4 neurons below the 5th percentile of natural-image responses.
  • Vision embeddings align more closely with neural activity than language embeddings do.
  • Alignment lost in the text bottleneck is recovered when hypotheses are rendered back into images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same closed-loop approach could characterize neurons in brain areas beyond V1 and V4.
  • Language-based descriptions might serve as an interface for querying large neural datasets.
  • Automated hypothesis generation could speed up experiments that link neural responses to behavior.
  • The method suggests testing whether semantic descriptions predict neuron responses in new animals or tasks.

Load-bearing premise

The digital twins accurately predict responses of real biological neurons to novel synthesized images never shown during twin training.

What would settle it

In vivo recordings showing that images synthesized from the semantic hypotheses fail to drive most neurons above the 95th percentile for activation or below the 5th percentile for suppression.

Figures

Figures reproduced from arXiv: 2605.12485 by Andreas S. Tolias, Katrin Franke, Nikos Karantzas, Sophia Sanborn, Surya Ganguli, Tamar Rott Shaham, Vedang Lad.

Figure 1
Figure 1. Figure 1: Framework for translating neural selectivity into interpretable semantic hypotheses. The pipeline consists of three stages: Translate: Each image is converted into a detailed textual description using Gemini 3.0 Pro. To evaluate the fidelity of this image-to-text translation, we regenerate images from the captions using a text-to-image model and quantify correspondence to the original image in an image-sim… view at source ↗
Figure 2
Figure 2. Figure 2: Translation and faithfulness of image-to-text descriptions. The Translate stage of our framework converts input images into detailed captions via Gemini 3.0 Pro and assesses faithfulness by comparing caption-conditioned reconstructions to the originals in DINOv3 embedding space. (a) Translate: Area V4. Given an input image (top left), Gemini 3.0 Pro generates a detailed, multi-sentence caption describing t… view at source ↗
Figure 3
Figure 3. Figure 3: Deriving semantic hypotheses from neurons in macaque visual cortex. For each V1 and V4 neuron, extreme-response images are identified from a large naturalistic image dataset via a functional digital twin. For neurons with baseline activity, we extract both top- and bottom-activating images and distill each set separately into an excitatory and a suppressive semantic hypothesis; for sparse neurons, we extra… view at source ↗
Figure 4
Figure 4. Figure 4: Area V4: Closed-loop verification of semantic hypotheses using generative stimuli and spatial opti￾mization. Top: A generated semantic hypothesis for an example V4 neuron is expanded into multiple diverse text prompts, which are then rendered into novel images using a text-to-image model. These generated images resemble the neuron’s most-activating natural images, capturing core feature conjunctions such a… view at source ↗
Figure 5
Figure 5. Figure 5: Area V1: Closed-loop verification of semantic hypotheses using generative stimuli and spatial opti￾mization. Semantic hypotheses successfully generate stimuli that drive neurons above the random baseline, confirming that the pipeline generalizes across the visual hierarchy. The smaller gain from spatial optimization relative to V4 quantifies the expected gradient: language is a coarser coordinate system fo… view at source ↗
Figure 7
Figure 7. Figure 7: Semantic structure of neural selectivity revealed through population activity clustering. Left: UMAP embedding of V4 neurons clustered by population activity similarity, annotated with nouns and adjectives extracted from the first sentence of each neuron’s semantic hypothesis. Large-scale neighborhoods exhibit smooth transitions in both visual content and descriptive language, from eyes and circular organi… view at source ↗
read the original abstract

Understanding what individual neurons encode is a core question in neuroscience. In primary visual cortex (V1), mathematical models (e.g., Gabor functions) capture neural selectivity, but no comparable framework exists for higher areas. We show that natural language can fill this role: across macaque V1 and V4, the selectivity of most neurons is captured by concise, verifiable semantic descriptions. Using digital twins of V1 and V4, we develop a closed-loop framework that translates each neuron's high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico. Descriptions range from oriented edges and spatial frequency in V1 to conjunctions of form, color, and texture in V4. In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses, respectively (vs. ~10% for random images); V1 activation results matched V4, while V1 suppression was less describable in language. Representational similarity analysis reveals partial alignment between neural activity, vision embeddings, and language embeddings, with vision most aligned to neural activity; alignment lost in the text bottleneck is recovered when hypotheses are rendered back into images, showing that linguistic compression is lossy yet semantically faithful. Together, these results show that combining generative models with neural digital twins enables interpretable, testable descriptions of neural function at scale, toward agentic scientific discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a closed-loop framework that uses digital twins of macaque V1 and V4 neurons to translate high- and low-activating natural images into dense captions, derive semantic hypotheses, synthesize new images, and verify the hypotheses in silico. It claims that concise language descriptions capture neural selectivity, with V4 activating and suppressing images driving 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses (versus ~10% for random images), V1 activation results matching V4, and representational similarity analysis showing partial alignment between neural activity, vision embeddings, and language embeddings that is recovered upon rendering hypotheses back to images.

Significance. If the digital-twin predictions prove accurate on the synthesized stimuli, the work would offer a scalable, interpretable approach to characterizing selectivity in higher visual areas using natural language, extending beyond Gabor-like models for V1. The closed-loop generation and verification pipeline, together with the embedding alignment analysis, represents a concrete step toward automated, testable descriptions of neural function.

major comments (2)
  1. [Abstract and Results] Abstract and Results: The headline quantitative claims (96.1% of V4 neurons driven above the 95th percentile by activating images and 97.6% below the 5th percentile by suppressing images) rest exclusively on predictions from digital twins trained on natural images. No new electrophysiological recordings from the biological neurons are reported for any of the synthesized images, which constitute an out-of-distribution shift. Without direct validation of twin accuracy on these stimuli, the percentile rankings do not establish that the linguistic hypotheses correctly characterize real-neuron selectivity.
  2. [Methods and Results] Methods and Results: The degree of distributional shift between the natural-image training set for the twins and the language-generated synthesized images is not quantified, nor is twin prediction error measured on held-out synthesized stimuli. This omission is load-bearing because the entire verification step is performed in silico.
minor comments (2)
  1. [Abstract] The abstract states that V1 suppression was less describable in language, but the corresponding quantitative comparison to V4 is not shown; a supplementary table or figure would clarify the asymmetry.
  2. [Results] Notation for the percentile thresholds and the exact definition of 'natural-image responses' used as the reference distribution should be made explicit in the main text rather than left to supplementary material.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments, which correctly identify the reliance on in silico verification and the need to characterize distributional shift. We respond to each major comment below, indicating revisions where we can strengthen the manuscript without misrepresenting the work.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: The headline quantitative claims (96.1% of V4 neurons driven above the 95th percentile by activating images and 97.6% below the 5th percentile by suppressing images) rest exclusively on predictions from digital twins trained on natural images. No new electrophysiological recordings from the biological neurons are reported for any of the synthesized images, which constitute an out-of-distribution shift. Without direct validation of twin accuracy on these stimuli, the percentile rankings do not establish that the linguistic hypotheses correctly characterize real-neuron selectivity.

    Authors: We agree that the quantitative claims rest on digital-twin predictions and that no new electrophysiological recordings were obtained on the synthesized images. This is an inherent limitation of the closed-loop, scalable framework presented, which uses the twins as a proxy to enable testing at the scale of hundreds of neurons without repeated in-vivo sessions for every generated stimulus. The twins were trained and cross-validated on large natural-image datasets with high predictive performance on held-out natural images; the synthesized images are not arbitrary but are produced from semantic hypotheses derived directly from those same natural-image responses. We will revise the abstract, results, and discussion to state explicitly that all percentile rankings are twin predictions, to frame the work as demonstrating the feasibility of language-based characterization via digital twins, and to note that direct biological confirmation remains an important direction for future experiments. revision: partial

  2. Referee: [Methods and Results] Methods and Results: The degree of distributional shift between the natural-image training set for the twins and the language-generated synthesized images is not quantified, nor is twin prediction error measured on held-out synthesized stimuli. This omission is load-bearing because the entire verification step is performed in silico.

    Authors: We accept this criticism and will add the requested analyses. In the revised Methods and Results we will quantify the distributional shift by reporting distances in a shared embedding space (e.g., CLIP or DINO features) between the original natural-image sets and the language-generated synthesized images, as well as basic statistics on low-level image properties. Because ground-truth neural responses for the synthesized images are unavailable, direct prediction-error measurement on them is not possible; however, we will report the twins’ internal uncertainty estimates on the synthesized stimuli and will discuss the implications for the in-silico verification step. revision: yes

standing simulated objections not resolved
  • Direct electrophysiological recordings on the synthesized images to obtain ground-truth responses from the biological neurons were not performed; such recordings would require new experimental sessions outside the scope and dataset of the present study.

Circularity Check

0 steps flagged

No significant circularity; derivation uses independent digital-twin predictions on novel images

full rationale

The paper's core chain—captioning high/low-activating natural images, generating semantic hypotheses, synthesizing new images, and verifying via in-silico twin responses—does not reduce to any input by construction. The percentile comparisons (96.1% above 95th, 97.6% below 5th) are computed from twin predictions on out-of-distribution synthesized stimuli, not from refitting or re-using the original natural-image data. No self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. The digital twins serve as an external predictive benchmark whose training distribution is distinct from the test images, satisfying the criteria for non-circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the accuracy of the digital twins and on the assumption that language models can extract the relevant semantic features from image captions without introducing systematic bias; no explicit free parameters or new physical entities are introduced in the abstract.

axioms (1)
  • domain assumption Digital twins trained on natural images generalize to images synthesized from language hypotheses.
    Invoked when the paper states that synthesized images drive extreme responses in the twins.

pith-pipeline@v0.9.0 · 5837 in / 1351 out tokens · 35009 ms · 2026-05-20T21:13:41.978970+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages · 5 internal anchors

  1. [1]

    2026 , eprint=

    Revisiting the Platonic Representation Hypothesis: An Aristotelian View , author=. 2026 , eprint=

  2. [2]

    Cell , volume=

    Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences , author=. Cell , volume=. 2019 , publisher=

  3. [3]

    Interpreting the retinal neural code for natural scenes: From computations to neurons

    Maheswaranathan, Niru and McIntosh, Lane T and Tanaka, Hidenori and Grant, Satchel and Kastner, David B and Melander, Joshua B and Nayebi, Aran and Brezovec, Luke E and Wang, Julia H and Ganguli, Surya and Baccus, Stephen A. Interpreting the retinal neural code for natural scenes: From computations to neurons. Neuron

  4. [4]

    Neural representational geometry underlies few-shot concept learning

    Sorscher, Ben and Ganguli, Surya and Sompolinsky, Haim. Neural representational geometry underlies few-shot concept learning. Proc. Natl. Acad. Sci. U. S. A

  5. [5]

    Distill , volume=

    Multimodal neurons in artificial neural networks , author=. Distill , volume=

  6. [6]

    Nature Neuroscience , volume=

    Semantic reconstruction of continuous language from non-invasive brain recordings , author=. Nature Neuroscience , volume=. 2023 , publisher=

  7. [7]

    Proceedings of the National Academy of Sciences , volume=

    The neural architecture of language: Integrative modeling converges on predictive processing , author=. Proceedings of the National Academy of Sciences , volume=

  8. [8]

    arXiv preprint arXiv:2510.02182 , year=

    Uncovering semantic selectivity of latent groups in higher visual cortex with mutual information-guided diffusion , author=. arXiv preprint arXiv:2510.02182 , year=

  9. [9]

    Jones, J. P. and Palmer, L. A. , journal=. An evaluation of the two-dimensional

  10. [10]

    Journal of the Optical Society of America A , volume=

    Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters , author=. Journal of the Optical Society of America A , volume=

  11. [11]

    Nature , volume=

    Emergence of simple-cell receptive field properties by learning a sparse code for natural images , author=. Nature , volume=. 1996 , publisher=

  12. [12]

    Annual Review of Neuroscience , volume=

    Natural image statistics and neural representation , author=. Annual Review of Neuroscience , volume=. 2001 , publisher=

  13. [13]

    Vision Research , volume=

    Quantitative analysis of cat retinal ganglion cell response to visual stimuli , author=. Vision Research , volume=. 1965 , publisher=

  14. [14]

    Journal of Neurophysiology , volume=

    Coding visual images of objects in the inferotemporal cortex of the macaque monkey , author=. Journal of Neurophysiology , volume=

  15. [15]

    and Connor, C

    Pasupathy, A. and Connor, C. E. , journal=. Shape representation in area

  16. [16]

    and Connor, C

    Pasupathy, A. and Connor, C. E. , journal=. Population coding of shape in area. 2002 , publisher=

  17. [17]

    Science , volume=

    A cortical region consisting entirely of face-selective cells , author=. Science , volume=. 2006 , publisher=

  18. [18]

    Annual Review of Neuroscience , volume=

    Mechanisms of face perception , author=. Annual Review of Neuroscience , volume=. 2010 , publisher=

  19. [19]

    Nature Neuroscience , volume=

    Metamers of the ventral stream , author=. Nature Neuroscience , volume=. 2011 , publisher=

  20. [20]

    Nature Neuroscience , volume=

    Underlying principles of visual shape selectivity in posterior inferotemporal cortex , author=. Nature Neuroscience , volume=. 2004 , publisher=

  21. [21]

    Cold Spring Harbor Symposia on Quantitative Biology , volume=

    Representation of naturalistic image structure in the primate visual cortex , author=. Cold Spring Harbor Symposia on Quantitative Biology , volume=. 2014 , publisher=

  22. [22]

    , journal=

    Oliver, Michael and Winter, Michele and Dupré la Tour, Tom and Eickenberg, Michael and Gallant, Jack L. , journal=. A biologically-inspired hierarchical convolutional energy model predicts. 2024 , doi=

  23. [23]

    Nature Machine Intelligence , volume=

    Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

  24. [24]

    and Wehbe, Leila , booktitle=

    Luo, Andrew and Henderson, Margot and Tarr, Michael J. and Wehbe, Leila , booktitle=

  25. [25]

    2025 , month=

    Wasserman, Navve and Cosarinsky, Matias and Golbari, Yuval and Oliva, Aude and Torralba, Antonio and Rott Shaham, Tamar and Irani, Michal , journal=. 2025 , month=

  26. [26]

    International Conference on Learning Representations (ICLR) , year=

    Rethinking Language-Alignment in Human Visual Cortex with Syntax Manipulation and Word Models , author=. International Conference on Learning Representations (ICLR) , year=

  27. [27]

    Nature Machine Intelligence , volume=

    High-level visual representations in the human brain are aligned with large language models , author=. Nature Machine Intelligence , volume=. 2025 , publisher=

  28. [28]

    Nature , volume=

    Invariant visual representation by single neurons in the human brain , author=. Nature , volume=. 2005 , publisher=

  29. [29]

    Proceedings of the National Academy of Sciences , volume=

    Human single-neuron responses at the threshold of conscious recognition , author=. Proceedings of the National Academy of Sciences , volume=. 2008 , publisher=

  30. [30]

    Frontiers in systems neuroscience , volume=

    Representational similarity analysis-connecting the branches of systems neuroscience , author=. Frontiers in systems neuroscience , volume=. 2008 , publisher=

  31. [31]

    Trends in cognitive sciences , volume=

    Representational geometry: integrating cognition, computation, and the brain , author=. Trends in cognitive sciences , volume=. 2013 , publisher=

  32. [32]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

    Interpretable convolutional neural networks , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

  33. [33]

    Proceedings of the National Academy of Sciences , volume=

    Understanding the role of individual units in a deep neural network , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

  34. [34]

    Proceedings of the European Conference on Computer Vision , pages=

    Interpretable basis decomposition for visual explanation , author=. Proceedings of the European Conference on Computer Vision , pages=. 2018 , organization=

  35. [35]

    Proceedings of the National Academy of Sciences , volume=

    Performance-optimized hierarchical models predict neural responses in higher visual cortex , author=. Proceedings of the National Academy of Sciences , volume=. 2014 , publisher=

  36. [36]

    The UK Biobank resource with deep phenotyping and genomic data

    Bycroft, Clare and Freeman, Colin and Petkova, Desislava and Band, Gavin and Elliott, Lloyd T and Sharp, Kevin and Motyer, Allan and Vukcevic, Damjan and Delaneau, Olivier and O'Connell, Jared and Cortes, Adrian and Welsh, Samantha and Young, Alan and Effingham, Mark and McVean, Gil and Leslie, Stephen and Allen, Naomi and Donnelly, Peter and Marchini, Jo...

  37. [37]

    Highly accurate protein structure prediction with AlphaFold

    Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and Žídek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and J...

  38. [38]

    Foundation model of neural activity predicts response to new stimulus types

    Wang, Eric Y and Fahey, Paul G and Ding, Zhuokun and Papadopoulos, Stelios and Ponder, Kayla and Weis, Marissa A and Chang, Andersen and Muhammad, Taliah and Patel, Saumil and Ding, Zhiwei and Tran, Dat and Fu, Jiakun and Schneider-Mizell, Casey M and MICrONS Consortium and Reid, R Clay and Collman, Forrest and da Costa, Nuno Maçarico and Franke, Katrin a...

  39. [39]

    Accurate medium-range global weather forecasting with 3D neural networks

    Bi, Kaifeng and Xie, Lingxi and Zhang, Hengheng and Chen, Xin and Gu, Xiaotao and Tian, Qi. Accurate medium-range global weather forecasting with 3D neural networks. Nature

  40. [40]

    High-speed, cortex-wide volumetric recording of neuroactivity at cellular resolution using light beads microscopy

    Demas, Jeffrey and Manley, Jason and Tejera, Frank and Barber, Kevin and Kim, Hyewon and Traub, Francisca Martínez and Chen, Brandon and Vaziri, Alipasha. High-speed, cortex-wide volumetric recording of neuroactivity at cellular resolution using light beads microscopy. Nat. Methods

  41. [41]

    Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings

    Steinmetz, Nicholas A and Aydin, Cagatay and Lebedeva, Anna and Okun, Michael and Pachitariu, Marius and Bauza, Marius and Beau, Maxime and Bhagat, Jai and Böhm, Claudia and Broux, Martijn and Chen, Susu and Colonell, Jennifer and Gardner, Richard J and Karsh, Bill and Kloosterman, Fabian and Kostadinov, Dimitar and Mora-Lopez, Carolina and O'Callaghan, J...

  42. [42]

    2024 , eprint=

    The Platonic Representation Hypothesis , author=. 2024 , eprint=

  43. [43]

    2024 , eprint=

    The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. 2024 , eprint=

  44. [44]

    2022 , eprint=

    Toy Models of Superposition , author=. 2022 , eprint=

  45. [45]

    Functional connectomics spanning multiple areas of mouse visual cortex

    MICrONS Consortium. Functional connectomics spanning multiple areas of mouse visual cortex. Nature

  46. [46]

    and Hong, Ha and Yamins, Daniel L

    Cadieu, Charles F. and Hong, Ha and Yamins, Daniel L. K. and Pinto, Nicolas and Ardila, Diego and Solomon, Ethan A. and Majaj, Najib J. and DiCarlo, James J. , journal=. Deep neural networks rival the representation of primate. 2014 , publisher=

  47. [47]

    Science , volume=

    Neural population control via deep image synthesis , author=. Science , volume=. 2019 , publisher=

  48. [48]

    Nature Neuroscience , volume=

    Inception loops discover what excites neurons most using deep predictive models , author=. Nature Neuroscience , volume=. 2019 , publisher=

  49. [49]

    Dual-feature selectivity enables bidirectional coding in visual cortical neurons

    Franke, Katrin and Karantzas, Nikos and Willeke, Konstantin and Diamantaki, Maria and Ramakrishnan, Kandan and Bedel, Hasan Atakan and Elumalai, Pavithra and Restivo, Kelli and Fahey, Paul and Nealley, Cate and Shinn, Tori and Garcia, Gabrielle and Patel, Saumil and Ecker, Alexander and Walker, Edgar Y and Froudarakis, Emmanouil and Sanborn, Sophia and Si...

  50. [50]

    TMLR , year=

    Interpreting Neurons in Deep Vision Networks with Language Models , author=. TMLR , year=

  51. [51]

    Advances in Neural Information Processing Systems , volume=

    Compositional explanations of neurons , author=. Advances in Neural Information Processing Systems , volume=

  52. [52]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Network dissection: Quantifying interpretability of deep visual representations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  53. [53]

    arXiv preprint arXiv:2204.10965 , year=

    Clip-dissect: Automatic description of neuron representations in deep vision networks , author=. arXiv preprint arXiv:2204.10965 , year=

  54. [54]

    International conference on machine learning , pages=

    Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

  55. [55]

    International Conference on Machine Learning , pages=

    Identifying interpretable subspaces in image representations , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  56. [56]

    International Conference on Learning Representations , year=

    Natural language descriptions of deep visual features , author=. International Conference on Learning Representations , year=

  57. [57]

    Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=

    Rigorously assessing natural language explanations of neurons , author=. Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=

  58. [58]

    Advances in Neural Information Processing Systems , volume=

    Find: A function description benchmark for evaluating interpretability methods , author=. Advances in Neural Information Processing Systems , volume=

  59. [59]

    2009 IEEE conference on computer vision and pattern recognition , pages=

    Imagenet: A large-scale hierarchical image database , author=. 2009 IEEE conference on computer vision and pattern recognition , pages=. 2009 , organization=

  60. [60]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Understanding deep image representations by inverting them , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  61. [61]

    Distill , volume=

    Feature visualization , author=. Distill , volume=

  62. [62]

    European conference on computer vision , pages=

    Visualizing and understanding convolutional networks , author=. European conference on computer vision , pages=. 2014 , organization=

  63. [63]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Rich feature hierarchies for accurate object detection and semantic segmentation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  64. [64]

    Visualizing and Understanding Recurrent Networks

    Visualizing and understanding recurrent networks , author=. arXiv preprint arXiv:1506.02078 , year=

  65. [65]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    What is one grain of sand in the desert? analyzing individual neurons in deep nlp models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  66. [66]

    Distill , volume=

    Curve circuits , author=. Distill , volume=

  67. [67]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Toward a visual concept vocabulary for gan latent space , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  68. [68]

    arXiv preprint arXiv:2310.05916 , year=

    Interpreting clip's image representation via text-based decomposition , author=. arXiv preprint arXiv:2310.05916 , year=

  69. [69]

    Forty-first International Conference on Machine Learning , year=

    A multimodal automated interpretability agent , author=. Forty-first International Conference on Machine Learning , year=

  70. [70]

    Proceedings of the 40th International Conference on Machine Learning , pages =

    Identifying Interpretable Subspaces in Image Representations , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

  71. [71]

    International conference on machine learning , pages=

    Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) , author=. International conference on machine learning , pages=. 2018 , organization=

  72. [72]

    Analogs of Linguistic Structure in Deep Representations

    Analogs of linguistic structure in deep representations , author=. arXiv preprint arXiv:1707.08139 , year=

  73. [73]

    A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex

    Gondur, Rabia and Stan, Patricia L and Smith, Matthew A and Cowley, Benjamin R. A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex. The Fourteenth International Conference on Learning Representations

  74. [74]

    The importance of mixed selectivity in complex cognitive tasks

    Rigotti, Mattia and Barak, Omri and Warden, Melissa R and Wang, Xiao-Jing and Daw, Nathaniel D and Miller, Earl K and Fusi, Stefano. The importance of mixed selectivity in complex cognitive tasks. Nature

  75. [75]

    Local vs

    Thorpe, Simon. Local vs. Distributed Coding. Intellectica

  76. [76]

    Choosing the right basis for interpretability: Psychophysical comparison between neuron-based and dictionary-based representations.arXiv preprint arXiv:2411.03993, 2024

    Local vs distributed representations: What is the right basis for interpretability? , author=. International Conference on Learning Representations (ICLR) , year=. 2411.03993 , archivePrefix=

  77. [77]

    Deep learning-driven characterization of single cell tuning in primate visual area V4 supports topological organization

    Willeke, Konstantin F and Restivo, Kelli and Franke, Katrin and Nix, Arne F and Cadena, Santiago A and Shinn, Tori and Nealley, Cate and Rodriguez, Gabrielle and Patel, Saumil and Ecker, Alexander S and Sinz, Fabian H and Tolias, Andreas S. Deep learning-driven characterization of single cell tuning in primate visual area V4 supports topological organizat...

  78. [78]

    What's ``up'' with vision-language models? Investigating their struggle with spatial reasoning

    Kamath, Amita and Hessel, Jack and Chang, Kai-Wei. What's ``up'' with vision-language models? Investigating their struggle with spatial reasoning. arXiv [cs.CL]

  79. [79]

    SpatialVLM : Endowing Vision-Language Models with spatial reasoning capabilities

    Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Driess, Danny and Florence, Pete and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei. SpatialVLM : Endowing Vision-Language Models with spatial reasoning capabilities. arXiv [cs.CV]

  80. [80]

    Mind the gap: Benchmarking spatial reasoning in Vision-Language Models

    Stogiannidis, Ilias and McDonagh, Steven and Tsaftaris, Sotirios A. Mind the gap: Benchmarking spatial reasoning in Vision-Language Models. arXiv [cs.CV]

Showing first 80 references.