Letting the neural code speak: Automated characterization of monkey visual neurons through human language
Pith reviewed 2026-05-20 21:13 UTC · model grok-4.3
The pith
Semantic descriptions in natural language capture the selectivity of most neurons in macaque V1 and V4.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using digital twins of V1 and V4 neurons, a closed-loop framework translates each neuron's high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico. Descriptions range from oriented edges and spatial frequency in V1 to conjunctions of form, color, and texture in V4. In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses, respectively (vs. ~10% for random images).
What carries the argument
Closed-loop framework that turns neuron responses into captions, semantic hypotheses, and verified synthesized images via digital twins.
If this is right
- Concise semantic descriptions capture selectivity for most neurons across V1 and V4.
- Activating hypothesis images drive 96.1% of V4 neurons above the 95th percentile of natural-image responses.
- Suppressing hypothesis images drive 97.6% of V4 neurons below the 5th percentile of natural-image responses.
- Vision embeddings align more closely with neural activity than language embeddings do.
- Alignment lost in the text bottleneck is recovered when hypotheses are rendered back into images.
Where Pith is reading between the lines
- The same closed-loop approach could characterize neurons in brain areas beyond V1 and V4.
- Language-based descriptions might serve as an interface for querying large neural datasets.
- Automated hypothesis generation could speed up experiments that link neural responses to behavior.
- The method suggests testing whether semantic descriptions predict neuron responses in new animals or tasks.
Load-bearing premise
The digital twins accurately predict responses of real biological neurons to novel synthesized images never shown during twin training.
What would settle it
In vivo recordings showing that images synthesized from the semantic hypotheses fail to drive most neurons above the 95th percentile for activation or below the 5th percentile for suppression.
Figures
read the original abstract
Understanding what individual neurons encode is a core question in neuroscience. In primary visual cortex (V1), mathematical models (e.g., Gabor functions) capture neural selectivity, but no comparable framework exists for higher areas. We show that natural language can fill this role: across macaque V1 and V4, the selectivity of most neurons is captured by concise, verifiable semantic descriptions. Using digital twins of V1 and V4, we develop a closed-loop framework that translates each neuron's high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico. Descriptions range from oriented edges and spatial frequency in V1 to conjunctions of form, color, and texture in V4. In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses, respectively (vs. ~10% for random images); V1 activation results matched V4, while V1 suppression was less describable in language. Representational similarity analysis reveals partial alignment between neural activity, vision embeddings, and language embeddings, with vision most aligned to neural activity; alignment lost in the text bottleneck is recovered when hypotheses are rendered back into images, showing that linguistic compression is lossy yet semantically faithful. Together, these results show that combining generative models with neural digital twins enables interpretable, testable descriptions of neural function at scale, toward agentic scientific discovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a closed-loop framework that uses digital twins of macaque V1 and V4 neurons to translate high- and low-activating natural images into dense captions, derive semantic hypotheses, synthesize new images, and verify the hypotheses in silico. It claims that concise language descriptions capture neural selectivity, with V4 activating and suppressing images driving 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses (versus ~10% for random images), V1 activation results matching V4, and representational similarity analysis showing partial alignment between neural activity, vision embeddings, and language embeddings that is recovered upon rendering hypotheses back to images.
Significance. If the digital-twin predictions prove accurate on the synthesized stimuli, the work would offer a scalable, interpretable approach to characterizing selectivity in higher visual areas using natural language, extending beyond Gabor-like models for V1. The closed-loop generation and verification pipeline, together with the embedding alignment analysis, represents a concrete step toward automated, testable descriptions of neural function.
major comments (2)
- [Abstract and Results] Abstract and Results: The headline quantitative claims (96.1% of V4 neurons driven above the 95th percentile by activating images and 97.6% below the 5th percentile by suppressing images) rest exclusively on predictions from digital twins trained on natural images. No new electrophysiological recordings from the biological neurons are reported for any of the synthesized images, which constitute an out-of-distribution shift. Without direct validation of twin accuracy on these stimuli, the percentile rankings do not establish that the linguistic hypotheses correctly characterize real-neuron selectivity.
- [Methods and Results] Methods and Results: The degree of distributional shift between the natural-image training set for the twins and the language-generated synthesized images is not quantified, nor is twin prediction error measured on held-out synthesized stimuli. This omission is load-bearing because the entire verification step is performed in silico.
minor comments (2)
- [Abstract] The abstract states that V1 suppression was less describable in language, but the corresponding quantitative comparison to V4 is not shown; a supplementary table or figure would clarify the asymmetry.
- [Results] Notation for the percentile thresholds and the exact definition of 'natural-image responses' used as the reference distribution should be made explicit in the main text rather than left to supplementary material.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which correctly identify the reliance on in silico verification and the need to characterize distributional shift. We respond to each major comment below, indicating revisions where we can strengthen the manuscript without misrepresenting the work.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results: The headline quantitative claims (96.1% of V4 neurons driven above the 95th percentile by activating images and 97.6% below the 5th percentile by suppressing images) rest exclusively on predictions from digital twins trained on natural images. No new electrophysiological recordings from the biological neurons are reported for any of the synthesized images, which constitute an out-of-distribution shift. Without direct validation of twin accuracy on these stimuli, the percentile rankings do not establish that the linguistic hypotheses correctly characterize real-neuron selectivity.
Authors: We agree that the quantitative claims rest on digital-twin predictions and that no new electrophysiological recordings were obtained on the synthesized images. This is an inherent limitation of the closed-loop, scalable framework presented, which uses the twins as a proxy to enable testing at the scale of hundreds of neurons without repeated in-vivo sessions for every generated stimulus. The twins were trained and cross-validated on large natural-image datasets with high predictive performance on held-out natural images; the synthesized images are not arbitrary but are produced from semantic hypotheses derived directly from those same natural-image responses. We will revise the abstract, results, and discussion to state explicitly that all percentile rankings are twin predictions, to frame the work as demonstrating the feasibility of language-based characterization via digital twins, and to note that direct biological confirmation remains an important direction for future experiments. revision: partial
-
Referee: [Methods and Results] Methods and Results: The degree of distributional shift between the natural-image training set for the twins and the language-generated synthesized images is not quantified, nor is twin prediction error measured on held-out synthesized stimuli. This omission is load-bearing because the entire verification step is performed in silico.
Authors: We accept this criticism and will add the requested analyses. In the revised Methods and Results we will quantify the distributional shift by reporting distances in a shared embedding space (e.g., CLIP or DINO features) between the original natural-image sets and the language-generated synthesized images, as well as basic statistics on low-level image properties. Because ground-truth neural responses for the synthesized images are unavailable, direct prediction-error measurement on them is not possible; however, we will report the twins’ internal uncertainty estimates on the synthesized stimuli and will discuss the implications for the in-silico verification step. revision: yes
- Direct electrophysiological recordings on the synthesized images to obtain ground-truth responses from the biological neurons were not performed; such recordings would require new experimental sessions outside the scope and dataset of the present study.
Circularity Check
No significant circularity; derivation uses independent digital-twin predictions on novel images
full rationale
The paper's core chain—captioning high/low-activating natural images, generating semantic hypotheses, synthesizing new images, and verifying via in-silico twin responses—does not reduce to any input by construction. The percentile comparisons (96.1% above 95th, 97.6% below 5th) are computed from twin predictions on out-of-distribution synthesized stimuli, not from refitting or re-using the original natural-image data. No self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described framework. The digital twins serve as an external predictive benchmark whose training distribution is distinct from the test images, satisfying the criteria for non-circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Digital twins trained on natural images generalize to images synthesized from language hypotheses.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using digital twins of V1 and V4, we develop a closed-loop framework that translates each neuron’s high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Revisiting the Platonic Representation Hypothesis: An Aristotelian View , author=. 2026 , eprint=
work page 2026
-
[2]
Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences , author=. Cell , volume=. 2019 , publisher=
work page 2019
-
[3]
Interpreting the retinal neural code for natural scenes: From computations to neurons
Maheswaranathan, Niru and McIntosh, Lane T and Tanaka, Hidenori and Grant, Satchel and Kastner, David B and Melander, Joshua B and Nayebi, Aran and Brezovec, Luke E and Wang, Julia H and Ganguli, Surya and Baccus, Stephen A. Interpreting the retinal neural code for natural scenes: From computations to neurons. Neuron
-
[4]
Neural representational geometry underlies few-shot concept learning
Sorscher, Ben and Ganguli, Surya and Sompolinsky, Haim. Neural representational geometry underlies few-shot concept learning. Proc. Natl. Acad. Sci. U. S. A
-
[5]
Multimodal neurons in artificial neural networks , author=. Distill , volume=
-
[6]
Semantic reconstruction of continuous language from non-invasive brain recordings , author=. Nature Neuroscience , volume=. 2023 , publisher=
work page 2023
-
[7]
Proceedings of the National Academy of Sciences , volume=
The neural architecture of language: Integrative modeling converges on predictive processing , author=. Proceedings of the National Academy of Sciences , volume=
-
[8]
arXiv preprint arXiv:2510.02182 , year=
Uncovering semantic selectivity of latent groups in higher visual cortex with mutual information-guided diffusion , author=. arXiv preprint arXiv:2510.02182 , year=
-
[9]
Jones, J. P. and Palmer, L. A. , journal=. An evaluation of the two-dimensional
-
[10]
Journal of the Optical Society of America A , volume=
Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters , author=. Journal of the Optical Society of America A , volume=
-
[11]
Emergence of simple-cell receptive field properties by learning a sparse code for natural images , author=. Nature , volume=. 1996 , publisher=
work page 1996
-
[12]
Annual Review of Neuroscience , volume=
Natural image statistics and neural representation , author=. Annual Review of Neuroscience , volume=. 2001 , publisher=
work page 2001
-
[13]
Quantitative analysis of cat retinal ganglion cell response to visual stimuli , author=. Vision Research , volume=. 1965 , publisher=
work page 1965
-
[14]
Journal of Neurophysiology , volume=
Coding visual images of objects in the inferotemporal cortex of the macaque monkey , author=. Journal of Neurophysiology , volume=
- [15]
-
[16]
Pasupathy, A. and Connor, C. E. , journal=. Population coding of shape in area. 2002 , publisher=
work page 2002
-
[17]
A cortical region consisting entirely of face-selective cells , author=. Science , volume=. 2006 , publisher=
work page 2006
-
[18]
Annual Review of Neuroscience , volume=
Mechanisms of face perception , author=. Annual Review of Neuroscience , volume=. 2010 , publisher=
work page 2010
-
[19]
Metamers of the ventral stream , author=. Nature Neuroscience , volume=. 2011 , publisher=
work page 2011
-
[20]
Underlying principles of visual shape selectivity in posterior inferotemporal cortex , author=. Nature Neuroscience , volume=. 2004 , publisher=
work page 2004
-
[21]
Cold Spring Harbor Symposia on Quantitative Biology , volume=
Representation of naturalistic image structure in the primate visual cortex , author=. Cold Spring Harbor Symposia on Quantitative Biology , volume=. 2014 , publisher=
work page 2014
-
[22]
Oliver, Michael and Winter, Michele and Dupré la Tour, Tom and Eickenberg, Michael and Gallant, Jack L. , journal=. A biologically-inspired hierarchical convolutional energy model predicts. 2024 , doi=
work page 2024
-
[23]
Nature Machine Intelligence , volume=
Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset , author=. Nature Machine Intelligence , volume=. 2023 , publisher=
work page 2023
-
[24]
Luo, Andrew and Henderson, Margot and Tarr, Michael J. and Wehbe, Leila , booktitle=
-
[25]
Wasserman, Navve and Cosarinsky, Matias and Golbari, Yuval and Oliva, Aude and Torralba, Antonio and Rott Shaham, Tamar and Irani, Michal , journal=. 2025 , month=
work page 2025
-
[26]
International Conference on Learning Representations (ICLR) , year=
Rethinking Language-Alignment in Human Visual Cortex with Syntax Manipulation and Word Models , author=. International Conference on Learning Representations (ICLR) , year=
-
[27]
Nature Machine Intelligence , volume=
High-level visual representations in the human brain are aligned with large language models , author=. Nature Machine Intelligence , volume=. 2025 , publisher=
work page 2025
-
[28]
Invariant visual representation by single neurons in the human brain , author=. Nature , volume=. 2005 , publisher=
work page 2005
-
[29]
Proceedings of the National Academy of Sciences , volume=
Human single-neuron responses at the threshold of conscious recognition , author=. Proceedings of the National Academy of Sciences , volume=. 2008 , publisher=
work page 2008
-
[30]
Frontiers in systems neuroscience , volume=
Representational similarity analysis-connecting the branches of systems neuroscience , author=. Frontiers in systems neuroscience , volume=. 2008 , publisher=
work page 2008
-
[31]
Trends in cognitive sciences , volume=
Representational geometry: integrating cognition, computation, and the brain , author=. Trends in cognitive sciences , volume=. 2013 , publisher=
work page 2013
-
[32]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Interpretable convolutional neural networks , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[33]
Proceedings of the National Academy of Sciences , volume=
Understanding the role of individual units in a deep neural network , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=
work page 2020
-
[34]
Proceedings of the European Conference on Computer Vision , pages=
Interpretable basis decomposition for visual explanation , author=. Proceedings of the European Conference on Computer Vision , pages=. 2018 , organization=
work page 2018
-
[35]
Proceedings of the National Academy of Sciences , volume=
Performance-optimized hierarchical models predict neural responses in higher visual cortex , author=. Proceedings of the National Academy of Sciences , volume=. 2014 , publisher=
work page 2014
-
[36]
The UK Biobank resource with deep phenotyping and genomic data
Bycroft, Clare and Freeman, Colin and Petkova, Desislava and Band, Gavin and Elliott, Lloyd T and Sharp, Kevin and Motyer, Allan and Vukcevic, Damjan and Delaneau, Olivier and O'Connell, Jared and Cortes, Adrian and Welsh, Samantha and Young, Alan and Effingham, Mark and McVean, Gil and Leslie, Stephen and Allen, Naomi and Donnelly, Peter and Marchini, Jo...
-
[37]
Highly accurate protein structure prediction with AlphaFold
Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and Žídek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and J...
-
[38]
Foundation model of neural activity predicts response to new stimulus types
Wang, Eric Y and Fahey, Paul G and Ding, Zhuokun and Papadopoulos, Stelios and Ponder, Kayla and Weis, Marissa A and Chang, Andersen and Muhammad, Taliah and Patel, Saumil and Ding, Zhiwei and Tran, Dat and Fu, Jiakun and Schneider-Mizell, Casey M and MICrONS Consortium and Reid, R Clay and Collman, Forrest and da Costa, Nuno Maçarico and Franke, Katrin a...
-
[39]
Accurate medium-range global weather forecasting with 3D neural networks
Bi, Kaifeng and Xie, Lingxi and Zhang, Hengheng and Chen, Xin and Gu, Xiaotao and Tian, Qi. Accurate medium-range global weather forecasting with 3D neural networks. Nature
-
[40]
Demas, Jeffrey and Manley, Jason and Tejera, Frank and Barber, Kevin and Kim, Hyewon and Traub, Francisca Martínez and Chen, Brandon and Vaziri, Alipasha. High-speed, cortex-wide volumetric recording of neuroactivity at cellular resolution using light beads microscopy. Nat. Methods
-
[41]
Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings
Steinmetz, Nicholas A and Aydin, Cagatay and Lebedeva, Anna and Okun, Michael and Pachitariu, Marius and Bauza, Marius and Beau, Maxime and Bhagat, Jai and Böhm, Claudia and Broux, Martijn and Chen, Susu and Colonell, Jennifer and Gardner, Richard J and Karsh, Bill and Kloosterman, Fabian and Kostadinov, Dimitar and Mora-Lopez, Carolina and O'Callaghan, J...
- [42]
-
[43]
The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. 2024 , eprint=
work page 2024
- [44]
-
[45]
Functional connectomics spanning multiple areas of mouse visual cortex
MICrONS Consortium. Functional connectomics spanning multiple areas of mouse visual cortex. Nature
-
[46]
and Hong, Ha and Yamins, Daniel L
Cadieu, Charles F. and Hong, Ha and Yamins, Daniel L. K. and Pinto, Nicolas and Ardila, Diego and Solomon, Ethan A. and Majaj, Najib J. and DiCarlo, James J. , journal=. Deep neural networks rival the representation of primate. 2014 , publisher=
work page 2014
-
[47]
Neural population control via deep image synthesis , author=. Science , volume=. 2019 , publisher=
work page 2019
-
[48]
Inception loops discover what excites neurons most using deep predictive models , author=. Nature Neuroscience , volume=. 2019 , publisher=
work page 2019
-
[49]
Dual-feature selectivity enables bidirectional coding in visual cortical neurons
Franke, Katrin and Karantzas, Nikos and Willeke, Konstantin and Diamantaki, Maria and Ramakrishnan, Kandan and Bedel, Hasan Atakan and Elumalai, Pavithra and Restivo, Kelli and Fahey, Paul and Nealley, Cate and Shinn, Tori and Garcia, Gabrielle and Patel, Saumil and Ecker, Alexander and Walker, Edgar Y and Froudarakis, Emmanouil and Sanborn, Sophia and Si...
-
[50]
Interpreting Neurons in Deep Vision Networks with Language Models , author=. TMLR , year=
-
[51]
Advances in Neural Information Processing Systems , volume=
Compositional explanations of neurons , author=. Advances in Neural Information Processing Systems , volume=
-
[52]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Network dissection: Quantifying interpretability of deep visual representations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[53]
arXiv preprint arXiv:2204.10965 , year=
Clip-dissect: Automatic description of neuron representations in deep vision networks , author=. arXiv preprint arXiv:2204.10965 , year=
-
[54]
International conference on machine learning , pages=
Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=
work page 2021
-
[55]
International Conference on Machine Learning , pages=
Identifying interpretable subspaces in image representations , author=. International Conference on Machine Learning , pages=. 2023 , organization=
work page 2023
-
[56]
International Conference on Learning Representations , year=
Natural language descriptions of deep visual features , author=. International Conference on Learning Representations , year=
-
[57]
Rigorously assessing natural language explanations of neurons , author=. Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=
-
[58]
Advances in Neural Information Processing Systems , volume=
Find: A function description benchmark for evaluating interpretability methods , author=. Advances in Neural Information Processing Systems , volume=
-
[59]
2009 IEEE conference on computer vision and pattern recognition , pages=
Imagenet: A large-scale hierarchical image database , author=. 2009 IEEE conference on computer vision and pattern recognition , pages=. 2009 , organization=
work page 2009
-
[60]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Understanding deep image representations by inverting them , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
- [61]
-
[62]
European conference on computer vision , pages=
Visualizing and understanding convolutional networks , author=. European conference on computer vision , pages=. 2014 , organization=
work page 2014
-
[63]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Rich feature hierarchies for accurate object detection and semantic segmentation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[64]
Visualizing and Understanding Recurrent Networks
Visualizing and understanding recurrent networks , author=. arXiv preprint arXiv:1506.02078 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[65]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
What is one grain of sand in the desert? analyzing individual neurons in deep nlp models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
- [66]
-
[67]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Toward a visual concept vocabulary for gan latent space , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[68]
arXiv preprint arXiv:2310.05916 , year=
Interpreting clip's image representation via text-based decomposition , author=. arXiv preprint arXiv:2310.05916 , year=
-
[69]
Forty-first International Conference on Machine Learning , year=
A multimodal automated interpretability agent , author=. Forty-first International Conference on Machine Learning , year=
-
[70]
Proceedings of the 40th International Conference on Machine Learning , pages =
Identifying Interpretable Subspaces in Image Representations , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =
work page 2023
-
[71]
International conference on machine learning , pages=
Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) , author=. International conference on machine learning , pages=. 2018 , organization=
work page 2018
-
[72]
Analogs of Linguistic Structure in Deep Representations
Analogs of linguistic structure in deep representations , author=. arXiv preprint arXiv:1707.08139 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[73]
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
Gondur, Rabia and Stan, Patricia L and Smith, Matthew A and Cowley, Benjamin R. A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex. The Fourteenth International Conference on Learning Representations
-
[74]
The importance of mixed selectivity in complex cognitive tasks
Rigotti, Mattia and Barak, Omri and Warden, Melissa R and Wang, Xiao-Jing and Daw, Nathaniel D and Miller, Earl K and Fusi, Stefano. The importance of mixed selectivity in complex cognitive tasks. Nature
- [75]
-
[76]
Local vs distributed representations: What is the right basis for interpretability? , author=. International Conference on Learning Representations (ICLR) , year=. 2411.03993 , archivePrefix=
-
[77]
Willeke, Konstantin F and Restivo, Kelli and Franke, Katrin and Nix, Arne F and Cadena, Santiago A and Shinn, Tori and Nealley, Cate and Rodriguez, Gabrielle and Patel, Saumil and Ecker, Alexander S and Sinz, Fabian H and Tolias, Andreas S. Deep learning-driven characterization of single cell tuning in primate visual area V4 supports topological organizat...
-
[78]
What's ``up'' with vision-language models? Investigating their struggle with spatial reasoning
Kamath, Amita and Hessel, Jack and Chang, Kai-Wei. What's ``up'' with vision-language models? Investigating their struggle with spatial reasoning. arXiv [cs.CL]
-
[79]
SpatialVLM : Endowing Vision-Language Models with spatial reasoning capabilities
Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Driess, Danny and Florence, Pete and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei. SpatialVLM : Endowing Vision-Language Models with spatial reasoning capabilities. arXiv [cs.CV]
-
[80]
Mind the gap: Benchmarking spatial reasoning in Vision-Language Models
Stogiannidis, Ilias and McDonagh, Steven and Tsaftaris, Sotirios A. Mind the gap: Benchmarking spatial reasoning in Vision-Language Models. arXiv [cs.CV]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.