NeuroCogMap Reveals Cognitive Organization of Large Language Models

Chenyu Liu; Guoqi Li; Haolang Lu; Hao Sun; Ji-Rong Wen; Jun Xu; Kun Wang; Liang Pang; Qiang Ma; Qiankun Li

arxiv: 2607.00397 · v1 · pith:AAZ5IFBWnew · submitted 2026-07-01 · 🧬 q-bio.NC · cs.AI· cs.CL

NeuroCogMap Reveals Cognitive Organization of Large Language Models

Zhongxiang Sun , Haolang Lu , Qiang Ma , Qi Li , Qipeng Wang , Liang Pang , Chenyu Liu , Qiankun Li

show 6 more authors

Hao Sun Kun Wang Yi Zeng Jun Xu Guoqi Li Ji-Rong Wen

This is my paper

Pith reviewed 2026-07-02 02:31 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.CL

keywords NeuroCogMaplarge language modelsfunctional parcelscognitive organizationmodel failureshuman cortical responsesdecision-making strategies

0 comments

The pith

NeuroCogMap partitions LLM internal features into stable functional parcels that form a coherent organization partly conserved across models and linked to human cortical responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NeuroCogMap as a framework that divides the internal activations of large language models into functional parcels, each tied to specific cognitive roles and arranged in a hierarchy. These parcels show consistent boundaries and labels across different models, connect directly to model outputs, and mark distinct internal disruptions for failures such as hallucination and bias. The same parcels also yield better predictions of how human brains respond to natural language, especially in higher association areas, and surface hidden strategies used in decision tasks. A sympathetic reader would care because the work supplies a system-level map that treats artificial systems as having reproducible cognitive structure rather than opaque weights.

Core claim

NeuroCogMap organizes internal features of LLMs into functional parcels that form a stable and semantically coherent organization partly conserved across models. These parcels are functionally linked to model outputs, with major failures including hallucination, bias, refusal failure and sycophancy corresponding to distinct disruptions in representational and behavioural-control systems. The organization improves prediction of human cortical responses during naturalistic language comprehension, strongest in higher-order association cortex, and its internal signatures expose latent strategies that can refine classical models of human decision-making.

What carries the argument

NeuroCogMap, the cognitive neuroscience-inspired framework that partitions internal LLM features into functional parcels linked to interpretable functions, cognitive capabilities, and a cognitive hierarchy.

If this is right

Major LLM failures map to distinct disruptions inside the parcel organization, supplying internal signatures for detection and targeted fixes.
The parcels improve prediction accuracy for human cortical activity during language tasks, especially in higher-order areas.
Internal signatures from the parcels reveal latent strategies that can update classical models of human decision-making.
The organization is partly conserved across models, allowing functional comparisons between different LLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same parcel-based approach could be applied to other transformer-based systems to test whether cognitive-like organization is architecture-dependent.
If the parcels truly correspond to human systems, interventions that alter specific parcels in an LLM should produce predictable changes in both model behavior and alignment with brain data.
The conservation of parcels across models raises the possibility that training dynamics impose common organizational constraints independent of scale or data.

Load-bearing premise

The internal features of LLMs can be divided into parcels whose boundaries and functional labels remain stable and meaningfully similar to human cognitive systems instead of arising from the analysis method itself.

What would settle it

Re-running the parcel identification on a held-out set of models and finding that parcel boundaries shift substantially or that parcel labels no longer predict either model failures or human brain responses would falsify the stability and cross-system correspondence claims.

read the original abstract

Understanding how complex cognitive functions are organized within artificial systems is central to interpreting large language models (LLMs) and relating them to biological cognition. Yet although LLMs exhibit broad cognitive-like behaviours, it remains unclear whether their internal representations form reproducible functional systems that explain behaviour, failure and links to human cognition. Here we present NeuroCogMap, a cognitive neuroscience-inspired framework that organizes internal features of LLMs into functional parcels and links them to interpretable functions, cognitive capabilities and a cognitive hierarchy. These parcels form a stable and semantically coherent organization that is partly conserved across models and functionally linked to model outputs. Within this organization, major LLM failures, including hallucination, bias, refusal failure and sycophancy, correspond to distinct disruptions in representational and behavioural-control systems, yielding internal signatures for mechanism-guided detection and targeted intervention. Beyond model behaviour, NeuroCogMap improves prediction of human cortical responses during naturalistic language comprehension, with the strongest correspondence in higher-order association cortex. At the cognitive level, its internal signatures expose latent strategies that guide refinements of classical models of human decision-making. Together, these findings establish NeuroCogMap as a system-level framework for mapping functional organization in artificial systems and for relating this organization to human cortical function and cognitive behaviour.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NeuroCogMap claims a stable parcellation of LLM features into cognitive parcels with links to failures and human cortex, but the abstract supplies no methods or validation to assess whether any of it holds.

read the letter

The main point is that this paper introduces NeuroCogMap as a framework that organizes LLM internal features into functional parcels, claims those parcels are stable across models, ties major failures like hallucination and sycophancy to distinct disruptions, and reports better prediction of human cortical responses especially in association areas.

What is actually new is the named system-level framing that tries to give a reusable vocabulary for LLM internals plus explicit signatures for specific failure modes. The cross-link to human cortex and the suggestion that internal signatures can refine classical decision-making models are also presented as outcomes.

The paper does a decent job laying out why such a map could be useful for interpretability and for connecting artificial and biological cognition.

The soft spots are substantial and central. The abstract contains no description of how parcels are defined, what data or models were used, what statistics support stability or conservation, or any quantitative results on prediction improvement. Without those, it is impossible to tell whether the claimed organization is reproducible or method-dependent. The load-bearing assumption that parcels have stable boundaries and meaningful analogy to human cognitive systems therefore remains untested in the provided text.

This is for researchers at the AI-neuroscience boundary who are looking for high-level organizing ideas. A reader focused on concrete methods or reproducible findings will get little from it.

It deserves peer review once the full methods and results are available so referees can check the actual evidence.

Referee Report

2 major / 0 minor

Summary. The paper presents NeuroCogMap, a cognitive neuroscience-inspired framework that partitions internal features of large language models into functional parcels. These parcels are claimed to form a stable, semantically coherent organization partly conserved across models, to correspond to distinct disruptions underlying failures such as hallucination, bias, refusal failure and sycophancy, to improve prediction of human cortical responses during language comprehension (especially in association cortex), and to expose latent strategies relevant to models of human decision-making.

Significance. If the parcellation is shown to be robust, reproducible, and not an artifact of the chosen analysis pipeline, the work could supply a concrete bridge between mechanistic interpretability in LLMs and systems-level cognitive neuroscience. The potential to link internal representational disruptions to specific behavioral failures and to improve brain-activity prediction would be of broad interest. However, the absence of any methodological equations, hyperparameter specifications, statistical validation, or quantitative results in the provided text prevents assessment of whether these correspondences are load-bearing or method-dependent.

major comments (2)

[Abstract] Abstract (and entire manuscript): No equations, algorithms, distance metrics, clustering procedures, or hyperparameter choices are supplied for the parcellation step that defines the functional parcels. Without these details the central claim that the parcels possess stable boundaries and reproducible functional labels cannot be evaluated for robustness or circularity.
[Abstract] Abstract: The statements that NeuroCogMap 'improves prediction of human cortical responses' and that 'major LLM failures correspond to distinct disruptions' are presented without any reported correlation coefficients, baseline comparisons, cross-validation statistics, or effect sizes. These quantitative claims are load-bearing for the significance of the framework yet remain unsupported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which identify key areas where additional detail will strengthen the manuscript. We address each major comment below and will incorporate the requested information in a revised version.

read point-by-point responses

Referee: [Abstract] Abstract (and entire manuscript): No equations, algorithms, distance metrics, clustering procedures, or hyperparameter choices are supplied for the parcellation step that defines the functional parcels. Without these details the central claim that the parcels possess stable boundaries and reproducible functional labels cannot be evaluated for robustness or circularity.

Authors: We agree that these methodological details are absent from the provided text. We will revise the manuscript to include a dedicated Methods section specifying the equations for feature extraction and similarity computation, the distance metric (cosine similarity), the clustering algorithm (agglomerative hierarchical clustering with Ward linkage), and hyperparameter selection (number of parcels chosen via silhouette score maximization on held-out data). Stability will be quantified with bootstrap resampling and cross-model adjusted Rand indices. revision: yes
Referee: [Abstract] Abstract: The statements that NeuroCogMap 'improves prediction of human cortical responses' and that 'major LLM failures correspond to distinct disruptions' are presented without any reported correlation coefficients, baseline comparisons, cross-validation statistics, or effect sizes. These quantitative claims are load-bearing for the significance of the framework yet remain unsupported.

Authors: We agree that the abstract lacks the specific quantitative metrics. We will revise the abstract to report key statistics (e.g., prediction correlations, effect sizes) and will add a Results subsection that includes baseline comparisons, cross-validation procedures, and all relevant coefficients and p-values to support the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected from available text

full rationale

The manuscript text supplied consists solely of the abstract, which outlines the NeuroCogMap framework at a conceptual level without any equations, methodological procedures, fitting steps, or derivation chains that could be inspected for self-definition, fitted-input predictions, or self-citation load-bearing. No specific quotes exist to exhibit a reduction of any claimed prediction or functional parcel to its own inputs by construction. The central claims about stable parcels, links to failures, and improved cortical prediction are described at high level only, with no load-bearing steps shown that collapse into the inputs. This is the most common honest finding when methodological details are absent; the derivation is therefore treated as self-contained on the evidence provided.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit parameters, axioms, or invented entities; the framework itself appears to introduce 'functional parcels' as a new organizational unit whose definition and validation details are not supplied.

pith-pipeline@v0.9.1-grok · 5791 in / 1183 out tokens · 31297 ms · 2026-07-02T02:31:31.493495+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

164 extracted references · 10 canonical work pages · 6 internal anchors

[1]

Neuroscience- inspired artificial intelligence.Neuron, 95(2):245–258, 2017

Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. Neuroscience- inspired artificial intelligence.Neuron, 95(2):245–258, 2017

2017
[2]

A deep learning framework for neuroscience.Nature neuroscience, 22(11):1761–1770, 2019

Blake A Richards, Timothy P Lillicrap, Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Chris- tensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, et al. A deep learning framework for neuroscience.Nature neuroscience, 22(11):1761–1770, 2019

2019
[3]

A survey of large language models.Frontiers of Computer Science, 20(12):2012627, 2026

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Zican Dong, Yupeng Hou, Beichen Zhang, Yingqian Min, Junjie Zhang, Peiyu Liu, et al. A survey of large language models.Frontiers of Computer Science, 20(12):2012627, 2026

2026
[4]

Deep neural networks as scientific models.Trends in cognitive sciences, 23(4):305–317, 2019

Radoslaw M Cichy and Daniel Kaiser. Deep neural networks as scientific models.Trends in cognitive sciences, 23(4):305–317, 2019

2019
[5]

Evaluating large language models in theory of mind tasks.Proceedings of the National Academy of Sciences, 121(45):e2405460121, 2024

Michal Kosinski. Evaluating large language models in theory of mind tasks.Proceedings of the National Academy of Sciences, 121(45):e2405460121, 2024

2024
[6]

Larger and more instructable language models become less reliable.Nature, 634(8032):61–68, 2024

Lexin Zhou, Wout Schellaert, Fernando Martínez-Plumed, Yael Moros-Daval, Cèsar Ferri, and José Hernández-Orallo. Larger and more instructable language models become less reliable.Nature, 634(8032):61–68, 2024

2024
[7]

Language models transmit behavioural traits through hidden signals in data.Nature, 652(8110):615–621, 2026

Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Sören Mindermann, Jacob Hilton, Samuel Marks, and Owain Evans. Language models transmit behavioural traits through hidden signals in data.Nature, 652(8110):615–621, 2026

2026
[8]

Vempala, and Edwin Zhang

Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. Evaluating large language models for accuracy incentivizes hallucinations.Nature, April 2026

2026
[9]

Knowledge neurons in pretrained transformers

Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, and Furu Wei. Knowledge neurons in pretrained transformers. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, 2022

2022
[10]

Sparse autoen- coders find highly interpretable features in language models

Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. Sparse autoen- coders find highly interpretable features in language models. InThe Twelfth International Conference on Learning Representations, 2023. 27 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2023
[11]

Toward universal steering and monitoring of ai models.Science, 391(6787):787–792, 2026

Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adsera, and Mikhail Belkin. Toward universal steering and monitoring of ai models.Science, 391(6787):787–792, 2026

2026
[12]

Causal abstraction: A theoretical foundation for mechanistic interpretability.Journal of Machine Learning Research, 26(83):1–64, 2025

Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, et al. Causal abstraction: A theoretical foundation for mechanistic interpretability.Journal of Machine Learning Research, 26(83):1–64, 2025

2025
[13]

Mechanistic understanding and validation of large ai models with semanticlens.Nature Machine Intelligence, 7(9):1572–1585, 2025

Maximilian Dreyer, Jim Berend, Tobias Labarta, Johanna Vielhaben, Thomas Wiegand, Sebastian Lapuschkin, and Wojciech Samek. Mechanistic understanding and validation of large ai models with semanticlens.Nature Machine Intelligence, 7(9):1572–1585, 2025

2025
[14]

The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

BT Thomas Yeo, Fenna M Krienen, Jorge Sepulcre, Mert R Sabuncu, Danial Lashkari, Marisa Hollinshead, Joshua L Roffman, Jordan W Smoller, Lilla Zöllei, Jonathan R Polimeni, et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

2011
[15]

Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

Alexander Schaefer, Ru Kong, Evan M Gordon, Timothy O Laumann, Xi-Nian Zuo, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

2018
[16]

A multi-modal parcellation of human cerebral cortex.Nature, 536(7615):171–178, 2016

Matthew F Glasser, Timothy S Coalson, Emma C Robinson, Carl D Hacker, John Harwell, Essa Ya- coub, Kamil Ugurbil, Jesper Andersson, Christian F Beckmann, Mark Jenkinson, et al. A multi-modal parcellation of human cerebral cortex.Nature, 536(7615):171–178, 2016

2016
[17]

Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016

Alexander G Huth, Wendy A De Heer, Thomas L Griffiths, Frédéric E Theunissen, and Jack L Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016

2016
[18]

Human cognition involves the dynamic integration of neural activity and neuromodulatory systems.Nature neuroscience, 22(2):289–296, 2019

JamesMShine,MichaelBreakspear,PeterTBell,KaylenaAEhgoetzMartens,RichardShine,Oluwasanmi Koyejo, Olaf Sporns, and Russell A Poldrack. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems.Nature neuroscience, 22(2):289–296, 2019

2019
[19]

Bloom’s taxonomy of cognitive learning objectives.Journal of the Medical Library Association: JMLA, 103(3):152, 2015

Nancy E Adams. Bloom’s taxonomy of cognitive learning objectives.Journal of the Medical Library Association: JMLA, 103(3):152, 2015

2015
[20]

Using human brain lesions to infer function: a relic from a past era in the fmri age?Nature Reviews Neuroscience, 5(10):812–819, 2004

Chris Rorden and Hans-Otto Karnath. Using human brain lesions to infer function: a relic from a past era in the fmri age?Nature Reviews Neuroscience, 5(10):812–819, 2004

2004
[21]

Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

2022
[22]

Weight- sparse transformers have interpretable circuits.arXiv preprint arXiv:2511.13653, 2025

Leo Gao, Achyuta Rajaram, Jacob Coxon, Soham V Govande, Bowen Baker, and Dan Mossing. Weight- sparse transformers have interpretable circuits.arXiv preprint arXiv:2511.13653, 2025

work page arXiv 2025
[23]

Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2

Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, János Kramár, Anca Dragan, Rohin Shah, and Neel Nanda. Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2. InProceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 278–300, 2024

2024
[24]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. InInternational conference on machine learning, pages 2397–2430. PMLR, 2023. 28 NeuroCogM...

2023
[27]

Truthfulqa: Measuring how models mimic human falsehoods

Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 3214–3252, 2022

2022
[28]

Latent retrieval for weakly supervised open domain question answering

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering. InProceedings of the 57th annual meeting of the association for computational linguistics, pages 6086–6096, 2019

2019
[29]

Halueval: A large-scale hallucination evaluation benchmark for large language models

Junyi Li, Xiaoxue Cheng, Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. Halueval: A large-scale hallucination evaluation benchmark for large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464, 2023

2023
[30]

Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models

Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, and Ying Ding. Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 2858–2873, 2025

2025
[31]

Free dolly: Introducing the world’s first truly open instruction-tuned llm

Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold Xin. Free dolly: Introducing the world’s first truly open instruction-tuned llm. https://www.databricks.com/blog/2023/04/12/ dolly-first-open-commercially-viable-instruction-tuned-llm , 2023. Accessed: 2023-06-30

2023
[32]

Crowdsourcing multiple choice science questions

Johannes Welbl, Nelson F Liu, and Matt Gardner. Crowdsourcing multiple choice science questions. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 94–106, 2017

2017
[33]

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

Potsawee Manakul, Adian Liusie, and Mark Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 9004–9017, 2023

2023
[34]

Real- time detection of hallucinated entities in long-form generation.arXiv preprint arXiv:2509.03531, 2025

Oscar Obeso, Andy Arditi, Javier Ferrando, Joshua Freeman, Cameron Holmes, and Neel Nanda. Real- time detection of hallucinated entities in long-form generation.arXiv preprint arXiv:2509.03531, 2025

work page arXiv 2025
[35]

Uncertainty estimation in autoregressive structured prediction

Andrey Malinin and Mark Gales. Uncertainty estimation in autoregressive structured prediction. In International Conference on Learning Representations, 2021

2021
[36]

Out-of-distribution detection and selective generation for conditional language models

Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, and Peter J Liu. Out-of-distribution detection and selective generation for conditional language models. In The Eleventh International Conference on Learning Representations, 2023

2023
[37]

Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

2024
[38]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Jailbreakbench: An open robustness benchmark for jailbreaking large language models.Advances in Neural Information Processing Systems, 37:55005–55029, 2024

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J Pappas, Florian Tramer, et al. Jailbreakbench: An open robustness benchmark for jailbreaking large language models.Advances in Neural Information Processing Systems, 37:55005–55029, 2024

2024
[40]

Baseline defenses for adversarial attacks against aligned language models, 2024

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models, 2024

2024
[41]

Single- pass detection of jailbreaking input in large language models.Transactions on Machine Learning Research, 2025

Leyla Naz Candogan, Yongtao Wu, Elias Abad Rocamora, Grigorios Chrysos, and Volkan Cevher. Single- pass detection of jailbreaking input in large language models.Transactions on Machine Learning Research, 2025

2025
[42]

Smoothllm: Defending large language models against jailbreaking attacks.Transactions on Machine Learning Research, 2025

Alexander Robey, Eric Wong, Hamed Hassani, and George J Pappas. Smoothllm: Defending large language models against jailbreaking attacks.Transactions on Machine Learning Research, 2025. 29 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2025
[43]

The neural architecture of language: Inte- grative modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021

Martin Schrimpf, Idan Asher Blank, Greta Tuckute, Carina Kauf, Eghbal A Hosseini, Nancy Kan- wisher, Joshua B Tenenbaum, and Evelina Fedorenko. The neural architecture of language: Inte- grative modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021

2021
[44]

Shared functional specialization in transformer-based language models and the human brain.Nature communications, 15(1):5523, 2024

Sreejan Kumar, Theodore R Sumers, Takateru Yamakoshi, Ariel Goldstein, Uri Hasson, Kenneth A Norman, Thomas L Griffiths, Robert D Hawkins, and Samuel A Nastase. Shared functional specialization in transformer-based language models and the human brain.Nature communications, 15(1):5523, 2024

2024
[45]

Increasing alignment of large language models with language processing in the human brain.Nature computational science, 5(11):1080–1090, 2025

Changjiang Gao, Zhengwu Ma, Jiajun Chen, Ping Li, Shujian Huang, and Jixing Li. Increasing alignment of large language models with language processing in the human brain.Nature computational science, 5(11):1080–1090, 2025

2025
[46]

Instruction- tuning aligns llms to the human brain

Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, and Antoine Bosselut. Instruction- tuning aligns llms to the human brain. InFirst Conference on Language Modeling, 2024

2024
[47]

A natural language fmri dataset for voxelwise encoding models.Scientific Data, 10(1):555, 2023

Amanda LeBel, Lauren Wagner, Shailee Jain, Aneesh Adhikari-Desai, Bhavin Gupta, Allyson Morgenthal, Jerry Tang, Lixiang Xu, and Alexander G Huth. A natural language fmri dataset for voxelwise encoding models.Scientific Data, 10(1):555, 2023

2023
[48]

Toward a universal decoder of linguistic meaning from brain activation.Nature communications, 9(1):963, 2018

Francisco Pereira, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. Toward a universal decoder of linguistic meaning from brain activation.Nature communications, 9(1):963, 2018

2018
[49]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019
[50]

LITcoder: A General-Purpose Library for Building and Comparing Encoding Models

Taha Binhuraib, Ruimin Gao, and Anna A Ivanova. Litcoder: A general-purpose library for building and comparing encoding models.arXiv preprint arXiv:2509.09152, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[51]

David Thissen, Lynne Steinberg, and Daniel Kuang. Quick and easy implementation of the benjamini- hochberg procedure for controlling the false positive rate in multiple comparisons.Journal of educational and behavioral statistics, 27(1):77–83, 2002

2002
[52]

Mapping neurotransmitter systems to the structural and functional organization of the human neocortex.Nature neuroscience, 25(11):1569–1581, 2022

Justine Y Hansen, Golia Shafiei, Ross D Markello, Kelly Smart, Sylvia ML Cox, Martin Nørgaard, Vincent Beliveau, Yanjun Wu, Jean-Dominique Gallezot, Étienne Aumont, et al. Mapping neurotransmitter systems to the structural and functional organization of the human neocortex.Nature neuroscience, 25(11):1569–1581, 2022

2022
[53]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

Representational similarity analysis- connecting the branches of systems neuroscience.Frontiers in systems neuroscience, 2:249, 2008

Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. Representational similarity analysis- connecting the branches of systems neuroscience.Frontiers in systems neuroscience, 2:249, 2008

2008
[55]

Rethinking model-based and model-free influences on mental effort and striatal prediction errors.Nature Human Behaviour, 7(6):956–969, 2023

Carolina Feher da Silva, Gaia Lombardi, Micah Edelson, and Todd A Hare. Rethinking model-based and model-free influences on mental effort and striatal prediction errors.Nature Human Behaviour, 7(6):956–969, 2023

2023
[56]

When does model-based control pay off? PLoS computational biology, 12(8):e1005090, 2016

Wouter Kool, Fiery A Cushman, and Samuel J Gershman. When does model-based control pay off? PLoS computational biology, 12(8):e1005090, 2016

2016
[57]

Cost-benefit arbitration between multiple reinforcement-learning systems.Psychological science, 28(9):1321–1333, 2017

Wouter Kool, Samuel J Gershman, and Fiery A Cushman. Cost-benefit arbitration between multiple reinforcement-learning systems.Psychological science, 28(9):1321–1333, 2017

2017
[58]

A foundation model to predict and capture human cognition.Nature, 644(8078):1002–1009, 2025

Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, Noémi Éltető, et al. A foundation model to predict and capture human cognition.Nature, 644(8078):1002–1009, 2025. 30 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2025
[59]

Model-based fmri and its application to reward learning and decision making.Annals of the New York Academy of sciences, 1104(1):35–53, 2007

JOHN P O’DOHERTY, Alan Hampton, and Hackjin Kim. Model-based fmri and its application to reward learning and decision making.Annals of the New York Academy of sciences, 1104(1):35–53, 2007

2007
[60]

States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.Neuron, 66(4):585–595, 2010

Jan Gläscher, Nathaniel Daw, Peter Dayan, and John P O’Doherty. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.Neuron, 66(4):585–595, 2010

2010
[61]

Model-based influences on humans’ choices and striatal prediction errors.Neuron, 69(6):1204–1215, 2011

Nathaniel D Daw, Samuel J Gershman, Ben Seymour, Peter Dayan, and Raymond J Dolan. Model-based influences on humans’ choices and striatal prediction errors.Neuron, 69(6):1204–1215, 2011

2011
[62]

Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of Experimental Psychology: General, 152(1):268, 2023

Vencislav Popov and Hannah Dames. Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of Experimental Psychology: General, 152(1):268, 2023

2023
[63]

Generalized outcome-based strategy classification: Comparing deterministic and probabilistic choice models.Psychonomic bulletin & review, 21(6):1431–1443, 2014

Benjamin E Hilbig and Morten Moshagen. Generalized outcome-based strategy classification: Comparing deterministic and probabilistic choice models.Psychonomic bulletin & review, 21(6):1431–1443, 2014

2014
[64]

Deficits in category learning in older adults: Rule-based versus clustering accounts.Psychology and Aging, 32(5):473, 2017

Stephen P Badham, Adam N Sanborn, and Elizabeth A Maylor. Deficits in category learning in older adults: Rule-based versus clustering accounts.Psychology and Aging, 32(5):473, 2017

2017
[65]

The globalizability of temporal discounting.Nature Human Behaviour, 6(10):1386–1397, 2022

Kai Ruggeri, Amma Panin, Milica Vdovic, Bojana Većkalov, Nazeer Abdul-Salaam, Jascha Achterberg, Carla Akil, Jolly Amatya, Kanchan Amatya, Thomas Lind Andersen, et al. The globalizability of temporal discounting.Nature Human Behaviour, 6(10):1386–1397, 2022

2022
[66]

arm bandit task dataset (2022).URL osf

B Bahrami and J Navajas. arm bandit task dataset (2022).URL osf. io/f3t2a, 4

2022
[67]

A new look at the statistical model identification.IEEE transactions on automatic control, 19(6):716–723, 1974

Hirotugu Akaike. A new look at the statistical model identification.IEEE transactions on automatic control, 19(6):716–723, 1974

1974
[68]

Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023

2023
[69]

Bias and fairness in large language models: A survey

Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed. Bias and fairness in large language models: A survey. Computational linguistics, 50(3):1097–1179, 2024

2024
[70]

Defending chatgpt against jailbreak attack via self-reminders.Nature Machine Intelligence, 5(12):1486– 1496, 2023

Yueqi Xie, Jingwei Yi, Jiawei Shao, Justin Curl, Lingjuan Lyu, Qifeng Chen, Xing Xie, and Fangzhao Wu. Defending chatgpt against jailbreak attack via self-reminders.Nature Machine Intelligence, 5(12):1486– 1496, 2023

2023
[71]

Sycophantic ai decreases prosocial intentions and promotes dependence.Science, 391(6792):eaec8352, 2026

Myra Cheng, Cinoo Lee, Pranav Khadpe, Sunny Yu, Dyllan Han, and Dan Jurafsky. Sycophantic ai decreases prosocial intentions and promotes dependence.Science, 391(6792):eaec8352, 2026

2026
[72]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[73]

Poldrack, Aniket Kittur, Donald Kalar, Eric Miller, Christian Seppa, Yolanda Gil, Douglas S

Russell A. Poldrack, Aniket Kittur, Donald Kalar, Eric Miller, Christian Seppa, Yolanda Gil, Douglas S. Parker, Fred W. Sabb, and Robert M. Bilder. The cognitive atlas: Toward a knowledge foundation for cognitive neuroscience.Frontiers in Neuroinformatics, 5:17, 2011

2011
[74]

Poldrack, Thomas E

Tal Yarkoni, Russell A. Poldrack, Thomas E. Nichols, David C. Van Essen, and Tor D. Wager. Large-scale automated synthesis of human functional neuroimaging data.Nature Methods, 8:665–670, 2011

2011
[75]

Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of experimental psychology

Vencislav Popov and Hannah Dames. Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of experimental psychology. General, 2021

2021
[76]

A survey on multimodal large language models.National Science Review, 11(12):nwae403, 2024

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. A survey on multimodal large language models.National Science Review, 11(12):nwae403, 2024

2024
[77]

Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023. 31 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2023
[78]

Multimodal learning with next-token prediction for large multimodal models.Nature, pages 1–7, 2026

Xinlong Wang, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Zhen Li, Yuqi Wang, et al. Multimodal learning with next-token prediction for large multimodal models.Nature, pages 1–7, 2026

2026
[79]

A brain-wide map of neural activity during complex behaviour.Nature, 645:177–191, 2025

International Brain Laboratory, Dora Angelaki, Brandon Benson, Julius Benson, Daniel Birman, et al. A brain-wide map of neural activity during complex behaviour.Nature, 645:177–191, 2025

2025
[80]

Greaves, Leonardo Novelli, Sina Mansour L., Andrew Zalesky, and Adeel Razi

Matthew D. Greaves, Leonardo Novelli, Sina Mansour L., Andrew Zalesky, and Adeel Razi. Structurally informed models of directed brain connectivity.Nature Reviews Neuroscience, 26:23–41, 2025

2025

Showing first 80 references.

[1] [1]

Neuroscience- inspired artificial intelligence.Neuron, 95(2):245–258, 2017

Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. Neuroscience- inspired artificial intelligence.Neuron, 95(2):245–258, 2017

2017

[2] [2]

A deep learning framework for neuroscience.Nature neuroscience, 22(11):1761–1770, 2019

Blake A Richards, Timothy P Lillicrap, Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Chris- tensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, et al. A deep learning framework for neuroscience.Nature neuroscience, 22(11):1761–1770, 2019

2019

[3] [3]

A survey of large language models.Frontiers of Computer Science, 20(12):2012627, 2026

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Zican Dong, Yupeng Hou, Beichen Zhang, Yingqian Min, Junjie Zhang, Peiyu Liu, et al. A survey of large language models.Frontiers of Computer Science, 20(12):2012627, 2026

2026

[4] [4]

Deep neural networks as scientific models.Trends in cognitive sciences, 23(4):305–317, 2019

Radoslaw M Cichy and Daniel Kaiser. Deep neural networks as scientific models.Trends in cognitive sciences, 23(4):305–317, 2019

2019

[5] [5]

Evaluating large language models in theory of mind tasks.Proceedings of the National Academy of Sciences, 121(45):e2405460121, 2024

Michal Kosinski. Evaluating large language models in theory of mind tasks.Proceedings of the National Academy of Sciences, 121(45):e2405460121, 2024

2024

[6] [6]

Larger and more instructable language models become less reliable.Nature, 634(8032):61–68, 2024

Lexin Zhou, Wout Schellaert, Fernando Martínez-Plumed, Yael Moros-Daval, Cèsar Ferri, and José Hernández-Orallo. Larger and more instructable language models become less reliable.Nature, 634(8032):61–68, 2024

2024

[7] [7]

Language models transmit behavioural traits through hidden signals in data.Nature, 652(8110):615–621, 2026

Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Sören Mindermann, Jacob Hilton, Samuel Marks, and Owain Evans. Language models transmit behavioural traits through hidden signals in data.Nature, 652(8110):615–621, 2026

2026

[8] [8]

Vempala, and Edwin Zhang

Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. Evaluating large language models for accuracy incentivizes hallucinations.Nature, April 2026

2026

[9] [9]

Knowledge neurons in pretrained transformers

Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, and Furu Wei. Knowledge neurons in pretrained transformers. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, 2022

2022

[10] [10]

Sparse autoen- coders find highly interpretable features in language models

Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. Sparse autoen- coders find highly interpretable features in language models. InThe Twelfth International Conference on Learning Representations, 2023. 27 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2023

[11] [11]

Toward universal steering and monitoring of ai models.Science, 391(6787):787–792, 2026

Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adsera, and Mikhail Belkin. Toward universal steering and monitoring of ai models.Science, 391(6787):787–792, 2026

2026

[12] [12]

Causal abstraction: A theoretical foundation for mechanistic interpretability.Journal of Machine Learning Research, 26(83):1–64, 2025

Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, et al. Causal abstraction: A theoretical foundation for mechanistic interpretability.Journal of Machine Learning Research, 26(83):1–64, 2025

2025

[13] [13]

Mechanistic understanding and validation of large ai models with semanticlens.Nature Machine Intelligence, 7(9):1572–1585, 2025

Maximilian Dreyer, Jim Berend, Tobias Labarta, Johanna Vielhaben, Thomas Wiegand, Sebastian Lapuschkin, and Wojciech Samek. Mechanistic understanding and validation of large ai models with semanticlens.Nature Machine Intelligence, 7(9):1572–1585, 2025

2025

[14] [14]

The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

BT Thomas Yeo, Fenna M Krienen, Jorge Sepulcre, Mert R Sabuncu, Danial Lashkari, Marisa Hollinshead, Joshua L Roffman, Jordan W Smoller, Lilla Zöllei, Jonathan R Polimeni, et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

2011

[15] [15]

Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

Alexander Schaefer, Ru Kong, Evan M Gordon, Timothy O Laumann, Xi-Nian Zuo, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

2018

[16] [16]

A multi-modal parcellation of human cerebral cortex.Nature, 536(7615):171–178, 2016

Matthew F Glasser, Timothy S Coalson, Emma C Robinson, Carl D Hacker, John Harwell, Essa Ya- coub, Kamil Ugurbil, Jesper Andersson, Christian F Beckmann, Mark Jenkinson, et al. A multi-modal parcellation of human cerebral cortex.Nature, 536(7615):171–178, 2016

2016

[17] [17]

Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016

Alexander G Huth, Wendy A De Heer, Thomas L Griffiths, Frédéric E Theunissen, and Jack L Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016

2016

[18] [18]

Human cognition involves the dynamic integration of neural activity and neuromodulatory systems.Nature neuroscience, 22(2):289–296, 2019

JamesMShine,MichaelBreakspear,PeterTBell,KaylenaAEhgoetzMartens,RichardShine,Oluwasanmi Koyejo, Olaf Sporns, and Russell A Poldrack. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems.Nature neuroscience, 22(2):289–296, 2019

2019

[19] [19]

Bloom’s taxonomy of cognitive learning objectives.Journal of the Medical Library Association: JMLA, 103(3):152, 2015

Nancy E Adams. Bloom’s taxonomy of cognitive learning objectives.Journal of the Medical Library Association: JMLA, 103(3):152, 2015

2015

[20] [20]

Using human brain lesions to infer function: a relic from a past era in the fmri age?Nature Reviews Neuroscience, 5(10):812–819, 2004

Chris Rorden and Hans-Otto Karnath. Using human brain lesions to infer function: a relic from a past era in the fmri age?Nature Reviews Neuroscience, 5(10):812–819, 2004

2004

[21] [21]

Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt.Advances in neural information processing systems, 35:17359–17372, 2022

2022

[22] [22]

Weight- sparse transformers have interpretable circuits.arXiv preprint arXiv:2511.13653, 2025

Leo Gao, Achyuta Rajaram, Jacob Coxon, Soham V Govande, Bowen Baker, and Dan Mossing. Weight- sparse transformers have interpretable circuits.arXiv preprint arXiv:2511.13653, 2025

work page arXiv 2025

[23] [23]

Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2

Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, János Kramár, Anca Dragan, Rohin Shah, and Neel Nanda. Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2. InProceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 278–300, 2024

2024

[24] [24]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[26] [26]

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. InInternational conference on machine learning, pages 2397–2430. PMLR, 2023. 28 NeuroCogM...

2023

[27] [27]

Truthfulqa: Measuring how models mimic human falsehoods

Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 3214–3252, 2022

2022

[28] [28]

Latent retrieval for weakly supervised open domain question answering

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering. InProceedings of the 57th annual meeting of the association for computational linguistics, pages 6086–6096, 2019

2019

[29] [29]

Halueval: A large-scale hallucination evaluation benchmark for large language models

Junyi Li, Xiaoxue Cheng, Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. Halueval: A large-scale hallucination evaluation benchmark for large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464, 2023

2023

[30] [30]

Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models

Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, and Ying Ding. Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 2858–2873, 2025

2025

[31] [31]

Free dolly: Introducing the world’s first truly open instruction-tuned llm

Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold Xin. Free dolly: Introducing the world’s first truly open instruction-tuned llm. https://www.databricks.com/blog/2023/04/12/ dolly-first-open-commercially-viable-instruction-tuned-llm , 2023. Accessed: 2023-06-30

2023

[32] [32]

Crowdsourcing multiple choice science questions

Johannes Welbl, Nelson F Liu, and Matt Gardner. Crowdsourcing multiple choice science questions. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 94–106, 2017

2017

[33] [33]

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

Potsawee Manakul, Adian Liusie, and Mark Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 9004–9017, 2023

2023

[34] [34]

Real- time detection of hallucinated entities in long-form generation.arXiv preprint arXiv:2509.03531, 2025

Oscar Obeso, Andy Arditi, Javier Ferrando, Joshua Freeman, Cameron Holmes, and Neel Nanda. Real- time detection of hallucinated entities in long-form generation.arXiv preprint arXiv:2509.03531, 2025

work page arXiv 2025

[35] [35]

Uncertainty estimation in autoregressive structured prediction

Andrey Malinin and Mark Gales. Uncertainty estimation in autoregressive structured prediction. In International Conference on Learning Representations, 2021

2021

[36] [36]

Out-of-distribution detection and selective generation for conditional language models

Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, and Peter J Liu. Out-of-distribution detection and selective generation for conditional language models. In The Eleventh International Conference on Learning Representations, 2023

2023

[37] [37]

Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

2024

[38] [38]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

Jailbreakbench: An open robustness benchmark for jailbreaking large language models.Advances in Neural Information Processing Systems, 37:55005–55029, 2024

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J Pappas, Florian Tramer, et al. Jailbreakbench: An open robustness benchmark for jailbreaking large language models.Advances in Neural Information Processing Systems, 37:55005–55029, 2024

2024

[40] [40]

Baseline defenses for adversarial attacks against aligned language models, 2024

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models, 2024

2024

[41] [41]

Single- pass detection of jailbreaking input in large language models.Transactions on Machine Learning Research, 2025

Leyla Naz Candogan, Yongtao Wu, Elias Abad Rocamora, Grigorios Chrysos, and Volkan Cevher. Single- pass detection of jailbreaking input in large language models.Transactions on Machine Learning Research, 2025

2025

[42] [42]

Smoothllm: Defending large language models against jailbreaking attacks.Transactions on Machine Learning Research, 2025

Alexander Robey, Eric Wong, Hamed Hassani, and George J Pappas. Smoothllm: Defending large language models against jailbreaking attacks.Transactions on Machine Learning Research, 2025. 29 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2025

[43] [43]

The neural architecture of language: Inte- grative modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021

Martin Schrimpf, Idan Asher Blank, Greta Tuckute, Carina Kauf, Eghbal A Hosseini, Nancy Kan- wisher, Joshua B Tenenbaum, and Evelina Fedorenko. The neural architecture of language: Inte- grative modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021

2021

[44] [44]

Shared functional specialization in transformer-based language models and the human brain.Nature communications, 15(1):5523, 2024

Sreejan Kumar, Theodore R Sumers, Takateru Yamakoshi, Ariel Goldstein, Uri Hasson, Kenneth A Norman, Thomas L Griffiths, Robert D Hawkins, and Samuel A Nastase. Shared functional specialization in transformer-based language models and the human brain.Nature communications, 15(1):5523, 2024

2024

[45] [45]

Increasing alignment of large language models with language processing in the human brain.Nature computational science, 5(11):1080–1090, 2025

Changjiang Gao, Zhengwu Ma, Jiajun Chen, Ping Li, Shujian Huang, and Jixing Li. Increasing alignment of large language models with language processing in the human brain.Nature computational science, 5(11):1080–1090, 2025

2025

[46] [46]

Instruction- tuning aligns llms to the human brain

Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, and Antoine Bosselut. Instruction- tuning aligns llms to the human brain. InFirst Conference on Language Modeling, 2024

2024

[47] [47]

A natural language fmri dataset for voxelwise encoding models.Scientific Data, 10(1):555, 2023

Amanda LeBel, Lauren Wagner, Shailee Jain, Aneesh Adhikari-Desai, Bhavin Gupta, Allyson Morgenthal, Jerry Tang, Lixiang Xu, and Alexander G Huth. A natural language fmri dataset for voxelwise encoding models.Scientific Data, 10(1):555, 2023

2023

[48] [48]

Toward a universal decoder of linguistic meaning from brain activation.Nature communications, 9(1):963, 2018

Francisco Pereira, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. Toward a universal decoder of linguistic meaning from brain activation.Nature communications, 9(1):963, 2018

2018

[49] [49]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019

[50] [50]

LITcoder: A General-Purpose Library for Building and Comparing Encoding Models

Taha Binhuraib, Ruimin Gao, and Anna A Ivanova. Litcoder: A general-purpose library for building and comparing encoding models.arXiv preprint arXiv:2509.09152, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[51] [51]

David Thissen, Lynne Steinberg, and Daniel Kuang. Quick and easy implementation of the benjamini- hochberg procedure for controlling the false positive rate in multiple comparisons.Journal of educational and behavioral statistics, 27(1):77–83, 2002

2002

[52] [52]

Mapping neurotransmitter systems to the structural and functional organization of the human neocortex.Nature neuroscience, 25(11):1569–1581, 2022

Justine Y Hansen, Golia Shafiei, Ross D Markello, Kelly Smart, Sylvia ML Cox, Martin Nørgaard, Vincent Beliveau, Yanjun Wu, Jean-Dominique Gallezot, Étienne Aumont, et al. Mapping neurotransmitter systems to the structural and functional organization of the human neocortex.Nature neuroscience, 25(11):1569–1581, 2022

2022

[53] [53]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[54] [54]

Representational similarity analysis- connecting the branches of systems neuroscience.Frontiers in systems neuroscience, 2:249, 2008

Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. Representational similarity analysis- connecting the branches of systems neuroscience.Frontiers in systems neuroscience, 2:249, 2008

2008

[55] [55]

Rethinking model-based and model-free influences on mental effort and striatal prediction errors.Nature Human Behaviour, 7(6):956–969, 2023

Carolina Feher da Silva, Gaia Lombardi, Micah Edelson, and Todd A Hare. Rethinking model-based and model-free influences on mental effort and striatal prediction errors.Nature Human Behaviour, 7(6):956–969, 2023

2023

[56] [56]

When does model-based control pay off? PLoS computational biology, 12(8):e1005090, 2016

Wouter Kool, Fiery A Cushman, and Samuel J Gershman. When does model-based control pay off? PLoS computational biology, 12(8):e1005090, 2016

2016

[57] [57]

Cost-benefit arbitration between multiple reinforcement-learning systems.Psychological science, 28(9):1321–1333, 2017

Wouter Kool, Samuel J Gershman, and Fiery A Cushman. Cost-benefit arbitration between multiple reinforcement-learning systems.Psychological science, 28(9):1321–1333, 2017

2017

[58] [58]

A foundation model to predict and capture human cognition.Nature, 644(8078):1002–1009, 2025

Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, Noémi Éltető, et al. A foundation model to predict and capture human cognition.Nature, 644(8078):1002–1009, 2025. 30 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2025

[59] [59]

Model-based fmri and its application to reward learning and decision making.Annals of the New York Academy of sciences, 1104(1):35–53, 2007

JOHN P O’DOHERTY, Alan Hampton, and Hackjin Kim. Model-based fmri and its application to reward learning and decision making.Annals of the New York Academy of sciences, 1104(1):35–53, 2007

2007

[60] [60]

States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.Neuron, 66(4):585–595, 2010

Jan Gläscher, Nathaniel Daw, Peter Dayan, and John P O’Doherty. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.Neuron, 66(4):585–595, 2010

2010

[61] [61]

Model-based influences on humans’ choices and striatal prediction errors.Neuron, 69(6):1204–1215, 2011

Nathaniel D Daw, Samuel J Gershman, Ben Seymour, Peter Dayan, and Raymond J Dolan. Model-based influences on humans’ choices and striatal prediction errors.Neuron, 69(6):1204–1215, 2011

2011

[62] [62]

Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of Experimental Psychology: General, 152(1):268, 2023

Vencislav Popov and Hannah Dames. Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of Experimental Psychology: General, 152(1):268, 2023

2023

[63] [63]

Generalized outcome-based strategy classification: Comparing deterministic and probabilistic choice models.Psychonomic bulletin & review, 21(6):1431–1443, 2014

Benjamin E Hilbig and Morten Moshagen. Generalized outcome-based strategy classification: Comparing deterministic and probabilistic choice models.Psychonomic bulletin & review, 21(6):1431–1443, 2014

2014

[64] [64]

Deficits in category learning in older adults: Rule-based versus clustering accounts.Psychology and Aging, 32(5):473, 2017

Stephen P Badham, Adam N Sanborn, and Elizabeth A Maylor. Deficits in category learning in older adults: Rule-based versus clustering accounts.Psychology and Aging, 32(5):473, 2017

2017

[65] [65]

The globalizability of temporal discounting.Nature Human Behaviour, 6(10):1386–1397, 2022

Kai Ruggeri, Amma Panin, Milica Vdovic, Bojana Većkalov, Nazeer Abdul-Salaam, Jascha Achterberg, Carla Akil, Jolly Amatya, Kanchan Amatya, Thomas Lind Andersen, et al. The globalizability of temporal discounting.Nature Human Behaviour, 6(10):1386–1397, 2022

2022

[66] [66]

arm bandit task dataset (2022).URL osf

B Bahrami and J Navajas. arm bandit task dataset (2022).URL osf. io/f3t2a, 4

2022

[67] [67]

A new look at the statistical model identification.IEEE transactions on automatic control, 19(6):716–723, 1974

Hirotugu Akaike. A new look at the statistical model identification.IEEE transactions on automatic control, 19(6):716–723, 1974

1974

[68] [68]

Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1–38, 2023

2023

[69] [69]

Bias and fairness in large language models: A survey

Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed. Bias and fairness in large language models: A survey. Computational linguistics, 50(3):1097–1179, 2024

2024

[70] [70]

Defending chatgpt against jailbreak attack via self-reminders.Nature Machine Intelligence, 5(12):1486– 1496, 2023

Yueqi Xie, Jingwei Yi, Jiawei Shao, Justin Curl, Lingjuan Lyu, Qifeng Chen, Xing Xie, and Fangzhao Wu. Defending chatgpt against jailbreak attack via self-reminders.Nature Machine Intelligence, 5(12):1486– 1496, 2023

2023

[71] [71]

Sycophantic ai decreases prosocial intentions and promotes dependence.Science, 391(6792):eaec8352, 2026

Myra Cheng, Cinoo Lee, Pranav Khadpe, Sunny Yu, Dyllan Han, and Dan Jurafsky. Sycophantic ai decreases prosocial intentions and promotes dependence.Science, 391(6792):eaec8352, 2026

2026

[72] [72]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017

[73] [73]

Poldrack, Aniket Kittur, Donald Kalar, Eric Miller, Christian Seppa, Yolanda Gil, Douglas S

Russell A. Poldrack, Aniket Kittur, Donald Kalar, Eric Miller, Christian Seppa, Yolanda Gil, Douglas S. Parker, Fred W. Sabb, and Robert M. Bilder. The cognitive atlas: Toward a knowledge foundation for cognitive neuroscience.Frontiers in Neuroinformatics, 5:17, 2011

2011

[74] [74]

Poldrack, Thomas E

Tal Yarkoni, Russell A. Poldrack, Thomas E. Nichols, David C. Van Essen, and Tor D. Wager. Large-scale automated synthesis of human functional neuroimaging data.Nature Methods, 8:665–670, 2011

2011

[75] [75]

Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of experimental psychology

Vencislav Popov and Hannah Dames. Intent matters: Resolving the intentional versus incidental learning paradox in episodic long-term memory.Journal of experimental psychology. General, 2021

2021

[76] [76]

A survey on multimodal large language models.National Science Review, 11(12):nwae403, 2024

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. A survey on multimodal large language models.National Science Review, 11(12):nwae403, 2024

2024

[77] [77]

Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023. 31 NeuroCogMap Reveals Cognitive Organization of Large Language Models

2023

[78] [78]

Multimodal learning with next-token prediction for large multimodal models.Nature, pages 1–7, 2026

Xinlong Wang, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Zhen Li, Yuqi Wang, et al. Multimodal learning with next-token prediction for large multimodal models.Nature, pages 1–7, 2026

2026

[79] [79]

A brain-wide map of neural activity during complex behaviour.Nature, 645:177–191, 2025

International Brain Laboratory, Dora Angelaki, Brandon Benson, Julius Benson, Daniel Birman, et al. A brain-wide map of neural activity during complex behaviour.Nature, 645:177–191, 2025

2025

[80] [80]

Greaves, Leonardo Novelli, Sina Mansour L., Andrew Zalesky, and Adeel Razi

Matthew D. Greaves, Leonardo Novelli, Sina Mansour L., Andrew Zalesky, and Adeel Razi. Structurally informed models of directed brain connectivity.Nature Reviews Neuroscience, 26:23–41, 2025

2025