Artificial Phantasia: Emergent Mental Imagery in Large Language Models
Pith reviewed 2026-05-21 21:37 UTC · model grok-4.3
The pith
Large language models outperform humans on tasks that require imagining compositional letter and shape transformations, pointing to an emergent form of mental imagery that relies on language rather than pictures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The best LLMs achieved markedly higher accuracy than human participants on novel items that require mentally composing letter and shape transformations, with statistical significance at p < .0001. Accuracy improved when reasoning models were given more tokens for step-by-step linguistic manipulation. These results support the existence of an emergent, non-pictorial mental imagery capacity in LLMs that can be driven entirely by language.
What carries the argument
An extended version of a classic mental-imagery task in which subjects must imagine successive compositional transformations of letters and shapes and then identify the resulting figure.
If this is right
- Language alone can be sufficient for solving tasks previously assumed to need pictorial imagery.
- Longer reasoning chains improve performance, showing a direct linguistic contribution to the imagery-like behavior.
- LLMs may possess an emergent cognitive capacity that functions without internal pictures.
- Traditional debates about whether mental imagery must be pictorial are reopened by the existence of this non-pictorial alternative.
Where Pith is reading between the lines
- If language-based operations can mimic visual imagery results, then some human imagery tasks might also be solved propositionally under certain conditions.
- The finding invites experiments that isolate whether LLMs are truly simulating spatial relations or simply exploiting statistical patterns in textual descriptions of shapes.
- Designers of future AI systems could deliberately train or prompt models to use extended linguistic chains for tasks that currently rely on vision modules.
Load-bearing premise
That the chosen transformation tasks cannot be solved by language-based reasoning and instead require pictorial mental representations.
What would settle it
A controlled version of the same tasks in which language-based shortcuts are removed or blocked and LLM performance drops to or below human levels.
Figures
read the original abstract
Can visual imagery be driven solely by language? This idea goes against cognitive science's traditional view that visual mental imagery is only possible through pictorial representations. Large Language Models (LLMs) provide nascent evidence not only that visual mental imagery via propositional-representations is possible, but that it can be more robust than human imagination. We created dozens of novel items for an extension to a classic task which is argued to be solvable exclusively via pictorial representations (i.e., language alone would be insufficient). Subjects were asked to imagine a series of compositional letter and shape transformations and identify the resultant "image". We found that the best LLMs performed significantly better than humans ($n = 100$ human participants, $p < .0001$), indicating the existence of an artificial phantasia, or emergent "visual" mental imagery that may not be pictorial. Furthermore, we tested reasoning models with variable reasoning-token allocation and found that models perform best with longer reasoning chains, demonstrating a linguistic impact on the task -- language alone may be sufficient. We examined three emergent imagery hypotheses: pure propositional imagery, propositional imagery with visio-linguistic priors, or pictorial visual imagery (classical visual imagery). Our study not only presents evidence for a previously unreported emergent cognitive capacity of LLMs, but also reignites debate on the requirement for a pictorial format in mental imagery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that LLMs exhibit an emergent capacity for 'visual' mental imagery, termed artificial phantasia, which may operate via propositional rather than pictorial representations. This is supported by superior LLM performance over 100 human participants (p < .0001) on novel compositional letter/shape transformation tasks argued to require pictorial representations, with further evidence that longer reasoning chains improve results, challenging the necessity of pictorial formats in mental imagery.
Significance. If substantiated, the result would provide empirical grounds to revisit the long-standing assumption in cognitive science that visual mental imagery necessitates pictorial representations, while highlighting LLMs' capacity for robust task performance through language alone. The inclusion of variable reasoning-token tests and explicit hypothesis examination (propositional, visio-linguistic, or pictorial) adds concrete data, though the interpretation depends on validating the task premise.
major comments (3)
- [Abstract and Methods] The interpretation of LLM superiority as evidence for non-pictorial phantasia rests on the premise (Abstract) that the novel items 'are argued to be solvable exclusively via pictorial representations (i.e., language alone would be insufficient)'. No control experiments, verbal-strategy probes, or analysis demonstrating that propositional decomposition cannot reliably solve the items are reported, leaving open the possibility that humans underperform for reasons unrelated to imagery format (e.g., working memory or instruction compliance).
- [Results] The central performance claim (LLMs significantly better than humans, p < .0001) lacks reported details on exact task items, prompting protocols, controls for stochastic LLM output, or full statistical reporting (e.g., effect sizes, per-item breakdowns), which are required to evaluate whether the result is robust or sensitive to implementation choices.
- [Discussion] The three emergent imagery hypotheses (pure propositional, propositional with visio-linguistic priors, pictorial) are examined but without specific ablations or tests that would distinguish them; superior performance plus reasoning-length effects alone do not yet adjudicate between propositional sufficiency and any imagery format.
minor comments (2)
- [Introduction] Define 'artificial phantasia' with a concise operational contrast to classical pictorial imagery in the introduction to prevent terminological overlap.
- [Figures] Add error bars or confidence intervals to all performance figures and ensure legends clearly distinguish human vs. LLM conditions.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which has identified important areas for clarification and strengthening in our manuscript. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and Methods] The interpretation of LLM superiority as evidence for non-pictorial phantasia rests on the premise (Abstract) that the novel items 'are argued to be solvable exclusively via pictorial representations (i.e., language alone would be insufficient)'. No control experiments, verbal-strategy probes, or analysis demonstrating that propositional decomposition cannot reliably solve the items are reported, leaving open the possibility that humans underperform for reasons unrelated to imagery format (e.g., working memory or instruction compliance).
Authors: We accept this critique as valid. The task premise draws from established cognitive science literature on mental imagery transformations (e.g., mental rotation studies), but we did not include dedicated verbal-strategy probes or controls to isolate propositional solvability. In revision we will expand the Methods and Discussion sections with explicit references to the supporting literature, add a dedicated limitations paragraph addressing alternative explanations such as working memory load and instruction compliance, and note that future work could incorporate verbal probes. This constitutes a partial revision because new empirical controls cannot be added retroactively without additional data collection. revision: partial
-
Referee: [Results] The central performance claim (LLMs significantly better than humans, p < .0001) lacks reported details on exact task items, prompting protocols, controls for stochastic LLM output, or full statistical reporting (e.g., effect sizes, per-item breakdowns), which are required to evaluate whether the result is robust or sensitive to implementation choices.
Authors: We agree that greater transparency is required. The revised manuscript will include the complete set of task items and examples in the supplementary materials, a detailed account of prompting protocols (including system prompts and temperature settings), and explicit controls for stochasticity (multiple runs with fixed seeds where applicable). We will also report effect sizes, confidence intervals, and per-item accuracy breakdowns alongside the existing p-value to allow full assessment of robustness. revision: yes
-
Referee: [Discussion] The three emergent imagery hypotheses (pure propositional, propositional with visio-linguistic priors, pictorial) are examined but without specific ablations or tests that would distinguish them; superior performance plus reasoning-length effects alone do not yet adjudicate between propositional sufficiency and any imagery format.
Authors: We acknowledge that the current evidence, while suggestive, does not fully adjudicate among the three hypotheses. The observed benefit of longer reasoning chains supports propositional sufficiency but cannot rule out contributions from visio-linguistic priors or latent pictorial mechanisms. In the revised Discussion we will more explicitly delineate these limitations, clarify that our primary claim concerns the sufficiency of language-based processing, and outline targeted future experiments (e.g., visual-priming ablations and non-visual control tasks) that could distinguish the hypotheses. This will be a partial revision focused on interpretive framing rather than new empirical tests. revision: partial
Circularity Check
No circularity; central claim rests on direct empirical comparison
full rationale
The paper's derivation consists of creating novel task items, administering them to LLMs and 100 human participants, and reporting a statistically significant performance advantage for the best LLMs (p < .0001). This result is obtained from experimental data rather than any equation, fitted parameter, or self-citation that reduces the outcome to its own inputs by construction. The interpretation linking superior LLM performance to non-pictorial 'phantasia' draws on the classic task's prior literature and the observed benefit of longer reasoning chains, but introduces no self-definitional loop, fitted-input prediction, or load-bearing self-citation chain. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The classic task is solvable exclusively via pictorial representations and language alone would be insufficient.
invented entities (1)
-
artificial phantasia
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We found that the best LLMs performed significantly better than humans (n = 100 human participants, p < .0001), indicating the existence of an artificial phantasia, or emergent 'visual' mental imagery that may not be pictorial.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Bainbridge, Zoë Pounder, Alison F
Wilma A. Bainbridge, Zoë Pounder, Alison F. Eardley, and Chris I. Baker. Quantifying aphantasia through drawing: Those without visual imagery show deficits in object but not spatial memory. Cortex, 135: 0 159--172, 2021. ISSN 0010-9452. doi:10.1016/j.cortex.2020.11.014
-
[2]
Tell me about yourself: Llms are aware of their learned behaviors, 2025
Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, and Owain Evans. Tell me about yourself: Llms are aware of their learned behaviors, 2025. URL https://arxiv.org/abs/2501.11120
-
[3]
Eric J. Bigelow, John P. McCoy, and Tomer D. Ullman. Non-commitment in mental imagery. Cognition, 238: 0 105498, 2023
work page 2023
-
[4]
The Border between Seeing and Thinking
Ned Block. The Border between Seeing and Thinking. Oxford University Press, 2023
work page 2023
-
[5]
Aphantasia: In search of a theory
Andrea Blomkvist. Aphantasia: In search of a theory. Mind & Language, 38 0 (3): 0 866--888, 2023. doi:https://doi.org/10.1111/mila.12432. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/mila.12432
-
[6]
The key of the maze: The role of mental imagery and cognitive flexibility in navigational planning
Alessia Bocchi, Marika Carrieri, Stefania Lancia, Valentina Quaresima, and Laura Piccardi. The key of the maze: The role of mental imagery and cognitive flexibility in navigational planning. Neuroscience Letters, 651: 0 146--150, 2017. ISSN 0304-3940. doi:https://doi.org/10.1016/j.neulet.2017.05.009. URL https://www.sciencedirect.com/science/article/pii/S...
-
[7]
Smith, Yejin Choi, and Hannaneh Hajishirzi
Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, and Hannaneh Hajishirzi. The art of saying no: Contextual noncompliance in language models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. P...
work page 2024
-
[8]
Remembering the past and imagining the future: a neural model of spatial memory and imagery
Patrick Byrne, Suzanna Becker, and Neil Burgess. Remembering the past and imagining the future: a neural model of spatial memory and imagery. Psychological review, 114 0 (2): 0 340, 2007
work page 2007
-
[9]
Zirui Chen and Michael F. Bonner. Universal dimensions of visual representation. Science Advances, 11 0 (27): 0 eadw7697, 2025. doi:10.1126/sciadv.adw7697. URL https://www.science.org/doi/abs/10.1126/sciadv.adw7697
-
[10]
Arc prize 2024: Technical report, 2025
Francois Chollet, Mike Knoop, Gregory Kamradt, and Bryan Landers. Arc prize 2024: Technical report, 2025. URL https://arxiv.org/abs/2412.04604
-
[11]
C.J. Dance, A. Ipser, and J. Simner. The prevalence of aphantasia (imagery weakness) in the general population. Consciousness and Cognition, 97: 0 103243, 2022. ISSN 1053-8100. doi:https://doi.org/10.1016/j.concog.2021.103243. URL https://www.sciencedirect.com/science/article/pii/S1053810021001690
-
[12]
Dawes, Rebecca Keogh, Sarah Robuck, and Joel Pearson
Alexei J. Dawes, Rebecca Keogh, Sarah Robuck, and Joel Pearson. Memories with a blind mind: Remembering the past and imagining the future with aphantasia. Cognition, 227: 0 105192, 2022. ISSN 0010-0277. doi:10.1016/j.cognition.2022.105192
-
[13]
Using large language models in psychology
Dorottya Demszky, Diyi Yang, David S Yeager, Christopher J Bryan, Margarett Clapper, Susannah Chandhok, Johannes C Eichstaedt, Cameron Hecht, Jeremy Jamieson, Meghann Johnson, et al. Using large language models in psychology. Nature Reviews Psychology, 2 0 (11): 0 688--701, 2023
work page 2023
-
[14]
Shared neural mechanisms of visual perception and imagery
Nadine Dijkstra, Sander E Bosch, and Marcel AJ van Gerven. Shared neural mechanisms of visual perception and imagery. Trends in cognitive sciences, 23 0 (5): 0 423--434, 2019
work page 2019
-
[15]
A Survey on In-context Learning
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning, 2024. URL https://arxiv.org/abs/2301.00234
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[16]
Is visual imagery really visual? overlooked evidence from neuropsychology
Martha J Farah. Is visual imagery really visual? overlooked evidence from neuropsychology. Psychological review, 95 0 (3): 0 307, 1988
work page 1988
-
[17]
Jeanne Farrington. Seven plus or minus two. Performance Improvement Quarterly, 23 0 (4): 0 113--116, 2011. doi:https://doi.org/10.1002/piq.20099. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/piq.20099
-
[18]
Conflicting intuitions may be based on differing abilities: Evidence from mental imaging research
Bill Faw. Conflicting intuitions may be based on differing abilities: Evidence from mental imaging research. Journal of Consciousness Studies, 16: 0 45--68, 01 2009
work page 2009
-
[19]
Creative Imagery: Discoveries and Inventions in Visualization
Ronald Finke. Creative Imagery: Discoveries and Inventions in Visualization. Psychology Press, 1990
work page 1990
-
[20]
Reinterpreting visual patterns in mental imagery
Ronald A Finke, Steven Pinker, and Martha J Farah. Reinterpreting visual patterns in mental imagery. Cognitive Science, 13 0 (1): 0 51--78, 1989
work page 1989
-
[21]
Michael C. Frank and Noah D. Goodman. Cognitive modeling using artificial intelligence. Annual Review of Psychology, 2025. ISSN 0066-4308. doi:https://doi.org/10.1146/annurev-psych-030625-040748. URL https://www.annualreviews.org/content/journals/10.1146/annurev-psych-030625-040748
-
[22]
ImageBind : One embedding space to bind them all
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. ImageBind : One embedding space to bind them all. arXiv , 2023. doi:10.48550/arxiv.2305.05665
-
[23]
Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding, 2021. URL https://arxiv.org/abs/2009.03300
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[24]
Mental imagery in emotion and emotional disorders
Emily A Holmes and Andrew Mathews. Mental imagery in emotion and emotional disorders. Clinical psychology review, 30 0 (3): 0 349--362, 2010
work page 2010
-
[25]
T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation
Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, and Xihui Liu. T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp.\ 78723--78747. Curran Associates, Inc., 2023. URL http...
work page 2023
-
[26]
Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H
Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Akyürek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, and Jacob Andreas. Elements of world knowledge (ewok): A cognition-...
-
[27]
Language Models (Mostly) Know What They Know
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec,...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[28]
Lachlan Kay, Rebecca Keogh, and Joel Pearson. Slower but more accurate mental rotation performance in aphantasia linked to differences in cognitive strategies. Consciousness and Cognition, 121: 0 103694, 2024. ISSN 1053-8100. doi:https://doi.org/10.1016/j.concog.2024.103694. URL https://www.sciencedirect.com/science/article/pii/S1053810024000618
-
[29]
Visual working memory in aphantasia: Retained accuracy and capacity with a different strategy
Rebecca Keogh, Marcus Wicken, and Joel Pearson. Visual working memory in aphantasia: Retained accuracy and capacity with a different strategy. Cortex, 143: 0 237--253, 2021. ISSN 0010-9452. doi:https://doi.org/10.1016/j.cortex.2021.07.012. URL https://www.sciencedirect.com/science/article/pii/S0010945221002628
-
[30]
Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, and Shafiq Joty. xcodeeval: A large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval, 2023. URL https://arxiv.org/abs/2303.03004
-
[31]
Learning image embeddings using convolutional neural networks for improved multi-modal semantics
Douwe Kiela and L \'e on Bottou. Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In Alessandro Moschitti, Bo Pang, and Walter Daelemans (eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pp.\ 36--45, Doha, Qatar, October 2014. Association for Computat...
-
[32]
Mule: Multimodal universal language embedding
Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, and Bryan Plummer. Mule: Multimodal universal language embedding. Proceedings of the AAAI Conference on Artificial Intelligence, 34 0 (07): 0 11254--11261, Apr. 2020. doi:10.1609/aaai.v34i07.6785. URL https://ojs.aaai.org/index.php/AAAI/article/view/6785
-
[33]
Najoung Kim and Tal Linzen. COGS : A compositional generalization challenge based on semantic interpretation. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.\ 9087--9105, Online, November 2020. Association for Computational Linguistics. doi:10...
-
[34]
The N arrative QA reading comprehension challenge
Tom \'a s Ko c isk \'y , Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G \'a bor Melis, and Edward Grefenstette. The N arrative QA reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6: 0 317--328, 2018. doi:10.1162/tacl_a_00023. URL https://aclanthology.org/Q18-1023/
-
[35]
Image and brain: The resolution of the imagery debate
Stephen M Kosslyn. Image and brain: The resolution of the imagery debate. MIT Press, 1996
work page 1996
-
[36]
Scanning visual images: Some structural implications
Stephen Michael Kosslyn. Scanning visual images: Some structural implications. Perception & Psychophysics, 14 0 (1): 0 90--94, 1973
work page 1973
-
[37]
Looking at mental images: Eye-tracking mental simulation during retrospective causal judgment
Kristina Krasich, Kevin O'Neill, and Felipe De Brigard. Looking at mental images: Eye-tracking mental simulation during retrospective causal judgment. Cognitive Science, 48 0 (3): 0 e13426, 2024. doi:https://doi.org/10.1111/cogs.13426. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.13426
-
[38]
Phantasia, aphantasia, and hyperphantasia: Empirical data and conceptual considerations
AJ Larner, AP Leff, and PC Nachev. Phantasia, aphantasia, and hyperphantasia: Empirical data and conceptual considerations. Neuroscience & Biobehavioral Reviews, 164: 0 105819, 2024. ISSN 0149-7634. doi:https://doi.org/10.1016/j.neubiorev.2024.105819. URL https://www.sciencedirect.com/science/article/pii/S0149763424002884
-
[39]
Alex Lawsen. Comment on the illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity, 2025. URL https://arxiv.org/abs/2506.09250
-
[40]
Revisiting the mental imagery debate: New evidence from aphantasia and neuroimaging, Sep 2025
Florent Lebon. Revisiting the mental imagery debate: New evidence from aphantasia and neuroimaging, Sep 2025. URL osf.io/preprints/psyarxiv/cfh85_v1
work page 2025
-
[41]
Cognitively inspired interpretability in large neural networks
Anna Leshinskaya, Taylor Webb, Ellie Pavlick, Jiahai Feng, Gustaw Opielka, Claire Stevenson, and Idan A Blank. Cognitively inspired interpretability in large neural networks. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 47, 2025
work page 2025
-
[42]
Quantifying ai psychology: A psychometrics benchmark for large language models, 2024
Yuan Li, Yue Huang, Hongyi Wang, Xiangliang Zhang, James Zou, and Lichao Sun. Quantifying ai psychology: A psychometrics benchmark for large language models, 2024. URL https://arxiv.org/abs/2406.17675
-
[43]
Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B. Tenenbaum. Compositional visual generation with composable diffusion models. In Shai Avidan, Gabriel Brostow, Moustapha Ciss \'e , Giovanni Maria Farinella, and Tal Hassner (eds.), Computer Vision -- ECCV 2022, pp.\ 423--439, Cham, 2022. Springer Nature Switzerland. ISBN 978-3-031-19790-1
work page 2022
-
[44]
Wu, Ilia Sucholutsky, Tania Lombrozo, and Thomas L
Ryan Liu, Jiayi Geng, Addison J. Wu, Ilia Sucholutsky, Tania Lombrozo, and Thomas L. Griffiths. Mind your step (by step): Chain-of-thought can reduce performance on tasks where thinking makes humans worse, 2025. URL https://arxiv.org/abs/2410.21333
-
[45]
Learning by thinking in natural and artificial minds
Tania Lombrozo. Learning by thinking in natural and artificial minds. Trends in Cognitive Sciences, 28: 0 1011--1022, 2024
work page 2024
-
[46]
Joel J. Lorenzatti. Aphantasia: a philosophical approach. Philosophical Psychology, 38 0 (4): 0 1476--1504, 2025. doi:10.1080/09515089.2023.2253854. URL https://doi.org/10.1080/09515089.2023.2253854
-
[47]
Uncertainty estimation in autoregressive structured prediction, 2021
Andrey Malinin and Mark Gales. Uncertainty estimation in autoregressive structured prediction, 2021. URL https://arxiv.org/abs/2002.07650
-
[48]
David F. Marks. Visual imagery differences in the recall of pictures. British Journal of Psychology, 64 0 (1): 0 17--24, 1973. doi:https://doi.org/10.1111/j.2044-8295.1973.tb01322.x. URL https://bpspsychub.onlinelibrary.wiley.com/doi/abs/10.1111/j.2044-8295.1973.tb01322.x
-
[49]
RNNs Implicitly Implement Tensor Product Representations
R. Thomas McCoy, Tal Linzen, Ewan Dunbar, and Paul Smolensky. Rnns implicitly implement tensor product representations, 2019. URL https://arxiv.org/abs/1812.08718
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[50]
Sam Whitman McGrath, Jacob Russin, Ellie Pavlick, and Roman Feiman. How can deep neural networks inform theory in psychological science? Current Directions in Psychological Science, 33 0 (5): 0 325--333, 2024. doi:10.1177/09637214241268098. URL https://doi.org/10.1177/09637214241268098
-
[51]
Aphantasia as imagery blindsight
Matthias Michel, Jorge Morales, Ned Block, and Hakwan Lau. Aphantasia as imagery blindsight. Trends in Cognitive Sciences, 29 0 (1): 0 8--9, 2025. doi:10.1016/j.tics.2024.11.002
-
[53]
Bence Nanay. Unconscious mental imagery. Philosophical Transactions of the Royal Society B: Biological Sciences, 376 0 (1817): 0 20190689, 2021. doi:10.1098/rstb.2019.0689. URL https://royalsocietypublishing.org/doi/abs/10.1098/rstb.2019.0689
-
[54]
Thomas Naselaris, Cheryl A. Olman, Dustin E. Stansbury, Kamil Ugurbil, and Jack L. Gallant. A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes. NeuroImage, 105: 0 215--228, 2015. ISSN 1053-8119. doi:https://doi.org/10.1016/j.neuroimage.2014.10.018. URL https://www.sciencedirect.com/science/article/pii/S1053811914008428
-
[55]
Richard E. Nisbett and Timothy D. Wilson. Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84 0 (3): 0 231--259, 1977. doi:doi:10.1037/0033-295X.84.3.231
-
[56]
Individual differences in autobiographical memory
Daniela J Palombo, Signy Sheldon, and Brian Levine. Individual differences in autobiographical memory. Trends in Cognitive Sciences, 22 0 (7): 0 583--597, 2018
work page 2018
-
[57]
Mapping language models to grounded conceptual spaces
Roma Patel and Ellie Pavlick. Mapping language models to grounded conceptual spaces. In International Conference on Learning Representations, 2022. URL https://openreview.net/pdf?id=gJcEM8sxHK
work page 2022
-
[58]
Symbols and grounding in large language models
Ellie Pavlick. Symbols and grounding in large language models. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381 0 (2251): 0 20220041, 2023. doi:10.1098/rsta.2022.0041. URL https://royalsocietypublishing.org/doi/abs/10.1098/rsta.2022.0041
-
[59]
Joel Pearson and Stephen M. Kosslyn. The heterogeneity of mental representation: Ending the imagery debate. Proceedings of the National Academy of Sciences, 112 0 (33): 0 10089--10092, 2015. doi:10.1073/pnas.1504933112. URL https://www.pnas.org/doi/abs/10.1073/pnas.1504933112
-
[60]
Mental imagery: functional mechanisms and clinical applications
Joel Pearson, Thomas Naselaris, Emily A Holmes, and Stephen M Kosslyn. Mental imagery: functional mechanisms and clinical applications. Trends in cognitive sciences, 19 0 (10): 0 590--602, 2015
work page 2015
-
[61]
Ian B. Phillips. Aphantasia reimagined. Noûs, pp.\ 1--25, 2025. doi:10.1111/nous.12551
-
[62]
Why concepts are (probably) vectors
Steven T Piantadosi, Dyana CY Muller, Joshua S Rule, Karthikeya Kaushik, Mark Gorenstein, Elena R Leib, and Emily Sanford. Why concepts are (probably) vectors. Trends in Cognitive Sciences, 28 0 (9): 0 844--856, 2024
work page 2024
-
[63]
Dillon Plunkett, Adam Morris, Keerthi Reddy, and Jorge Morales. Self-interpretability: Llms can describe complex internal processes that drive their decisions, and improve with training, 2025. URL https://arxiv.org/abs/2505.17120
-
[64]
Psychological Review 63(2), 81–97 (1956) https://doi.org/10.1037/h0043158
Zoë Pounder, Jane Jacob, Samuel Evans, Catherine Loveday, Alison F. Eardley, and Juha Silvanto. Only minimal differences between individuals with congenital aphantasia and those with typical imagery on neuropsychological tasks that involve imagery. Cortex, 148: 0 180--192, 2022. doi:10.1037/h0043158
-
[65]
Measuring and Narrowing the Compositionality Gap in Language Models
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, and Mike Lewis. Measuring and narrowing the compositionality gap in language models, 2023. URL https://arxiv.org/abs/2210.03350
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[66]
What the mind's eye tells the mind's brain: A critique of mental imagery
Zenon W Pylyshyn. What the mind's eye tells the mind's brain: A critique of mental imagery. Psychological bulletin, 80 0 (1): 0 1, 1973
work page 1973
-
[67]
Zenon W. Pylyshyn. Mental imagery: In search of a theory. Behavioral and Brain Sciences, 25 0 (2): 0 157–182, 2002. doi:10.1017/S0140525X02000043
-
[68]
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[69]
Does spatial cog- nition emerge in frontier models? arXiv preprint arXiv:2410.06468, 2024
Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Kraehenbuehl, and Vladlen Koltun. Does spatial cognition emerge in frontier models?, 2025. URL https://arxiv.org/abs/2410.06468
-
[70]
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. Gpqa: A graduate-level google-proof q&a benchmark, 2023. URL https://arxiv.org/abs/2311.12022
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[71]
The effect of sampling temperature on problem solving in large language models
Matthew Renze and Erhan Guven. The effect of sampling temperature on problem solving in large language models. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, pp.\ 7346--7356, Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi:10.18653/v1/20...
-
[72]
Nlp evaluation in trouble: On the need to measure llm data contamination for each benchmark, 2023
Oscar Sainz, Jon Ander Campos, Iker García-Ferrero, Julen Etxaniz, Oier Lopez de Lacalle, and Eneko Agirre. Nlp evaluation in trouble: On the need to measure llm data contamination for each benchmark, 2023. URL https://arxiv.org/abs/2310.18018
-
[73]
Shepard and Jacqueline Metzler
Roger N. Shepard and Jacqueline Metzler. Mental rotation of three-dimensional objects. Science, 171 0 (3972): 0 701--703, 1971. doi:10.1126/science.171.3972.701. URL https://www.science.org/doi/abs/10.1126/science.171.3972.701
-
[74]
Probing the psychology of ai models
Richard Shiffrin and Melanie Mitchell. Probing the psychology of ai models. Proceedings of the National Academy of Sciences, 120 0 (10): 0 e2300963120, 2023. doi:10.1073/pnas.2300963120. URL https://www.pnas.org/doi/abs/10.1073/pnas.2300963120
-
[75]
Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar. The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity, 2025. URL https://arxiv.org/abs/2506.06941
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[76]
Conscious schematic imagery in aphantasia
Lu Teng. Conscious schematic imagery in aphantasia. Unpublished manuscript, 2025
work page 2025
-
[77]
I.—computing machinery and intelligence
Alan M Turing. I.—computing machinery and intelligence. Mind., 59 0 (236), 1950. ISSN 0026-4423
work page 1950
-
[78]
Reasoning with large language models on graph tasks: The influence of temperature
Yiming Wang, Ziyang Zhang, Hanwei Chen, and Huayi Shen. Reasoning with large language models on graph tasks: The influence of temperature. In 2024 5th International Conference on Computer Engineering and Application (ICCEA), pp.\ 630--634, 2024 a . doi:10.1109/ICCEA62105.2024.10603677
-
[79]
Mmlu-pro: A more robust and challenging multi-task language understanding benchmark
Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, and Wenhu Chen. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J...
work page 2024
-
[80]
Mark E. Wheeler, Steven E. Petersen, and Randy L. Buckner. Memory's echo: Vivid remembering reactivates sensory-specific cortex. Proceedings of the National Academy of Sciences, 97 0 (20): 0 11125--11129, 2000. doi:10.1073/pnas.97.20.11125. URL https://www.pnas.org/doi/abs/10.1073/pnas.97.20.11125
-
[81]
David J. Wright, Matthew W. Scott, Sarah N. Kraeutner, Pamela Barhoun, Maurizio Bertollo, Mark J. Campbell, Baptiste M. Waltzing, Stephan F. Dahm, Maaike Esselaar, Cornelia Frank, Robert M. Hardwick, Ian Fuelscher, Ben Marshall, Nicola J. Hodges, Christian Hyde, and Paul S. Holmes. An international estimate of the prevalence of differing visual imagery ab...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.