pith. sign in

arxiv: 2605.18759 · v1 · pith:BIBLKYMPnew · submitted 2026-04-05 · 💻 cs.HC · cs.AI

Interoceptive Divergence in Aesthetic Evaluation and Implications for Human-AI Alignment

Pith reviewed 2026-05-21 10:42 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords aesthetic evaluationlarge language modelsinteroceptionbodily sensationsAI alignmenthuman-AI comparisonemotional responses
0
0 comments X

The pith

Large language models diverge from humans in how bodily sensations connect to beauty judgments

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether large language models match humans when evaluating the beauty of images by posing the same questions about beauty ratings, emotions, and bodily sensations. Humans and LLMs display similar links between beauty and emotions plus similar attention to image features. Clear differences appear in the range of emotional responses and especially in how beauty ratings relate to internal bodily feelings. The results suggest that text-trained AI can capture some average human aesthetic patterns but falls short on the embodied side of experience, with consequences for building AI that aligns with human sensibilities.

Core claim

By presenting LLMs with the same questionnaire items used in human studies, comparative analyses show broadly similar patterns in correlations between beauty ratings and emotions as well as in prioritized image features. Notable divergences emerge in the distribution of emotional responses and the relationship between beauty ratings and bodily sensations. These findings indicate that state-of-the-art LLMs can approximate average human tendencies in aesthetic evaluation to a certain extent but exhibit limitations particularly in relation to interoceptive aspects, which may reflect insufficient representation in training data or unintended consequences of alignment processes.

What carries the argument

Direct comparison of questionnaire responses on beauty ratings, emotional responses, and bodily sensations between humans and LLMs

If this is right

  • LLMs can approximate some average human aesthetic evaluation patterns from large-scale text training.
  • Divergences in bodily sensation relationships point to specific challenges for achieving full human-AI alignment in tasks involving sensibility.
  • Training data limitations or alignment processes may need targeted changes to better capture interoceptive elements.
  • Developing AI systems with more human-like aesthetic processing requires addressing gaps in internal state representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These gaps could limit AI usefulness in creative or artistic domains where bodily intuition shapes judgments.
  • Multimodal models that incorporate sensory or simulated body data might close the observed divergences.
  • Alignment work may need to model internal states explicitly rather than relying on output behavior alone.

Load-bearing premise

That prompting LLMs with the same questionnaire items used in human studies produces responses that are directly comparable to human self-reported internal states, especially for interoceptive bodily sensations.

What would settle it

Finding that LLMs trained or fine-tuned on data that includes explicit descriptions of bodily sensations produce beauty-to-bodily-sensation correlations matching those seen in human data would challenge the claim of inherent interoceptive limitations.

Figures

Figures reproduced from arXiv: 2605.18759 by Tatsuya Daikoku, Yasuo Kuniyoshi, Yoshia Abe.

Figure 1
Figure 1. Figure 1: Number of images with zero responses in human evaluations, where the emotion never [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Number of images with zero responses in AI evaluations, where the emotion never [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Spearman’s rank correlation coefficients between the beauty and the emotion intensity [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Spearman’s rank correlation coefficients between the beauty/valence/arousal scores and [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Differences from individual human evaluators ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Differences from the human average (rH). Values are first averaged across multiple raters of the same image, and then averaged across all 347 images. Error bars indicate the standard er￾ror of the mean. The red bars represent the deviation of an individual human evaluator from the human average (|rh − rH|); the blue bars represent the deviation of the human average from the overall average of all AI-based … view at source ↗
Figure 7
Figure 7. Figure 7: Relationship between the human average ( [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Regression coefficients of image features in linear regression with the human average [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Relative importance of image features in SHAP. Using two random forest regression [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Artificial intelligence (AI), exemplified by large language models (LLMs), is rapidly approaching and in some cases surpassing human performance across a wide range of cognitive tasks. However, human nature is not limited to intelligence alone; it also encompasses sensibility, including the capacity to perceive and experience beauty in visual scenes. This raises a fundamental question: how humans and AI systems converge or diverge in such aesthetic experiences. Aesthetic evaluation depends not only on objective properties of images but also on internal processes within the observer. As part of ongoing efforts in AI alignment, building upon prior human studies that have examined the relationship between beauty ratings, bodily sensations, and emotions, we adopt a comparable set of questionnaire items and present them to LLMs, enabling a direct comparison between human and AI responses. Our comparative analyses revealed that, while humans and AI exhibited broadly similar patterns in the correlations between beauty ratings and emotions, as well as in the image features they prioritized, notable divergences emerged in both the distribution of emotional responses and the relationship between beauty ratings and bodily sensations. These findings suggest that state-of-the-art LLMs, trained on large-scale textual data, can approximate average human tendencies in aesthetic evaluation to a certain extent. However, they also indicate limitations, particularly in relation to interoceptive aspects, which may reflect insufficient representation in training data or unintended consequences of alignment processes. These findings highlight key challenges for AI alignment and suggest important directions for developing AI systems with human-like aesthetic processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper compares human and LLM responses to questionnaires on aesthetic evaluation of visual scenes, building on prior human studies of beauty ratings, bodily sensations, and emotions. It reports broadly similar patterns in beauty-emotion correlations and prioritized image features, but notable divergences in emotional response distributions and the relationship between beauty ratings and bodily sensations. These are interpreted as evidence that state-of-the-art LLMs can approximate average human aesthetic tendencies but exhibit limitations in interoceptive aspects, with implications for AI alignment.

Significance. If the empirical comparisons are robust, the work highlights a potentially important gap in current LLMs' capacity to simulate embodied, interoceptive components of human aesthetic experience. The direct reuse of questionnaire items from human studies is a methodological strength that facilitates comparability, and the focus on alignment challenges beyond pure cognitive performance is timely.

major comments (2)
  1. [Methods] The abstract and methods description provide no information on human sample sizes, specific LLM models and versions tested, number of prompt repetitions or temperature settings, statistical tests for distribution and correlation comparisons, image selection criteria, or controls for prompt variability. Without these details it is impossible to assess whether the reported divergences are statistically reliable or generalizable, which directly bears on the central claim of LLM limitations in interoceptive aesthetic processing.
  2. [Discussion] The interpretation that divergences in beauty-bodily sensation correlations reflect specific limitations in interoceptive aspects (Abstract and Discussion) assumes LLM questionnaire responses function as comparable proxies for internal bodily states. Because LLMs generate text via next-token prediction without any embodied sensory apparatus, this equivalence is not established and the observed divergences could be an artifact of linguistic simulation rather than a targeted deficit in aesthetic processing.
minor comments (1)
  1. [Abstract] The abstract states that humans and LLMs 'exhibited broadly similar patterns' in some correlations but does not quantify the degree of similarity (e.g., via correlation coefficients or p-values), which would help readers gauge the practical magnitude of the reported divergences.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which have helped us identify areas where the manuscript can be clarified and strengthened. We address each major comment below and describe the corresponding revisions.

read point-by-point responses
  1. Referee: [Methods] The abstract and methods description provide no information on human sample sizes, specific LLM models and versions tested, number of prompt repetitions or temperature settings, statistical tests for distribution and correlation comparisons, image selection criteria, or controls for prompt variability. Without these details it is impossible to assess whether the reported divergences are statistically reliable or generalizable, which directly bears on the central claim of LLM limitations in interoceptive aesthetic processing.

    Authors: We agree that these methodological details are essential for readers to evaluate the reliability and generalizability of the reported divergences. The current manuscript version omitted a sufficiently detailed Methods section. In the revised manuscript we will add a dedicated Methods section specifying the human sample size, the exact LLM models and versions tested, the number of prompt repetitions per item, the temperature settings employed, the statistical tests used for distribution comparisons (e.g., Kolmogorov-Smirnov) and correlation comparisons (e.g., Fisher z-tests), the criteria for selecting the visual scenes, and controls for prompt variability such as standardized phrasing and multiple independent runs. These additions will directly support assessment of the central claims. revision: yes

  2. Referee: [Discussion] The interpretation that divergences in beauty-bodily sensation correlations reflect specific limitations in interoceptive aspects (Abstract and Discussion) assumes LLM questionnaire responses function as comparable proxies for internal bodily states. Because LLMs generate text via next-token prediction without any embodied sensory apparatus, this equivalence is not established and the observed divergences could be an artifact of linguistic simulation rather than a targeted deficit in aesthetic processing.

    Authors: We appreciate the referee’s clarification of the interpretive limits. We do not claim that LLM responses constitute direct proxies for embodied interoceptive states; the study compares observable patterns in questionnaire answers. Nevertheless, the selective divergence in beauty-bodily sensation relationships—while beauty-emotion correlations and feature priorities remain broadly aligned—still indicates that current LLMs’ textual approximations fall short of reproducing the full structure of human aesthetic reports. To address the concern we will revise the Abstract and Discussion to (a) explicitly state that the comparison concerns response patterns rather than internal states and (b) frame the implications for AI alignment in terms of limitations in simulating human-like output distributions rather than assuming proxy equivalence. This revision will make the claims more precise without altering the empirical findings. revision: partial

Circularity Check

0 steps flagged

Empirical comparison with no derivation chain or self-referential reduction

full rationale

The paper performs a direct empirical comparison by feeding the same questionnaire items from prior human studies to LLMs and reporting observed divergences in emotional distributions and beauty-bodily sensation correlations. No equations, fitted parameters, or first-principles derivations are present that could reduce the reported findings to inputs by construction. Prior human studies are invoked only as the source of the questionnaire items, providing an independent empirical baseline rather than a self-citation that bears the load of the central claim. The analysis therefore remains self-contained against external benchmarks and exhibits no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LLM questionnaire responses can be treated as proxies for aesthetic processing comparable to human reports, particularly for interoception.

axioms (1)
  • domain assumption LLM responses to the same questionnaire items used with humans yield data that can be directly compared to human self-reports of internal states
    Invoked when the paper states it presents the questionnaire items to LLMs for direct comparison.

pith-pipeline@v0.9.0 · 5798 in / 1224 out tokens · 31880 ms · 2026-05-21T10:42:28.682670+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 5 internal anchors

  1. [1]

    Harnessing the Power of LLMs for Image Aesthetics Assessment Through Semantic and Contextual Understanding,

    Y. Abe, T. Daikoku, and Y. Kuniyoshi, “Harnessing the Power of LLMs for Image Aesthetics Assessment Through Semantic and Contextual Understanding,” in2025 IEEE International Conference on Image Processing (ICIP), pp. 977–982, 2025

  2. [2]

    Claude’s Constitution,

    Anthropic, “Claude’s Constitution,” 2023, https://www.anthropic.com/news/ claudes-constitution, (Accessed: 2025-09-04)

  3. [3]

    Claude 3.7 Sonnet and Claude Code,

    ——, “Claude 3.7 Sonnet and Claude Code,” 2025, https://www.anthropic.com/news/ claude-3-7-sonnet/ (Accessed: 2025-03-17)

  4. [4]

    A General Language Assistant as a Laboratory for Alignment

    A. Askell, Y. Bai, A. Chen, D. Drain, D. Ganguli, T. Henighan, A. Jones, N. Joseph, B. Mann, N. DasSarma, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, J. Kernion, K. Ndousse, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, and J. Kaplan, “A general language assistant as a laboratory for alignment,” 2021, arXiv:2112.00861

  5. [5]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Gan- guli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, and J. Kaplan,...

  6. [6]

    Constitutional AI: Harmlessness from AI Feedback

    Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirho- seini, C. McKinnon, C. Chen, C. Olsson, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, E. Tran-Johnson, E. Perez, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, K. Luko- suite, L. Lovitt, M. Sellitto, N. Elhage, N. Schiefer, N. Mercado, N. DasSarma, ...

  7. [7]

    M. A. Boden,The creative mind: Myths and mechanisms. Routledge, 2004

  8. [8]

    Interoception: the sense of the physiological condition of the body,

    A. (Bud) Craig, “Interoception: the sense of the physiological condition of the body,”Current Opinion in Neurobiology, vol. 13, no. 4, pp. 500–505, 2003. 17

  9. [9]

    Emotion, embodiment, and aesthetic appraisal: The impact of interoceptive abilities and art type

    G. Cabbai, C. K¨ uhnapfel, J. Fingerhut, L. Kaltwasser, J. Prinz, and M. Pelowski, “Emotion, embodiment, and aesthetic appraisal: The impact of interoceptive abilities and art type.” Psychology of Aesthetics, Creativity, and the Arts, pp. No Pagination Specified–No Pagination Specified, 2024

  10. [10]

    Interoception: the forgotten modality in perceptual grounding of abstract and concrete concepts,

    L. Connell, D. Lynott, and B. Banks, “Interoception: the forgotten modality in perceptual grounding of abstract and concrete concepts,”Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 373, no. 1752, p. 20170143, 2018

  11. [11]

    The new measures of in- teroceptive accuracy: A systematic review and assessment,

    O. Desmedt, O. Luminet, M. Walentynowicz, and O. Corneille, “The new measures of in- teroceptive accuracy: A systematic review and assessment,”Neuroscience & Biobehavioral Reviews, vol. 153, p. 105388, 2023

  12. [12]

    Generative artificial intelligence enhances creativity but reduces the diversity of novel content,

    A. R. Doshi and O. Hauser, “Generative artificial intelligence enhances creativity but reduces the diversity of novel content,”Science Advances, vol. 10, no. 28, 2023

  13. [13]

    Knowing your own heart: Distinguishing interoceptive accuracy from interoceptive awareness,

    S. N. Garfinkel, A. K. Seth, A. B. Barrett, K. Suzuki, and H. D. Critchley, “Knowing your own heart: Distinguishing interoceptive accuracy from interoceptive awareness,”Biological Psychology, vol. 104, pp. 65–74, 2015

  14. [14]

    Aesthetic value and the ai alignment problem,

    A. C. Helliwell, “Aesthetic value and the ai alignment problem,”Philosophy & Technology, vol. 37, no. 4, p. 129, 2024

  15. [15]

    Defining computational aesthetics,

    F. Hoenig, “Defining computational aesthetics,” inProceedings of the First Eurographics Con- ference on Computational Aesthetics in Graphics, Visualization and Imaging, ser. Computa- tional Aesthetics’05, p. 13–18. Eurographics Association, 2005

  16. [16]

    Large language models for automated data science: Introducing caafe for context-aware automated feature engineering,

    N. Hollmann, S. M¨ uller, and F. Hutter, “Large language models for automated data science: Introducing caafe for context-aware automated feature engineering,” inAdvances in Neural Information Processing Systems, vol. 36, pp. 44 753–44 775. Curran Associates, Inc., 2023

  17. [17]

    Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features,

    K. Iigaya, S. Yi, I. A. Wahle, K. Tanwisuth, and J. P. O’Doherty, “Aesthetic preference for art can be predicted from a mixture of low- and high-level visual features,”Nature Human Behaviour, vol. 5, no. 6, pp. 743–755, 2021

  18. [18]

    Neural mecha- nisms underlying the hierarchical construction of perceived aesthetic value,

    K. Iigaya, S. Yi, I. A. Wahle, S. Tanwisuth, L. Cross, and J. P. O’Doherty, “Neural mecha- nisms underlying the hierarchical construction of perceived aesthetic value,”Nature Commu- nications, vol. 14, no. 1, p. 127, 2023

  19. [19]

    Toward a brain-based theory of beauty,

    T. Ishizu and S. Zeki, “Toward a brain-based theory of beauty,”PLOS ONE, vol. 6, no. 7, pp. 1–10, 2011

  20. [20]

    AI Alignment: A Comprehensive Survey

    J. Ji, T. Qiu, B. Chen, B. Zhang, H. Lou, K. Wang, Y. Duan, Z. He, L. Vierling, D. Hong, J. Zhou, Z. Zhang, F. Zeng, J. Dai, X. Pan, K. Y. Ng, A. O’Gara, H. Xu, B. Tse, J. Fu, S. McAleer, Y. Yang, Y. Wang, S.-C. Zhu, Y. Guo, and W. Gao, “Ai alignment: A compre- hensive survey,” 2025, arXiv:2310.19852

  21. [21]

    Neural correlates of beauty,

    H. Kawabata and S. Zeki, “Neural correlates of beauty,”Journal of Neurophysiology, vol. 91, no. 4, pp. 1699–1705, 2004

  22. [22]

    Keltner and J

    D. Keltner and J. S. Lerner,Emotion. John Wiley & Sons, 2010

  23. [23]

    Simulating a human fetus in soft uterus,

    D. Kim, H. Kanazawa, and Y. Kuniyoshi, “Simulating a human fetus in soft uterus,” in2022 IEEE International Conference on Development and Learning (ICDL), pp. 135–141, 2022

  24. [24]

    Can openai o1 outperform humans in higher-order cognitive thinking?

    E. Latif, Y. Zhou, S. Guo, L. Shi, Y. Gao, M. Nyaaba, A. Bewerdorff, X. Yang, and X. Zhai, “Can openai o1 outperform humans in higher-order cognitive thinking?” 2024, arXiv:2412.05753. 18

  25. [25]

    AesBiasBench: Evaluating bias and alignment in multimodal language models for personalized image aesthetic assessment,

    K. Li, L. M. Po, H. Yang, X. Xu, K. Liu, and Y. Zhao, “AesBiasBench: Evaluating bias and alignment in multimodal language models for personalized image aesthetic assessment,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Process- ing, C. Christodoulopoulos, T. Chakraborty, C. Rose, and V. Peng, Eds., pp. 7607–7620. Associat...

  26. [26]

    When chatgpt is gone: Creativity reverts and homo- geneity persists,

    Q. Liu, Y. Zhou, J. Huang, and G. Li, “When chatgpt is gone: Creativity reverts and homo- geneity persists,” 2024, arXiv:2401.06816

  27. [27]

    A unified approach to interpreting model predictions,

    S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017

  28. [28]

    The next chapter of the gemini era for developers,

    S. B. Mallick and K. Korevec, “The next chapter of the gemini era for developers,” 2024, https://developers.googleblog.com/en/the-next-chapter-of-the-gemini-era-for-developers/ (Accessed: 2025-07-10)

  29. [29]

    Bodily maps of emotions,

    L. Nummenmaa, E. Glerean, R. Hari, and J. K. Hietanen, “Bodily maps of emotions,”Pro- ceedings of the National Academy of Sciences, vol. 111, no. 2, pp. 646–651, 2014

  30. [30]

    Hello GPT-4o,

    OpenAI, “Hello GPT-4o,” 2024, https://openai.com/index/hello-gpt-4o/ (Accessed: 2024-08- 15)

  31. [31]

    Training language models to follow instructions with human feedback

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welin- der, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” 2022, arXiv:2203.02155

  32. [32]

    Visual aesthetics and human preference,

    S. E. Palmer, K. B. Schloss, and J. Sammartino, “Visual aesthetics and human preference,” Annual Review of Psychology, vol. 64, no. Volume 64, 2013, pp. 77–107, 2013

  33. [33]

    Pelowski, P

    M. Pelowski, P. S. Markey, M. Forster, G. Gerger, and H. Leder, “Move me, astonish me. . . delight my eyes and brain: The Vienna Integrated Model of top-down and bottom-up processes in Art Perception (VIMAP) and corresponding affective, evaluative, and neurophysiological correlates,”Physics of Life Reviews, vol. 21, pp. 80–125, 2017

  34. [34]

    Processing fluency and aesthetic pleasure: Is beauty in the perceiver’s processing experience?

    R. Reber, N. Schwarz, and P. Winkielman, “Processing fluency and aesthetic pleasure: Is beauty in the perceiver’s processing experience?”Personality and Social Psychology Review, vol. 8, no. 4, pp. 364–382, 2004

  35. [35]

    Towards bidi- rectional human-ai alignment: A systematic review for clarifications, framework, and future directions,

    H. Shen, T. Knearem, R. Ghosh, K. Alkiek, K. Krishna, Y. Liu, Z. Ma, S. Petridis, Y.-H. Peng, L. Qiwei, S. Rakshit, C. Si, Y. Xie, J. P. Bigham, F. Bentley, J. Chai, Z. Lipton, Q. Mei, R. Mihalcea, M. Terry, D. Yang, M. R. Morris, P. Resnick, and D. Jurgens, “Towards bidi- rectional human-ai alignment: A systematic review for clarifications, framework, an...

  36. [36]

    The influence of interoceptive accuracy on the verbalization of emotions,

    N. Suzuki and T. Yamamoto, “The influence of interoceptive accuracy on the verbalization of emotions,”Scientific Reports, vol. 13, no. 1, p. 22158, 2023

  37. [37]

    Generative emergent communication: Large language model is a collective world model,

    T. Taniguchi, R. Ueda, T. Nakamura, M. Suzuki, and A. Taniguchi, “Generative emergent communication: Large language model is a collective world model,” 2025, arXiv:2501.00226

  38. [38]

    Physiological correlates of aesthetic perception of artworks in a museum

    W. Tschacher, S. Greenwood, V. Kirchberg, S. Wintzerith, K. van den Berg, and M. Tr¨ ondle, “Physiological correlates of aesthetic perception of artworks in a museum.”Psychology of Aesthetics, Creativity, and the Arts, vol. 6, no. 1, pp. 96–103, 2012

  39. [39]

    Valenzise, C

    G. Valenzise, C. Kang, and F. Dufaux,Advances and Challenges in Computational Image Aesthetics. Springer International Publishing, 2022, pp. 133–181. 19

  40. [40]

    A systematic review and meta-analysis of the relationship between subjective interoception and alexithymia: Implications for construct definitions and measurement,

    K. Van Bael, J. Scarfo, E. Suleyman, J. Katherveloo, N. Grimble, and M. Ball, “A systematic review and meta-analysis of the relationship between subjective interoception and alexithymia: Implications for construct definitions and measurement,”PLOS ONE, vol. 19, no. 11, pp. 1–35, 2024

  41. [41]

    Bodily sensations, emotions, and person- ality traits in the aesthetic experience of everyday photographs,

    S. Washizu, Y. Abe, T. Daikoku, and Y. Kuniyoshi, “Bodily sensations, emotions, and person- ality traits in the aesthetic experience of everyday photographs,”Scientific Reports, vol. 16, no. 1, p. 348, 2025

  42. [42]

    Why and how should cognitive science care about aesthetics?

    E. Wassiliwizky and W. Menninghaus, “Why and how should cognitive science care about aesthetics?”Trends in Cognitive Sciences, vol. 25, no. 6, pp. 437–449, 2021

  43. [43]

    The rise and potential of large language model based agents: a survey,

    Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, R. Zheng, X. Fan, X. Wang, L. Xiong, Y. Zhou, W. Wang, C. Jiang, Y. Zou, X. Liu, Z. Yin, S. Dou, R. Weng, W. Qin, Y. Zheng, X. Qiu, X. Huang, Q. Zhang, and T. Gui, “The rise and potential of large language model based agents: a survey,”Science China Information Sciences, ...

  44. [44]

    Large language models with- out grounding recover non-sensorimotor but not sensorimotor features of human concepts,

    Q. Xu, Y. Peng, S. A. Nastase, M. Chodorow, M. Wu, and P. Li, “Large language models with- out grounding recover non-sensorimotor but not sensorimotor features of human concepts,” Nature Human Behaviour, vol. 9, no. 9, pp. 1871–1886, 2025

  45. [45]

    Personalized image aesthetics assessment with rich attributes,

    Y. Yang, L. Xu, L. Li, N. Qie, Y. Li, P. Zhang, and Y. Guo, “Personalized image aesthetics assessment with rich attributes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19 861–19 869, 2022

  46. [46]

    A survey on multimodal large language models,

    S. Yin, C. Fu, S. Zhao, K. Li, X. Sun, T. Xu, and E. Chen, “A survey on multimodal large language models,”National Science Review, vol. 11, no. 12, p. nwae403, 2024. 20