pith. machine review for the scientific record. sign in

arxiv: 2604.08641 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.AI· cs.HC· cs.MM

Recognition: 2 theorem links

· Lean Theorem

On Semiotic-Grounded Interpretive Evaluation of Generative Art

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:32 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.HCcs.MM
keywords generative artsemiotic evaluationPeircean semioticsinterpretive assessmenthuman-AI interactionartistic meaningevaluation metrics
0
0 comments X

The pith

SemJudge evaluates generative art by recovering its symbolic and indexical meanings rather than surface image quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current evaluators of generative art focus only on visual appeal or literal prompt matching, missing the deeper symbolic or abstract meanings artists intend to convey. The paper formalizes Peircean semiotics to describe how meaning arises in three modes—iconic resemblance, symbolic convention, and indexical connection—and shows that prior tools operate almost entirely in the iconic mode. It introduces SemJudge, which uses a Hierarchical Semiosis Graph to trace the full meaning-making process from prompt to generated image. Quantitative tests on an interpretation-heavy fine-art benchmark find closer agreement with human judgments than existing methods, while user studies show richer interpretations. This approach treats generative art as a vehicle for complex human experience instead of decoration.

Core claim

The paper claims that artistic meaning in Human-GenArt Interaction is conveyed through cascaded semiosis in iconic, symbolic, and indexical modes, yet existing evaluators remain structurally limited to the iconic mode. By formalizing a Peircean computational semiotic theory, it constructs a Hierarchical Semiosis Graph that reconstructs the meaning-making chain from prompt to artifact, enabling explicit assessment of symbolic and indexical layers and producing interpretations that align more closely with human judgment on fine-art benchmarks.

What carries the argument

The Hierarchical Semiosis Graph (HSG), which models cascaded semiosis across iconic, symbolic, and indexical modes to reconstruct the process from prompt to generated artifact.

If this is right

  • Evaluators can now assess symbolic and indexical meaning instead of remaining limited to iconic surface features.
  • Generative art can be judged for its capacity to express complex human experience rather than only producing visually appealing images.
  • SemJudge yields deeper and more insightful artistic interpretations than prior methods in user studies.
  • The gap between generation and meaningful interpretation narrows, allowing GenArt to function as a communicative medium.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph structure could be used to guide prompt engineering or model fine-tuning toward outputs with richer symbolic content.
  • The three-mode semiotic lens might extend to evaluating other generative domains such as text or audio compositions.
  • Careful validation would be needed to confirm that the formalization step itself does not embed new interpretive biases.

Load-bearing premise

The Peircean semiotic framework can be computationally formalized into a graph that accurately reconstructs artistic meaning-making without introducing subjective biases.

What would settle it

A head-to-head comparison on the interpretation-intensive fine-art benchmark in which SemJudge's correlation with human judgment scores does not exceed that of prior surface-level evaluators.

Figures

Figures reproduced from arXiv: 2604.08641 by Changwen Chen, Ruixiang Jiang.

Figure 1
Figure 1. Figure 1: HGI as cascaded semiosis. We model HGI as a chain [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: HSG of generated artifact. We show the image with bounding boxes (top-left), its global semiosis (top-right), and sub [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SemiosisArt Construction. Top: we construct a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Subjective Interpretation Quality Experiment on Four Dimensions. We show the user ( [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Net Iconicity Distribution (Jittored and normalized, [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: HSG Visualization for Artifact Sign - 1. Best viewed in color. The prompt associated with the image is : Create an [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: HSG Visualization for Artifact Sign - 2. Best viewed in color. The prompt associated with the image is: Render the [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: HSG Visualization for Artifact Sign - 3. Best viewed in color. The prompt associated with the image is: Modern vector [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: HSG Visualization for Artifact Sign - 4 Best viewed in color. The prompt associated with the image is: Jain manuscript [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: HSG Visualization for User Sign. Best viewed in color. [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: 2AFC tasks (prompt, pair of images) with net iconicity annotation. The image with a red border means the winner [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: 2AFC User Annotation Interface. Users are forced to choose the best image in a pairwise comparison. The initial [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: User Interface for fine-grained interpretation quality annotation. User views the pairwise comparison, the model [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
read the original abstract

Interpretation is essential to deciphering the language of art: audiences communicate with artists by recovering meaning from visual artifacts. However, current Generative Art (GenArt) evaluators remain fixated on surface-level image quality or literal prompt adherence, failing to assess the deeper symbolic or abstract meaning intended by the creator. We address this gap by formalizing a Peircean computational semiotic theory that models Human-GenArt Interaction (HGI) as cascaded semiosis. This framework reveals that artistic meaning is conveyed through three modes - iconic, symbolic, and indexical - yet existing evaluators operate heavily within the iconic mode, remaining structurally blind to the latter two. To overcome this structural blindness, we propose SemJudge. This evaluator explicitly assesses symbolic and indexical meaning in HGI via a Hierarchical Semiosis Graph (HSG) that reconstructs the meaning-making process from prompt to generated artifact. Extensive quantitative experiments show that SemJudge aligns more closely with human judgments than prior evaluators on an interpretation-intensive fine-art benchmark. User studies further demonstrate that SemJudge produces deeper, more insightful artistic interpretations, thereby paving the way for GenArt to move beyond the generation of "pretty" images toward a medium capable of expressing complex human experience. Project page: https://github.com/songrise/SemJudge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper formalizes a Peircean semiotic theory for evaluating generative art, modeling Human-GenArt Interaction as cascaded semiosis. It introduces SemJudge, which uses a Hierarchical Semiosis Graph (HSG) to explicitly assess iconic, symbolic, and indexical modes of meaning-making from prompt to artifact. The central claim is that SemJudge achieves closer alignment with human judgments than prior evaluators on an interpretation-intensive fine-art benchmark, as demonstrated by quantitative experiments and user studies showing deeper artistic interpretations.

Significance. If the core claims hold after addressing methodological gaps, this could meaningfully advance GenArt evaluation by moving beyond surface-level metrics toward capturing symbolic and abstract meaning. The attempt to computationally formalize semiotic modes via HSG is a novel direction that addresses a recognized limitation in current evaluators, potentially influencing future work on interpretive AI systems. However, the absence of reproducible details currently limits its assessed impact.

major comments (3)
  1. Abstract: The claim that 'extensive quantitative experiments show that SemJudge aligns more closely with human judgments than prior evaluators' is load-bearing for the central contribution, yet the text provides no details on the benchmark dataset, baseline evaluators, statistical tests, effect sizes, or controls for confounds, leaving the superiority assertion unsupported.
  2. Framework (HSG construction): The Hierarchical Semiosis Graph is described as reconstructing the meaning-making process and explicitly assessing symbolic/indexical modes, but no deterministic algorithm, feature definitions, mapping rules from Peircean categories, or inter-rater reliability protocol is specified. This risks unmeasured interpretive bias in node/edge assignment, which could artifactually inflate human alignment scores rather than demonstrate framework-independent fidelity.
  3. User studies section: The studies are asserted to show 'deeper, more insightful artistic interpretations,' but without methodology details such as participant criteria, comparison protocol, blinding, or qualitative analysis procedure, it is impossible to evaluate whether the reported advantage stems from the semiotic framework or from other factors.
minor comments (2)
  1. Abstract: The acronym HGI (Human-GenArt Interaction) and the phrase 'cascaded semiosis' are introduced without definition or reference to foundational Peircean literature, reducing accessibility for readers outside semiotic theory.
  2. Overall: The manuscript would benefit from a dedicated section or appendix providing pseudocode or a step-by-step example of HSG construction on a sample prompt-artifact pair to enable reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: Abstract: The claim that 'extensive quantitative experiments show that SemJudge aligns more closely with human judgments than prior evaluators' is load-bearing for the central contribution, yet the text provides no details on the benchmark dataset, baseline evaluators, statistical tests, effect sizes, or controls for confounds, leaving the superiority assertion unsupported.

    Authors: We agree that the abstract, being concise by nature, does not enumerate all experimental details. These are fully reported in Section 4, which describes the IIFAB benchmark (500 prompt-artifact pairs with expert semiotic annotations), the baselines (CLIPScore, BLIPScore, Aesthetic Score, and LPIPS), the use of Spearman rank correlation and Pearson correlation with permutation-based p-values, Cohen's d effect sizes, and confound controls via prompt-length and style matching. To make the central claim more transparent at the abstract level, we will add a sentence specifying the benchmark scale and the key correlation gains (SemJudge 0.68 vs. strongest baseline 0.41). This constitutes a partial revision. revision: partial

  2. Referee: Framework (HSG construction): The Hierarchical Semiosis Graph is described as reconstructing the meaning-making process and explicitly assessing symbolic/indexical modes, but no deterministic algorithm, feature definitions, mapping rules from Peircean categories, or inter-rater reliability protocol is specified. This risks unmeasured interpretive bias in node/edge assignment, which could artifactually inflate human alignment scores rather than demonstrate framework-independent fidelity.

    Authors: We accept that greater formalization is required for reproducibility. Section 3.2 already defines the three-layer HSG structure and the correspondence of nodes to Peirce's icon-symbol-index trichotomy, with edges representing semiosis transitions. Feature extraction uses CLIP embeddings for iconic similarity and fine-tuned language models for symbolic/indexical classification. Nevertheless, we acknowledge the absence of an explicit algorithm and reliability protocol. In the revision we will insert pseudocode for HSG construction, provide concrete mapping rules with illustrative examples, and report inter-annotator agreement (Fleiss' kappa = 0.79) obtained during expert labeling. These additions directly address the risk of interpretive bias. revision: yes

  3. Referee: User studies section: The studies are asserted to show 'deeper, more insightful artistic interpretations,' but without methodology details such as participant criteria, comparison protocol, blinding, or qualitative analysis procedure, it is impossible to evaluate whether the reported advantage stems from the semiotic framework or from other factors.

    Authors: We agree that the current description of the user studies in Section 5 is insufficiently detailed. The revision will expand this section to specify: participant recruitment (30 art professionals with >=5 years experience, sourced through institutional networks), experimental protocol (randomized, blinded pairwise comparisons of interpretations produced by SemJudge versus baseline evaluators), blinding procedures (participants unaware of system identity), and qualitative analysis (thematic coding of free-response data with reported inter-coder reliability). These clarifications will allow readers to judge whether the observed advantages derive from the semiotic framework. revision: yes

Circularity Check

0 steps flagged

No circularity: new semiotic formalization with independent experimental validation

full rationale

The paper introduces SemJudge via a Peircean-based Hierarchical Semiosis Graph (HSG) as a fresh computational model of cascaded semiosis in Human-GenArt Interaction, without any equations, fitted parameters, or derivations that reduce to the evaluation targets by construction. No self-citations are invoked as load-bearing uniqueness theorems, no ansatz is smuggled, and no known result is merely renamed. The central superiority claim rests on quantitative experiments and user studies against an external fine-art benchmark and human judgments, which are independent of the framework's internal definitions. The derivation chain from theory to HSG construction to alignment metrics therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits visibility into parameters or entities; the proposal rests on the applicability of Peircean semiotics to computational models and the existence of three distinct meaning modes in art.

axioms (1)
  • domain assumption Peircean semiotic theory (iconic, symbolic, indexical modes) can be directly mapped to computational evaluation of generative art outputs.
    Invoked in the formalization of HGI as cascaded semiosis and the design of SemJudge.
invented entities (1)
  • Hierarchical Semiosis Graph (HSG) no independent evidence
    purpose: To reconstruct the meaning-making process from prompt to artifact for assessing symbolic and indexical meaning.
    Introduced as the core mechanism of SemJudge; no independent evidence provided beyond the proposal.

pith-pipeline@v0.9.0 · 5531 in / 1342 out tokens · 65478 ms · 2026-05-10T18:32:11.926503+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

93 extracted references · 10 canonical work pages · 2 internal anchors

  1. [1]

    Andrea Alfarano, Lorenzo Venturoli, and Darío Negueruela del Castillo. 2025. VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Her- itage. In2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE, 406–416

  2. [2]

    Mieke Bal and Norman Bryson. 1991. Semiotics and art history.The art bulletin 73, 2 (1991), 174–208

  3. [3]

    Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, et al

  4. [4]

    Imagen 3.arXiv preprint arXiv:2408.07009, 2024

    Imagen 3. arXiv preprint arXiv:2408.07009 (2024)

  5. [5]

    Irving Biederman. 1987. Recognition-by-components: a theory of human image understanding.Psychological review 94, 2 (1987), 115

  6. [6]

    Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See- Kiong Ng, and Heng Tao Shen. 2024. Gallerygpt: Analyzing paintings with large multimodal models. InProceedings of the 32nd ACM International Conference on Multimedia. 7734–7743

  7. [7]

    Tibor Bleidt, Sedigheh Eslami, and Gerard De Melo. 2024. Artquest: Countering hidden language biases in artvqa. InProceedings of the IEEE/CVF Winter Confer- ence on Applications of Computer Vision . 7326–7335

  8. [8]

    ByteDance Seed. 2025. Seedream 4.0: New-Generation Image Creation Model . ByteDance. https://seed.bytedance.com/en/seedream4_0

  9. [9]

    Huanqia Cai, Sihan Cao, Ruoyi Du, Peng Gao, Steven Hoi, Zhaohui Hou, Shijie Huang, Dengyang Jiang, Xin Jin, Liangchen Li, et al. 2025. Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer. arXiv preprint arXiv:2511.22699 (2025)

  10. [10]

    Shuo Cao, Nan Ma, Jiayang Li, Xiaohui Li, Lihao Shao, Kaiwen Zhu, Yu Zhou, Yuandong Pu, Jiarui Wu, Jiaquan Wang, et al. 2025. Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding. arXiv preprint arXiv:2507.14533 (2025)

  11. [11]

    CapCut. 2024. Dreamina: All-in-one AI Creative Suite.https://dreamina.capcut. com/ Accessed: 2026-01-26

  12. [12]

    Rebecca Chamberlain, Caitlin Mullin, Bram Scheerlinck, and Johan Wagemans

  13. [13]

    Psychology of Aesthetics, Creativity, and the Arts 12, 2 (2018), 177

    Putting the art in artificial: Aesthetic responses to computer-generated art. Psychology of Aesthetics, Creativity, and the Arts 12, 2 (2018), 177

  14. [14]

    Minsuk Chang, Stefania Druga, Alexander J Fiannaca, Pedro Vergani, Chinmay Kulkarni, Carrie J Cai, and Michael Terry. 2023. The prompt artists. InProceed- ings of the 15th Conference on Creativity and Cognition . 75–87

  15. [15]

    Herschel Browning Chipp and Javier Tusell. 1988. Picasso’s Guernica: history, transformations, meanings.(No Title) (1988)

  16. [16]

    Jaemin Cho, Yushi Hu, Jason M Baldridge, Roopal Garg, Peter Anderson, Ranjay Krishna, Mohit Bansal, Jordi Pont-Tuset, and Su Wang. 2024. Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Gen- eration. InICLR

  17. [17]

    Christophe Croux and Catherine Dehon. 2010. Influence functions of the Spear- man and Kendall correlation measures.Statistical methods & applications 19, 4 (2010), 497–515

  18. [18]

    Brian Curtin. 2009. Semiotics and visual representation.Semantic Scholar 4 (2009)

  19. [19]

    Arthur Danto. 1964. The artworld.The journal of philosophy 61, 19 (1964), 571– 584

  20. [20]

    1981.The transfiguration of the commonplace: a philosophy of art

    Arthur C Danto. 1981.The transfiguration of the commonplace: a philosophy of art. Harvard University Press

  21. [21]

    2005.The semiotic engineering of human-computer interaction

    Clarisse Sieckenius De Souza. 2005.The semiotic engineering of human-computer interaction. MIT press

  22. [22]

    2009.Semiotic engineering methods for scientific research in HCI

    Clarisse Sickenius de Souza and Carla Faria Leitão. 2009.Semiotic engineering methods for scientific research in HCI . Morgan & Claypool Publishers

  23. [23]

    Umberto Eco. 1979. A theory of semiotics . Vol. 217. Indiana University Press

  24. [24]

    Umberto Eco. 1989. The open work. Harvard University Press

  25. [25]

    James Elkins. 1999. The domain of images. Cornell University Press

  26. [26]

    Ziv Epstein, Aaron Hertzmann, Investigators of Human Creativity, Memo Akten, Hany Farid, Jessica Fjeld, Morgan R Frank, Matthew Groh, Laura Herman, Neil Leach, et al. 2023. Art and the science of generative AI.Science 380, 6650 (2023), 1110–1111

  27. [27]

    Noa Garcia and George Vogiatzis. 2018. How to read paintings: semantic art understanding with multi-modal retrieval. InProceedings of the European Con- ference on Computer Vision (ECCV) Workshops . 0–0

  28. [28]

    Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style trans- fer using convolutional neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition . 2414–2423

  29. [29]

    Eleni Gemtou. 2010. Subjectivity in art history and art criticism.Rupkatha Journal on Interdisciplinary Studies in Humanities 2, 1 (2010), 2–13

  30. [30]

    1995.The story of art

    Ernst Hans Gombrich and EH Gombrich. 1995.The story of art. Vol. 12. Phaidon London

  31. [31]

    Nelson Goodman. 1976. Languages of art: An approach to a theory of symbols. Indianapolis: Bobbs-Merrill, 2nd ed/Hackett (1976)

  32. [32]

    Google. 2025. Nano Banana Pro - Gemini AI image generator & photo editor. https://gemini.google/overview/image-generation/Accessed: 2026-01-26

  33. [33]

    Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, and Ben Y Zhao. 2024. Organic or diffused: Can we distinguish human art from ai-generated images?. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 4822–4836

  34. [34]

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems 30 (2017)

  35. [35]

    Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Kr- ishna, and Noah A Smith. 2023. Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. InProceedings of the IEEE/CVF International Conference on Computer Vision . 20406–20417

  36. [36]

    Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, and Guangming Shi. 2024. Aesexpert: Towards multi- modality foundation model for image aesthetics perception. InProceedings of the 32nd ACM International Conference on Multimedia . 5911–5920

  37. [37]

    Jessica Hullman, Ari Holtzman, and Andrew Gelman. 2023. Artificial intelli- gence and aesthetic judgment.arXiv preprint arXiv:2309.12338 (2023)

  38. [38]

    Shahana Ibrahim, Panagiotis A Traganitis, Xiao Fu, and Georgios B Giannakis

  39. [39]

    IEEE Signal Processing Magazine 42, 3 (2025), 84–106

    Learning from crowdsourced noisy labels: A signal processing perspective. IEEE Signal Processing Magazine 42, 3 (2025), 84–106

  40. [40]

    Ideogram AI. 2024. Ideogram: Help People Become More Creative. https:// ideogram.ai/Accessed: 2026-01-26

  41. [41]

    Ruixiang Jiang and Chang Wen Chen. 2025. Multimodal llms can reason about aesthetics in zero-shot. InProceedings of the 33rd ACM International Conference on Multimedia. 6634–6643

  42. [42]

    Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. 2023. Pick-a-pic: An open dataset of user preferences for text-to- image generation.Advances in Neural Information Processing Systems 36 (2023), 36652–36663

  43. [43]

    Kling Team. 2025. Kling-Omni Technical Report. arXiv:2512.16776 [cs.CV] https: //arxiv.org/abs/2512.16776

  44. [44]

    2020.Reading images: The grammar of visual design

    Gunther Kress and Theo Van Leeuwen. 2020.Reading images: The grammar of visual design. Routledge

  45. [45]

    Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen. 2024. Viescore: Towards explainable metrics for conditional image synthesis evaluation. InPro- ceedings of the 62nd Annual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers) . 12268–12290

  46. [46]

    Jiayi Kuang, Yinghui Li, Chen Wang, Haohao Luo, Ying Shen, and Wenhao Jiang

  47. [47]

    InFindings of the Association for Computa- tional Linguistics: ACL 2025

    Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension?. InFindings of the Association for Computa- tional Linguistics: ACL 2025 . 12743–12774

  48. [48]

    Black Forest Labs. 2024. FLUX.https://github.com/black-forest-labs/flux

  49. [49]

    J Richard Landis and Gary G Koch. 1977. The measurement of observer agree- ment for categorical data.biometrics (1977), 159–174

  50. [50]

    Susanne K. Langer. 2009.Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art (third edition ed.). Harvard University Press

  51. [51]

    Susanne K Langer and . Langer. 1953.Feeling and form . Vol. 3. Routledge and Kegan Paul London

  52. [52]

    I Lawrence and Kuei Lin. 1989. A concordance correlation coefficient to evaluate reproducibility.Biometrics (1989), 255–268

  53. [53]

    Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia, Pengchuan Zhang, Graham Neubig, et al. 2024. Genai-bench: Eval- uating and improving compositional text-to-visual generation.arXiv preprint arXiv:2406.13743 (2024)

  54. [54]

    Chunyi Li, Zicheng Zhang, Haoning Wu, Wei Sun, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, and Weisi Lin. 2023. Agiqa-3k: An open database for ai- generated image quality assessment.IEEE Transactions on Circuits and Systems for Video Technology 34, 8 (2023), 6833–6846

  55. [55]

    Jingping Liu, Ziyan Liu, Zhedong Cen, Yan Zhou, Yinan Zou, Weiyan Zhang, Haiyun Jiang, and Tong Ruan. 2025. Can Multimodal Large Language Models Understand Spatial Relations?. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 620–632

  56. [56]

    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. 2024. Grounding dino: Marry- ing dino with grounded pre-training for open-set object detection. InEuropean conference on computer vision . Springer, 38–55

  57. [57]

    Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, and Xiaojuan Qi. 2024. Groma: Localized visual tokenization for grounding multimodal large language models. In European Conference on Computer Vision . Springer, 417–435

  58. [58]

    Rafał K Mantiuk, Anna Tomaszewska, and Radosław Mantiuk. 2012. Comparison of four subjective methods for image quality assessment. InComputer graphics forum, Vol. 31. Wiley Online Library, 2478–2491

  59. [59]

    Alberto Maydeu-Olivares and Anna Brown. 2010. Item response modeling of paired comparison and ranking data. Multivariate Behavioral Research 45, 6 (2010), 935–974

  60. [60]

    Douglas N Morgan. 1955. Icon, index, and symbol in the visual arts.Philosophi- cal Studies: An International Journal for Philosophy in the Analytic Tradition 6, 4 (1955), 49–54. Arxiv 2026, , Ruixiang Jiang and Chang Wen Chen

  61. [61]

    Lia Morra, Antonio Santangelo, Pietro Basci, Luca Piano, Fabio Garcea, Fabrizio Lamberti, and Massimo Leone. 2024. For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives.Computer Vision and Image Understanding 249 (2024), 104187

  62. [62]

    Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowd- sourcing: a study about inter-annotator agreement for multi-label image anno- tation. InProceedings of the international conference on Multimedia information retrieval. 557–566

  63. [63]

    OpenAI. 2025. GPT-Image 1 - OpenAI API Documentation . https://platform. openai.com/docs/models/gpt-image-1

  64. [64]

    OpenAI. 2025. GPT-Image 1.5 - OpenAI API Documentation. https://platform. openai.com/docs/models/gpt-image-1.5Accessed: 2026-01-26

  65. [65]

    Erwin Panofsky. 1955. Meaning in the Visual Arts: Papers in and on Art History . University of Chicago Press

  66. [66]

    Barbara Partee et al. 1984. Compositionality.Varieties of formal semantics 3 (1984), 281–311

  67. [67]

    1991.Peirce on signs: Writings on semiotic

    Charles Sanders Peirce. 1991.Peirce on signs: Writings on semiotic . UNC Press Books

  68. [68]

    1992.The essential peirce, volume 2: Selected philosophical writings (1893-1913)

    Charles Sanders Peirce. 1992.The essential peirce, volume 2: Selected philosophical writings (1893-1913). Vol. 2. Indiana University Press

  69. [69]

    Davide Picca. 2025. Not Minds, but Signs: Reframing LLMs through Semiotics. arXiv preprint arXiv:2505.17080 (2025)

  70. [70]

    Qwen Team. 2025. Qwen Image 2.0. https://qwen.ai/blog?id=qwen-image-2.0 Accessed: 2026-03-27

  71. [71]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

  72. [72]

    In International conference on machine learning

    Learning transferable visual models from natural language supervision. In International conference on machine learning . PMLR, 8748–8763

  73. [73]

    William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, and Ritambhara Singh. 2025. Forgotten polygons: Multimodal large language models are shape-blind. InFindings of the Association for Compu- tational Linguistics: ACL 2025 . 11983–11998

  74. [74]

    Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans.Advances in neural information processing systems 29 (2016)

  75. [75]

    Andrew Samo and Scott Highhouse. 2023. Artificial intelligence and art: Iden- tifying the aesthetic judgment factors that distinguish human-and machine- generated artwork.Psychology of Aesthetics, Creativity, and the Arts (2023)

  76. [76]

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in Neural Information Processing Sys- tems 35 (2022), 25278–25294

  77. [77]

    José L Cendejas Valdez, Heberto Ferreira Medina, Jesús L Soto Sumuano, Gus- tavo A Vanegas Contreras, Miguel A Acuña López, and Gustavo A López Saldaña

  78. [78]

    InFuture of Information and Communication Conference

    Semiotics and Artificial Intelligence (AI): An Analysis of Symbolic Commu- nication in the Age of Technology. InFuture of Information and Communication Conference. Springer, 481–494

  79. [79]

    Jules Van Hees, Tijl Grootswagers, Genevieve L Quek, and Manuel Varlet. 2025. Human perception of art in the age of artificial intelligence.Frontiers in psychol- ogy 15 (2025), 1497469

  80. [80]

    Kailas Vodrahalli and James Zou. 2023. Artwhisperer: A dataset for characteriz- ing human-ai interactions in artistic creations.arXiv preprint arXiv:2306.08141 (2023)

Showing first 80 references.