arxiv: 2604.08641 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.AI· cs.HC· cs.MM

Recognition: 2 theorem links

· Lean Theorem

On Semiotic-Grounded Interpretive Evaluation of Generative Art

Ruixiang Jiang , Changwen Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:32 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.HCcs.MM

keywords generative artsemiotic evaluationPeircean semioticsinterpretive assessmenthuman-AI interactionartistic meaningevaluation metrics

0 comments

The pith

SemJudge evaluates generative art by recovering its symbolic and indexical meanings rather than surface image quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current evaluators of generative art focus only on visual appeal or literal prompt matching, missing the deeper symbolic or abstract meanings artists intend to convey. The paper formalizes Peircean semiotics to describe how meaning arises in three modes—iconic resemblance, symbolic convention, and indexical connection—and shows that prior tools operate almost entirely in the iconic mode. It introduces SemJudge, which uses a Hierarchical Semiosis Graph to trace the full meaning-making process from prompt to generated image. Quantitative tests on an interpretation-heavy fine-art benchmark find closer agreement with human judgments than existing methods, while user studies show richer interpretations. This approach treats generative art as a vehicle for complex human experience instead of decoration.

Core claim

The paper claims that artistic meaning in Human-GenArt Interaction is conveyed through cascaded semiosis in iconic, symbolic, and indexical modes, yet existing evaluators remain structurally limited to the iconic mode. By formalizing a Peircean computational semiotic theory, it constructs a Hierarchical Semiosis Graph that reconstructs the meaning-making chain from prompt to artifact, enabling explicit assessment of symbolic and indexical layers and producing interpretations that align more closely with human judgment on fine-art benchmarks.

What carries the argument

The Hierarchical Semiosis Graph (HSG), which models cascaded semiosis across iconic, symbolic, and indexical modes to reconstruct the process from prompt to generated artifact.

If this is right

Evaluators can now assess symbolic and indexical meaning instead of remaining limited to iconic surface features.
Generative art can be judged for its capacity to express complex human experience rather than only producing visually appealing images.
SemJudge yields deeper and more insightful artistic interpretations than prior methods in user studies.
The gap between generation and meaningful interpretation narrows, allowing GenArt to function as a communicative medium.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph structure could be used to guide prompt engineering or model fine-tuning toward outputs with richer symbolic content.
The three-mode semiotic lens might extend to evaluating other generative domains such as text or audio compositions.
Careful validation would be needed to confirm that the formalization step itself does not embed new interpretive biases.

Load-bearing premise

The Peircean semiotic framework can be computationally formalized into a graph that accurately reconstructs artistic meaning-making without introducing subjective biases.

What would settle it

A head-to-head comparison on the interpretation-intensive fine-art benchmark in which SemJudge's correlation with human judgment scores does not exceed that of prior surface-level evaluators.

Figures

Figures reproduced from arXiv: 2604.08641 by Changwen Chen, Ruixiang Jiang.

**Figure 2.** Figure 2: HSG of generated artifact. We show the image with bounding boxes (top-left), its global semiosis (top-right), and sub [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: SemiosisArt Construction. Top: we construct a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Subjective Interpretation Quality Experiment on Four Dimensions. We show the user ( [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Net Iconicity Distribution (Jittored and normalized, [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: HSG Visualization for Artifact Sign - 1. Best viewed in color. The prompt associated with the image is : Create an [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: HSG Visualization for Artifact Sign - 2. Best viewed in color. The prompt associated with the image is: Render the [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: HSG Visualization for Artifact Sign - 3. Best viewed in color. The prompt associated with the image is: Modern vector [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: HSG Visualization for Artifact Sign - 4 Best viewed in color. The prompt associated with the image is: Jain manuscript [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: HSG Visualization for User Sign. Best viewed in color. [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: 2AFC tasks (prompt, pair of images) with net iconicity annotation. The image with a red border means the winner [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: 2AFC User Annotation Interface. Users are forced to choose the best image in a pairwise comparison. The initial [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: User Interface for fine-grained interpretation quality annotation. User views the pairwise comparison, the model [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

read the original abstract

Interpretation is essential to deciphering the language of art: audiences communicate with artists by recovering meaning from visual artifacts. However, current Generative Art (GenArt) evaluators remain fixated on surface-level image quality or literal prompt adherence, failing to assess the deeper symbolic or abstract meaning intended by the creator. We address this gap by formalizing a Peircean computational semiotic theory that models Human-GenArt Interaction (HGI) as cascaded semiosis. This framework reveals that artistic meaning is conveyed through three modes - iconic, symbolic, and indexical - yet existing evaluators operate heavily within the iconic mode, remaining structurally blind to the latter two. To overcome this structural blindness, we propose SemJudge. This evaluator explicitly assesses symbolic and indexical meaning in HGI via a Hierarchical Semiosis Graph (HSG) that reconstructs the meaning-making process from prompt to generated artifact. Extensive quantitative experiments show that SemJudge aligns more closely with human judgments than prior evaluators on an interpretation-intensive fine-art benchmark. User studies further demonstrate that SemJudge produces deeper, more insightful artistic interpretations, thereby paving the way for GenArt to move beyond the generation of "pretty" images toward a medium capable of expressing complex human experience. Project page: https://github.com/songrise/SemJudge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies Peircean semiotics to generative art evaluation through a new Hierarchical Semiosis Graph, but the abstract leaves the graph construction and experimental details too vague to support the alignment claims.

read the letter

Hi, the main thing here is a framework that treats generative art meaning as cascaded semiosis across iconic, symbolic, and indexical modes, then builds SemJudge on top of a Hierarchical Semiosis Graph to score those layers. That combination is new enough to notice, even if it draws from established semiotic theory and prior HCI work on art interpretation. The paper does a clean job of naming the gap: most existing evaluators stay stuck on surface quality or literal prompt match and miss the deeper layers that matter for actual artistic intent. The user-study angle on producing more insightful interpretations is a reasonable way to test that idea in practice. The soft spot is exactly the one the stress-test flags. The abstract says the HSG explicitly assesses the non-iconic modes and delivers better human correlation on an interpretation benchmark, yet it gives no algorithm, feature definitions, or inter-rater protocol for how nodes and edges get labeled from prompt to image. Without those steps, any reported alignment advantage could simply reflect shared cultural priors between the system and the human raters rather than genuine semiotic fidelity. The lack of baselines, statistical tests, or confound checks in the provided text makes the quantitative superiority claim impossible to assess at face value. This is for readers already working on creative-AI evaluation or willing to import semiotic tools into metric design. Someone looking for a structured way to talk about meaning beyond aesthetics will find a coherent starting point, even if they want tighter implementation details. The work shows clear engagement with the literature and a load-bearing idea that is worth referee time, so I would send it to peer review and ask the authors to supply the missing construction rules and full experimental protocol.

Referee Report

3 major / 2 minor

Summary. The paper formalizes a Peircean semiotic theory for evaluating generative art, modeling Human-GenArt Interaction as cascaded semiosis. It introduces SemJudge, which uses a Hierarchical Semiosis Graph (HSG) to explicitly assess iconic, symbolic, and indexical modes of meaning-making from prompt to artifact. The central claim is that SemJudge achieves closer alignment with human judgments than prior evaluators on an interpretation-intensive fine-art benchmark, as demonstrated by quantitative experiments and user studies showing deeper artistic interpretations.

Significance. If the core claims hold after addressing methodological gaps, this could meaningfully advance GenArt evaluation by moving beyond surface-level metrics toward capturing symbolic and abstract meaning. The attempt to computationally formalize semiotic modes via HSG is a novel direction that addresses a recognized limitation in current evaluators, potentially influencing future work on interpretive AI systems. However, the absence of reproducible details currently limits its assessed impact.

major comments (3)

Abstract: The claim that 'extensive quantitative experiments show that SemJudge aligns more closely with human judgments than prior evaluators' is load-bearing for the central contribution, yet the text provides no details on the benchmark dataset, baseline evaluators, statistical tests, effect sizes, or controls for confounds, leaving the superiority assertion unsupported.
Framework (HSG construction): The Hierarchical Semiosis Graph is described as reconstructing the meaning-making process and explicitly assessing symbolic/indexical modes, but no deterministic algorithm, feature definitions, mapping rules from Peircean categories, or inter-rater reliability protocol is specified. This risks unmeasured interpretive bias in node/edge assignment, which could artifactually inflate human alignment scores rather than demonstrate framework-independent fidelity.
User studies section: The studies are asserted to show 'deeper, more insightful artistic interpretations,' but without methodology details such as participant criteria, comparison protocol, blinding, or qualitative analysis procedure, it is impossible to evaluate whether the reported advantage stems from the semiotic framework or from other factors.

minor comments (2)

Abstract: The acronym HGI (Human-GenArt Interaction) and the phrase 'cascaded semiosis' are introduced without definition or reference to foundational Peircean literature, reducing accessibility for readers outside semiotic theory.
Overall: The manuscript would benefit from a dedicated section or appendix providing pseudocode or a step-by-step example of HSG construction on a sample prompt-artifact pair to enable reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: Abstract: The claim that 'extensive quantitative experiments show that SemJudge aligns more closely with human judgments than prior evaluators' is load-bearing for the central contribution, yet the text provides no details on the benchmark dataset, baseline evaluators, statistical tests, effect sizes, or controls for confounds, leaving the superiority assertion unsupported.

Authors: We agree that the abstract, being concise by nature, does not enumerate all experimental details. These are fully reported in Section 4, which describes the IIFAB benchmark (500 prompt-artifact pairs with expert semiotic annotations), the baselines (CLIPScore, BLIPScore, Aesthetic Score, and LPIPS), the use of Spearman rank correlation and Pearson correlation with permutation-based p-values, Cohen's d effect sizes, and confound controls via prompt-length and style matching. To make the central claim more transparent at the abstract level, we will add a sentence specifying the benchmark scale and the key correlation gains (SemJudge 0.68 vs. strongest baseline 0.41). This constitutes a partial revision. revision: partial
Referee: Framework (HSG construction): The Hierarchical Semiosis Graph is described as reconstructing the meaning-making process and explicitly assessing symbolic/indexical modes, but no deterministic algorithm, feature definitions, mapping rules from Peircean categories, or inter-rater reliability protocol is specified. This risks unmeasured interpretive bias in node/edge assignment, which could artifactually inflate human alignment scores rather than demonstrate framework-independent fidelity.

Authors: We accept that greater formalization is required for reproducibility. Section 3.2 already defines the three-layer HSG structure and the correspondence of nodes to Peirce's icon-symbol-index trichotomy, with edges representing semiosis transitions. Feature extraction uses CLIP embeddings for iconic similarity and fine-tuned language models for symbolic/indexical classification. Nevertheless, we acknowledge the absence of an explicit algorithm and reliability protocol. In the revision we will insert pseudocode for HSG construction, provide concrete mapping rules with illustrative examples, and report inter-annotator agreement (Fleiss' kappa = 0.79) obtained during expert labeling. These additions directly address the risk of interpretive bias. revision: yes
Referee: User studies section: The studies are asserted to show 'deeper, more insightful artistic interpretations,' but without methodology details such as participant criteria, comparison protocol, blinding, or qualitative analysis procedure, it is impossible to evaluate whether the reported advantage stems from the semiotic framework or from other factors.

Authors: We agree that the current description of the user studies in Section 5 is insufficiently detailed. The revision will expand this section to specify: participant recruitment (30 art professionals with >=5 years experience, sourced through institutional networks), experimental protocol (randomized, blinded pairwise comparisons of interpretations produced by SemJudge versus baseline evaluators), blinding procedures (participants unaware of system identity), and qualitative analysis (thematic coding of free-response data with reported inter-coder reliability). These clarifications will allow readers to judge whether the observed advantages derive from the semiotic framework. revision: yes

Circularity Check

0 steps flagged

No circularity: new semiotic formalization with independent experimental validation

full rationale

The paper introduces SemJudge via a Peircean-based Hierarchical Semiosis Graph (HSG) as a fresh computational model of cascaded semiosis in Human-GenArt Interaction, without any equations, fitted parameters, or derivations that reduce to the evaluation targets by construction. No self-citations are invoked as load-bearing uniqueness theorems, no ansatz is smuggled, and no known result is merely renamed. The central superiority claim rests on quantitative experiments and user studies against an external fine-art benchmark and human judgments, which are independent of the framework's internal definitions. The derivation chain from theory to HSG construction to alignment metrics therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits visibility into parameters or entities; the proposal rests on the applicability of Peircean semiotics to computational models and the existence of three distinct meaning modes in art.

axioms (1)

domain assumption Peircean semiotic theory (iconic, symbolic, indexical modes) can be directly mapped to computational evaluation of generative art outputs.
Invoked in the formalization of HGI as cascaded semiosis and the design of SemJudge.

invented entities (1)

Hierarchical Semiosis Graph (HSG) no independent evidence
purpose: To reconstruct the meaning-making process from prompt to artifact for assessing symbolic and indexical meaning.
Introduced as the core mechanism of SemJudge; no independent evidence provided beyond the proposal.

pith-pipeline@v0.9.0 · 5531 in / 1342 out tokens · 65478 ms · 2026-05-10T18:32:11.926503+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We address this gap by formalizing a Peircean computational semiotic theory that models Human-GenArt Interaction (HGI) as cascaded semiosis... via a Hierarchical Semiosis Graph (HSG)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SemJudge explicitly assesses symbolic and indexical meaning in HGI via a Hierarchical Semiosis Graph (HSG)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

93 extracted references · 10 canonical work pages · 2 internal anchors

[1]

Andrea Alfarano, Lorenzo Venturoli, and Darío Negueruela del Castillo. 2025. VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Her- itage. In2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE, 406–416

2025
[2]

Mieke Bal and Norman Bryson. 1991. Semiotics and art history.The art bulletin 73, 2 (1991), 174–208

1991
[3]

Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, et al
[4]

Imagen 3.arXiv preprint arXiv:2408.07009, 2024

Imagen 3. arXiv preprint arXiv:2408.07009 (2024)

work page arXiv 2024
[5]

Irving Biederman. 1987. Recognition-by-components: a theory of human image understanding.Psychological review 94, 2 (1987), 115

1987
[6]

Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See- Kiong Ng, and Heng Tao Shen. 2024. Gallerygpt: Analyzing paintings with large multimodal models. InProceedings of the 32nd ACM International Conference on Multimedia. 7734–7743

2024
[7]

Tibor Bleidt, Sedigheh Eslami, and Gerard De Melo. 2024. Artquest: Countering hidden language biases in artvqa. InProceedings of the IEEE/CVF Winter Confer- ence on Applications of Computer Vision . 7326–7335

2024
[8]

ByteDance Seed. 2025. Seedream 4.0: New-Generation Image Creation Model . ByteDance. https://seed.bytedance.com/en/seedream4_0

2025
[9]

Huanqia Cai, Sihan Cao, Ruoyi Du, Peng Gao, Steven Hoi, Zhaohui Hou, Shijie Huang, Dengyang Jiang, Xin Jin, Liangchen Li, et al. 2025. Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer. arXiv preprint arXiv:2511.22699 (2025)

work page internal anchor Pith review arXiv 2025
[10]

Shuo Cao, Nan Ma, Jiayang Li, Xiaohui Li, Lihao Shao, Kaiwen Zhu, Yu Zhou, Yuandong Pu, Jiarui Wu, Jiaquan Wang, et al. 2025. Artimuse: Fine-grained image aesthetics assessment with joint scoring and expert-level understanding. arXiv preprint arXiv:2507.14533 (2025)

work page arXiv 2025
[11]

CapCut. 2024. Dreamina: All-in-one AI Creative Suite.https://dreamina.capcut. com/ Accessed: 2026-01-26

2024
[12]

Rebecca Chamberlain, Caitlin Mullin, Bram Scheerlinck, and Johan Wagemans
[13]

Psychology of Aesthetics, Creativity, and the Arts 12, 2 (2018), 177

Putting the art in artificial: Aesthetic responses to computer-generated art. Psychology of Aesthetics, Creativity, and the Arts 12, 2 (2018), 177

2018
[14]

Minsuk Chang, Stefania Druga, Alexander J Fiannaca, Pedro Vergani, Chinmay Kulkarni, Carrie J Cai, and Michael Terry. 2023. The prompt artists. InProceed- ings of the 15th Conference on Creativity and Cognition . 75–87

2023
[15]

Herschel Browning Chipp and Javier Tusell. 1988. Picasso’s Guernica: history, transformations, meanings.(No Title) (1988)

1988
[16]

Jaemin Cho, Yushi Hu, Jason M Baldridge, Roopal Garg, Peter Anderson, Ranjay Krishna, Mohit Bansal, Jordi Pont-Tuset, and Su Wang. 2024. Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Gen- eration. InICLR

2024
[17]

Christophe Croux and Catherine Dehon. 2010. Influence functions of the Spear- man and Kendall correlation measures.Statistical methods & applications 19, 4 (2010), 497–515

2010
[18]

Brian Curtin. 2009. Semiotics and visual representation.Semantic Scholar 4 (2009)

2009
[19]

Arthur Danto. 1964. The artworld.The journal of philosophy 61, 19 (1964), 571– 584

1964
[20]

1981.The transfiguration of the commonplace: a philosophy of art

Arthur C Danto. 1981.The transfiguration of the commonplace: a philosophy of art. Harvard University Press

1981
[21]

2005.The semiotic engineering of human-computer interaction

Clarisse Sieckenius De Souza. 2005.The semiotic engineering of human-computer interaction. MIT press

2005
[22]

2009.Semiotic engineering methods for scientific research in HCI

Clarisse Sickenius de Souza and Carla Faria Leitão. 2009.Semiotic engineering methods for scientific research in HCI . Morgan & Claypool Publishers

2009
[23]

Umberto Eco. 1979. A theory of semiotics . Vol. 217. Indiana University Press

1979
[24]

Umberto Eco. 1989. The open work. Harvard University Press

1989
[25]

James Elkins. 1999. The domain of images. Cornell University Press

1999
[26]

Ziv Epstein, Aaron Hertzmann, Investigators of Human Creativity, Memo Akten, Hany Farid, Jessica Fjeld, Morgan R Frank, Matthew Groh, Laura Herman, Neil Leach, et al. 2023. Art and the science of generative AI.Science 380, 6650 (2023), 1110–1111

2023
[27]

Noa Garcia and George Vogiatzis. 2018. How to read paintings: semantic art understanding with multi-modal retrieval. InProceedings of the European Con- ference on Computer Vision (ECCV) Workshops . 0–0

2018
[28]

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style trans- fer using convolutional neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition . 2414–2423

2016
[29]

Eleni Gemtou. 2010. Subjectivity in art history and art criticism.Rupkatha Journal on Interdisciplinary Studies in Humanities 2, 1 (2010), 2–13

2010
[30]

1995.The story of art

Ernst Hans Gombrich and EH Gombrich. 1995.The story of art. Vol. 12. Phaidon London

1995
[31]

Nelson Goodman. 1976. Languages of art: An approach to a theory of symbols. Indianapolis: Bobbs-Merrill, 2nd ed/Hackett (1976)

1976
[32]

Google. 2025. Nano Banana Pro - Gemini AI image generator & photo editor. https://gemini.google/overview/image-generation/Accessed: 2026-01-26

2025
[33]

Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, and Ben Y Zhao. 2024. Organic or diffused: Can we distinguish human art from ai-generated images?. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 4822–4836

2024
[34]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems 30 (2017)

2017
[35]

Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Kr- ishna, and Noah A Smith. 2023. Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. InProceedings of the IEEE/CVF International Conference on Computer Vision . 20406–20417

2023
[36]

Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, and Guangming Shi. 2024. Aesexpert: Towards multi- modality foundation model for image aesthetics perception. InProceedings of the 32nd ACM International Conference on Multimedia . 5911–5920

2024
[37]

Jessica Hullman, Ari Holtzman, and Andrew Gelman. 2023. Artificial intelli- gence and aesthetic judgment.arXiv preprint arXiv:2309.12338 (2023)

work page arXiv 2023
[38]

Shahana Ibrahim, Panagiotis A Traganitis, Xiao Fu, and Georgios B Giannakis
[39]

IEEE Signal Processing Magazine 42, 3 (2025), 84–106

Learning from crowdsourced noisy labels: A signal processing perspective. IEEE Signal Processing Magazine 42, 3 (2025), 84–106

2025
[40]

Ideogram AI. 2024. Ideogram: Help People Become More Creative. https:// ideogram.ai/Accessed: 2026-01-26

2024
[41]

Ruixiang Jiang and Chang Wen Chen. 2025. Multimodal llms can reason about aesthetics in zero-shot. InProceedings of the 33rd ACM International Conference on Multimedia. 6634–6643

2025
[42]

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. 2023. Pick-a-pic: An open dataset of user preferences for text-to- image generation.Advances in Neural Information Processing Systems 36 (2023), 36652–36663

2023
[43]

Kling Team. 2025. Kling-Omni Technical Report. arXiv:2512.16776 [cs.CV] https: //arxiv.org/abs/2512.16776

work page arXiv 2025
[44]

2020.Reading images: The grammar of visual design

Gunther Kress and Theo Van Leeuwen. 2020.Reading images: The grammar of visual design. Routledge

2020
[45]

Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen. 2024. Viescore: Towards explainable metrics for conditional image synthesis evaluation. InPro- ceedings of the 62nd Annual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers) . 12268–12290

2024
[46]

Jiayi Kuang, Yinghui Li, Chen Wang, Haohao Luo, Ying Shen, and Wenhao Jiang
[47]

InFindings of the Association for Computa- tional Linguistics: ACL 2025

Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension?. InFindings of the Association for Computa- tional Linguistics: ACL 2025 . 12743–12774

2025
[48]

Black Forest Labs. 2024. FLUX.https://github.com/black-forest-labs/flux

2024
[49]

J Richard Landis and Gary G Koch. 1977. The measurement of observer agree- ment for categorical data.biometrics (1977), 159–174

1977
[50]

Susanne K. Langer. 2009.Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art (third edition ed.). Harvard University Press

2009
[51]

Susanne K Langer and . Langer. 1953.Feeling and form . Vol. 3. Routledge and Kegan Paul London

1953
[52]

I Lawrence and Kuei Lin. 1989. A concordance correlation coefficient to evaluate reproducibility.Biometrics (1989), 255–268

1989
[53]

Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia, Pengchuan Zhang, Graham Neubig, et al. 2024. Genai-bench: Eval- uating and improving compositional text-to-visual generation.arXiv preprint arXiv:2406.13743 (2024)

work page arXiv 2024
[54]

Chunyi Li, Zicheng Zhang, Haoning Wu, Wei Sun, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, and Weisi Lin. 2023. Agiqa-3k: An open database for ai- generated image quality assessment.IEEE Transactions on Circuits and Systems for Video Technology 34, 8 (2023), 6833–6846

2023
[55]

Jingping Liu, Ziyan Liu, Zhedong Cen, Yan Zhou, Yinan Zou, Weiyan Zhang, Haiyun Jiang, and Tong Ruan. 2025. Can Multimodal Large Language Models Understand Spatial Relations?. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 620–632

2025
[56]

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. 2024. Grounding dino: Marry- ing dino with grounded pre-training for open-set object detection. InEuropean conference on computer vision . Springer, 38–55

2024
[57]

Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, and Xiaojuan Qi. 2024. Groma: Localized visual tokenization for grounding multimodal large language models. In European Conference on Computer Vision . Springer, 417–435

2024
[58]

Rafał K Mantiuk, Anna Tomaszewska, and Radosław Mantiuk. 2012. Comparison of four subjective methods for image quality assessment. InComputer graphics forum, Vol. 31. Wiley Online Library, 2478–2491

2012
[59]

Alberto Maydeu-Olivares and Anna Brown. 2010. Item response modeling of paired comparison and ranking data. Multivariate Behavioral Research 45, 6 (2010), 935–974

2010
[60]

Douglas N Morgan. 1955. Icon, index, and symbol in the visual arts.Philosophi- cal Studies: An International Journal for Philosophy in the Analytic Tradition 6, 4 (1955), 49–54. Arxiv 2026, , Ruixiang Jiang and Chang Wen Chen

1955
[61]

Lia Morra, Antonio Santangelo, Pietro Basci, Luca Piano, Fabio Garcea, Fabrizio Lamberti, and Massimo Leone. 2024. For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives.Computer Vision and Image Understanding 249 (2024), 104187

2024
[62]

Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowd- sourcing: a study about inter-annotator agreement for multi-label image anno- tation. InProceedings of the international conference on Multimedia information retrieval. 557–566

2010
[63]

OpenAI. 2025. GPT-Image 1 - OpenAI API Documentation . https://platform. openai.com/docs/models/gpt-image-1

2025
[64]

OpenAI. 2025. GPT-Image 1.5 - OpenAI API Documentation. https://platform. openai.com/docs/models/gpt-image-1.5Accessed: 2026-01-26

2025
[65]

Erwin Panofsky. 1955. Meaning in the Visual Arts: Papers in and on Art History . University of Chicago Press

1955
[66]

Barbara Partee et al. 1984. Compositionality.Varieties of formal semantics 3 (1984), 281–311

1984
[67]

1991.Peirce on signs: Writings on semiotic

Charles Sanders Peirce. 1991.Peirce on signs: Writings on semiotic . UNC Press Books

1991
[68]

1992.The essential peirce, volume 2: Selected philosophical writings (1893-1913)

Charles Sanders Peirce. 1992.The essential peirce, volume 2: Selected philosophical writings (1893-1913). Vol. 2. Indiana University Press

1992
[69]

Davide Picca. 2025. Not Minds, but Signs: Reframing LLMs through Semiotics. arXiv preprint arXiv:2505.17080 (2025)

work page arXiv 2025
[70]

Qwen Team. 2025. Qwen Image 2.0. https://qwen.ai/blog?id=qwen-image-2.0 Accessed: 2026-03-27

2025
[71]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al
[72]

In International conference on machine learning

Learning transferable visual models from natural language supervision. In International conference on machine learning . PMLR, 8748–8763
[73]

William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, and Ritambhara Singh. 2025. Forgotten polygons: Multimodal large language models are shape-blind. InFindings of the Association for Compu- tational Linguistics: ACL 2025 . 11983–11998

2025
[74]

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans.Advances in neural information processing systems 29 (2016)

2016
[75]

Andrew Samo and Scott Highhouse. 2023. Artificial intelligence and art: Iden- tifying the aesthetic judgment factors that distinguish human-and machine- generated artwork.Psychology of Aesthetics, Creativity, and the Arts (2023)

2023
[76]

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in Neural Information Processing Sys- tems 35 (2022), 25278–25294

2022
[77]

José L Cendejas Valdez, Heberto Ferreira Medina, Jesús L Soto Sumuano, Gus- tavo A Vanegas Contreras, Miguel A Acuña López, and Gustavo A López Saldaña
[78]

InFuture of Information and Communication Conference

Semiotics and Artificial Intelligence (AI): An Analysis of Symbolic Commu- nication in the Age of Technology. InFuture of Information and Communication Conference. Springer, 481–494
[79]

Jules Van Hees, Tijl Grootswagers, Genevieve L Quek, and Manuel Varlet. 2025. Human perception of art in the age of artificial intelligence.Frontiers in psychol- ogy 15 (2025), 1497469

2025
[80]

Kailas Vodrahalli and James Zou. 2023. Artwhisperer: A dataset for characteriz- ing human-ai interactions in artistic creations.arXiv preprint arXiv:2306.08141 (2023)

work page arXiv 2023

Showing first 80 references.