LLMs Generate Kitsch
Pith reviewed 2026-05-13 22:21 UTC · model grok-4.3
The pith
Large language models systematically generate kitsch because of how they are trained on next-token prediction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLMs generate kitsch as a systematic outcome of their training, which favors statistically frequent and conventional patterns over original or risky ones, producing artifacts that readers experience as hollow despite surface-level appeal.
What carries the argument
Next-token prediction training objective, which rewards outputs matching common training-data patterns and thereby favors the safe, sentimental, and conventional qualities that define kitsch.
If this is right
- Readers perceive LLM stories as kitschier once their personal definition of kitsch is controlled for in the rating task.
- Evaluations of LLM creativity must treat kitsch as a separate dimension from overall quality or human-likeness.
- Applications of LLMs to open-ended creative work such as research or coding inherit the same bias toward conventional outputs.
Where Pith is reading between the lines
- Training changes that penalize predictability could reduce the kitsch output without harming fluency.
- The same mechanism likely applies to image, music, or video generators trained on similar objectives.
- Success metrics for creative AI may need to include explicit checks for conventionality rather than relying on human preference ratings alone.
Load-bearing premise
Kitsch can be isolated and measured as a distinct property independent of other quality judgments, and training is the primary cause rather than prompt design or model scale.
What would settle it
An experiment in which readers rate LLM stories as no kitschier than human stories when definitions are controlled, or a model trained under a different objective that eliminates the elevated kitsch ratings.
Figures
read the original abstract
Large Language Models (LLMs) are increasingly used to generate pictures, texts, music, videos, and other works that have traditionally required human creativity. LLM-generated artifacts are often rated better than human-generated works in controlled studies. At the same time, they can come across as generic and hollow. We propose to resolve this tension by arguing that LLMs systematically generate kitsch, and that this is a consequence of the way in which they are trained. We also show empirically that readers perceive LLM-generated stories as kitschier, if we control for their definition of "kitsch". We discuss implications for the design of future studies and for creative tasks such as research and coding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs systematically generate kitsch as a direct consequence of their training objectives, which accounts for the observed tension between high controlled-study ratings and the generic, hollow quality of outputs. It supports this via an empirical reader study showing that participants rate LLM-generated stories as kitschier than human ones when their individual definitions of kitsch are controlled for, and discusses implications for evaluation protocols in creative tasks such as research and coding.
Significance. If the central claim is substantiated with adequate controls, the work supplies a coherent explanatory framework for limitations in LLM creativity that goes beyond surface-level quality metrics. It could usefully inform the design of future human-AI comparison studies and prompt the development of training or evaluation methods that better distinguish depth from superficial appeal.
major comments (2)
- [Empirical results section] Empirical results section: the reader-perception experiment supplies no information on sample size, story selection criteria, statistical tests, or inter-rater reliability. These omissions prevent assessment of whether the reported difference in kitsch perception is robust or generalizable, directly undermining the empirical support for the central claim.
- [Training mechanism discussion] Training-to-kitsch argument (likely §3): the manuscript asserts rather than derives that the pretraining objective is the primary driver of kitsch; no ablations are reported that vary scale, prompt design, or post-training alignment while holding story content fixed. Without such controls, the causal attribution remains untested and the claim is not load-bearing.
minor comments (2)
- [Abstract] Abstract: the clause 'if we control for their definition of kitsch' should specify the exact experimental procedure used to implement that control.
- [Introduction] Notation and terminology: ensure consistent use of 'kitsch' across the manuscript and provide a brief operational definition early in the text to aid readers unfamiliar with the aesthetic concept.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which identify key areas where the manuscript can be clarified and strengthened. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Empirical results section] Empirical results section: the reader-perception experiment supplies no information on sample size, story selection criteria, statistical tests, or inter-rater reliability. These omissions prevent assessment of whether the reported difference in kitsch perception is robust or generalizable, directly undermining the empirical support for the central claim.
Authors: We agree that these methodological details are essential for assessing robustness. In the revised manuscript we will expand the empirical results section to report the sample size, the criteria used for selecting and matching stories, the statistical tests applied, and inter-rater reliability measures. These additions will allow readers to evaluate the strength and generalizability of the observed difference in kitsch perception. revision: yes
-
Referee: [Training mechanism discussion] Training-to-kitsch argument (likely §3): the manuscript asserts rather than derives that the pretraining objective is the primary driver of kitsch; no ablations are reported that vary scale, prompt design, or post-training alignment while holding story content fixed. Without such controls, the causal attribution remains untested and the claim is not load-bearing.
Authors: We will revise §3 to derive the link more explicitly: the autoregressive pretraining objective minimizes cross-entropy loss and thereby favors high-probability continuations drawn from the training distribution, which statistically favors conventional, formulaic patterns that readers perceive as kitsch. While the current work does not include ablations that vary scale, prompt design, or alignment with content held fixed, the core mechanism follows directly from the properties of next-token prediction itself rather than from any particular model variant. We will add a limitations paragraph noting the absence of such controls and suggesting them as future work. revision: partial
Circularity Check
No significant circularity; central claim is an interpretive assertion supported by independent empirical evidence
full rationale
The paper asserts that LLMs generate kitsch as a consequence of training but provides no equations, fitted parameters, or self-citations that reduce the claim to its own inputs by construction. The reader-perception experiment is described as controlling for participants' definition of kitsch and is presented as separate evidence rather than a statistical renaming of a fit. No uniqueness theorems, ansatzes smuggled via citation, or self-definitional loops appear in the abstract or described structure. The derivation chain is therefore self-contained as an argument plus observation, with no load-bearing step that collapses to a tautology or prior self-result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Kitsch is a measurable perceptual quality that can be isolated by controlling for individual definitions in rating tasks
Reference graph
Works this paper leans on
-
[1]
Rolf Reber, Piotr Winkielman, and Norbert Schwarz
Processing Fluency and Aesthetic Pleasure: Is Beauty in the Perceiver’s Processing Experience? Personality and Social Psychology Review 8(4):364– 382. Rolf Reber, Piotr Winkielman, and Norbert Schwarz
-
[2]
Psychological Science 9(1):45–48
Effects of perceptual fluency on affective judg- ments. Psychological Science 9(1):45–48. John Ruwitch. 2025. ‘AI slop’ videos may be annoying, but they’re racking up views — and ad money. NPR, All Things Considered. Accessed on 15 October 2025. Roger Scruton. 2014. A Point of View: The strangely enduring power of kitsch. BBC News. Retrieved on 06 June 20...
work page 2025
-
[3]
Chenglei Si, Diyi Yang, and Tatsunori Hashimoto
The Ideation-Execution Gap: Execution Out - comes of LLM-Generated versus Human Research Ideas. arXiv: 2506.20803 [cs.CL]. Yikang Sun, Cheng-Hsiang Yang, Yanru Lyu, and Rungtai Lin. 2022. From Pigments to Pixels: A Com- parison of Human and AI Painting. Applied Sciences 12(8). Gyburg Uhlmann. 2025. The Even Sheen of AI: Kitsch, LLMs, and Homogeneity. arXi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.