Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor
Pith reviewed 2026-05-09 20:31 UTC · model grok-4.3
The pith
Temporal features outweigh semantic incongruity in predicting audience appreciation for humor.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Dual Prediction Violation framework shows that temporal features substantially outweigh semantic incongruity in predicting audience appreciation. Analysis of 828 professional Chinese stand-up performances reveals that peak semantic violations matter more than average incongruity levels, and pauses systematically lengthen before high-surprise punchlines in successful performances. This positions timing as a scaffold for semantic surprise, with content and delivery operating in strategic coordination rather than independently.
What carries the argument
The Dual Prediction Violation (DPV) framework, which models the interplay between semantic content violations and temporal dynamics in humor processing.
Load-bearing premise
Quantitative measures of semantic incongruity and temporal features extracted from performance transcripts and timing data accurately reflect the cognitive processes driving audience appreciation.
What would settle it
A new dataset of performances in which temporal features such as pre-punchline pause lengths fail to predict audience appreciation better than semantic incongruity measures, or show no systematic lengthening before high-surprise moments.
Figures
read the original abstract
Humor is a fundamental cognitive phenomenon in which humans derive pleasure from the expectation violations and their resolution, exemplifying the brain's dynamic capacity for predictive processing. Classical humor theories emphasize semantic incongruity as the primary driver of amusement, yet overlook temporal dynamics despite comedians' intuition that "timing is everything." The extent to which temporal structure contributes to humor appreciation and how it interacts with semantic content remains poorly understood. Here, we propose the Dual Prediction Violation (DPV) framework to capture the interplay between content and timing. By analyzing 828 professional Chinese stand-up performances, we show that temporal features substantially outweigh semantic incongruity in predicting audience appreciation. Specifically, we find that peak semantic violations matter more than average incongruity levels, and pauses systematically lengthen before high-surprise punchlines--a strategic coupling that distinguishes successful from unsuccessful performances. These findings reframe humor as temporally scaffolded, where timing and semantic content operate in strategic coordination rather than independently. Our DPV framework bridges humor theory with predictive processing, demonstrating that temporal structure plays a central role in naturalistic humor appreciation with implications for understanding multi-scale prediction integration in linguistic processing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Dual Prediction Violation (DPV) framework to model the interplay between semantic incongruity and temporal dynamics in humor appreciation. Analyzing 828 professional Chinese stand-up performances, it claims that temporal features (e.g., pre-punchline pause lengthening) substantially outweigh semantic incongruity in predicting audience appreciation, with peak semantic violations mattering more than average levels and a strategic coupling distinguishing successful from unsuccessful performances.
Significance. If the empirical results hold after addressing measurement validation, this would meaningfully advance humor research by reframing it as temporally scaffolded predictive processing, bridging classical incongruity theories with multi-scale linguistic prediction. The large-scale naturalistic dataset of 828 performances is a clear strength, as is the attempt to derive falsifiable claims about timing-semantics coordination with potential implications for computational models of humor and cognitive science.
major comments (3)
- [Methods] Methods section: The central claim that temporal features substantially outweigh semantic incongruity depends on unvalidated proxies for both semantic surprise (e.g., how peak violations vs. average incongruity are computed from transcripts, likely via embeddings) and audience appreciation (e.g., laugh timing or ratings). No validation against human judgments of surprise or controls for comedian-specific delivery styles is reported, risking that the reported dominance is an artifact of the chosen metrics.
- [Results] Results section: The regression-style comparisons showing temporal outperformance lack reported details on statistical controls (multiple comparisons correction, comedian fixed effects, audience variability), robustness to alternative operationalizations of DPV quantities, or effect sizes that would confirm the 'substantially outweigh' claim is not driven by proxy choice.
- [Discussion] Discussion section: The interpretation of 'strategic coupling' (pauses lengthening before high-surprise punchlines) as distinguishing successful performances assumes the extracted temporal and semantic measures accurately reflect cognitive processes without confounds; this load-bearing assumption requires explicit testing or sensitivity analysis to support the reframing of humor theory.
minor comments (2)
- [Abstract] The abstract and introduction could more clearly distinguish the DPV framework's novel predictions from prior work on timing in humor to strengthen the contribution statement.
- [Results] Figure or table presenting the regression coefficients comparing temporal vs. semantic predictors would benefit from explicit confidence intervals and model comparison metrics (e.g., AIC or cross-validation scores) for clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have prompted us to clarify and strengthen several aspects of our manuscript. Below, we provide point-by-point responses to the major comments.
read point-by-point responses
-
Referee: [Methods] Methods section: The central claim that temporal features substantially outweigh semantic incongruity depends on unvalidated proxies for both semantic surprise (e.g., how peak violations vs. average incongruity are computed from transcripts, likely via embeddings) and audience appreciation (e.g., laugh timing or ratings). No validation against human judgments of surprise or controls for comedian-specific delivery styles is reported, risking that the reported dominance is an artifact of the chosen metrics.
Authors: We appreciate the referee's concern regarding the validation of our proxies. In the revised manuscript, we will elaborate on the computation of semantic surprise using state-of-the-art embedding models, specifying the exact methods for identifying peak violations versus average incongruity. For audience appreciation, our laugh detection approach follows protocols established in previous computational humor studies. Although we did not conduct a new human validation experiment due to the scale of the dataset, we will add a dedicated subsection discussing the reliability of these proxies based on prior validations and include comedian fixed effects to control for delivery styles. This should address concerns about potential artifacts. revision: partial
-
Referee: [Results] Results section: The regression-style comparisons showing temporal outperformance lack reported details on statistical controls (multiple comparisons correction, comedian fixed effects, audience variability), robustness to alternative operationalizations of DPV quantities, or effect sizes that would confirm the 'substantially outweigh' claim is not driven by proxy choice.
Authors: We agree that more rigorous statistical reporting is essential. The revised Results section will include comprehensive details on our regression models, incorporating comedian fixed effects, corrections for multiple comparisons, and considerations for audience variability. We will report effect sizes and perform additional robustness analyses using alternative measures for semantic surprise and temporal features. These enhancements will substantiate that the observed dominance of temporal features is robust and not an artifact of specific proxy choices. revision: yes
-
Referee: [Discussion] Discussion section: The interpretation of 'strategic coupling' (pauses lengthening before high-surprise punchlines) as distinguishing successful performances assumes the extracted temporal and semantic measures accurately reflect cognitive processes without confounds; this load-bearing assumption requires explicit testing or sensitivity analysis to support the reframing of humor theory.
Authors: The referee raises a valid point about the assumptions underlying our interpretations. In the revised Discussion, we will explicitly address potential confounds and include sensitivity analyses by varying key parameters in our DPV measures. This will provide a more robust foundation for our claims about strategic coupling and the reframing of humor as temporally scaffolded predictive processing. We will also discuss the limitations of observational data in directly testing cognitive mechanisms. revision: yes
Circularity Check
No circularity: empirical results from performance data analysis
full rationale
The paper proposes the DPV framework and reports findings from analyzing 828 Chinese stand-up performances, comparing temporal features against semantic incongruity measures to predict audience appreciation. No equations, derivations, or self-referential definitions appear in the abstract or described claims. Results are framed as data-driven observations (e.g., peak violations matter more, pauses lengthen before punchlines) rather than quantities defined in terms of fitted parameters or reduced by construction to inputs. Self-citations are not load-bearing here, and the central claim rests on external performance transcripts and timing data without reducing to tautology.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Audience appreciation in stand-up performances can be measured reliably from observable responses and used to evaluate predictive processing models.
invented entities (1)
-
Dual Prediction Violation (DPV) framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Annamoradnejad, I., & Zoghi, G. (2024). Colbert: Using bert sentence embedding in parallel neural networks for com- putational humor.Expert Systems with Applications,249, 123685 (cit. on p. 2)
work page 2024
-
[2]
(2001).Humorous texts: A semantic and pragmatic analysis(V ol
Attardo, S. (2001).Humorous texts: A semantic and pragmatic analysis(V ol. 6). Walter de Gruyter. (Cit. on p. 3)
work page 2001
-
[3]
Attardo, S., & Raskin, V . (1991). Script theory revis (it) ed: Joke similarity and joke representation model.Humor: In- ternational Journal of Humor,4(3), 293–347 (cit. on pp. 2, 5)
work page 1991
-
[4]
Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time.Cognitive Psychology,41(3), 254–311 (cit. on p. 2). Bögels, S., Schriefers, H., V onk, W., & Chwilla, D. J. (2011). Prosodic breaks in sentence processing investigated by event-related potentials.Language and Linguistics Com- pass,5(7), 424–440 (cit. on p. 2)
work page 2000
-
[5]
Cattle, A., & Ma, X. (2018). Recognizing humour using word associations and humour anchor extraction.International Conference on Computational Linguistics(cit. on p. 2)
work page 2018
-
[6]
Clark, A. (2018). A nice surprise? predictive processing and the active pursuit of novelty.Phenomenology and the Cog- nitive Sciences,17(3), 521–534 (cit. on p. 5)
work page 2018
-
[7]
Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons.Science,299(5614), 1898–1902 (cit. on p. 2)
work page 2003
-
[8]
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time.Psychological Review,96(3), 459 (cit. on p. 2)
work page 1989
-
[9]
Kao, J. T., Levy, R., & Goodman, N. D. (2016). A computa- tional model of linguistic humor in puns.Cognitive Science, 40(5), 1270–1285 (cit. on p. 2)
work page 2016
-
[10]
Large, E. W., & Jones, M. R. (1999). The dynamics of attend- ing: How people track time-varying events.Psychological Review,106(1), 119 (cit. on pp. 1–3, 5, 6)
work page 1999
-
[11]
Lehne, M., Engel, P., Rohrmeier, M., Menninghaus, W., Ja- cobs, A. M., & Koelsch, S. (2015). Reading a suspenseful literary text activates brain areas related to social cognition and predictive inference.PloS One,10(5), e0124550 (cit. on p. 2)
work page 2015
-
[12]
Lehne, M., & Koelsch, S. (2015). Toward a general psycholog- ical model of tension and suspense.Frontiers in Psychology, 6, 79 (cit. on pp. 2, 5)
work page 2015
-
[13]
Ma, Y ., Peng, Y ., Yang, F., Zha, S., Zhang, C., Jia, Z., Zheng, Z., & Zhu, Y . (2026). Narrativeloom: Enhancing creative storytelling through multi-persona collaborative improvi- sation.ACM Conference on Human Factors in Computing Systems (CHI)(cit. on p. 6)
work page 2026
-
[14]
Ma, Y ., Peng, Y ., & Zhu, Y . (2025). Word embeddings track so- cial group changes across 70 years in china.Annual Meeting of the Cognitive Science Society (CogSci)(cit. on p. 3)
work page 2025
-
[15]
Mihalcea, R., Strapparava, C., & Pulman, S. (2010). Com- putational models for incongruity detection in humour.In- ternational Conference on Intelligent Text Processing and Computational Linguistics(cit. on p. 2)
work page 2010
-
[16]
Moran, J. M., Wig, G. S., Adams Jr, R. B., Janata, P., & Kelley, W. M. (2004). Neural correlates of humor detection and appreciation.Neuroimage,21(3), 1055–1060 (cit. on pp. 1, 2)
work page 2004
-
[17]
Norrick, N. R. (2001). On the conversational performance of narrative jokes: Toward an account of timing.Humor: International Journal of Humor Research,14(3), 255–274 (cit. on p. 3)
work page 2001
-
[18]
Zhu, Y ., & Zheng, Z. (2025). Probing and inducing com- binational creativity in vision-language models.ACM Con- ference on Human Factors in Computing Systems (CHI) (cit. on p. 6)
work page 2025
-
[19]
Suls, J. (1972). A two-stage model for the appreciation of jokes and cartoons: An information-processing analysis. InThe psychology of humor: Theoretical perspectives and empirical issues(pp. 81–100, V ol. 1). Academic Press. (Cit. on pp. 1, 2, 5)
work page 1972
-
[20]
Suls, J. (1983). Cognitive processes in humor appreciation. InHandbook of humor research: Volume 1: Basic issues (pp. 39–57). Springer. (Cit. on pp. 1, 2). Van de Cruys, S., Metzinger, T. K., & Wiese, W. (2017). Af- fective value in the predictive mind. InPhilosophy and predictive processing(pp. 1–21). MIND Group; Frankfurt am Main. (Cit. on pp. 1, 2)
work page 1983
-
[21]
Veale, T. (2004). Incongruity in humor: Root cause or epiphe- nomenon?Humor: International Journal of Humor Re- search,17(4), 419–428 (cit. on p. 2)
work page 2004
-
[22]
Vrticka, P., Black, J. M., & Reiss, A. L. (2013). The neural basis of humour processing.Nature Reviews Neuroscience, 14(12), 860–868 (cit. on p. 2)
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.