Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

Chi Zhang; Junchen Lyu; Yixin Zhu; Yongqian Peng; Yuxi Ma

arxiv: 2605.00143 · v1 · submitted 2026-04-30 · 💻 cs.CL

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

Yuxi Ma , Yongqian Peng , Junchen Lyu , Chi Zhang , Yixin Zhu This is my paper

Pith reviewed 2026-05-09 20:31 UTC · model grok-4.3

classification 💻 cs.CL

keywords humor appreciationtemporal scaffoldingsemantic incongruitystand-up comedypredictive processingpause timingDual Prediction Violationaudience response

0 comments

The pith

Temporal features outweigh semantic incongruity in predicting audience appreciation for humor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that in stand-up comedy, the timing of delivery plays a larger role than the semantic content of the surprise itself in determining how much audiences appreciate a joke. By examining 828 professional Chinese performances, it finds that comedians lengthen pauses before high-surprise punchlines and that these temporal adjustments predict success more strongly than overall levels of semantic incongruity. Peak violations of expectation matter more than average ones, showing that humor depends on when the surprise arrives as much as what the surprise is. A sympathetic reader would care because the work links everyday comedy to the brain's ongoing prediction processes and explains why skilled performers treat timing as central rather than incidental.

Core claim

The Dual Prediction Violation framework shows that temporal features substantially outweigh semantic incongruity in predicting audience appreciation. Analysis of 828 professional Chinese stand-up performances reveals that peak semantic violations matter more than average incongruity levels, and pauses systematically lengthen before high-surprise punchlines in successful performances. This positions timing as a scaffold for semantic surprise, with content and delivery operating in strategic coordination rather than independently.

What carries the argument

The Dual Prediction Violation (DPV) framework, which models the interplay between semantic content violations and temporal dynamics in humor processing.

Load-bearing premise

Quantitative measures of semantic incongruity and temporal features extracted from performance transcripts and timing data accurately reflect the cognitive processes driving audience appreciation.

What would settle it

A new dataset of performances in which temporal features such as pre-punchline pause lengths fail to predict audience appreciation better than semantic incongruity measures, or show no systematic lengthening before high-surprise moments.

Figures

Figures reproduced from arXiv: 2605.00143 by Chi Zhang, Junchen Lyu, Yixin Zhu, Yongqian Peng, Yuxi Ma.

**Figure 1.** Figure 1: Illustration of the proposed Dual Prediction Violation (DPV) mechanism. Setup statements establish temporal and semantic expectations through regular patterns (0.6s pauses, small semantic distances between consecutive sentences). The punchline simultaneously violates both dimensions: an extended pause (1.5s) disrupts established rhythm while semantically distant content deviates sharply from contextual pre… view at source ↗

**Figure 2.** Figure 2: Correlation between features and audience appreciation. Partial correlation coefficients (controlling for performance duration) between temporal features (green labels) and semantic features (yellow labels) with audience vote rates. Temporal dynamics substantially outperform semantic features in predicting appreciation, with average pause duration and pause variability showing the strongest effects. Stati… view at source ↗

**Figure 4.** Figure 4: Group comparisons of semantic features. Mean values for high-performing (top 20%, red) and low-performing (bottom 20%, blue) sets across semantic dimensions: average distance (left) and peak distance (right). High-performing comedians show greater semantic incongruity, with peak distance demonstrating stronger discrimination than average distance. Error bars represent standard deviation. Statistical signif… view at source ↗

**Figure 5.** Figure 5: Strategic coupling between timing and semantic surprise. Pauses systematically extend before high-surprise content, with successful comedians showing stronger modulation. (a) Across all performances, pauses before high-surprise pairs (top 20% semantic distance) are 35.6% longer than before low-surprise pairs (bottom 20% semantic distance). (b) High-performing comedians (top 20%, red) exhibit steeper timin… view at source ↗

read the original abstract

Humor is a fundamental cognitive phenomenon in which humans derive pleasure from the expectation violations and their resolution, exemplifying the brain's dynamic capacity for predictive processing. Classical humor theories emphasize semantic incongruity as the primary driver of amusement, yet overlook temporal dynamics despite comedians' intuition that "timing is everything." The extent to which temporal structure contributes to humor appreciation and how it interacts with semantic content remains poorly understood. Here, we propose the Dual Prediction Violation (DPV) framework to capture the interplay between content and timing. By analyzing 828 professional Chinese stand-up performances, we show that temporal features substantially outweigh semantic incongruity in predicting audience appreciation. Specifically, we find that peak semantic violations matter more than average incongruity levels, and pauses systematically lengthen before high-surprise punchlines--a strategic coupling that distinguishes successful from unsuccessful performances. These findings reframe humor as temporally scaffolded, where timing and semantic content operate in strategic coordination rather than independently. Our DPV framework bridges humor theory with predictive processing, demonstrating that temporal structure plays a central role in naturalistic humor appreciation with implications for understanding multi-scale prediction integration in linguistic processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a DPV framework and reports from 828 Chinese stand-up shows that pre-punchline pauses and other timing features predict audience response better than semantic incongruity scores.

read the letter

Timing really does seem to be everything according to this paper, at least in their analysis of stand-up comedy. They propose the DPV framework to link semantic surprise with temporal scaffolding and back it with data from hundreds of performances. They do a few things right. Pulling together 828 professional shows gives them a decent sample size for naturalistic data, which is better than lab experiments on humor. Tying it to predictive coding accounts makes sense as an update to older incongruity theories. The specific pattern they report – longer pauses before high-surprise punchlines in successful acts – is a concrete empirical claim that could be useful if replicated. The weak part is the measurement pipeline. The claim that temporal features substantially outweigh semantic incongruity depends on how they turned transcripts into surprise scores and how they scored appreciation. If those are based on embeddings without validation against human ratings, or if comedian style correlates with both timing and success, the result could shift. The abstract doesn't lay out the stats or controls, so it's hard to tell how robust the outperformance is. This paper is aimed at cognitive scientists and computational linguists who study humor or predictive processing in language. Someone looking for new angles on multi-scale prediction in real-world settings might find it worth reading. I would send it to peer review. The data and the framing are fresh enough that referees can sort out the methods questions.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the Dual Prediction Violation (DPV) framework to model the interplay between semantic incongruity and temporal dynamics in humor appreciation. Analyzing 828 professional Chinese stand-up performances, it claims that temporal features (e.g., pre-punchline pause lengthening) substantially outweigh semantic incongruity in predicting audience appreciation, with peak semantic violations mattering more than average levels and a strategic coupling distinguishing successful from unsuccessful performances.

Significance. If the empirical results hold after addressing measurement validation, this would meaningfully advance humor research by reframing it as temporally scaffolded predictive processing, bridging classical incongruity theories with multi-scale linguistic prediction. The large-scale naturalistic dataset of 828 performances is a clear strength, as is the attempt to derive falsifiable claims about timing-semantics coordination with potential implications for computational models of humor and cognitive science.

major comments (3)

[Methods] Methods section: The central claim that temporal features substantially outweigh semantic incongruity depends on unvalidated proxies for both semantic surprise (e.g., how peak violations vs. average incongruity are computed from transcripts, likely via embeddings) and audience appreciation (e.g., laugh timing or ratings). No validation against human judgments of surprise or controls for comedian-specific delivery styles is reported, risking that the reported dominance is an artifact of the chosen metrics.
[Results] Results section: The regression-style comparisons showing temporal outperformance lack reported details on statistical controls (multiple comparisons correction, comedian fixed effects, audience variability), robustness to alternative operationalizations of DPV quantities, or effect sizes that would confirm the 'substantially outweigh' claim is not driven by proxy choice.
[Discussion] Discussion section: The interpretation of 'strategic coupling' (pauses lengthening before high-surprise punchlines) as distinguishing successful performances assumes the extracted temporal and semantic measures accurately reflect cognitive processes without confounds; this load-bearing assumption requires explicit testing or sensitivity analysis to support the reframing of humor theory.

minor comments (2)

[Abstract] The abstract and introduction could more clearly distinguish the DPV framework's novel predictions from prior work on timing in humor to strengthen the contribution statement.
[Results] Figure or table presenting the regression coefficients comparing temporal vs. semantic predictors would benefit from explicit confidence intervals and model comparison metrics (e.g., AIC or cross-validation scores) for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have prompted us to clarify and strengthen several aspects of our manuscript. Below, we provide point-by-point responses to the major comments.

read point-by-point responses

Referee: [Methods] Methods section: The central claim that temporal features substantially outweigh semantic incongruity depends on unvalidated proxies for both semantic surprise (e.g., how peak violations vs. average incongruity are computed from transcripts, likely via embeddings) and audience appreciation (e.g., laugh timing or ratings). No validation against human judgments of surprise or controls for comedian-specific delivery styles is reported, risking that the reported dominance is an artifact of the chosen metrics.

Authors: We appreciate the referee's concern regarding the validation of our proxies. In the revised manuscript, we will elaborate on the computation of semantic surprise using state-of-the-art embedding models, specifying the exact methods for identifying peak violations versus average incongruity. For audience appreciation, our laugh detection approach follows protocols established in previous computational humor studies. Although we did not conduct a new human validation experiment due to the scale of the dataset, we will add a dedicated subsection discussing the reliability of these proxies based on prior validations and include comedian fixed effects to control for delivery styles. This should address concerns about potential artifacts. revision: partial
Referee: [Results] Results section: The regression-style comparisons showing temporal outperformance lack reported details on statistical controls (multiple comparisons correction, comedian fixed effects, audience variability), robustness to alternative operationalizations of DPV quantities, or effect sizes that would confirm the 'substantially outweigh' claim is not driven by proxy choice.

Authors: We agree that more rigorous statistical reporting is essential. The revised Results section will include comprehensive details on our regression models, incorporating comedian fixed effects, corrections for multiple comparisons, and considerations for audience variability. We will report effect sizes and perform additional robustness analyses using alternative measures for semantic surprise and temporal features. These enhancements will substantiate that the observed dominance of temporal features is robust and not an artifact of specific proxy choices. revision: yes
Referee: [Discussion] Discussion section: The interpretation of 'strategic coupling' (pauses lengthening before high-surprise punchlines) as distinguishing successful performances assumes the extracted temporal and semantic measures accurately reflect cognitive processes without confounds; this load-bearing assumption requires explicit testing or sensitivity analysis to support the reframing of humor theory.

Authors: The referee raises a valid point about the assumptions underlying our interpretations. In the revised Discussion, we will explicitly address potential confounds and include sensitivity analyses by varying key parameters in our DPV measures. This will provide a more robust foundation for our claims about strategic coupling and the reframing of humor as temporally scaffolded predictive processing. We will also discuss the limitations of observational data in directly testing cognitive mechanisms. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from performance data analysis

full rationale

The paper proposes the DPV framework and reports findings from analyzing 828 Chinese stand-up performances, comparing temporal features against semantic incongruity measures to predict audience appreciation. No equations, derivations, or self-referential definitions appear in the abstract or described claims. Results are framed as data-driven observations (e.g., peak violations matter more, pauses lengthen before punchlines) rather than quantities defined in terms of fitted parameters or reduced by construction to inputs. Self-citations are not load-bearing here, and the central claim rests on external performance transcripts and timing data without reducing to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces the DPV framework as a new organizing lens; no free parameters are described in the abstract. The analysis rests on the domain assumption that audience appreciation in stand-up can be reliably quantified from observable responses and that transcript-derived semantic measures capture expectation violation.

axioms (1)

domain assumption Audience appreciation in stand-up performances can be measured reliably from observable responses and used to evaluate predictive processing models.
Invoked when treating laughter or appreciation ratings as the outcome variable for comparing temporal and semantic predictors.

invented entities (1)

Dual Prediction Violation (DPV) framework no independent evidence
purpose: To capture the interplay between semantic content and temporal structure in humor appreciation.
Newly proposed in the paper to integrate timing with classical incongruity accounts.

pith-pipeline@v0.9.0 · 5505 in / 1278 out tokens · 44277 ms · 2026-05-09T20:31:50.111955+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Annamoradnejad, I., & Zoghi, G. (2024). Colbert: Using bert sentence embedding in parallel neural networks for com- putational humor.Expert Systems with Applications,249, 123685 (cit. on p. 2)

work page 2024
[2]

(2001).Humorous texts: A semantic and pragmatic analysis(V ol

Attardo, S. (2001).Humorous texts: A semantic and pragmatic analysis(V ol. 6). Walter de Gruyter. (Cit. on p. 3)

work page 2001
[3]

Attardo, S., & Raskin, V . (1991). Script theory revis (it) ed: Joke similarity and joke representation model.Humor: In- ternational Journal of Humor,4(3), 293–347 (cit. on pp. 2, 5)

work page 1991
[4]

Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time.Cognitive Psychology,41(3), 254–311 (cit. on p. 2). Bögels, S., Schriefers, H., V onk, W., & Chwilla, D. J. (2011). Prosodic breaks in sentence processing investigated by event-related potentials.Language and Linguistics Com- pass,5(7), 424–440 (cit. on p. 2)

work page 2000
[5]

Cattle, A., & Ma, X. (2018). Recognizing humour using word associations and humour anchor extraction.International Conference on Computational Linguistics(cit. on p. 2)

work page 2018
[6]

Clark, A. (2018). A nice surprise? predictive processing and the active pursuit of novelty.Phenomenology and the Cog- nitive Sciences,17(3), 521–534 (cit. on p. 5)

work page 2018
[7]

D., Tobler, P

Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons.Science,299(5614), 1898–1902 (cit. on p. 2)

work page 2003
[8]

R., & Boltz, M

Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time.Psychological Review,96(3), 459 (cit. on p. 2)

work page 1989
[9]

T., Levy, R., & Goodman, N

Kao, J. T., Levy, R., & Goodman, N. D. (2016). A computa- tional model of linguistic humor in puns.Cognitive Science, 40(5), 1270–1285 (cit. on p. 2)

work page 2016
[10]

W., & Jones, M

Large, E. W., & Jones, M. R. (1999). The dynamics of attend- ing: How people track time-varying events.Psychological Review,106(1), 119 (cit. on pp. 1–3, 5, 6)

work page 1999
[11]

M., & Koelsch, S

Lehne, M., Engel, P., Rohrmeier, M., Menninghaus, W., Ja- cobs, A. M., & Koelsch, S. (2015). Reading a suspenseful literary text activates brain areas related to social cognition and predictive inference.PloS One,10(5), e0124550 (cit. on p. 2)

work page 2015
[12]

Lehne, M., & Koelsch, S. (2015). Toward a general psycholog- ical model of tension and suspense.Frontiers in Psychology, 6, 79 (cit. on pp. 2, 5)

work page 2015
[13]

Ma, Y ., Peng, Y ., Yang, F., Zha, S., Zhang, C., Jia, Z., Zheng, Z., & Zhu, Y . (2026). Narrativeloom: Enhancing creative storytelling through multi-persona collaborative improvi- sation.ACM Conference on Human Factors in Computing Systems (CHI)(cit. on p. 6)

work page 2026
[14]

Ma, Y ., Peng, Y ., & Zhu, Y . (2025). Word embeddings track so- cial group changes across 70 years in china.Annual Meeting of the Cognitive Science Society (CogSci)(cit. on p. 3)

work page 2025
[15]

Mihalcea, R., Strapparava, C., & Pulman, S. (2010). Com- putational models for incongruity detection in humour.In- ternational Conference on Intelligent Text Processing and Computational Linguistics(cit. on p. 2)

work page 2010
[16]

M., Wig, G

Moran, J. M., Wig, G. S., Adams Jr, R. B., Janata, P., & Kelley, W. M. (2004). Neural correlates of humor detection and appreciation.Neuroimage,21(3), 1055–1060 (cit. on pp. 1, 2)

work page 2004
[17]

Norrick, N. R. (2001). On the conversational performance of narrative jokes: Toward an account of timing.Humor: International Journal of Humor Research,14(3), 255–274 (cit. on p. 3)

work page 2001
[18]

Zhu, Y ., & Zheng, Z. (2025). Probing and inducing com- binational creativity in vision-language models.ACM Con- ference on Human Factors in Computing Systems (CHI) (cit. on p. 6)

work page 2025
[19]

Suls, J. (1972). A two-stage model for the appreciation of jokes and cartoons: An information-processing analysis. InThe psychology of humor: Theoretical perspectives and empirical issues(pp. 81–100, V ol. 1). Academic Press. (Cit. on pp. 1, 2, 5)

work page 1972
[20]

Suls, J. (1983). Cognitive processes in humor appreciation. InHandbook of humor research: Volume 1: Basic issues (pp. 39–57). Springer. (Cit. on pp. 1, 2). Van de Cruys, S., Metzinger, T. K., & Wiese, W. (2017). Af- fective value in the predictive mind. InPhilosophy and predictive processing(pp. 1–21). MIND Group; Frankfurt am Main. (Cit. on pp. 1, 2)

work page 1983
[21]

Veale, T. (2004). Incongruity in humor: Root cause or epiphe- nomenon?Humor: International Journal of Humor Re- search,17(4), 419–428 (cit. on p. 2)

work page 2004
[22]

M., & Reiss, A

Vrticka, P., Black, J. M., & Reiss, A. L. (2013). The neural basis of humour processing.Nature Reviews Neuroscience, 14(12), 860–868 (cit. on p. 2)

work page 2013

[1] [1]

Annamoradnejad, I., & Zoghi, G. (2024). Colbert: Using bert sentence embedding in parallel neural networks for com- putational humor.Expert Systems with Applications,249, 123685 (cit. on p. 2)

work page 2024

[2] [2]

(2001).Humorous texts: A semantic and pragmatic analysis(V ol

Attardo, S. (2001).Humorous texts: A semantic and pragmatic analysis(V ol. 6). Walter de Gruyter. (Cit. on p. 3)

work page 2001

[3] [3]

Attardo, S., & Raskin, V . (1991). Script theory revis (it) ed: Joke similarity and joke representation model.Humor: In- ternational Journal of Humor,4(3), 293–347 (cit. on pp. 2, 5)

work page 1991

[4] [4]

Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time.Cognitive Psychology,41(3), 254–311 (cit. on p. 2). Bögels, S., Schriefers, H., V onk, W., & Chwilla, D. J. (2011). Prosodic breaks in sentence processing investigated by event-related potentials.Language and Linguistics Com- pass,5(7), 424–440 (cit. on p. 2)

work page 2000

[5] [5]

Cattle, A., & Ma, X. (2018). Recognizing humour using word associations and humour anchor extraction.International Conference on Computational Linguistics(cit. on p. 2)

work page 2018

[6] [6]

Clark, A. (2018). A nice surprise? predictive processing and the active pursuit of novelty.Phenomenology and the Cog- nitive Sciences,17(3), 521–534 (cit. on p. 5)

work page 2018

[7] [7]

D., Tobler, P

Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons.Science,299(5614), 1898–1902 (cit. on p. 2)

work page 2003

[8] [8]

R., & Boltz, M

Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time.Psychological Review,96(3), 459 (cit. on p. 2)

work page 1989

[9] [9]

T., Levy, R., & Goodman, N

Kao, J. T., Levy, R., & Goodman, N. D. (2016). A computa- tional model of linguistic humor in puns.Cognitive Science, 40(5), 1270–1285 (cit. on p. 2)

work page 2016

[10] [10]

W., & Jones, M

Large, E. W., & Jones, M. R. (1999). The dynamics of attend- ing: How people track time-varying events.Psychological Review,106(1), 119 (cit. on pp. 1–3, 5, 6)

work page 1999

[11] [11]

M., & Koelsch, S

Lehne, M., Engel, P., Rohrmeier, M., Menninghaus, W., Ja- cobs, A. M., & Koelsch, S. (2015). Reading a suspenseful literary text activates brain areas related to social cognition and predictive inference.PloS One,10(5), e0124550 (cit. on p. 2)

work page 2015

[12] [12]

Lehne, M., & Koelsch, S. (2015). Toward a general psycholog- ical model of tension and suspense.Frontiers in Psychology, 6, 79 (cit. on pp. 2, 5)

work page 2015

[13] [13]

Ma, Y ., Peng, Y ., Yang, F., Zha, S., Zhang, C., Jia, Z., Zheng, Z., & Zhu, Y . (2026). Narrativeloom: Enhancing creative storytelling through multi-persona collaborative improvi- sation.ACM Conference on Human Factors in Computing Systems (CHI)(cit. on p. 6)

work page 2026

[14] [14]

Ma, Y ., Peng, Y ., & Zhu, Y . (2025). Word embeddings track so- cial group changes across 70 years in china.Annual Meeting of the Cognitive Science Society (CogSci)(cit. on p. 3)

work page 2025

[15] [15]

Mihalcea, R., Strapparava, C., & Pulman, S. (2010). Com- putational models for incongruity detection in humour.In- ternational Conference on Intelligent Text Processing and Computational Linguistics(cit. on p. 2)

work page 2010

[16] [16]

M., Wig, G

Moran, J. M., Wig, G. S., Adams Jr, R. B., Janata, P., & Kelley, W. M. (2004). Neural correlates of humor detection and appreciation.Neuroimage,21(3), 1055–1060 (cit. on pp. 1, 2)

work page 2004

[17] [17]

Norrick, N. R. (2001). On the conversational performance of narrative jokes: Toward an account of timing.Humor: International Journal of Humor Research,14(3), 255–274 (cit. on p. 3)

work page 2001

[18] [18]

Zhu, Y ., & Zheng, Z. (2025). Probing and inducing com- binational creativity in vision-language models.ACM Con- ference on Human Factors in Computing Systems (CHI) (cit. on p. 6)

work page 2025

[19] [19]

Suls, J. (1972). A two-stage model for the appreciation of jokes and cartoons: An information-processing analysis. InThe psychology of humor: Theoretical perspectives and empirical issues(pp. 81–100, V ol. 1). Academic Press. (Cit. on pp. 1, 2, 5)

work page 1972

[20] [20]

Suls, J. (1983). Cognitive processes in humor appreciation. InHandbook of humor research: Volume 1: Basic issues (pp. 39–57). Springer. (Cit. on pp. 1, 2). Van de Cruys, S., Metzinger, T. K., & Wiese, W. (2017). Af- fective value in the predictive mind. InPhilosophy and predictive processing(pp. 1–21). MIND Group; Frankfurt am Main. (Cit. on pp. 1, 2)

work page 1983

[21] [21]

Veale, T. (2004). Incongruity in humor: Root cause or epiphe- nomenon?Humor: International Journal of Humor Re- search,17(4), 419–428 (cit. on p. 2)

work page 2004

[22] [22]

M., & Reiss, A

Vrticka, P., Black, J. M., & Reiss, A. L. (2013). The neural basis of humour processing.Nature Reviews Neuroscience, 14(12), 860–868 (cit. on p. 2)

work page 2013