Computational Analysis of Speech Clarity Predicts Audience Engagement in TED Talks

(2) Department of Communication; Bar Ilan University; East Lansing; Israel; Matan Lary (1); MI; Michigan State University; Ralf Schmaelzle (2); Ramat Gan; Roni Segal (1)

arxiv: 2604.04583 · v1 · submitted 2026-04-06 · 💻 cs.HC

Computational Analysis of Speech Clarity Predicts Audience Engagement in TED Talks

Roni Segal (1) , Matan Lary (1) , Ralf Schmaelzle (2) , Yossi Ben-Zion (1) ((1) Department of Physics , Bar Ilan University , Ramat Gan , Israel , (2) Department of Communication

show 4 more authors

Michigan State University East Lansing MI USA)

This is my paper

Pith reviewed 2026-05-10 19:48 UTC · model grok-4.3

classification 💻 cs.HC

keywords TED Talksspeech clarityaudience engagementprocessing fluencylarge language modelsYouTube metricspublic speakingcomputational analysis

0 comments

The pith

Clarity of explanation in TED Talks emerges as the strongest predictor of YouTube likes and views.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors set out to test whether linguistic clarity drives audience engagement with public talks. They start from the idea that accessible presentation of complex ideas reduces cognitive effort and therefore elicits more positive responses from listeners. To examine this, they scored 1,239 TED Talk transcripts with large language models on clarity and organization, then related those scores to actual YouTube likes and views while controlling for talk length, topic, and scientific content. Clarity proved the best single predictor and added meaningful explanatory power beyond the controls. The same pattern held across different content domains and strengthened over the years studied.

Core claim

Computational scoring of 1,239 TED Talk transcripts across fifty independent large-language-model runs shows that clarity of explanation is the strongest predictor of audience engagement, yielding standardized coefficients of .339 for likes and .314 for views and contributing an incremental R-squared of approximately .095 beyond duration, topic, and scientific status; the full model accounts for 29 percent of variance in likes and 22.5 percent in views, with the clarity effect remaining stable across content categories and outperforming conventional readability formulas.

What carries the argument

LLM-derived clarity scores obtained from repeated independent evaluations of each transcript on explanation accessibility and structural organization, entered as predictors in regression models of engagement metrics.

If this is right

Clearer transcripts should produce measurably higher engagement across both scientific and non-scientific topics.
Transcript-based clarity assessment offers a scalable method for evaluating and improving public communication.
The observed rise in average clarity across years indicates a standardization effect within the TED format.
Clarity-focused training can be expected to increase reach more effectively than emphasis on topic novelty alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the clarity-engagement link is causal, targeted editing of existing talks for clearer phrasing could raise their reach without altering core content.
The same computational approach could be tested on other lecture or podcast archives to determine whether the pattern generalizes beyond TED.
Real-time AI tools that flag low-clarity sections during speech drafting might improve outcomes more than generic delivery coaching.

Load-bearing premise

The large-language-model clarity scores validly reflect the processing ease that actual human audiences experience when hearing the talks.

What would settle it

A controlled study in which the same speaker delivers two otherwise identical talks that differ only in measured clarity, with direct measurement of subsequent audience likes or retention showing no difference.

Figures

Figures reproduced from arXiv: 2604.04583 by (2) Department of Communication, Bar Ilan University, East Lansing, Israel, Matan Lary (1), MI, Michigan State University, Ralf Schmaelzle (2), Ramat Gan, Roni Segal (1), USA), Yossi Ben-Zion (1) ((1) Department of Physics.

**Figure 2.** Figure 2: Histograms of the log-transformed TED Talk engagement metrics. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of AI-derived clarity scores across all TED Talks in the dataset. (N = 1,280). [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Ridgeline density plots of clarity scores by year. Each curve represents the distribution of [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

read the original abstract

What makes a public talk resonate with large audiences? While prior research has emphasized speaker delivery or topic novelty, we reasoned that a core driver of engagement is linguistic clarity. This aligns with theories of processing fluency and cognitive load, which posit that audiences reward speakers who present complex ideas accessibly. We leveraged artificial intelligence to analyze 1,239 TED Talk transcripts (2006--2013), supplemented by a later-phase longitudinal sample. Each transcript was evaluated across 50 independent large language model runs on two dimensions, clarity of explanation and structural organization, and linked to YouTube engagement metrics (likes and views).Clarity emerged as the strongest predictor of audience responses ($\beta = .339$ for likes; $\beta = .314$ for views), contributing substantial incremental variance ($\Delta R^{2} \approx .095$) beyond duration, topic, and scientific status. The full model explained 29\% of variance in likes and 22.5\% in views. This effect was domain-general, remaining invariant across content categories and between scientific and non-scientific talks. Notably, clarity outperformed traditional readability metrics, indicating that discourse coherence predicts engagement more powerfully than surface-level linguistic simplicity. Longitudinal analyses further revealed standardization within TED, characterized by increasing clarity and reduced variability over time. Theoretically, these results support processing fluency accounts: clearer communication reduces cognitive friction and elicits more positive evaluative responses. Practically, transcript-based clarity represents a scalable and trainable strategy for improving public discourse. By demonstrating that language models can reliably capture latent communicative qualities, this study paves the way for feedback systems in education, science communication, and public speaking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM clarity scores add about 9% incremental variance to TED engagement predictions over readability metrics, but without human validation the scores may just track overall talk quality.

read the letter

The paper runs LLM prompts on 1239 TED transcripts to score clarity and organization, then regresses those against YouTube likes and views while controlling for duration, topic, and scientific content. Clarity comes out strongest with betas near 0.33 and 0.31, adding roughly 9.5% variance and beating traditional readability measures. The effect looks stable across talk categories and they also report a longitudinal trend of increasing clarity and less variability in later TED talks. That is the concrete empirical piece worth noting: a large public corpus linked to real behavioral metrics rather than self-report or lab data. The scale and the incremental R-squared are straightforward to evaluate and give the work some grounding. The longitudinal standardization angle is a small but useful side observation for anyone tracking how public science communication has evolved. The main limitation is the absence of any human-rated validation for the LLM clarity scores. Without that check, it is hard to rule out that the scores are simply capturing broader talk quality, speaker appeal, or content interest that also drives views and likes. The controls are limited, so unmeasured factors like visuals or delivery style remain possible confounds. Prompt details and aggregation across the 50 runs are not spelled out in the abstract, which makes replication or robustness checks harder. This is the sort of paper that would interest researchers in science communication, HCI, or education technology who want scalable ways to score talks. A reader building automated feedback tools or studying processing fluency could pull the regression results and the comparison to readability metrics. It has enough data and a clear question to deserve referee time, even if the validation gap would need addressing in revision. I would send it out for review rather than desk reject.

Referee Report

3 major / 2 minor

Summary. The paper analyzes 1,239 TED Talk transcripts (2006-2013) by scoring each for clarity of explanation and structural organization via 50 independent LLM runs, then regresses these scores against YouTube likes and views. It reports that clarity is the strongest predictor (β = .339 for likes; β = .314 for views), adding ΔR² ≈ .095 beyond controls for duration, topic, and scientific status, with full models explaining 29% and 22.5% of variance respectively. The effect is domain-general, outperforms traditional readability metrics, and longitudinal data show increasing standardization in TED clarity over time. The work claims support for processing-fluency accounts of audience engagement.

Significance. If the LLM-derived clarity scores are shown to validly index human processing fluency and the regressions adequately address confounds, the study would offer a scalable, transcript-based demonstration that linguistic clarity drives engagement in public talks, extending processing-fluency theory with a large, real-world corpus. The domain-general pattern and outperformance of surface readability metrics would be notable strengths; the longitudinal standardization finding could inform institutional practices in science communication.

major comments (3)

[Methods] Methods section: No human validation, inter-rater reliability, or correlation with established cognitive-load measures is reported for the LLM clarity and organization scores. Because these scores are the central predictor whose theoretical interpretation rests on capturing processing fluency, the absence of such validation undermines the claim that the observed β and ΔR² reflect reduced cognitive friction rather than overall talk quality.
[Results] Results section (regression tables): Exact LLM prompts, aggregation rule across the 50 runs, variance-inflation factors for multicollinearity between clarity/organization and controls, and any correction for multiple tests or post-hoc model comparisons are not described. These omissions directly affect the reliability of the reported 'strongest predictor' status and the incremental R² values.
[Methods] Methods/Results: The control set (duration, topic category, scientific status) leaves unmeasured variables such as speaker charisma, prosody, visual aids, and title appeal that plausibly correlate with both LLM clarity scores and YouTube engagement. Without additional robustness checks (e.g., speaker fixed effects or external quality ratings), the attribution of ΔR² ≈ .095 to clarity remains vulnerable to omitted-variable bias.

minor comments (2)

[Abstract] Abstract: The 'later-phase longitudinal sample' is referenced but its size, time window, and how it was merged with the main 1,239-talk corpus are not specified; this detail should be added for clarity.
[Results] Results: Standard errors, exact p-values, and confidence intervals should accompany all β coefficients and ΔR² values to allow readers to assess precision.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate. Our responses focus on strengthening the methodological transparency and acknowledging limitations without overstating the current evidence.

read point-by-point responses

Referee: [Methods] Methods section: No human validation, inter-rater reliability, or correlation with established cognitive-load measures is reported for the LLM clarity and organization scores. Because these scores are the central predictor whose theoretical interpretation rests on capturing processing fluency, the absence of such validation undermines the claim that the observed β and ΔR² reflect reduced cognitive friction rather than overall talk quality.

Authors: We agree that the lack of direct human validation is a limitation for interpreting the scores as measures of processing fluency. The 50 independent LLM runs were designed to provide internal consistency, and we can report inter-run agreement statistics (e.g., intraclass correlation) in a revision. However, we did not collect human ratings or cognitive-load data in this study, as the focus was on scalable computational analysis of existing transcripts. In the revised manuscript, we will add a limitations section explicitly discussing this gap, describe the prompt design rationale in greater detail to support the targeted assessment of clarity and organization, and suggest human validation as an important direction for future work. This will allow readers to better evaluate the theoretical claims. revision: partial
Referee: [Results] Results section (regression tables): Exact LLM prompts, aggregation rule across the 50 runs, variance-inflation factors for multicollinearity between clarity/organization and controls, and any correction for multiple tests or post-hoc model comparisons are not described. These omissions directly affect the reliability of the reported 'strongest predictor' status and the incremental R² values.

Authors: We appreciate this feedback on reproducibility. The exact prompts and aggregation procedure (mean score across runs) were omitted to keep the main text concise but will be fully documented in supplementary materials in the revision. We will also compute and report variance inflation factors for all predictors to confirm low multicollinearity. The primary regression models were pre-specified based on theoretical considerations, with no post-hoc comparisons; we will clarify this in the text and note that no multiple-testing correction was applied to the main analyses. These changes will be incorporated into the Methods and Results sections. revision: yes
Referee: [Methods] Methods/Results: The control set (duration, topic category, scientific status) leaves unmeasured variables such as speaker charisma, prosody, visual aids, and title appeal that plausibly correlate with both LLM clarity scores and YouTube engagement. Without additional robustness checks (e.g., speaker fixed effects or external quality ratings), the attribution of ΔR² ≈ .095 to clarity remains vulnerable to omitted-variable bias.

Authors: We acknowledge the potential for omitted-variable bias from unmeasured factors such as charisma, prosody, and visual elements, which transcripts alone cannot capture. Our controls for duration, topic category, and scientific status address some confounds, and the domain-general pattern (consistent effects across content categories) provides partial protection against topic-specific biases. In the revision, we will expand the limitations section to discuss these issues explicitly and note that speaker fixed effects are not feasible here due to the low number of repeated speakers in the 2006–2013 sample. We will also emphasize that the outperformance of clarity over traditional readability metrics helps differentiate it from general talk quality. These additions will be made without altering the core findings. revision: partial

standing simulated objections not resolved

We cannot add new human validation ratings or cognitive-load measures, as these data were not collected in the original study.

Circularity Check

0 steps flagged

No circularity: independent LLM scoring regressed on external engagement metrics

full rationale

The paper's central derivation consists of generating clarity and organization scores via 50 independent LLM runs on TED talk transcripts, then fitting a linear regression of those scores (plus controls for duration, topic, and scientific status) against YouTube likes and views. The reported β values (.339/.314) and ΔR² (.095) are statistical outputs of this fit; they are not inputs that define the scores by construction, nor do they reduce to any self-citation, ansatz, or renaming of a known result. No load-bearing self-citations, uniqueness theorems, or fitted-input-as-prediction patterns appear in the derivation chain. The analysis remains a standard, externally benchmarked regression whose results are falsifiable against the held-out engagement data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that LLM clarity scores are valid proxies for human processing fluency and that the chosen control variables suffice to isolate the clarity effect.

axioms (2)

domain assumption LLM evaluations across 50 independent runs reliably and validly measure latent clarity and structural organization in spoken discourse.
Invoked when treating the aggregated LLM scores as the key independent variable linked to engagement via processing-fluency theory.
ad hoc to paper The regression controls (duration, topic, scientific status) capture all major confounds, so that incremental R² can be attributed to clarity.
Required for the claim that clarity contributes substantial unique variance.

pith-pipeline@v0.9.0 · 5646 in / 1310 out tokens · 151413 ms · 2026-05-10T19:48:40.291850+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

and Colón Amill, Daniel and Shulman, Hillary C

Anderson, C. (2016).TED Talks: The official TED guide to public speaking: Tips and tricks for giving unforgettable speeches and presentations. Hachette UK. Aristotle (1991).On Rhetoric: A Theory of Civic Discourse. Oxford: Oxford University Press. Aristotle (2013).Poetics. Oxford: Oxford University Press. Berger, J., & Milkman, K. L. (2012). What makes on...

work page doi:10.1177/0963662519865687 2016
[2]

Grice, H. P. (1975). Logic and conversation. In P. Cole, & J. L. Morgan (Eds.),Syntax and Semantics: Volume 3: Speech Acts. New York: Academic Press. Kintsch, W. (1998).Comprehension: A Paradigm for Cognition. Cambridge: Cambridge University Press. Kroczek, L. O. H., & Mühlberger, A. (2023). Public speaking training in front of a supportive audience in vi...

work page doi:10.1038/s41598-023-41196-5 1975
[3]

curse of knowledge,

McNamara, D. S. (2013).Reading comprehension strategies: Theories, interventions, and technologies. Psychology Press. McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always better? interactions at the effects of text coherence, background knowledge, and levels of understanding in learning from text.Cognition and Instructi...

work page doi:10.1002/acp.1178 2013

[1] [1]

and Colón Amill, Daniel and Shulman, Hillary C

Anderson, C. (2016).TED Talks: The official TED guide to public speaking: Tips and tricks for giving unforgettable speeches and presentations. Hachette UK. Aristotle (1991).On Rhetoric: A Theory of Civic Discourse. Oxford: Oxford University Press. Aristotle (2013).Poetics. Oxford: Oxford University Press. Berger, J., & Milkman, K. L. (2012). What makes on...

work page doi:10.1177/0963662519865687 2016

[2] [2]

Grice, H. P. (1975). Logic and conversation. In P. Cole, & J. L. Morgan (Eds.),Syntax and Semantics: Volume 3: Speech Acts. New York: Academic Press. Kintsch, W. (1998).Comprehension: A Paradigm for Cognition. Cambridge: Cambridge University Press. Kroczek, L. O. H., & Mühlberger, A. (2023). Public speaking training in front of a supportive audience in vi...

work page doi:10.1038/s41598-023-41196-5 1975

[3] [3]

curse of knowledge,

McNamara, D. S. (2013).Reading comprehension strategies: Theories, interventions, and technologies. Psychology Press. McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always better? interactions at the effects of text coherence, background knowledge, and levels of understanding in learning from text.Cognition and Instructi...

work page doi:10.1002/acp.1178 2013