Teachers' Vocal Expressions and Student Engagement in Asynchronous Video Learning

Hung-Yue Suen; Yu-Sheng Su

arxiv: 2605.17463 · v1 · pith:GS5KFGUWnew · submitted 2026-05-17 · 💻 cs.HC · cs.CY

Teachers' Vocal Expressions and Student Engagement in Asynchronous Video Learning

Hung-Yue Suen , Yu-Sheng Su This is my paper

Pith reviewed 2026-05-19 22:40 UTC · model grok-4.3

classification 💻 cs.HC cs.CY

keywords vocal expressionsstudent engagementasynchronous video learningMOOCsvalence and arousalnonverbal communicationaffective engagementonline education

0 comments

The pith

Nonverbal vocal expressions with positive valence and high arousal enhance student engagement in asynchronous video learning, while verbal emotive expressions do not.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how teachers' vocal expressions shape students' self-reported affective engagement during asynchronous video courses such as MOOCs. It separates verbal content from nonverbal tone using computational acoustic and sentiment analysis on 210 lectures and feedback from 738 students. The results show that positive high-arousal vocal emotions like happiness and surprise increase engagement, negative high-arousal ones like anger decrease it, and the actual words spoken have little effect. A reader would care because this identifies a concrete, modifiable feature of instructional videos that could make online learning feel more involving without changing the subject matter.

Core claim

Using computational acoustic and sentiment analysis on 210 video lectures from four MOOC platforms, the study extracted valence and arousal scores from teachers' verbal vocal expressions and classified nonverbal vocal emotions into anger, fear, happiness, neutral, sadness, and surprise; analysis of post-class feedback from 738 students showed that nonverbal expressions with positive valence and high arousal such as happiness and surprise enhanced affective engagement, negative high-arousal expressions such as anger reduced it, and verbal emotive expressions produced no significant change.

What carries the argument

Classification of vocal expressions by valence-arousal dimensions and discrete emotion categories (anger, fear, happiness, neutral, sadness, surprise) extracted via acoustic and sentiment tools, applied to isolate nonverbal tone effects from verbal content on self-reported affective engagement.

If this is right

Teachers preparing asynchronous videos should deliver content using happy or surprised vocal tones to raise engagement.
Avoiding angry vocal tones can prevent drops in student involvement.
Instructional video creators can improve outcomes by focusing on vocal delivery over scripting emotional language.
The pattern holds across multiple MOOC platforms based on the collected data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Voice-focused training for educators could become a practical addition to online teaching preparation.
Automated video analysis tools might one day flag or suggest vocal styles to improve engagement.
The verbal-nonverbal split could be tested in live online sessions or other digital formats like podcasts.

Load-bearing premise

The computational tools correctly identify the vocal emotions teachers intended to convey and student self-reports accurately capture engagement caused by those expressions rather than other features of the videos.

What would settle it

A controlled replication that holds lecture content fixed, varies only the vocal expressions, and measures objective outcomes such as video completion rates or quiz scores instead of self-reports.

Figures

Figures reproduced from arXiv: 2605.17463 by Hung-Yue Suen, Yu-Sheng Su.

read the original abstract

Asynchronous video learning, including massive open online courses (MOOCs), offers flexibility but often lacks students' affective engagement. This study examines how teachers' verbal and nonverbal vocal emotive expressions influence students' self-reported affective engagement. Using computational acoustic and sentiment analysis, valence and arousal scores were extracted from teachers' verbal vocal expressions, and nonverbal vocal emotions were classified into six categories: anger, fear, happiness, neutral, sadness, and surprise. Data from 210 video lectures across four MOOC platforms and feedback from 738 students collected after class were analyzed. Results revealed that teachers' verbal emotive expressions, even with positive valence and high arousal, did not significantly impact engagement. Conversely, vocal expressions with positive valence and high arousal, such as happiness and surprise, enhanced engagement, while negative high-arousal emotions, such as anger, reduced it. These findings offer practical insights for instructional video creators, teachers, and influencers to foster emotional engagement in asynchronous video learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Nonverbal vocal emotions like happiness and surprise boost self-reported engagement in these MOOC videos while anger hurts it and verbal sentiment shows no effect, but the observational data leaves the causal link open to other factors.

read the letter

The core finding is that in 210 videos from four MOOC platforms, teachers' nonverbal vocal expressions with positive valence and high arousal improved students' post-class affective engagement reports, anger reduced it, and the verbal content itself added nothing detectable. They pulled valence-arousal scores and six emotion categories from acoustic and sentiment tools, then matched them to feedback from 738 students. That gives a concrete, if limited, data point on vocal delivery in async settings that some course designers might actually use.

Referee Report

2 major / 1 minor

Summary. The manuscript examines how teachers' verbal and nonverbal vocal emotive expressions influence students' self-reported affective engagement in asynchronous video learning. It analyzes 210 video lectures from four MOOC platforms using computational acoustic and sentiment analysis to extract valence/arousal scores and classify nonverbal emotions into six categories, drawing on feedback from 738 students. The central finding is that nonverbal expressions with positive valence and high arousal (e.g., happiness, surprise) enhance engagement while negative high-arousal emotions (e.g., anger) reduce it, whereas verbal emotive expressions show no significant impact.

Significance. If the associations hold after addressing confounds and providing statistical transparency, the results could supply practical guidance for instructional video design in MOOCs and similar platforms, highlighting the role of specific nonverbal vocal cues over verbal sentiment.

major comments (2)

[Results] Results section: The abstract and reported findings state directional effects of specific vocal emotions on engagement but supply no statistical details (p-values, effect sizes, confidence intervals, error bars), sample breakdowns by video or student, or validation metrics for the emotion classifiers. This prevents confirmation that the claims are supported by the data.
[Methods] Methods: The observational design collects data across 210 videos without described controls or matching for confounding variables such as lecture topic, video duration, visual elements, instructor identity, or student characteristics. The central claim attributing engagement differences to extracted vocal emotions therefore remains vulnerable to alternative explanations from unmeasured correlated factors.

minor comments (1)

[Abstract] Abstract: The description of the computational tools could specify the exact acoustic and sentiment analysis packages or models employed to allow reproducibility assessment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate planned revisions to improve statistical transparency and discussion of limitations.

read point-by-point responses

Referee: [Results] Results section: The abstract and reported findings state directional effects of specific vocal emotions on engagement but supply no statistical details (p-values, effect sizes, confidence intervals, error bars), sample breakdowns by video or student, or validation metrics for the emotion classifiers. This prevents confirmation that the claims are supported by the data.

Authors: We agree that the current manuscript lacks sufficient statistical details to fully support the reported directional effects. In the revised version, we will expand the Results section to report p-values, effect sizes, confidence intervals, and error bars for the key associations between vocal emotions and engagement. We will also include sample breakdowns by video and student, as well as validation metrics for the emotion classifiers such as accuracy, precision, recall, and F1 scores. revision: yes
Referee: [Methods] Methods: The observational design collects data across 210 videos without described controls or matching for confounding variables such as lecture topic, video duration, visual elements, instructor identity, or student characteristics. The central claim attributing engagement differences to extracted vocal emotions therefore remains vulnerable to alternative explanations from unmeasured correlated factors.

Authors: We acknowledge the limitations inherent in the observational design and the potential for unmeasured confounds. In the revision, we will add an expanded limitations subsection that explicitly discusses these factors, including lecture topic, video duration, visual elements, instructor identity, and student characteristics. We will also clarify any steps already taken, such as sampling across four MOOC platforms, and explore adding available covariates (e.g., video length) to the models. A fully matched or experimental design is beyond the scope of the current study. revision: partial

Circularity Check

0 steps flagged

No circularity: purely observational empirical analysis with external tools

full rationale

The paper reports an observational study that applies off-the-shelf computational acoustic and sentiment analysis tools to extract valence/arousal scores and classify nonverbal emotions from 210 existing MOOC videos, then correlates those features with post-class self-reported engagement from 738 students. No equations, fitted parameters, self-citations, or ansatzes are invoked to derive the central claims; the reported associations are direct statistical outputs from the collected data rather than quantities defined in terms of themselves. The derivation chain is therefore self-contained and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of automated emotion detection from audio and the assumption that post-class self-reports isolate the causal effect of vocal expressions.

axioms (2)

domain assumption Self-reported affective engagement after viewing accurately reflects the influence of specific vocal emotional expressions rather than other aspects of the video content or student background.
The study uses post-class student feedback as the primary outcome measure without describing controls for alternative explanations.
domain assumption The computational tools for valence arousal extraction and six-category nonverbal emotion classification produce reliable labels for teachers' vocal expressions.
Results are reported directly from these extracted scores and classifications.

pith-pipeline@v0.9.0 · 5694 in / 1451 out tokens · 38651 ms · 2026-05-19T22:40:23.311371+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using computational acoustic and sentiment analysis, valence and arousal scores were extracted... nonverbal vocal emotions were classified into six categories... hierarchical linear regression
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Results revealed that teachers' verbal emotive expressions... did not significantly impact engagement. Conversely, vocal expressions with positive valence and high arousal... enhanced engagement

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

https://doi.org/10.1007/s42438-022-00288-2 Chiu, T. K. F. (2022). Applying the self -determination theory (SDT) to explain student engagement in online learning during the COVID -19 pandemic. Journal of Research on Technology in Education, 54, 14–30. https://doi.org/10.1080/15391523.2021.1891998 Daher, W., Sabbah, K., & Abuzant, M. (2021). Affective engag...

work page doi:10.1007/s42438-022-00288-2 2022
[2]

Weiss, H

https://doi.org/10.3389/fpsyg.2022.810451. Weiss, H. M., & Cropanzano, R. (1996). Affective events theory: A theoretical discussion of the structure, causes and consequences of affective experiences at work. In B. M. Staw & L. L. Cummings (Eds.), Research in organizational behavior: An annual series of analytical essays and critical reviews, Vol. 18 (pp. ...

work page doi:10.3389/fpsyg.2022.810451 2022

[1] [1]

https://doi.org/10.1007/s42438-022-00288-2 Chiu, T. K. F. (2022). Applying the self -determination theory (SDT) to explain student engagement in online learning during the COVID -19 pandemic. Journal of Research on Technology in Education, 54, 14–30. https://doi.org/10.1080/15391523.2021.1891998 Daher, W., Sabbah, K., & Abuzant, M. (2021). Affective engag...

work page doi:10.1007/s42438-022-00288-2 2022

[2] [2]

Weiss, H

https://doi.org/10.3389/fpsyg.2022.810451. Weiss, H. M., & Cropanzano, R. (1996). Affective events theory: A theoretical discussion of the structure, causes and consequences of affective experiences at work. In B. M. Staw & L. L. Cummings (Eds.), Research in organizational behavior: An annual series of analytical essays and critical reviews, Vol. 18 (pp. ...

work page doi:10.3389/fpsyg.2022.810451 2022