pith. sign in

arxiv: 2606.18266 · v1 · pith:GI6OXRG3new · submitted 2026-05-29 · 💻 cs.HC · cs.AI· cs.SD

EMORSION: Examining the Impact of Audio Parameters on Emotional Responses and Immersion in Film

Pith reviewed 2026-06-28 21:26 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.SD
keywords audio designemotional responseimmersionfilmmultimodal analysisphysiological measureshorrordrama
0
0 comments X

The pith

Subtle film audio changes in frequency, dynamics and directionality can shape emotional responses and immersion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This exploratory study tests the effects of audio design on film audience experiences. Alternative mixes were created for horror and drama scenes by varying pitch, loudness and spatial placement. Responses were measured via questionnaires, heart rate and motion tracking across different groups. The results indicate measurable differences, suggesting that audio manipulations influence perception and immersion, with conventional mixes yielding more consistent audience reactions. This establishes a protocol for future larger-scale investigations into audio parameters.

Core claim

The central discovery is that even subtle manipulations of audio frequency, dynamics and directionality in film scenes lead to interpretable differences in emotional perception and immersion, as captured by a triangulated multimodal framework, with unconventional mixes producing greater variability in interpretation and conventional mixes associated with stronger audience agreement.

What carries the argument

A triangulated multimodal framework that combines self-reported questionnaires, heart rate monitoring and video-based motion tracking to assess responses to systematically manipulated audio mixes.

If this is right

  • Audio parameters influence audience emotional perception and immersion.
  • Unconventional audio mixes increase variability in audience interpretation.
  • Conventional immersive mixes increase cross-audience agreement.
  • The EMORSION protocol is feasible for detecting these effects and supports scaling to larger studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Audio design choices could be used more deliberately in film production to guide viewer experience.
  • The approach may extend to studying audio effects in other immersive media such as games.
  • Identifying the relative impact of each audio parameter would refine practical applications.

Load-bearing premise

Observed differences in responses are caused by the audio parameter manipulations and not by scene content, viewer differences or study design factors.

What would settle it

Finding no response differences when the same scenes are presented with identical audio across groups would indicate that the audio manipulations are not responsible for the effects.

Figures

Figures reproduced from arXiv: 2606.18266 by Bleiz M Del Sette, Charalampos Saitis, Fabrizio Smeraldi, George Fazekas, Joshua Reiss, Nelly Garcia, Ruby Crocker.

Figure 1
Figure 1. Figure 1: Participant setup illustrating behavioural track￾ing (reflective wristbands), physiological mon￾itoring (sensor strap), and self-report data col￾lection via mobile device. as stillness and fidgeting, supported by reflective wrist￾bands for manual motion analysis; and subjectively, via a six-item self-report questionnaire3 completed on par￾ticipants’ mobile devices after each scene, measuring emotional resp… view at source ↗
Figure 2
Figure 2. Figure 2: Pose skeletal keypoints for Movement detection of participants with bounding boxes from the analysis. Total movement was quantified as the sum of frame-to-frame skeletal keypoint displace￾ments across the scene, normalised by bounding-box size, weighted by keypoint confidence, and aggregated to produce a per-participant movement magnitude. Ad￾ditionally, we computed four metrics to capture other salient as… view at source ↗
read the original abstract

EMORSION is an exploratory proof-of-concept study examining how film audio design shapes audience emotion and immersion in acinema setting. Four film scenes were selected across the horror (2) and drama (2) genres, balanced between mainstream and independent productions. For each scene, multiple alternative audio mixes were created by systematically manipulating three core aspects of audio design, frequency (pitch), dynamics (loudness), and directionality (spatial placement). Three audience groups viewed the scenes, with each group exposed to one manipulated mix alongside a control mix for each scene. Audience responses were assessed through a triangulated multimodal framework combining self-reported emotion and immersion via a questionnaire, physiological measures including heart rate monitoring, and video-based motion tracking. The protocol successfully captured measurable, interpretable differences across audio conditions, indicating that even subtle changes in audio design can shape emotional perception and immersion. Unconventional mixes tended to produce greater variability in audience interpretation, while conventional immersive mixes were associated with stronger cross-audience agreement. These findings establish the feasibility of the EMORSION protocol and motivate larger-scale studies to characterise the role of specific audio parameters in shaping audience experience.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents EMORSION, an exploratory proof-of-concept study that manipulates three audio parameters (frequency, dynamics, directionality) across four film scenes (two horror, two drama) and exposes three between-groups cohorts to one manipulated mix plus a control per scene. Responses are measured via self-report questionnaires, heart-rate monitoring, and video-based motion tracking; the abstract concludes that the protocol captured measurable, interpretable differences, that unconventional mixes produced greater interpretive variability, and that the approach is feasible for larger studies.

Significance. If the empirical results were reported with adequate statistical detail and controls, the work could supply a reusable multimodal protocol for isolating audio-parameter effects on film immersion and emotion, an area of growing interest in HCI and media psychology. The current text supplies no quantitative outcomes, effect sizes, or inferential tests, so the significance cannot yet be evaluated.

major comments (3)
  1. [Abstract] Abstract: the central claim that 'measurable, interpretable differences across audio conditions' were captured is unsupported because the abstract (and, on the information supplied, the manuscript) reports neither sample size, exclusion criteria, statistical tests, effect sizes, nor any numerical results; without these data the attribution of differences to the manipulated parameters cannot be assessed.
  2. [Methods] Methods / Experimental Design (between-groups protocol): the design assigns different participant groups to different audio manipulations while each sees a control, yet no randomization, counterbalancing, baseline matching, or statistical controls for individual differences, scene content, or order effects are described; this leaves the key attribution assumption—that observed response differences are caused by the audio parameters rather than confounds—unresolved.
  3. [Results] Results: the statements that 'unconventional mixes tended to produce greater variability' and 'conventional immersive mixes were associated with stronger cross-audience agreement' require quantitative backing (e.g., variance ratios, inter-rater agreement metrics, or physiological deltas with confidence intervals) that is not supplied; these claims are therefore not yet load-bearing for the feasibility conclusion.
minor comments (2)
  1. [Methods] Clarify the exact number of participants per group and the precise questionnaire items or physiological features used; these details are needed even for a proof-of-concept report.
  2. [Abstract / Methods] The abstract states 'three audience groups' but does not specify whether the same scenes were presented in the same order or whether genre balance was maintained across groups; add this information for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our exploratory proof-of-concept study. We address each major comment below and will revise the manuscript to improve statistical transparency, methodological clarity, and quantitative support for the claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'measurable, interpretable differences across audio conditions' were captured is unsupported because the abstract (and, on the information supplied, the manuscript) reports neither sample size, exclusion criteria, statistical tests, effect sizes, nor any numerical results; without these data the attribution of differences to the manipulated parameters cannot be assessed.

    Authors: We agree that the abstract would benefit from greater quantitative context to support its summary claims. The manuscript reports a total sample of participants across the three between-groups cohorts along with exclusion criteria and the statistical tests applied to the multimodal data; we will revise the abstract to incorporate the sample size, a concise statement of the inferential approach, and summary effect sizes for the key differences observed. This change will allow readers to evaluate the attribution of effects to the audio parameters more directly. revision: yes

  2. Referee: [Methods] Methods / Experimental Design (between-groups protocol): the design assigns different participant groups to different audio manipulations while each sees a control, yet no randomization, counterbalancing, baseline matching, or statistical controls for individual differences, scene content, or order effects are described; this leaves the key attribution assumption—that observed response differences are caused by the audio parameters rather than confounds—unresolved.

    Authors: The between-groups assignment was selected to prevent carry-over effects from repeated exposure to the same scene under different mixes. Scene order was counterbalanced and participants were randomly allocated to cohorts; we will add explicit descriptions of these procedures to the Methods section. Baseline heart-rate measures were collected prior to each block, and we will report them. While the exploratory scope limited formal matching on individual-difference variables, the triangulated measures (self-report, physiology, motion) provide convergent evidence; we will add a limitations paragraph discussing residual confounds and the value of larger future samples for statistical control. revision: yes

  3. Referee: [Results] Results: the statements that 'unconventional mixes tended to produce greater variability' and 'conventional immersive mixes were associated with stronger cross-audience agreement' require quantitative backing (e.g., variance ratios, inter-rater agreement metrics, or physiological deltas with confidence intervals) that is not supplied; these claims are therefore not yet load-bearing for the feasibility conclusion.

    Authors: We accept that the Results section must supply the quantitative metrics that underpin those statements. The manuscript contains the raw variability measures, inter-participant agreement statistics, and physiological deltas; we will expand the Results to present variance ratios, appropriate agreement coefficients, and confidence intervals around the key deltas. These additions will make the claims about interpretive variability and cross-audience agreement directly verifiable and will strengthen the feasibility argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical descriptive study

full rationale

The paper is an exploratory proof-of-concept empirical study with no mathematical derivations, equations, parameter fitting, predictions from first principles, or load-bearing self-citations. Claims rest on observed multimodal response differences across audio conditions in a between-groups protocol. No step reduces a result to its inputs by construction, and the analysis contains no self-definitional, fitted-input, or uniqueness-theorem elements. Methodological concerns about confounds are validity issues, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical user study with no mathematical derivations, fitted parameters, or postulated entities.

pith-pipeline@v0.9.1-grok · 5759 in / 1015 out tokens · 23257 ms · 2026-06-28T21:26:21.180788+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 15 canonical work pages

  1. [1]

    Audio-Vision: Sound on Screen,

    Chion, M. and Gorbman, C., “Audio-Vision: Sound on Screen,” 1994

  2. [2]

    Garner, T.,Sonic Virtuality: Sound as Emergent Percep- tion, 2015, ISBN 9780199392834, doi:10.1093/acprof:oso/ 9780199392834.001.0001

  3. [3]

    The Power of Sound Design in a Moving Picture: an Empirical Study with emoTouch for iPad,

    Kock, M. and Louven, C., “The Power of Sound Design in a Moving Picture: an Empirical Study with emoTouch for iPad,” Empirical Musicology Review, 13(3-4), 2019, ISSN 1559-5749, doi:10.18061/emr.v13i3-4.6572

  4. [4]

    Greene, L. and Kulezic-Wilson, D., editors,The Palgrave Handbook of Sound Design and Music in Screen Media, Pal- grave Macmillan UK, London, 2016, ISBN 978-1-137-51679-4 978-1-137-51680-0, doi:10.1057/978-1-137-51680-0

  5. [5]

    Defining immersion: Literature review and im- plications for research on immersive audiovisual experiences,

    Agrawal, S., Simon, A., Bech, S., Bærentsen, K., and Forch- hammer, S., “Defining immersion: Literature review and im- plications for research on immersive audiovisual experiences,” Journal of AES, 68(6), pp. 404–417, 2019

  6. [6]

    Can Neuroscience Help Us Do a Better Job of Teaching Music?

    Hodges, D., “Can Neuroscience Help Us Do a Better Job of Teaching Music?”General Music Today, 23, pp. 3–12, 2010, doi:10.1177/1048371309349569

  7. [7]

    An empirical approach to the relationship between emotion and music production quality,

    Ronan, D., Reiss, J., and Gunes, H., “An empirical approach to the relationship between emotion and music production quality,” 2018

  8. [8]

    The why, what, and how of immersive experience,

    Zhang, C., “The why, what, and how of immersive experience,” IEEE Access, 8, pp. 90878–90888, 2020

  9. [9]

    How Cinema Sounds Affect the Perception of a Motion Picture,

    Anestis, A. and Goussios, C., “How Cinema Sounds Affect the Perception of a Motion Picture,”Universal Journal of Psychol- ogy, 3, pp. 147–152, 2015, doi:10.13189/ujp.2015.030503. AES 160th Convention, Copenhagen, Denmark 2026 May 28–30 Page 7 of 8 Garcia, Crocker, et al. EMORSION - Examining the Impact of Audio Parameters in Film

  10. [10]

    The Role of Sound in the Immersive Experience,

    Saroka, V ., “The Role of Sound in the Immersive Experience,” Avant, 14, 2024, doi:10.26913/ava3202406

  11. [11]

    The sound of storytelling: An exploratory study of sound design and music in film drama,

    Crocker, R., Garcia, N., Reiss, J., and Fazekas, G., “The sound of storytelling: An exploratory study of sound design and music in film drama,” in157th AES Convention, AES, 2024

  12. [12]

    How soundtracks shape what we see: Analyzing the influence of music on visual scenes through self-assessment, eye tracking, and pupillometry,

    Ansani, A., Marini, M., D’Errico, F., and Poggi, I., “How soundtracks shape what we see: Analyzing the influence of music on visual scenes through self-assessment, eye tracking, and pupillometry,”Frontiers in Psychology, 11, p. 556697, 2020

  13. [13]

    From cinema to the lab: Psychological experiments as liminal affective technologies,

    Zulato, E., “From cinema to the lab: Psychological experiments as liminal affective technologies,”Theory & Psychology, 2025, doi:10.1177/09593543251391140

  14. [14]

    Engaging with contemporary dance: What can body movements tell us about audience responses?

    Theodorou, L., Healey, P. G., and Smeraldi, F., “Engaging with contemporary dance: What can body movements tell us about audience responses?”Frontiers in Psychology, 10, p. 71, 2019

  15. [15]

    Measuring and defining the experi- ence of immersion in games,

    Jennett, C., Cox, A. L., Cairns, P., Dhoparee, S., Epps, A., Tijs, T., and Walton, A., “Measuring and defining the experi- ence of immersion in games,”International Journal of Human- Computer Studies, 66(9), pp. 641–661, 2008

  16. [16]

    Expression, Perception, and Induc- tion of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening,

    Juslin, P. and Laukka, P., “Expression, Perception, and Induc- tion of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening,”Journal of New Music Research, 33, pp. 217–238, 2004

  17. [17]

    RR interval signal quality of a heart rate monitor and an ECG Holter at rest and during exercise,

    Gilgen-Ammann, R., Schweizer, T., and Wyss, T., “RR interval signal quality of a heart rate monitor and an ECG Holter at rest and during exercise,”European Journal of Applied Physiology, 119(7), pp. 1525–1532, 2019

  18. [18]

    Viewer versus film: Exploring interaction effects of immersion and cognitive stance on the heart rate and self-reported engagement of viewers of short films,

    Rooney, B., Hennessy, E., and Bálint, K., “Viewer versus film: Exploring interaction effects of immersion and cognitive stance on the heart rate and self-reported engagement of viewers of short films,”Society for Cognitive Studies of the Moving Image, 2014

  19. [19]

    The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects.Fron- tiers in Psychology, V olume 4 - 2013, 2013

    Gonzalez, V . and˙Zelechowska, A., “Correspondences Between Music and Involuntary Human Micromotion During Stand- still,”Frontiers in Psychology, 9, 2018, doi:10.3389/fpsyg. 2018.01382

  20. [20]

    The Role of Embodiment in the Perception of Music,

    Leman, M. and Maes, P.-J., “The Role of Embodiment in the Perception of Music,”Empirical Musicology Review, 9, pp. 236–246, 2015, doi:10.18061/emr.v9i3-4.4498

  21. [21]

    How long is long enough to induce immersion?

    Zhang, C., Hoel, A. S., Perkis, A., and Zadtootaghaj, S., “How long is long enough to induce immersion?” in10th QoMEX, pp. 1–6, IEEE, 2018

  22. [22]

    Why Independent Films Matter?

    Aditya, D., “Why Independent Films Matter?”ResearchGate. Artikkeli. Julkaistu, 26, p. 2024, 2024

  23. [23]

    Detection of Arousal and Valence from Facial Expressions and Physiological Responses Evoked by Different Types of Stressors,

    Bruin, J., Stuldreher, I. V ., Perone, P., Hogenelst, K., Naber, M., Kamphuis, W., and Brouwer, A.-M., “Detection of Arousal and Valence from Facial Expressions and Physiological Responses Evoked by Different Types of Stressors,”Frontiers in Neuroer- gonomics, 5, 2024, ISSN 2673-6195, doi:10.3389/fnrgo.2024. 1338243

  24. [24]

    The Impact of Missing Data on Heart Rate Variability Features: A Comparative Study of Interpolation Methods for Ambulatory Health Monitoring,

    Benchekroun, M., Chevallier, B., Zalc, V ., Istrate, D., Lenne, D., and Vera, N., “The Impact of Missing Data on Heart Rate Variability Features: A Comparative Study of Interpolation Methods for Ambulatory Health Monitoring,”IRBM, 44(4), p. 100776, 2023, ISSN 1959-0318, doi:10.1016/j.irbm.2023. 100776

  25. [25]

    Strate- gies for Reliable Stress Recognition: A Machine Learning Ap- proach Using Heart Rate Variability Features,

    Bahameish, M., Stockman, T., and Requena Carrión, J., “Strate- gies for Reliable Stress Recognition: A Machine Learning Ap- proach Using Heart Rate Variability Features,”Sensors, 24(10), p. 3210, 2024, ISSN 1424-8220, doi:10.3390/s24103210

  26. [26]

    Surround Sound Spreads Visual Attention and Increases Cognitive Effort in Immer- sive Media Reproductions,

    Mendonça, C. and Korshunova, V ., “Surround Sound Spreads Visual Attention and Increases Cognitive Effort in Immer- sive Media Reproductions,” inProceedings of the 15th Audio Mostly, pp. 16–21, ACM, Graz Austria, 2020, ISBN 978-1- 4503-7563-4, doi:10.1145/3411109.3411118

  27. [27]

    The Effect of Auditory Stimulation on the Nonlinear Dynamics of Heart Rate: The Impact of Emotional Valence and Arousal,

    Dimitriev, D., Indeykina, O., and Dimitriev, A., “The Effect of Auditory Stimulation on the Nonlinear Dynamics of Heart Rate: The Impact of Emotional Valence and Arousal,”Noise and Health, 25(118), p. 165, 2023, ISSN 1463-1741, doi:10. 4103/nah.nah_15_22

  28. [28]

    Autonomic Correlates of Physical and Moral Disgust,

    Ottaviani, C., Mancini, F., Petrocchi, N., Medea, B., and Couy- oumdjian, A., “Autonomic Correlates of Physical and Moral Disgust,”International Journal of Psychophysiology, 89(1), pp. 57–62, 2013, ISSN 0167-8760, doi:10.1016/j.ijpsycho.2013. 05.003

  29. [29]

    Autonomic Nervous System Activity in Emo- tion: A Review,

    Kreibig, S. D., “Autonomic Nervous System Activity in Emo- tion: A Review,”Biological Psychology, 84(3), pp. 394–421, 2010, ISSN 0301-0511, doi:10.1016/j.biopsycho.2010.03.010

  30. [30]

    Sensory unpleasantness of high-frequency sounds,

    Kurakata, K., Mizunami, T., and Matsushita, K., “Sensory unpleasantness of high-frequency sounds,”Acoustical Science and Technology, 34(1), pp. 26–33, 2013

  31. [31]

    That sounds awful! Does sound unpleasant- ness modulate the mismatch negativity and its habituation?

    Ringer, H., Rösch, S. A., Roeber, U., Deller, J., Escera, C., and Grimm, S., “That sounds awful! Does sound unpleasant- ness modulate the mismatch negativity and its habituation?” Psychophysiology, 61(2), p. e14450, 2024. AES 160th Convention, Copenhagen, Denmark 2026 May 28–30 Page 8 of 8