pith. sign in

arxiv: 1907.04926 · v1 · pith:HQWM6TGXnew · submitted 2019-07-05 · 📡 eess.AS · cs.MM· cs.SD· eess.IV

Synchronizing Audio-Visual Film Stimuli in Unity (version 5.5.1f1): Game Engines as a Tool for Research

Pith reviewed 2026-05-25 02:09 UTC · model grok-4.3

classification 📡 eess.AS cs.MMcs.SDeess.IV
keywords Unityaudio-visual synchronizationgame enginestimuli presentationtiming compensationbiometric signalsexperimental designvideo playback
0
0 comments X

The pith

A compensation protocol corrects four timing problems in Unity 5.5.1f1 allowing accurate audio-visual stimulus presentation synchronized with biometric signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the suitability of Unity version 5.5.1f1 as a tool for presenting audio-visual stimuli in research setups that demand synchronization with biometric data acquisition. Analysis of standard playback revealed four timing-related problems: video and audio desynchronization, counter drift relative to the video, delays when taking screenshots, and playback fluency issues tied to video encoding that cause frame freezes compensated by jumps. The authors respond by outlining a protocol of checks and compensations to mitigate these issues. This would matter if true because it could turn a widely available game development platform into a reliable instrument for controlled psychological or neuroscientific experiments involving film stimuli.

Core claim

Unity 5.5.1f1 exhibits desynchronization between video and audio, desynchronization between the temporary counter and the video, delay in screenshot execution, and encoding-dependent fluency problems during audio-visual playback. A compensation and verification process is designed to address these, enabling accurate work with audio-visual material for robust experiments in terms of reliability.

What carries the argument

A protocol for checks and compensations that resolves the four identified timing problems in Unity's execution of audio-visual material.

If this is right

  • Accurate synchronization between audio-visual emission and biometric signal acquisition becomes possible.
  • Total playback time is preserved even when fluency issues are present.
  • Experimental designs can incorporate real-time screenshot functions with known delays.
  • The temporary counter can be aligned reliably with video content.
  • Game engines can serve as versatile tools for stimuli presentation once limitations are compensated.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar timing issues may exist in other versions of Unity or comparable game engines, warranting similar checks.
  • The protocol could be adapted for non-biometric experiments requiring precise AV timing.
  • This approach might lower barriers for researchers without access to specialized stimulus software.
  • Verification in diverse hardware setups would test the protocol's robustness beyond the tested procedure.

Load-bearing premise

The observed timing issues are the complete set of problems and the compensation protocol generalizes to other experimental procedures using Unity for audio-visual stimuli.

What would settle it

Implementing the protocol in Unity 5.5.1f1 and measuring the actual synchronization error between video frames, audio, and a biometric recording device in a new experimental setup.

read the original abstract

Unity is a software specifically designed for the development of video games. However, due to its programming possibilities and the polyvalence of its architecture, it can prove to be a versatile tool for stimuli presentation in research experiments. Nevertheless, it also has some limitations and conditions that need to be taken into account to ensure optimal performance in particular experimental situations. Such is the case if we want to use it in an experimental design that includes the acquisition of biometric signals synchronized with the broadcasting of video and audio in real time. In the present paper, we analyse how Unity (version 5.5.1f1) reacts in one such experimental design that requires the execution of audio-visual material. From the analysis of an experimental procedure in which the video was executed following the standard software specifications, we have detected the following problems desynchronization between the emission of the video and the audio; desynchronization between the temporary counter and the video; a delay in the execution of the screenshot; and depending on the encoding of the video a bad fluency in the video playback, which even though it maintains the total playback time, it causes Unity to freeze frames and proceed to compensate with little temporary jumps in the video. Finally, having detected all the problems, a compensation and verification process is designed to be able to work with audio-visual material in Unity (version 5.5.1f1) in an accurate way. We present a protocol for checks and compensations that allows solving these problems to ensure the execution of robust experiments in terms of reliability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the use of Unity (version 5.5.1f1) for presenting audio-visual film stimuli in research experiments that require real-time synchronization with biometric signal acquisition. From observations in a single experimental procedure run with standard software settings, the authors identify four timing-related problems: desynchronization between video and audio emission, desynchronization between the temporary counter and the video, a delay in screenshot execution, and encoding-dependent issues with video playback fluency (frame freezing compensated by temporal jumps). The central contribution is a compensation and verification protocol designed to address these issues and enable accurate, reliable use of AV material in Unity.

Significance. If the protocol were accompanied by quantitative validation, the work would offer practical value to researchers adapting game engines for precisely timed multimodal stimuli, particularly in fields requiring AV-biometric synchronization. The explicit enumeration of concrete artifacts (video-audio offset, counter drift, screenshot latency, encoding effects) is a useful starting point. However, the manuscript provides no measured error values, tolerance specifications, or before/after comparisons, so its significance remains that of an observational note rather than a demonstrated solution.

major comments (2)
  1. [Abstract] Abstract: The claim that the compensation and verification process allows work 'in an accurate way' is unsupported by any quantitative measurements of synchronization error (e.g., milliseconds of residual AV offset), tolerance thresholds, or empirical verification that the protocol restores timing to a stated precision. This directly undermines the central claim that the protocol solves the problems for robust experiments.
  2. [Abstract] Abstract: The assertion that the four listed issues are exhaustive and that the protocol generalizes rests solely on observations from one experimental procedure; no tests on alternate encodings, hardware configurations, Unity versions, or stimulus lengths are reported, leaving the exhaustiveness and robustness claims unverified.
minor comments (2)
  1. [Abstract] Abstract: 'bad fluency' is imprecise; replace with 'reduced playback fluency' or 'frame dropping with temporal jumps'.
  2. [Abstract] Abstract: 'temporary counter' should be clarified as 'time counter' or 'frame counter' for readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We agree that the language requires adjustment to better match the observational scope of the work and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the compensation and verification process allows work 'in an accurate way' is unsupported by any quantitative measurements of synchronization error (e.g., milliseconds of residual AV offset), tolerance thresholds, or empirical verification that the protocol restores timing to a stated precision. This directly undermines the central claim that the protocol solves the problems for robust experiments.

    Authors: We acknowledge that the manuscript reports observations from a single procedure and does not include quantitative error measurements or before/after verification data. The phrase 'in an accurate way' was intended to describe the intent of the compensation protocol rather than a demonstrated precision. We will revise the abstract to remove this phrasing and instead state that the protocol is designed to address the observed timing issues in the reported setup. revision: yes

  2. Referee: [Abstract] Abstract: The assertion that the four listed issues are exhaustive and that the protocol generalizes rests solely on observations from one experimental procedure; no tests on alternate encodings, hardware configurations, Unity versions, or stimulus lengths are reported, leaving the exhaustiveness and robustness claims unverified.

    Authors: The manuscript explicitly bases its analysis on one experimental procedure and does not present data from additional configurations. We will revise the abstract to clarify that the four issues were identified in this specific case and that the protocol is proposed for the described conditions, without asserting exhaustiveness or generalization. revision: yes

Circularity Check

0 steps flagged

No circularity; purely observational protocol report

full rationale

The paper reports detection of four timing issues (video-audio desync, counter drift, screenshot delay, encoding-dependent fluency) during testing of Unity 5.5.1f1 with AV stimuli in one experimental procedure, then describes a compensation/verification protocol based on those observations. No equations, fitted parameters, predictions, or self-citations are present. The central claim does not reduce any quantity to a quantity defined by the authors' own prior work or inputs; it is an empirical report without derivation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests entirely on empirical observation of one software version in one experimental setup; no mathematical axioms, free parameters, or new entities are introduced.

pith-pipeline@v0.9.0 · 5847 in / 1135 out tokens · 18508 ms · 2026-05-25T02:09:40.822017+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references

  1. [1]

    Cities: Skylines. (2015). [Video Game]. Developed by Colossal Order

  2. [2]

    Escape Plan. (2012). [Video Game]. Developed by Fun Bits Interactive

  3. [3]

    Firewatch. (2016). [Video Game]. Developed by Campo Santo

  4. [4]

    Rust. (2013). [Video Game]. Developed by Facepunch Studios

  5. [5]

    E., Bacevicuite, S., & Arief, M

    Bruni, L. E., Bacevicuite, S., & Arief, M.. Narrative Cognition in Interactive Systems: Suspense-Surprise and the P300 ERP Component. Interactive Storytelling (2014, Novmeber 3) , Lecture Notes in Computer Science book series Vol. 8832 , 164-175

  6. [6]

    Evaluating ANN efficiency in recognizing EEG and Eye-Tracking Evoked Potentials in Visual-Game-Events

    Wulff-Jensen, A., & Bruni, L. Evaluating ANN efficiency in recognizing EEG and Eye-Tracking Evoked Potentials in Visual-Game-Events. 8th International Conference on Applied Human Factors and Ergonomics, Los Angeles, EE.UU.,

  7. [7]

    Springer International Publishing

  8. [8]

    Differences in Cognitive Processing When Appreciating Figurative and Abstract Art Can Be Detected by Integrating EEG and Eye -Tracking Data

    Baceviciute, S., Bruni, L., Burelli, P., & Wulff -Jensen, A. Differences in Cognitive Processing When Appreciating Figurative and Abstract Art Can Be Detected by Integrating EEG and Eye -Tracking Data. 24th Conference of the International Association of Empirical Aesthetics, Vienna, Austria, 2016

  9. [9]

    3ME-A 3D Music Experience (2010)

    Genovese, A., Craig Jr., C., & Calle, S. 3ME-A 3D Music Experience (2010). ResearchGate

  10. [10]

    An efficient approach to playback of stereoscopic videos using a wide field-of-view

    Larkee, C., & LaDisa, J. An efficient approach to playback of stereoscopic videos using a wide field-of-view. Electronic Imaging (2015), 1-6

  11. [11]

    Panoramic 360◦ videos in virtual reality using two lenses and a mobile phone

    Ramachandrappa, A. Panoramic 360◦ videos in virtual reality using two lenses and a mobile phone. Doctoral dissertation, University of Illinois, Illinois, EE.UU., 2015. Link: https://ideals.illinois.edu/handle/2142/89067

  12. [12]

    Unity Animation Essentials

    Thorn, A. Unity Animation Essentials. Packt Publishing, 2015, Birmingham, England

  13. [13]

    Unity Technologies. (2017). Unity User Manual (2017.3) . Retrieved 2017, from https://docs.unity3d.com/Manual/index.html

  14. [14]

    Bonnie and Clyde. (1967). [Motion Picture]. Directed by A. Penn

  15. [15]

    Children of Men. (2006). [Motion Picture].Directed by A. Cuarón, A

  16. [16]

    Theora Converter .NET

    Ratkiley. Theora Converter .NET . Retrieved April 12, 2018, from https://sourceforge.net/projects/theoraconverter/

  17. [17]

    Theora Format Specification

    theora.org. Theora Format Specification . Retrieved 06 13, 2017, from: http://theora.org/doc/Theora.pdf

  18. [18]

    Premiere Pro CC 2017

    Adobe. Premiere Pro CC 2017 . Retrieved April 13, 2018, from https://www.adobe.com/products/premiere.html

  19. [19]

    On the Waterfront. (1954). [Motion Picture]. Directed by E. Kazan

  20. [20]

    The Searchers. (1956). [Motion Picture]. Directed by J. Ford

  21. [21]

    Whiplash. (2014). [Motion Picture].Directed by D. Chazelle