Synchronizing Audio-Visual Film Stimuli in Unity (version 5.5.1f1): Game Engines as a Tool for Research
Pith reviewed 2026-05-25 02:09 UTC · model grok-4.3
The pith
A compensation protocol corrects four timing problems in Unity 5.5.1f1 allowing accurate audio-visual stimulus presentation synchronized with biometric signals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Unity 5.5.1f1 exhibits desynchronization between video and audio, desynchronization between the temporary counter and the video, delay in screenshot execution, and encoding-dependent fluency problems during audio-visual playback. A compensation and verification process is designed to address these, enabling accurate work with audio-visual material for robust experiments in terms of reliability.
What carries the argument
A protocol for checks and compensations that resolves the four identified timing problems in Unity's execution of audio-visual material.
If this is right
- Accurate synchronization between audio-visual emission and biometric signal acquisition becomes possible.
- Total playback time is preserved even when fluency issues are present.
- Experimental designs can incorporate real-time screenshot functions with known delays.
- The temporary counter can be aligned reliably with video content.
- Game engines can serve as versatile tools for stimuli presentation once limitations are compensated.
Where Pith is reading between the lines
- Similar timing issues may exist in other versions of Unity or comparable game engines, warranting similar checks.
- The protocol could be adapted for non-biometric experiments requiring precise AV timing.
- This approach might lower barriers for researchers without access to specialized stimulus software.
- Verification in diverse hardware setups would test the protocol's robustness beyond the tested procedure.
Load-bearing premise
The observed timing issues are the complete set of problems and the compensation protocol generalizes to other experimental procedures using Unity for audio-visual stimuli.
What would settle it
Implementing the protocol in Unity 5.5.1f1 and measuring the actual synchronization error between video frames, audio, and a biometric recording device in a new experimental setup.
read the original abstract
Unity is a software specifically designed for the development of video games. However, due to its programming possibilities and the polyvalence of its architecture, it can prove to be a versatile tool for stimuli presentation in research experiments. Nevertheless, it also has some limitations and conditions that need to be taken into account to ensure optimal performance in particular experimental situations. Such is the case if we want to use it in an experimental design that includes the acquisition of biometric signals synchronized with the broadcasting of video and audio in real time. In the present paper, we analyse how Unity (version 5.5.1f1) reacts in one such experimental design that requires the execution of audio-visual material. From the analysis of an experimental procedure in which the video was executed following the standard software specifications, we have detected the following problems desynchronization between the emission of the video and the audio; desynchronization between the temporary counter and the video; a delay in the execution of the screenshot; and depending on the encoding of the video a bad fluency in the video playback, which even though it maintains the total playback time, it causes Unity to freeze frames and proceed to compensate with little temporary jumps in the video. Finally, having detected all the problems, a compensation and verification process is designed to be able to work with audio-visual material in Unity (version 5.5.1f1) in an accurate way. We present a protocol for checks and compensations that allows solving these problems to ensure the execution of robust experiments in terms of reliability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the use of Unity (version 5.5.1f1) for presenting audio-visual film stimuli in research experiments that require real-time synchronization with biometric signal acquisition. From observations in a single experimental procedure run with standard software settings, the authors identify four timing-related problems: desynchronization between video and audio emission, desynchronization between the temporary counter and the video, a delay in screenshot execution, and encoding-dependent issues with video playback fluency (frame freezing compensated by temporal jumps). The central contribution is a compensation and verification protocol designed to address these issues and enable accurate, reliable use of AV material in Unity.
Significance. If the protocol were accompanied by quantitative validation, the work would offer practical value to researchers adapting game engines for precisely timed multimodal stimuli, particularly in fields requiring AV-biometric synchronization. The explicit enumeration of concrete artifacts (video-audio offset, counter drift, screenshot latency, encoding effects) is a useful starting point. However, the manuscript provides no measured error values, tolerance specifications, or before/after comparisons, so its significance remains that of an observational note rather than a demonstrated solution.
major comments (2)
- [Abstract] Abstract: The claim that the compensation and verification process allows work 'in an accurate way' is unsupported by any quantitative measurements of synchronization error (e.g., milliseconds of residual AV offset), tolerance thresholds, or empirical verification that the protocol restores timing to a stated precision. This directly undermines the central claim that the protocol solves the problems for robust experiments.
- [Abstract] Abstract: The assertion that the four listed issues are exhaustive and that the protocol generalizes rests solely on observations from one experimental procedure; no tests on alternate encodings, hardware configurations, Unity versions, or stimulus lengths are reported, leaving the exhaustiveness and robustness claims unverified.
minor comments (2)
- [Abstract] Abstract: 'bad fluency' is imprecise; replace with 'reduced playback fluency' or 'frame dropping with temporal jumps'.
- [Abstract] Abstract: 'temporary counter' should be clarified as 'time counter' or 'frame counter' for readability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract. We agree that the language requires adjustment to better match the observational scope of the work and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the compensation and verification process allows work 'in an accurate way' is unsupported by any quantitative measurements of synchronization error (e.g., milliseconds of residual AV offset), tolerance thresholds, or empirical verification that the protocol restores timing to a stated precision. This directly undermines the central claim that the protocol solves the problems for robust experiments.
Authors: We acknowledge that the manuscript reports observations from a single procedure and does not include quantitative error measurements or before/after verification data. The phrase 'in an accurate way' was intended to describe the intent of the compensation protocol rather than a demonstrated precision. We will revise the abstract to remove this phrasing and instead state that the protocol is designed to address the observed timing issues in the reported setup. revision: yes
-
Referee: [Abstract] Abstract: The assertion that the four listed issues are exhaustive and that the protocol generalizes rests solely on observations from one experimental procedure; no tests on alternate encodings, hardware configurations, Unity versions, or stimulus lengths are reported, leaving the exhaustiveness and robustness claims unverified.
Authors: The manuscript explicitly bases its analysis on one experimental procedure and does not present data from additional configurations. We will revise the abstract to clarify that the four issues were identified in this specific case and that the protocol is proposed for the described conditions, without asserting exhaustiveness or generalization. revision: yes
Circularity Check
No circularity; purely observational protocol report
full rationale
The paper reports detection of four timing issues (video-audio desync, counter drift, screenshot delay, encoding-dependent fluency) during testing of Unity 5.5.1f1 with AV stimuli in one experimental procedure, then describes a compensation/verification protocol based on those observations. No equations, fitted parameters, predictions, or self-citations are present. The central claim does not reduce any quantity to a quantity defined by the authors' own prior work or inputs; it is an empirical report without derivation chains.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Cities: Skylines. (2015). [Video Game]. Developed by Colossal Order
2015
-
[2]
Escape Plan. (2012). [Video Game]. Developed by Fun Bits Interactive
2012
-
[3]
Firewatch. (2016). [Video Game]. Developed by Campo Santo
2016
-
[4]
Rust. (2013). [Video Game]. Developed by Facepunch Studios
2013
-
[5]
E., Bacevicuite, S., & Arief, M
Bruni, L. E., Bacevicuite, S., & Arief, M.. Narrative Cognition in Interactive Systems: Suspense-Surprise and the P300 ERP Component. Interactive Storytelling (2014, Novmeber 3) , Lecture Notes in Computer Science book series Vol. 8832 , 164-175
2014
-
[6]
Evaluating ANN efficiency in recognizing EEG and Eye-Tracking Evoked Potentials in Visual-Game-Events
Wulff-Jensen, A., & Bruni, L. Evaluating ANN efficiency in recognizing EEG and Eye-Tracking Evoked Potentials in Visual-Game-Events. 8th International Conference on Applied Human Factors and Ergonomics, Los Angeles, EE.UU.,
-
[7]
Springer International Publishing
-
[8]
Differences in Cognitive Processing When Appreciating Figurative and Abstract Art Can Be Detected by Integrating EEG and Eye -Tracking Data
Baceviciute, S., Bruni, L., Burelli, P., & Wulff -Jensen, A. Differences in Cognitive Processing When Appreciating Figurative and Abstract Art Can Be Detected by Integrating EEG and Eye -Tracking Data. 24th Conference of the International Association of Empirical Aesthetics, Vienna, Austria, 2016
2016
-
[9]
3ME-A 3D Music Experience (2010)
Genovese, A., Craig Jr., C., & Calle, S. 3ME-A 3D Music Experience (2010). ResearchGate
2010
-
[10]
An efficient approach to playback of stereoscopic videos using a wide field-of-view
Larkee, C., & LaDisa, J. An efficient approach to playback of stereoscopic videos using a wide field-of-view. Electronic Imaging (2015), 1-6
2015
-
[11]
Panoramic 360◦ videos in virtual reality using two lenses and a mobile phone
Ramachandrappa, A. Panoramic 360◦ videos in virtual reality using two lenses and a mobile phone. Doctoral dissertation, University of Illinois, Illinois, EE.UU., 2015. Link: https://ideals.illinois.edu/handle/2142/89067
2015
-
[12]
Unity Animation Essentials
Thorn, A. Unity Animation Essentials. Packt Publishing, 2015, Birmingham, England
2015
-
[13]
Unity Technologies. (2017). Unity User Manual (2017.3) . Retrieved 2017, from https://docs.unity3d.com/Manual/index.html
2017
-
[14]
Bonnie and Clyde. (1967). [Motion Picture]. Directed by A. Penn
1967
-
[15]
Children of Men. (2006). [Motion Picture].Directed by A. Cuarón, A
2006
-
[16]
Theora Converter .NET
Ratkiley. Theora Converter .NET . Retrieved April 12, 2018, from https://sourceforge.net/projects/theoraconverter/
2018
-
[17]
Theora Format Specification
theora.org. Theora Format Specification . Retrieved 06 13, 2017, from: http://theora.org/doc/Theora.pdf
2017
-
[18]
Premiere Pro CC 2017
Adobe. Premiere Pro CC 2017 . Retrieved April 13, 2018, from https://www.adobe.com/products/premiere.html
2017
-
[19]
On the Waterfront. (1954). [Motion Picture]. Directed by E. Kazan
1954
-
[20]
The Searchers. (1956). [Motion Picture]. Directed by J. Ford
1956
-
[21]
Whiplash. (2014). [Motion Picture].Directed by D. Chazelle
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.