Analyzing Visual Attention Patterns During Band Rehearsal with Mobile Eye Tracking
Pith reviewed 2026-06-28 08:32 UTC · model grok-4.3
The pith
Band rehearsals form a hub-and-spoke gaze pattern centered on the leader, with attention stabilizing after repeated attempts on new material.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that visual attention during ensemble rehearsal exhibits a hub-and-spoke topology, with the session leader as the dominant fixation target for all members and the learning guitarist directing up to 97 percent of interpersonal dwell time to this reference. Transition matrices show gaze shifts falling by up to 65 percent on average (82 percent for some individuals) between successive attempts on unfamiliar material, while scarf plots distinguish fragmented attention during teaching breakdowns from consolidated attention during uninterrupted runs. These quantitative patterns align with participants' post-session reflections.
What carries the argument
The hub-and-spoke attention topology, recovered from fixation matrices, transition matrices, and temporal scarf plots built from mobile eye-tracking data mapped to people and objects via YOLOv8 scene annotations.
If this is right
- Attention concentrates on one reference person rather than distributing evenly across the group.
- Repeated practice on new material reduces the frequency of gaze shifts between members.
- Teaching interruptions produce visible fragmentation in the sequence of fixations.
- Uninterrupted performance runs produce visibly consolidated fixation sequences.
- Participant self-reports after the session match the recorded gaze patterns.
Where Pith is reading between the lines
- Rehearsal software could track live attention distribution and flag moments when focus drifts from the leader.
- The same recording method could be tested in other coordinated group activities such as chamber music or team sports to check for similar topologies.
- If the stabilization effect proves reliable, rehearsal protocols might deliberately include repeated attempts to accelerate the drop in unnecessary scanning.
Load-bearing premise
The automated scene annotations correctly assign fixations to individual musicians and objects, and the small group of four players and three songs reveals patterns that hold more generally.
What would settle it
Repeating the same rehearsal protocol with a different ensemble or larger sample and finding either no single dominant gaze target or no consistent drop in transitions between attempts would falsify the reported topology and stabilization effect.
Figures
read the original abstract
Visual attention is central to ensemble coordination, yet how musicians allocate gaze during naturalistic rehearsal remains poorly understood. We present a pilot study using mobile eye tracking to examine gaze behaviour in a four-member band across three songs, each practiced twice. Musicians wore Pupil Labs Neon eye trackers, and YOLOv8-assisted scene annotations mapped fixations to ensemble members and objects in view. Analyzing fixation matrices, transition matrices, temporal scarf plots, and dwell-transition correlations, we uncover a hub-and-spoke attention topology: the session leader was the dominant gaze target for all members, while the learning guitarist concentrated up to 97% of interpersonal dwell on this single reference. Between attempts, gaze transitions decreased by up to 65% on average for unfamiliar material (up to 82% for individual participants) as scanning stabilized. Scarf plots reveal how teaching breakdowns fragment attention and uninterrupted runs consolidate it. Post-session participant reflections align with the quantitative patterns, and we discuss implications for gaze-aware tools in ensemble pedagogy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a pilot study using mobile eye tracking (Pupil Labs Neon) on four band members rehearsing three songs twice each. YOLOv8-assisted scene annotations map fixations to ensemble members and objects; analysis of fixation/transition matrices, scarf plots, and dwell correlations reveals a hub-and-spoke attention topology with the session leader as dominant target (up to 97% interpersonal dwell concentration for the learning guitarist) and reductions in gaze transitions (up to 65% average, 82% for individuals) between attempts as scanning stabilizes.
Significance. If the fixation-to-object mappings prove reliable, the work supplies the first quantitative description of visual attention dynamics in naturalistic ensemble rehearsal, documenting practice-induced stabilization and alignment with participant reflections. The naturalistic mobile-eye-tracking design in a moving rehearsal setting is a methodological strength that could inform gaze-aware ensemble pedagogy tools.
major comments (2)
- [Abstract] Abstract (methods/results): All reported percentages (97% dwell concentration, 65% transition reduction) and the hub-and-spoke topology are derived from fixation matrices produced by YOLOv8-assisted annotations. No accuracy, precision, recall, or inter-annotator agreement metrics are supplied for the annotation step in a dynamic, multi-person, moving-camera rehearsal environment; without such validation the quantitative claims rest on an untested measurement pipeline.
- [Abstract] Abstract (results/discussion): The sample comprises only four musicians and three songs. The manuscript must clarify whether the observed topology and transition reductions are presented as general ensemble phenomena or as case-specific observations, and must address how the small N affects the strength of the stabilization claim.
minor comments (1)
- [Abstract] The abstract states that 'post-session participant reflections align with the quantitative patterns' but supplies no information on interview protocol, coding, or how alignment was assessed.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our pilot study. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of methods and the framing of results.
read point-by-point responses
-
Referee: [Abstract] Abstract (methods/results): All reported percentages (97% dwell concentration, 65% transition reduction) and the hub-and-spoke topology are derived from fixation matrices produced by YOLOv8-assisted annotations. No accuracy, precision, recall, or inter-annotator agreement metrics are supplied for the annotation step in a dynamic, multi-person, moving-camera rehearsal environment; without such validation the quantitative claims rest on an untested measurement pipeline.
Authors: We agree that formal validation metrics for the annotation pipeline are absent from the current manuscript. The study is a pilot, and annotations combined automated YOLOv8 detection with manual review by the research team, but no quantitative metrics (e.g., precision/recall or inter-annotator agreement) were computed. In the revised version we will add a methods subsection describing the annotation workflow in detail, report any available spot-check agreement figures, and explicitly list the lack of full validation metrics as a limitation of the pilot. This will qualify the quantitative claims without altering the reported patterns. revision: yes
-
Referee: [Abstract] Abstract (results/discussion): The sample comprises only four musicians and three songs. The manuscript must clarify whether the observed topology and transition reductions are presented as general ensemble phenomena or as case-specific observations, and must address how the small N affects the strength of the stabilization claim.
Authors: The manuscript already labels the work a 'pilot study,' but we accept that the abstract and discussion do not sufficiently emphasize the case-specific nature of the findings. In revision we will (1) rephrase the abstract and results to state that the hub-and-spoke topology and transition reductions are observations from this particular four-person ensemble and these three songs, and (2) add an explicit paragraph in the discussion addressing the implications of N=4 for the stabilization claim, noting that the patterns are consistent with participant reflections yet require larger-scale replication before generalizing to ensemble rehearsal at large. revision: yes
Circularity Check
No significant circularity; purely observational descriptive analysis
full rationale
The paper is a pilot observational study that collects mobile eye-tracking data, applies YOLOv8-assisted annotations to map fixations, and reports descriptive statistics (dwell percentages, transition counts, scarf plots) on the resulting matrices. No equations, fitted models, predictions, or derivation chains exist that could reduce to author-defined inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The hub-and-spoke topology and percentage reductions are direct empirical summaries of the annotated data, not quantities defined in terms of themselves. This is the normal case of a self-contained descriptive study with no circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mobile eye trackers and YOLOv8 scene annotations produce sufficiently accurate fixation-to-target mappings for the purposes of the analysis.
Reference graph
Works this paper leans on
-
[1]
Laura Bishop, Carlos Cancino-Chacón, and Werner Goebl. 2019a. Eye Gaze as a Means of Giving and Seeking Information During Musical Interaction.Consciousness and Cognition68 (2019), 73–96. doi:10.1016/j.concog.2019.01.002 Laura Bishop, Carlos Cancino-Chacón, and Werner Goebl. 2019b. Moving to Commu- nicate, Moving to Interact: Patterns of Body Motion in Mu...
-
[2]
Optics Commu- nications454(2020) https://doi.org/10.1016/j
Coordinating Cognition: The Costs and Benefits of Shared Gaze During Collaborative Search.Cognition106, 3 (2008), 1465–1477. doi:10.1016/j. cognition.2007.05.012 Jane W. Davidson and James M.M. Good
work page doi:10.1016/j 2008
-
[3]
doi:10.1177/0305735602302005 Frédéric Dehais, Mickaël Causse, and Sébastien Tremblay
Social and Musical Co-Ordination Between Members of a String Quartet: An Exploratory Study.Psychology of Music 30, 2 (2002), 186–201. doi:10.1177/0305735602302005 Frédéric Dehais, Mickaël Causse, and Sébastien Tremblay
-
[4]
doi:10.1177/0018720813510735 G
Failure to Detect Critical Auditory Alerts in the Cockpit: Evidence for Inattentional Deafness.Human Factors56, 4 (2014), 631–644. doi:10.1177/0018720813510735 G. R. Dirkin
-
[5]
Perceptual and Motor Skills56, 1 (1983), 191–198
Cognitive Tunneling: Use of Visual Information Under Stress. Perceptual and Motor Skills56, 1 (1983), 191–198. doi:10.2466/pms.1983.56.1.191 Véronique Drai-Zerbib and Thierry Baccino
-
[6]
The Effect of Expertise on Eye Movements in Music Reading.Psychology of Music40, 1 (2012), 101–117. doi:10. 1177/0305735610394710 James A. Easterbrook
2012
-
[7]
The Effect of Emotion on Cue Utilization and the Organi- zation of Behavior.Psychological Review66, 3 (1959), 183–201. doi:10.1037/h0047707 Donald Glowinski, Maurizio Mancini, Roddy Cowie, Antonio Camurri, Carlo Chiorri, and Cian Doherty
-
[8]
The Movements Made by Performers in a Skilled Quartet: A Distinctive Pattern, and the Function That It Serves.Frontiers in Psychology4 (2013),
2013
-
[9]
Keller, Giacomo Novembre, and Michael J
doi:10.3389/fpsyg.2013.00841 Peter E. Keller, Giacomo Novembre, and Michael J. Hove
-
[10]
doi:10.1098/rstb.2013.0394 Krzysztof Krejtz, Tadeusz Szmidt, Andrew T
Rhythm in Joint Action: Psychological and Neurophysiological Mechanisms for Real-Time Interpersonal Coordination.Philosophical Transactions of the Royal Society B: Biological Sciences 369, 1658 (2014), 20130394. doi:10.1098/rstb.2013.0394 Krzysztof Krejtz, Tadeusz Szmidt, Andrew T. Duchowski, and Izabela Krejtz
-
[11]
InProceedings of the Symposium on Eye Tracking Research & Applications (ETRA ’14)
Entropy-Based Statistical Analysis of Eye Movement Transitions. InProceedings of the Symposium on Eye Tracking Research & Applications (ETRA ’14). Association for Computing Machinery, New York, NY, USA, 159–166. doi:10.1145/2578153.2578176 Matthias Ragert, Timothy Schroeder, and Peter E. Keller
-
[12]
Knowing Too Little or Too Much: The Effects of Familiarity with a Co-performer’s Part on Interpersonal Coordination in Musical Ensembles.Frontiers in Psychology4 (2013),
2013
-
[13]
3389/fpsyg.2013.00368 Daniel C
doi:10. 3389/fpsyg.2013.00368 Daniel C. Richardson and Rick Dale
arXiv 2013
-
[14]
doi:10.1207/ s15516709cog0000_29 Bertrand Schneider and Roy Pea
Looking To Understand: The Coupling Between Speakers’ and Listeners’ Eye Movements and Its Relationship to Dis- course Comprehension.Cognitive Science29, 6 (2005), 1045–1060. doi:10.1207/ s15516709cog0000_29 Bertrand Schneider and Roy Pea
2005
-
[15]
doi:10.1007/s11412- 013-9181-4 Michael Tomasello
Real-Time Mutual Gaze Perception En- hances Collaborative Learning and Collaboration Quality.International Journal of Computer-Supported Collaborative Learning8, 4 (2013), 375–397. doi:10.1007/s11412- 013-9181-4 Michael Tomasello
-
[16]
Attentional tunneling and task management in synthetic vision displays.The international journal of aviation psychology19, 2 (2009), 182–199. Alan M. Wing, Satoshi Endo, Adrian Bradbury, and Dirk Vorberg
2009
-
[17]
Optimal Feedback Correction in String Quartet Synchronization.Journal of The Royal Society Interface11, 93 (2014), 20131125. doi:10.1098/rsif.2013.1125
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.