pith. sign in

arxiv: 2604.21707 · v1 · submitted 2026-04-23 · 💻 cs.RO

Effects of Swarm Size Variability on Operator Workload

Pith reviewed 2026-05-09 21:31 UTC · model grok-4.3

classification 💻 cs.RO
keywords human-swarm interactionoperator workloadswarm size changesworkload residuecognitive resetdrone monitoringsubjective workloadperformance stability
0
0 comments X

The pith

Small decreases in swarm size leave operator workload elevated while small increases keep it low and large changes reset it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how abrupt shifts in the number of robots in a swarm affect a human operator's workload and task performance. It shows that actual success at monitoring stays steady across small size changes, but how hard the work feels depends on whether the swarm grew or shrank and by how much. Small reductions cause workload to remain high from the previous state, while small additions keep it lower and big jumps in either direction seem to clear prior effects. These patterns matter for real missions where robots fail or get reassigned, because balancing human effort helps avoid overload without losing the benefits of teaming with many machines. The studies used a simulated drone monitoring setup with controlled size changes between episodes to measure these responses.

Core claim

The central claim is that objective performance is largely unaffected by small changes in swarm size, while subjective workload is sensitive to both change direction and magnitude. Small increases preserve lower workload, whereas small decreases leave workload elevated, indicating workload residue; large changes in either direction attenuate these effects, suggesting a reset response.

What carries the argument

Workload history, the carryover of prior effort levels into current perception, combined with a cognitive reset triggered when swarm size changes exceed a threshold.

Load-bearing premise

That workload dynamics measured in short simulated drone monitoring episodes with discrete size shifts accurately capture the effects in continuous, high-stakes real-world human-swarm operations.

What would settle it

A study that measures subjective workload continuously during long-running real drone missions and finds no sustained elevation after small reductions or no drop after large shifts would falsify the residue and reset claims.

Figures

Figures reproduced from arXiv: 2604.21707 by Aleksandra Landowska, Horia A. Maior, Mohammad Soorati, Sarvapali D. Ramchurn, William Hunt.

Figure 2
Figure 2. Figure 2: Review panel. After the video ends, the user reports which colour drone disappeared and their workload. simple monitoring tasks [8], with similar effects observed when managing multiple sub￾swarms [13]. Most experimental studies assume a fixed swarm size for the duration of a task. In practice, however, swarm size is often dynamic: robots may fail, be redeployed, or join the mission as demands change, lead… view at source ↗
Figure 3
Figure 3. Figure 3: Directional arrows showing changes vs baseline (black dots). E.g. A blue arrow [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Study 2 directional changes relative to baseline (black dots), grouped by swarm [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Real-world deployments of human--swarm teams depend on balancing operator workload to leverage human strengths without inducing overload. A key challenge is that swarm size is often dynamic: robots may join or leave the mission due to failures or redeployment, causing abrupt workload fluctuations. Understanding how such changes affect human workload and performance is critical for robust human--swarm interaction design. This paper investigates how the magnitude and direction of changes in swarm size influence operator workload. Drawing on the concept of workload history, we test three hypotheses: (1) workload remains elevated following decreases in swarm size, (2) small increases are more manageable than large jumps, and (3) sufficiently large changes override these effects by inducing a cognitive reset. We conducted two studies (N = 34) using a monitoring task with simulated drone swarms of varying sizes. By varying the swarm size between episodes, we measured perceived workload relative to swarm size changes. Results show that objective performance is largely unaffected by small changes in swarm size, while subjective workload is sensitive to both change direction and magnitude. Small increases preserve lower workload, whereas small decreases leave workload elevated, indicating workload residue; large changes in either direction attenuate these effects, suggesting a reset response. These findings offer actionable guidance for managing swarm-size transitions to support operator workload in dynamic human--swarm systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper examines how changes in swarm size affect operator workload and performance in human-swarm interaction. Using two empirical studies with a total of 34 participants in a simulated drone monitoring task, it tests hypotheses regarding workload history effects: elevated workload after decreases in swarm size, better manageability of small increases, and cognitive reset from large changes. Key findings are that objective performance remains largely unaffected by small swarm size variations, while subjective workload is sensitive to both the direction and magnitude of changes, with small decreases causing persistent elevated workload (residue) and large changes leading to a reset effect.

Significance. If validated, these results offer practical guidance for managing dynamic swarm sizes in real-world deployments to optimize operator workload without compromising performance. The work contributes empirical evidence on workload dynamics in HRI, highlighting the importance of considering change history and magnitude in system design. Strengths include the hypothesis-driven approach with two studies testing specific predictions about direction and magnitude effects.

major comments (2)
  1. [Methods] Methods section: The total sample size is reported as N=34 across two studies, but no details on power analysis, effect sizes, or statistical power are provided. This is critical because the central claim relies on detecting directional effects in subjective workload measures while finding no effect on objective performance; without power information, it is unclear if null results on objective measures reflect true absence or insufficient sensitivity.
  2. [Results and Discussion] Results and Discussion: The interpretations of 'workload residue' following small decreases and 'reset response' from large changes depend on measurements across discrete episodes. The manuscript does not provide evidence addressing potential confounds such as task switching effects, adaptation during inter-episode breaks, or differences due to the low-stakes simulated environment, which directly impacts the validity of generalizing to continuous, high-stakes real-world operations.
minor comments (1)
  1. [Abstract] The abstract states 'two studies (N = 34)' but does not clarify the distribution of participants between studies, which would help assess the robustness of the findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Methods] Methods section: The total sample size is reported as N=34 across two studies, but no details on power analysis, effect sizes, or statistical power are provided. This is critical because the central claim relies on detecting directional effects in subjective workload measures while finding no effect on objective performance; without power information, it is unclear if null results on objective measures reflect true absence or insufficient sensitivity.

    Authors: We agree that explicit reporting of power and effect sizes would strengthen the paper. The studies were designed drawing on sample sizes from prior HRI workload research, but a formal a priori power analysis was not included. In the revised manuscript we will add a post-hoc power analysis (using G*Power or equivalent) based on the observed effect sizes for the significant subjective workload effects, report Cohen's d or partial eta-squared for all key comparisons, and explicitly discuss the implications for interpreting the null findings on objective performance measures. This will allow readers to better evaluate the sensitivity of the design. revision: yes

  2. Referee: [Results and Discussion] Results and Discussion: The interpretations of 'workload residue' following small decreases and 'reset response' from large changes depend on measurements across discrete episodes. The manuscript does not provide evidence addressing potential confounds such as task switching effects, adaptation during inter-episode breaks, or differences due to the low-stakes simulated environment, which directly impacts the validity of generalizing to continuous, high-stakes real-world operations.

    Authors: We appreciate the referee's attention to these methodological considerations. The experimental protocol maintained the same monitoring task across episodes specifically to minimize task-switching confounds, and breaks were kept brief and standardized to permit workload ratings without continuous-operation carryover. Nevertheless, we acknowledge that discrete episodes and the low-stakes simulation cannot fully replicate adaptation dynamics or stakes in real deployments. In the revision we will add an expanded limitations paragraph that directly discusses these factors, their potential influence on the residue and reset interpretations, and the consequent boundaries on generalizability. We maintain that the controlled design still yields internally valid evidence on workload-history effects that can inform subsequent real-world studies. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical hypothesis-testing study

full rationale

The paper reports results from two controlled studies (N=34) testing three hypotheses on workload history effects in a simulated drone monitoring task. No mathematical derivations, equations, fitted parameters, or predictions appear in the central claims. Workload residue and reset interpretations are direct inferences from measured subjective ratings across discrete episodes, with no self-citation chains or ansatzes invoked to justify the findings. The analysis is self-contained against external benchmarks and contains no load-bearing reductions to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Claims rest on empirical data from controlled simulations rather than theoretical derivations; relies on standard assumptions of behavioral measurement scales and statistical testing.

axioms (2)
  • domain assumption Subjective workload scales validly capture mental load in monitoring tasks
    The study depends on participants' self-reported workload to distinguish residue and reset effects.
  • standard math Standard statistical assumptions for comparing conditions across episodes
    Used to interpret differences in workload and performance measures.

pith-pipeline@v0.9.0 · 5550 in / 1391 out tokens · 67826 ms · 2026-05-09T21:31:32.626513+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    O., Landowska, A., Hunt, W., Maior, H., Ramchurn, S

    Abioye, A. O., Landowska, A., Hunt, W., Maior, H., Ramchurn, S. D., Naiseh, M., Banks, A., and Soorati, M. D. (2024). Adaptive human-swarm interaction based on workload measurement using functional near-infrared spectroscopy

  2. [2]

    Adams, J., Hamell, J., and Walker, P. (2023). Can a single human supervise a swarm of 100 heterogeneous robots?Field Robotics

  3. [3]

    Chandarana, M., Lewis, M., Sycara, K., and Scherer, S. (2018). Determining effective swarm sizes for multi-job type missions. In2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4848–4853. 8

  4. [4]

    P., Moacdieh, N

    Devlin, S. P., Moacdieh, N. M., Wickens, C. D., and Riggs, S. L. (2020). Transitions between low and high levels of mental workload can improve multitasking performance. IISE transactions on occupational ergonomics and human factors, 8(2):72–87

  5. [5]

    Divband Soorati, M., Clark, J., Ghofrani, J., Tarapore, D., and Ramchurn, S. D. (2021). Designing a user-centered interaction interface for human–swarm teaming. Drones, 5(4):131

  6. [6]

    Duchevet, A., Imbert, J.-P., Garcia, J., Lamirault, B., and Causse, M. (2025). Inves- tigating the independent and combined effects of startle and surprise in a simulated flight task.Human Factors, 67(11):1170–1187. PMID: 40373188

  7. [7]

    Fuenzalida, E. (2007). Effect of workload history on task performance.Human factors, 49:277–91

  8. [8]

    E., Seiffert, A

    Harriott, C. E., Seiffert, A. E., Hayes, S. T., and Adams, J. A. (2014). Biologically- inspired human-swarm interaction metrics. InProceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 58, pages 1471–1475

  9. [9]

    Hart, S. G. and Staveland, L. E. (1988). Development of nasa-tlx (task load index): Results of empirical and theoretical research. In Hancock, P. A. and Meshkati, N., editors,Human Mental Workload, pages 139–183. North-Holland

  10. [10]

    and Pollard, K

    Humann, J. and Pollard, K. A. (2019). Human factors in the scalability of multi- robot operation: A review and simulation. In2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pages 700–707

  11. [11]

    J., Sawyer, B

    Jansen, R. J., Sawyer, B. D., Van Egmond, R., De Ridder, H., and Hancock, P. A. (2016). Hysteresis in mental workload and task performance: the influence of demand transitions and task prioritization.Human factors, 58(8):1143–1157

  12. [12]

    Kaduk, J., Cavdan, M., Drewing, K., Vatakis, A., and Hamann, H. (2023). Effects of human-swarm interaction on subjective time perception: Swarm size and speed. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Inter- action, pages 456–465

  13. [13]

    Kaduk, J., Cavdan, M., Drewing, K., Vatakis, A., and Hamann, H. (2024). From one to many: How active robot swarm sizes influence human cognitive processes. In2024 33rd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

  14. [14]

    Kolling, A., Sycara, K., Nunnally, S., and Lewis, M. (2013). Human swarm interac- tion: An experimental study of two types of interaction with foraging swarms.Journal of Human-Robot Interaction, 2(2):103–129

  15. [15]

    Kolling, A., Walker, P., Chakraborty, N., Sycara, K., and Lewis, M. (2015). Human interaction with robot swarms: A survey.IEEE Transactions on Human-Machine Systems, 46(1):9–26

  16. [16]

    B., Capiola, A., Adams, J

    Lyons, J. B., Capiola, A., Adams, J. A., Mator, J. D., Cherry, E., and Barrera, K. (2025). Examining the human-centred challenges of human–swarm interaction. Philosophical Transactions A, 383(2289):20240140. 9

  17. [17]

    Marois, A., Mouratille, D., Pratviel, Y., Chamberland, C., and Tremblay, S. (2024). Using cardiac and electrodermal activity as cognitive markers for interruptions and distraction in a surveillance simulation. InNeuroergonomics and Cognitive Engineering (AHFE Conference Proceedings)

  18. [18]

    Meyer, J., Pinosky, A., Trzpit, T., Colgate, E., and Murphey, T. D. (2022). A game benchmark for real-time human-swarm control. In2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), pages 743–750

  19. [19]

    and Zawodniok, M

    Morrow, J. and Zawodniok, M. (2024). Evaluation of the human-robot-interaction dynamic under mental fatigue constraints in search and rescue operations. In2024 In- ternational Conference on Information and Communication Technologies for Disaster Management (ICT-DM), pages 1–7. IEEE

  20. [20]

    and Goodrich, M

    Pendleton, B. and Goodrich, M. (2013). Scalable human interaction with robotic swarms. InAIAA Infotech@Aerospace (I@A) Conference

  21. [21]

    D., Huynh, T

    Ramchurn, S. D., Huynh, T. D., Wu, F., Ikuno, Y., Flann, J., Moreau, L., Fischer, J. E., Jiang, W., Rodden, T., Simpson, E., et al. (2016). A disaster response system based on human-agent collectives.Journal of Artificial Intelligence Research, 57:661– 708

  22. [22]

    Reynolds, C. W. (1987). Flocks, herds and schools: A distributed behavioral model. InProceedings of the 14th annual conference on Computer graphics and interactive techniques, pages 25–34

  23. [23]

    Singh, S. (2025). Optimizing human-machine interfaces for neuroergonomics: Cog- nitive workload and performance in suas operations. InHuman-Computer Interaction & Emerging Technologies (AHFE Conference Proceedings)

  24. [24]

    D., Naiseh, M., Hunt, W., Parnell, K., Clark, J., and Ramchurn, S

    Soorati, M. D., Naiseh, M., Hunt, W., Parnell, K., Clark, J., and Ramchurn, S. D. (2024). Enabling trustworthiness in human-swarm systems through a digital twin. In Putting AI in the Critical Loop, pages 93–125. Elsevier

  25. [25]

    B., and Beltrame, G

    St-Onge, D., Kaufmann, M., Panerati, J., Ramtoula, B., Cao, Y., Coffey, E. B., and Beltrame, G. (2019). Planetary exploration with robot teams: Implementing higher autonomy with swarm intelligence.IEEE Robotics & Automation Magazine, 27(2):159– 168

  26. [26]

    and Clark, L

    Watson, D. and Clark, L. A. (1994). The panas-x: Manual for the positive and negative affect schedule-expanded form. 10