pith. sign in

arxiv: 1907.01925 · v2 · pith:QP5XPH5Nnew · submitted 2019-07-03 · 💻 cs.HC

Multitasking with Alexa Multitasking with Alexa: How Using Intelligent Personal Assistants Impacts Language-based Primary Task Performance

Pith reviewed 2026-05-25 09:53 UTC · model grok-4.3

classification 💻 cs.HC
keywords intelligent personal assistantsmultitaskingdual-task paradigmwriting taskscognitive resourcesmultiple resource theoryhuman-computer interaction
0
0 comments X

The pith

Using intelligent personal assistants disrupts content generation in writing more than copying.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how interacting with IPAs affects performance on language-based primary tasks using a dual-task setup. It compares copying content, a less demanding task, to generating new content, which is more demanding and shares resources with speaking to an IPA. The results show greater disruption for the generation task, explained by cognitive resource theories like multiple resource theory and working memory. This matters because IPAs are marketed for multitasking but may hinder certain types of work.

Core claim

In experiments using a dual-task paradigm, IPA interactions significantly disrupted performance on content-generating writing tasks more than on copying tasks, as these share more cognitive resources needed for IPA use.

What carries the argument

Dual-task paradigm with two writing primary tasks: copying versus generating content, during IPA interactions.

If this is right

  • Content generation writing is more impaired by concurrent IPA use than copying.
  • Multiple resource theory and working memory explain why language-based tasks vary in susceptibility to IPA interference.
  • Future studies should examine how interruption length, relevance, and timing affect primary task performance.
  • IPA design may need to account for the cognitive demands of primary tasks to minimize disruption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers of IPAs could prioritize minimizing interruptions during creative tasks.
  • This suggests that voice interfaces might be better suited for low-demand primary tasks.
  • Similar effects might appear in other multitasking scenarios involving speech and writing.

Load-bearing premise

That the observed differences in disruption between tasks are caused by shared cognitive resources rather than variations in interruption timing, length, or user familiarity with the tasks.

What would settle it

An experiment that controls for interruption timing and length and finds no difference in disruption between copying and generating tasks would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.01925 by Benjamin R. Cowan, He Liu, Justin Edwards, Leigh Clark, Philip Doyle, Sandy J. J. Gould, Tianyu Zhou.

Figure 1
Figure 1. Figure 1: Mean Raw TLX score (with standard error) for [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean interruption lag (with standard error) for [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean resumption lag (with standard error) for [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Intelligent personal assistants (IPAs) are supposed to help us multitask. Yet the impact of IPA use on multitasking is not clearly quantified, particularly in situations where primary tasks are also language based. Using a dual task paradigm, our study observes how IPA interactions impact two different types of writing primary tasks; copying and generating content. We found writing tasks that involve content generation, which are more cognitively demanding and share more of the resources needed for IPA use, are significantly more disrupted by IPA interaction than less demanding tasks such as copying content. We discuss how theories of cognitive resources, including multiple resource theory and working memory, explain these results. We also outline the need for future work how interruption length and relevance may impact primary task performance as well as the need to identify effects of interruption timing in user and IPA led interruptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript reports a dual-task experiment on the effects of IPA (Alexa) interactions on two language-based primary tasks: copying text versus generating content. It claims that content-generation tasks are significantly more disrupted by IPA use than copying tasks because they are more cognitively demanding and share more resources with IPA interaction, consistent with multiple resource theory and working memory models. The authors discuss theoretical implications and flag the need for future work on interruption length, relevance, and timing.

Significance. If the differential disruption result holds after proper controls and statistical reporting, the work would offer empirical support for resource-competition accounts of voice-assistant multitasking in language tasks and could inform IPA interface design. The contribution is modest in scope, however, because the study is purely observational and the abstract already identifies the key confounds as open questions.

major comments (2)
  1. [Abstract] Abstract: the claim that content-generation tasks 'are significantly more disrupted' supplies no sample size, statistical test, p-value, effect size, or error bars, so the central empirical result cannot be evaluated from the text provided.
  2. [Abstract] Abstract: the explicit statement that future work is needed on 'interruption length and relevance' and 'interruption timing in user and IPA led interruptions' indicates these factors were not controlled or matched between the copying and generation conditions; this directly undermines the attribution of the performance gap to resource overlap rather than to differences in interruption properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments. We address each major point below and have prepared revisions to strengthen the abstract.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that content-generation tasks 'are significantly more disrupted' supplies no sample size, statistical test, p-value, effect size, or error bars, so the central empirical result cannot be evaluated from the text provided.

    Authors: We agree that the abstract should report these details for transparency. In the revised version we will add the sample size, the statistical test performed, exact p-value, effect size, and reference to variability measures from the results section. revision: yes

  2. Referee: [Abstract] Abstract: the explicit statement that future work is needed on 'interruption length and relevance' and 'interruption timing in user and IPA led interruptions' indicates these factors were not controlled or matched between the copying and generation conditions; this directly undermines the attribution of the performance gap to resource overlap rather than to differences in interruption properties.

    Authors: We disagree. The study used identical IPA interaction scripts in both primary-task conditions, holding interruption length, relevance, and timing constant by design. The observed difference is therefore attributable to primary-task resource demands. The abstract flags future work on variations of these factors (e.g., longer or user-initiated interruptions), not on the matched controls already implemented. We will revise the abstract to state explicitly that interruption properties were matched across conditions. revision: no

Circularity Check

0 steps flagged

Purely empirical study; no derivations or load-bearing self-citations

full rationale

This is an experimental HCI paper reporting observed performance differences in a dual-task paradigm (copying vs. content-generation writing under IPA interruption). The abstract and description contain no equations, fitted parameters, model predictions, uniqueness theorems, or ansatzes. Results are presented as direct empirical observations, with explicit notes on future work for confounds such as interruption timing. No self-citation chains or renamings of known results appear as load-bearing steps. The central claim rests on data collection rather than any reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the dual-task method and the mapping of task demands onto cognitive resource pools; no free parameters, invented entities, or non-standard axioms are introduced in the abstract.

axioms (1)
  • domain assumption Dual-task performance differences can be attributed to overlap in cognitive resources (multiple resource theory and working memory limits).
    Invoked to explain why content generation is more disrupted than copying.

pith-pipeline@v0.9.0 · 5691 in / 1112 out tokens · 33503 ms · 2026-05-25T09:53:23.187343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 3 internal anchors

  1. [1]

    Agnès Alsius, Jordi Navarra, and Salvador Soto-Faraco. 2007. Atten- tion to touch weakens audiovisual speech integration. Experimental Brain Research 183, 3 (01 Nov 2007), 399–404. https://doi.org/10.1007/ s00221-007-1110-1

  2. [2]

    Erik M Altmann and J Gregory Trafton. 2002. Memory for goals: An activation-based model. Cognitive science 26, 1 (2002), 39–83

  3. [3]

    Adaptation and Personalization for Web2. 0

    Liliana Ardissono, Gianni Bosio, Annamaria Goy, and Giovanna Petrone. 2009. Context-aware notification management in an inte- grated collaborative environment. InUMAP 2009 workshop" Adaptation and Personalization for Web2. 0" , Vol. 485. CEUR, 21–30

  4. [4]

    Aylett, Per Ola Kristensson, Steve Whittaker, and Yolanda Vazquez-Alvarez

    Matthew P. Aylett, Per Ola Kristensson, Steve Whittaker, and Yolanda Vazquez-Alvarez. 2014. None of a CHInd: relationship counselling for HCI and speech technology. In Proceedings of the extended abstracts of the 32nd annual ACM conference on Human factors in computing systems - CHI EA ’14. ACM Press, Toronto, Ontario, Canada, 749–760. https://doi.org/10....

  5. [5]

    R Harald Baayen, Douglas J Davidson, and Douglas M Bates. 2008. Mixed-effects modeling with crossed random effects for subjects and items. Journal of memory and language 59, 4 (2008), 390–412

  6. [6]

    R Harald Baayen and Petar Milin. 2010. Analyzing reaction times. International Journal of Psychological Research 3, 2 (2010), 12–28

  7. [7]

    Alan D Baddeley and Graham Hitch. 1974. Working memory. In Psychology of learning and motivation . Vol. 8. Elsevier, 47–89

  8. [8]

    Brian P Bailey, Joseph A Konstan, and John V Carlis. 2001. The Effects of Interruptions on Task Performance, Annoyance, and Anxiety in the User Interface.. In Interact, Vol. 1. 593–601

  9. [9]

    Dale J Barr, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language 68, 3 (2013), 255–278

  10. [10]

    Douglas Bates, Martin Maechler, Ben Bolker, Steven Walker, Rune Haubo Bojesen Christensen, Henrik Singmann, Bin Dai, Fabian Scheipl, and Gabor Grothendieck. [n. d.]. Package ‘lme4’. ([n. d.])

  11. [11]

    Borst, Niels A

    Jelmer P. Borst, Niels A. Taatgen, and Hedderik van Rijn. 2010. The problem state: A cognitive bottleneck in multitasking. Journal of Experimental Psychology: Learning, Memory, and Cognition 36, 2 (2010), 363–382. https://doi.org/10.1037/a0018106

  12. [12]

    Duncan P Brumby, Anna L Cox, Jonathan Back, and Sandy JJ Gould

  13. [13]

    Journal of Experimental Psychology: Applied 19, 2 (2013), 95

    Recovering from an interruption: Investigating speed- accu- racy trade-offs in task resumption behavior. Journal of Experimental Psychology: Applied 19, 2 (2013), 95

  14. [14]

    Leigh Clark, Phillip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, and Benjamin R. Cowan. 2018. The State of Speech in HCI: Trends, Themes and Challenges. Unpublished (2018). https://doi.org/10.13140/ rg.2.2.17331.07202

  15. [15]

    Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Gara- ialde, Justin Edwards, Brendan Spillane, Christine Murad, Cosmin Munteanu, Vincent Wade, et al. 2019. What Makes a Good Conver- sation? Challenges in Designing Truly Conversational Agents. arXiv preprint arXiv:1901.06525 (2019)

  16. [16]

    Andy Cockburn and Amal Siresena. 2003. Evaluating mobile text entry with the Fastap keypad. (2003)

  17. [17]

    Benjamin R Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira

  18. [18]

    In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services

    What can i help you with?: infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, 43

  19. [19]

    Mary Czerwinski, Edward Cutrell, and Eric Horvitz. 2000. Instant messaging and interruption: Influence of task type on performance. In CUI 2019, August 22-23, 2019, Dublin, Ireland J. Edwards et al. OZCHI 2000 conference proceedings , Vol. 356. 361–367

  20. [20]

    Jamie L Desjardins and Karen A Doherty. 2014. The effect of hearing aid noise reduction on listening effort in hearing-impaired adults. Ear and Hearing 35, 6 (2014), 600–610

  21. [21]

    Mateusz Dubiel, Martin Halvey, and Leif Azzopardi. 2018. A Survey Investigating Usage of Virtual Personal Assistants. arXiv preprint arXiv:1807.04606 (2018)

  22. [22]

    Listening Expres. 2017. 39 Nothing to Worry About. http://www. listeningexpress.com/nce-a/book3/39-Nothing-to-Worry-About. html Accessed on 05.07.2018

  23. [23]

    Cyrus K Foroughi, Nicole E Werner, Daniela Barragán, and Deborah A Boehm-Davis. 2015. Interruptions disrupt reading comprehension. Journal of Experimental Psychology: General 144, 3 (2015), 704

  24. [24]

    Sarah Fraser, Jean-Pierre Gagné, Majolaine Alepins, and Pascale Dubois. 2010. Evaluating the effort expended to understand speech in noise using a dual-task paradigm: The effects of providing visual speech cues. Journal of speech, language, and hearing research 53, 1 (2010), 18–33

  25. [25]

    Edith Galy, Magali Cariou, and Claudine Mélan. 2012. What is the relationship between mental workload factors and cognitive load types? International Journal of Psychophysiology 83, 3 (2012), 269 – 275. https://doi.org/10.1016/j.ijpsycho.2011.09.023

  26. [26]

    Sandy J. J. Gould, Duncan P. Brumby, and Anna L. Cox. 2013. What does it mean for an interruption to be relevant? An investigation of relevance as a memory effect. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 57, 1 (Sept. 2013), 149–153. https://doi.org/10.1177/1541931213571034

  27. [27]

    Avashna Govender and Simon King. 2018. Measuring the Cogni- tive Load of Synthetic Speech Using a Dual Task Paradigm. In Inter- speech 2018. ISCA, 2843–2847. https://doi.org/10.21437/Interspeech. 2018-1199

  28. [28]

    Grabowski, Hanna Damasio, and Antonio R

    Thomas J. Grabowski, Hanna Damasio, and Antonio R. Damasio. 1998. Premotor and Prefrontal Correlates of Category-Related Lexical Re- trieval. NeuroImage 7, 3 (April 1998), 232–243. https://doi.org/10.1006/ nimg.1998.0324

  29. [29]

    Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 50. Sage Publications Sage CA: Los Angeles, CA, 904–908

  30. [30]

    Sandra G Hart and Lowell E Staveland. 1988. Development of NASA- TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139–183

  31. [31]

    Chih-Yuan Ho, Mark I Nikolic, Molly J Waters, and Nadine B Sarter

  32. [32]

    Not now! Supporting interruption management by indicating the modality and urgency of pending tasks.Human Factors 46, 3 (2004), 399–409

  33. [33]

    Hujiang. 2017. A lovable eccentric & A lost ship. https://st.hujiang. com/topic/15544016853/ Accessed on 05.07.2018

  34. [34]

    Shamsi T Iqbal and Brian P Bailey. 2006. Leveraging characteristics of task structure to predict the cost of interruption. In Proceedings of the SIGCHI conference on Human Factors in computing systems . ACM, 741–750

  35. [35]

    Shamsi T Iqbal and Eric Horvitz. 2007. Disruption and recovery of computing tasks: field study, analysis, and directions. In Proceedings of the SIGCHI conference on Human factors in computing systems . ACM, 677–686

  36. [36]

    Shamsi T Iqbal, Yun-Cheng Ju, and Eric Horvitz. 2010. Cars, calls, and cognition: investigating driving and divided attention. In Proceedings of the SIGCHI conference on human factors in computing systems . ACM, 1281–1290

  37. [37]

    Christian P Janssen and Duncan P Brumby. 2010. Strategic adaptation to performance objectives in a dual-task setting. Cognitive science 34, 8 (2010), 1548–1560

  38. [38]

    Joyen. 2004. Lesson 37 The Westhaven Express. http://www.joyen. net/article/lesson/nce/nce3/200410/258.html Accessed on 05.07.2018

  39. [39]

    Ewa Luger and Abigail Sellen. 2016. Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 5286–5297

  40. [40]

    Gloria Mark, Stephen Voida, and Armand Cardello. 2012. A pace not dictated by electrons: an empirical study of work without email. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 555–564

  41. [41]

    Deborah McCutchen. 1996. A capacity theory of writing: Working memory in composition. Educational Psychology Review 8, 3 (1996), 299–325

  42. [42]

    Christopher A Monk, J Gregory Trafton, and Deborah A Boehm-Davis

  43. [43]

    Journal of Experimental Psychology: Applied 14, 4 (2008), 299

    The effect of interruption duration and demand on resuming suspended goals. Journal of Experimental Psychology: Applied 14, 4 (2008), 299

  44. [44]

    Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. WaveNet: A Generative Model for Raw Audio. arXiv:1609.03499 [cs] (Sept. 2016). http://arxiv.org/abs/1609. 03499 arXiv: 1609.03499

  45. [45]

    Martin Porcheron, Joel E Fischer, Moira McGregor, Barry Brown, Ewa Luger, Heloisa Candello, and Kenton O’Hara. 2017. Talking with conversational agents in collaborative action. InCompanion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 431–436

  46. [46]

    Martin Porcheron, Joel E Fischer, Stuart Reeves, and Sarah Sharples

  47. [47]

    In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

    Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems . ACM, 640

  48. [48]

    R Core Team. 2018. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  49. [49]

    Raj M Ratwani, Alyssa E Andrews, Jenny D Sousk, and J Gregory Trafton. 2008. The effect of interruption modality on primary task resumption. InProceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 52. Sage Publications Sage CA: Los Angeles, CA, 393–397

  50. [50]

    Dario D Salvucci, Niels A Taatgen, and Jelmer P Borst. 2009. Toward a unified theory of the multitasking continuum: From concurrent perfor- mance to task switching, interruption, and resumption. In Proceedings of the SIGCHI conference on human factors in computing systems . ACM, 1819–1828

  51. [51]

    Henrik Singmann and David Kellen. [n. d.]. An Introduction to Mixed Models for Experimental Psychology

  52. [52]

    J.Gregory Trafton, Erik M Altmann, Derek P Brock, and Farilee E Mintz

  53. [53]

    International Journal of Human-Computer Studies 58, 5 (May 2003), 583–603

    Preparing to resume an interrupted task: effects of prospective goal encoding and retrospective rehearsal. International Journal of Human-Computer Studies 58, 5 (May 2003), 583–603. https://doi.org/ 10.1016/S1071-5819(03)00023-5

  54. [54]

    Heather L Tubbs-Cooley, Jeannie P Cimiotti, Jeffrey H Silber, Dou- glas M Sloane, and Linda H Aiken. 2013. An observational study of nurse staffing ratios and hospital readmission among children admitted for common conditions. BMJ Qual Saf 22, 9 (2013), 735–742

  55. [55]

    Christopher D. Wickens. 2002. Multiple resources and performance prediction. Theoretical Issues in Ergonomics Science 3, 2 (Jan. 2002), 159–177. https://doi.org/10.1080/14639220210123806

  56. [56]

    Eric N Wiebe, Edward Roberts, and Tara S Behrend. 2010. An exam- ination of two mental workload measurement approaches to under- standing multimedia learning. Computers in Human Behavior 26, 3 (2010), 474–481