pith. machine review for the scientific record. sign in

arxiv: 2603.02050 · v4 · submitted 2026-03-02 · 💻 cs.HC · cs.AI

Recognition: no theorem link

"When to Hand Off, When to Work Together": Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction

Authors on Pith no claims yet

Pith reviewed 2026-05-15 18:05 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords concurrent interactionhuman-agent collaborationco-creative collaborationdelegationcontext awarenessaction patternsdesign probedecision model
0
0 comments X

The pith

Concurrent interaction lets agents treat user actions as feedback or independent work, improving delegation in visible shared workspaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that as agents execute visibly in shared spaces, collaboration shifts from sequential hand-offs to concurrent co-creation, requiring agents to interpret simultaneous user actions correctly. Two studies with ten participants each show that visibility prompts intervention, but agents need context awareness to distinguish feedback from parallel work. The authors introduce the CLEO probe to provide this awareness, revealing five action patterns, six triggers, four enabling factors, and concurrent interaction in 31.8 percent of turns. They deliver a decision model, design implications, and an annotated dataset. Readers would care because this reframes how delegation succeeds when humans and agents can act at the same time.

Core claim

Through user studies, the authors demonstrate that concurrent interaction arises naturally when agent execution is visible, occurring in 31.8 percent of turns, and that agents require collaborative context awareness to respond to it as either feedback or independent work; the CLEO probe supplies this capability and supports a taxonomy of five action patterns along with triggers and factors that explain mode shifts, yielding a decision model and dataset that position concurrent interaction as the element that makes delegation more effective.

What carries the argument

CLEO, the design probe that supplies collaborative context awareness by classifying concurrent user actions as feedback versus independent parallel work and adapting agent execution accordingly.

If this is right

  • A taxonomy of five action patterns and ten codes describes how users and agents interact concurrently.
  • Six triggers and four enabling factors explain when users shift between delegation and joint-work modes.
  • A decision model guides agents on when to hand off versus collaborate in real time.
  • Design implications specify agent capabilities needed for visible, concurrent workflows.
  • An annotated dataset supports further research on these interaction patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same concurrent-interpretation approach could apply to non-creative domains such as code review or data exploration where real-time corrections matter.
  • Agents might learn the identified triggers over repeated sessions to anticipate user interventions without explicit signals.
  • Multi-user scenarios could extend the model to coordinate among several people and agents acting simultaneously.
  • Systems incorporating these patterns may reduce reliance on explicit commands and lower coordination overhead in creative tools.

Load-bearing premise

Patterns and triggers observed with ten participants in the chosen tasks will generalize to other users and tasks, and correctly interpreting concurrent actions produces better collaboration outcomes.

What would settle it

A larger study across diverse tasks finding no improvement in task quality, completion time, or user satisfaction when agents use concurrent-action interpretation versus treating all input as sequential commands would falsify the claim.

Figures

Figures reproduced from arXiv: 2603.02050 by DaEun Choi, Hyewon Lee, HyunJoon Jung, John Joon Young Chung, Juho Kim, Kihoon Son, Tae Soo Kim, Yoonjoo Lee, Yoonsu Kim.

Figure 1
Figure 1. Figure 1: An example of collaborative friction in human [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Interaction flow of the first probe system. (a) The user sends a voice message (b) while selecting canvas elements for [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Flexible co-creation scenario with Cleo. (a) The user invokes Cleo by calling its name. (b) While Cleo is acting on the canvas, the user can (c) work concurrently on the same task, or (d) intervene directly with additional instructions at any time [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Agent structure of Cleo. Three modules have been updated from the first probe pipeline (Appendix A.4): User Change Detection Module, Attribution Change Module, and Plan Update Module [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: How Cleo achieves context awareness. (a) The user invokes the agent. (b) It recalls past observations and de￾tects user updates against the current state. (c) It monitors concurrent user actions while executing its plan, prioritizes interventions, and updates its plan as needed. completed execution for each request (Appendix B.1-a). Using these two time points to segment the entire task workflow, the autho… view at source ↗
Figure 6
Figure 6. Figure 6: Decision model of human-agent co-creative collaboration [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Agent structure of the first probe system. Upon receiving user input, the Plan Module generates an action plan. The [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of interaction logs. (a) The overall task process is segmented into periods when the agent is activated and [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual representation of user action patterns. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
read the original abstract

As agents move into shared workspaces and their execution becomes visible, human-agent collaboration faces a fundamental shift from sequential delegation to concurrent co-creation. This raises a new coordination problem: what interaction patterns emerge, and what agent capabilities are required to support them? Study 1 (N=10) revealed that process visibility naturally prompted concurrent intervention, but exposed a critical capability gap: agents lacked the collaborative context awareness needed to distinguish user feedback from independent parallel work. This motivated CLEO, a design probe that embodies this capability, interpreting concurrent user actions as feedback or independent work and adapting execution accordingly. Study 2 (N=10) analyzed 214 turn-level interactions, identifying a taxonomy of five action patterns and ten codes, along with six triggers and four enabling factors explaining when and why users shift between collaboration modes. Concurrent interaction appeared in 31.8% of turns. We present a decision model, design implications, and an annotated dataset, positioning concurrent interaction as what makes delegation work better.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that visible agent execution in shared workspaces shifts human-agent collaboration from sequential delegation to concurrent co-creation. Study 1 (N=10) shows process visibility prompts concurrent intervention but reveals agents lack context awareness to distinguish feedback from independent work. This motivates CLEO, a probe that interprets concurrent actions accordingly. Study 2 (N=10) codes 214 turns into a taxonomy of five action patterns and ten codes, plus six triggers and four enabling factors; concurrent interaction occurs in 31.8% of turns. The work presents a decision model, design implications, and an annotated dataset, positioning concurrent interaction as improving delegation.

Significance. If the taxonomy and triggers generalize, the work offers a useful descriptive framework for coordination in co-creative human-agent systems and supplies an annotated dataset that supports future replication. The emphasis on concurrent modes and the CLEO probe concept add to HCI literature on mixed-initiative collaboration. However, the absence of performance metrics or controlled comparisons means the claim that concurrent interaction 'makes delegation work better' remains interpretive rather than demonstrated.

major comments (2)
  1. [Study 2] Study 2 (214 turns from N=10): The positioning that concurrent interaction enables better delegation lacks any quantitative outcome measures (task time, quality, error rates) or statistical comparison to a sequential baseline condition; the 31.8% rate and taxonomy are descriptive only and do not establish benefit.
  2. [Abstract and Discussion] Abstract and Discussion: Generalizability claims rest on N=10 per study with no discussion of task specificity or participant pool limitations; the decision model and design implications therefore require explicit caveats and proposed validation experiments before they can be treated as actionable.
minor comments (2)
  1. [Study 2] Clarify how the ten codes were derived and validated (e.g., inter-rater reliability statistics) to strengthen the taxonomy presentation.
  2. [Abstract] The abstract states 'positioning concurrent interaction as what makes delegation work better' without supporting data; revise phrasing to reflect the descriptive nature of the findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Study 2] Study 2 (214 turns from N=10): The positioning that concurrent interaction enables better delegation lacks any quantitative outcome measures (task time, quality, error rates) or statistical comparison to a sequential baseline condition; the 31.8% rate and taxonomy are descriptive only and do not establish benefit.

    Authors: We agree that the manuscript currently positions concurrent interaction as improving delegation in an interpretive manner without supporting quantitative metrics or baseline comparisons. The study is exploratory and provides a descriptive taxonomy, patterns, and triggers from observed interactions. We will revise the abstract, discussion, and conclusion to remove or qualify the claim that concurrent interaction 'makes delegation work better,' instead framing the contribution as an initial descriptive framework and dataset that motivates future controlled experiments measuring task time, quality, and error rates. revision: yes

  2. Referee: [Abstract and Discussion] Abstract and Discussion: Generalizability claims rest on N=10 per study with no discussion of task specificity or participant pool limitations; the decision model and design implications therefore require explicit caveats and proposed validation experiments before they can be treated as actionable.

    Authors: We accept this point. The small sample sizes and the specific co-creative task context limit generalizability, and the manuscript does not currently discuss these constraints in sufficient detail. We will add an explicit limitations subsection covering sample size, participant characteristics, and task specificity. We will also outline proposed validation experiments, including larger-scale studies with diverse participants and controlled comparisons of concurrent versus sequential conditions on outcome measures. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical study grounded in observed interaction data

full rationale

The paper reports two small-N user studies (N=10 each) that collect and qualitatively code 214 turns of human-agent interaction to derive a taxonomy of action patterns, triggers, and enabling factors. No mathematical derivations, equations, fitted parameters, or predictive models are present. Claims about concurrent interaction (31.8% of turns) and design implications follow directly from interpretive coding of the collected data rather than any self-referential reduction or self-citation chain. The derivation chain is therefore self-contained against external benchmarks with no load-bearing steps that collapse to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on empirical observations from two small user studies and the introduction of a design probe without external benchmarks, formal models, or independent validation data.

axioms (2)
  • domain assumption Process visibility naturally prompts concurrent user intervention.
    Stated as a key observation from Study 1.
  • domain assumption Agents require collaborative context awareness to distinguish feedback from independent parallel work.
    Core motivation for developing the CLEO probe.
invented entities (1)
  • CLEO no independent evidence
    purpose: Design probe that interprets concurrent user actions as feedback or independent work and adapts agent execution accordingly.
    Introduced to address the capability gap identified in Study 1.

pith-pipeline@v0.9.0 · 5510 in / 1401 out tokens · 67337 ms · 2026-05-15T18:05:57.993197+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 3 internal anchors

  1. [1]

    Manus: Hands On AI. [n. d.]. https://manus.im/

  2. [2]

    Shm Garanganao Almeda, John Joon Young Chung, Bjoern Hartmann, Sophia Liu, Brett Halperin, Yuwen Lu, and Max Kreminski. 2025. Artographer: a Curatorial Interface for Art Space Exploration. arXiv:2512.02288 [cs.HC] https://arxiv.org/ abs/2512.02288

  3. [3]

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. InProceedings of the 2019 chi conference on human factors in computing systems. 1–13

  4. [4]

    Barrett R Anderson, Jash Hemant Shah, and Max Kreminski. 2024. Homog- enization Effects of Large Language Models on Human Creative Ideation. In Proceedings of the 16th Conference on Creativity & Cognition(Chicago, IL, USA) (C&C ’24). Association for Computing Machinery, New York, NY, USA, 413–425. doi:10.1145/3635636.3656204

  5. [5]

    Anthropic. 2024. Introduction - Getting Started. https://modelcontextprotocol. io/docs/getting-started/intro. Accessed: 2026-03-30

  6. [6]

    Anthropic. 2026. Claude Haiku. https://www.anthropic.com/claude/haiku. Ac- cessed: 2026-03-30

  7. [7]

    Anthropic. 2026. Claude Sonnet. https://www.anthropic.com/claude/sonnet. Accessed: 2026-03-30

  8. [8]

    Timothy W Bickmore and Rosalind W Picard. 2005. Establishing and maintain- ing long-term human-computer relationships.ACM Transactions on Computer- Human Interaction (TOCHI)12, 2 (2005), 293–327

  9. [9]

    Claude Code by Anthropic. [n. d.]. https://claude.com/product/claude-code

  10. [10]

    Cursor Code IDE by Anysphere. [n. d.]. https://cursor.com/

  11. [11]

    James Calderhead. 1981. Stimulated recall: A method for research on teaching. British journal of educational psychology51, 2 (1981), 211–217

  12. [12]

    Nazli Cila. 2022. Designing Human-Agent Collaborations: Commitment, re- sponsiveness, and support. InProceedings of the 2022 CHI Conference on Hu- man Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22). Asso- ciation for Computing Machinery, New York, NY, USA, Article 420, 18 pages. doi:10.1145/3491102.3517500

  13. [13]

    Juliet Corbin and Anselm Strauss. 1990. Grounded Theory Research: Procedures, Canons and Evaluative Criteria.Zeitschrift für Soziologie19, 6 (Dec. 1990), 418–427. doi:10.1515/zfsoz-1990-0602

  14. [14]

    Nicholas P Dempsey. 2010. Stimulated recall interviews in ethnography.Qualita- tive sociology33, 3 (2010), 349–367

  15. [15]

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems36 (2023), 28091–28114

  16. [16]

    Shiying Ding, Xinyi Chen, Yan Fang, Wenrui Liu, Yiwu Qiu, and Chunlei Chai

  17. [17]

    In2023 16th International Symposium on Computational Intelligence and Design (ISCID)

    DesignGPT: Multi-Agent Collaboration in Design. In2023 16th International Symposium on Computational Intelligence and Design (ISCID). 204–208. doi:10. 1109/ISCID59865.2023.00056

  18. [18]

    Paul Dourish and Victoria Bellotti. 1992. Awareness and coordination in shared workspaces. InProceedings of the 1992 ACM conference on Computer-supported cooperative work. 107–114

  19. [19]

    KJ Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang. 2024. Cocoa: Co-Planning and Co-Execution with AI Agents.arXiv preprint arXiv:2412.10999(2024)

  20. [20]

    Susan R Fussell, Robert E Kraut, and Jane Siegel. 2000. Coordination of commu- nication: Effects of shared visual context on collaborative work. InProceedings of the 2000 ACM conference on Computer supported cooperative work. 21–30

  21. [21]

    2017.Discovery of grounded theory: Strategies for qualitative research

    Barney Glaser and Anselm Strauss. 2017.Discovery of grounded theory: Strategies for qualitative research. Routledge

  22. [22]

    Google. 2026. Gemini. https://gemini.google.com/. Accessed: 2026-03-30

  23. [23]

    Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. 2025. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864(2025)

  24. [24]

    Carl Gutwin and Saul Greenberg. 2002. A descriptive framework of workspace awareness for real-time groupware.Computer Supported Cooperative Work (CSCW)11, 3 (2002), 411–446

  25. [25]

    Carl Gutwin, Saul Greenberg, and Mark Roseman. 1996. Workspace awareness in real-time distributed groupware: Framework, widgets, and evaluation. InPeople and Computers XI: Proceedings of HCI’96. Springer, 281–298

  26. [26]

    Pan Hao, Dongyeop Kang, Nicholas Hinds, and Qianwen Wang. 2025. FLOW- FORGE: Guiding the Creation of Multi-agent Workflows with Design Space Visualization as a Thinking Scaffold.IEEE Transactions on Visualization and Computer Graphics(2025), 1–11. doi:10.1109/TVCG.2025.3634627

  27. [27]

    William C Hill, James D Hollan, Dave Wroblewski, and Tim McCandless. 1992. Edit wear and read wear. InProceedings of the SIGCHI conference on Human factors in computing systems. 3–9

  28. [28]

    Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Robert Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Yongxin Ni, Zhibin Gou, Zongze Xu, Yuyu Luo, and Chenglin Wu. 2025. Da...

  29. [29]

    Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166

  30. [30]

    1995.Cognition in the Wild

    Edwin Hutchins. 1995.Cognition in the Wild. MIT press

  31. [31]

    Bederson, Al- lison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, and Björn Eiderbäck

    Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B. Bederson, Al- lison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, and Björn Eiderbäck. 2003. Technol- ogy probes: inspiring design for and with families. InProceedings of the SIGCHI Conference on Human Factors in Computing System...

  32. [32]

    Hiroshi Ishii, Minoru Kobayashi, and Jonathan Grudin. 1993. Integration of interpersonal space and shared workspace: ClearBoard design and experiments. ACM Trans. Inf. Syst.11, 4 (Oct. 1993), 349–375. doi:10.1145/159764.159762

  33. [33]

    Daeheon Jeong, Seoyeon Byun, Kihoon Son, Dae Hyun Kim, and Juho Kim

  34. [34]

    Interface Design

    CANVAS: A Benchmark for Vision-Language Models on Tool-Based User Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Son et al. Interface Design. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 22182–22190

  35. [35]

    Peiling Jiang and Haijun Xia. 2025. Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages.arXiv preprint arXiv:2505.22831(2025)

  36. [36]

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770(2023)

  37. [37]

    Pegah Karimi, Jeba Rezwana, Safat Siddiqui, Mary Lou Maher, and Nasrin Dehbo- zorgi. 2020. Creative sketching partner: an analysis of human-AI co-creativity. InProceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy)(IUI ’20). Association for Computing Machinery, New York, NY, USA, 221–230. doi:10.1145/3377325.3377522

  38. [38]

    Shahedul Huq Khandkar. 2009. Open coding.University of Calgary23 (2009), 2009. http://pages.cpsc.ucalgary.ca/~saul/wiki/uploads/CPSC681/opencoding.pdf

  39. [39]

    Anjali Khurana, Xiaotian Su, April Yi Wang, and Parmit K Chilana. 2025. Do It For Me vs. Do It With Me: Investigating User Perceptions of Different Paradigms of Automation in Copilots for Feature-Rich Software. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, ...

  40. [40]

    Tae Soo Kim, Yoonjoo Lee, Jaesang Yu, John Joon Young Chung, and Juho Kim. 2026. DiscoverLLM: From Executing Intents to Discovering Them. arXiv:2602.03429 [cs.AI] https://arxiv.org/abs/2602.03429

  41. [41]

    Femke Kirschner, Fred Paas, and Paul A Kirschner. 2009. A cognitive load approach to collaborative learning: United brains for complex tasks.Educational psychology review21, 1 (2009), 31–42

  42. [42]

    Michelle S Lam, Omar Shaikh, Hallie Xu, Alice Guo, Diyi Yang, Jeffrey Heer, James A Landay, and Michael S Bernstein. 2025. Just-In-Time Objectives: A General Approach for Specialized AI Interactions.arXiv preprint arXiv:2510.14591 (2025)

  43. [43]

    Tomas Lawton, Francisco J Ibarrola, Dan Ventura, and Kazjon Grace. 2023. Draw- ing with Reframer: Emergence and Control in Co-Creative AI. InProceed- ings of the 28th International Conference on Intelligent User Interfaces(Sydney, NSW, Australia)(IUI ’23). Association for Computing Machinery, New York, NY, USA, 264–277. doi:10.1145/3581641.3584095

  44. [44]

    Lovable. [n. d.]. https://lovable.dev/

  45. [45]

    Lovart. [n. d.]. https://www.lovart.ai/

  46. [46]

    Mary Lou Maher and Josiah Poon. 1996. Modelling design exploration as co- evolution.Microcomputers in Civil Engineering11, 3 (1996), 193–207

  47. [47]

    Figma Make. [n. d.]. https://www.figma.com/make/

  48. [48]

    Tek-Jin Nam and Kyung Sakong. 2009. Collaborative 3D workspace and interac- tion techniques for synchronous distributed product design reviews.International Journal of Design3, 1 (2009)

  49. [49]

    OpenAI. 2026. ChatGPT. https://chatgpt.com/. Accessed: 2026-03-30

  50. [50]

    Stefania Pellegrinelli, Henny Admoni, Shervin Javdani, and Siddhartha Srinivasa

  51. [51]

    In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Human-robot shared workspace collaboration via hindsight optimization. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 831–838

  52. [52]

    Thanawit Prasongpongchai, Pat Pataranutaporn, Monchai Lertsutthiwong, and Pattie Maes. 2025. Talk to the Hand: an LLM-powered Chatbot with Visual Pointer as Proactive Companion for On-Screen Tasks. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 637, ...

  53. [53]

    Kevin Pu, Daniel Lazaro, Ian Arawjo, Haijun Xia, Ziang Xiao, Tovi Grossman, and Yan Chen. 2025. Assistance or Disruption? Exploring and Evaluating the Design and Trade-offs of Proactive AI Programming Support. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, A...

  54. [54]

    Leon Reicherts, Yvonne Rogers, Licia Capra, Ethan Wood, Tu Dinh Duong, and Neil Sebire. 2022. It’s good to talk: A comparison of using voice versus screen- based interactions for agent-assisted tasks.ACM Transactions on Computer- Human Interaction29, 3 (2022), 1–41

  55. [55]

    Toni Robertson. 2002. The public availability of actions and artefacts.Computer Supported Cooperative Work (CSCW)11, 3 (2002), 299–316

  56. [56]

    Harvey Sacks, Emanuel A Schegloff, and Gail Jefferson. 1974. A simplest system- atics for the organization of turn-taking for conversation.language50, 4 (1974), 696–735

  57. [57]

    Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. 2025. Vibe coding vs. agentic coding: Fundamentals and practical implications of agentic ai. arXiv preprint arXiv:2505.19443(2025)

  58. [58]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems36 (2023), 68539–68551

  59. [59]

    Sarah Schömbs, Yan Zhang, Jorge Goncalves, and Wafa Johal. 2026. From Con- versation to Orchestration: HCI Challenges and Opportunities in Interactive Multi-Agentic Systems. InProceedings of the 13th International Conference on Human-Agent Interaction (HAI ’25). Association for Computing Machinery, New York, NY, USA, 158–168. doi:10.1145/3765766.3765795

  60. [60]

    Stitch. [n. d.]. https://stitch.withgoogle.com/

  61. [61]

    Pollock, Ian Arawjo, Rubaiat Habib Kazi, Hariharan Subramonyam, Jingyi Li, Nazmus Saquib, and Arvind Satyanarayan

    Sangho Suh, Hai Dang, Ryan Yen, Josh M. Pollock, Ian Arawjo, Rubaiat Habib Kazi, Hariharan Subramonyam, Jingyi Li, Nazmus Saquib, and Arvind Satyanarayan

  62. [62]

    InAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST Adjunct ’24)

    Dynamic Abstractions: Building the Next Generation of Cognitive Tools and Interfaces. InAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST Adjunct ’24). Association for Computing Machinery, New York, NY, USA, Article 91, 3 pages. doi:10.1145/3672539.3686706

  63. [63]

    John C Tang. 1991. Findings from observational studies of collaborative work. International Journal of Man-machine studies34, 2 (1991), 143–160

  64. [64]

    Wen-Fan Wang, Chien-Ting Lu, Jin Ping Ng, Yi-Ting Chiu, Ting-Ying Lee, Miaosen Wang, Bing-Yu Chen, and Xiang ’Anthony’ Chen. 2025. AnimAgents: Coordinating Multi-Stage Animation Pre-Production with Human-Multi-Agent Collaboration. arXiv:2511.17906 [cs.HC] https://arxiv.org/abs/2511.17906

  65. [65]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR)

  66. [66]

    Weitao You, Yinyu Lu, Zirui Ma, Nan Li, Mingxu Zhou, Xue Zhao, Pei Chen, and Lingyun Sun. 2025. DesignManager: An Agent-Powered Copilot for Designers to Integrate AI Design Tools into Creative Workflows.ACM Trans. Graph.44, 4, Article 35 (July 2025), 26 pages. doi:10.1145/3730919

  67. [67]

    When to Hand Off, When to Work Together

    Jiayi Zhou, Renzhong Li, Junxiu Tang, Tan Tang, Haotian Li, Weiwei Cui, and Yingcai Wu. 2024. Understanding Nonlinear Collaboration between Human and AI Agents: A Co-design Framework for Creative Design. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New Yor...

  68. [68]

    •Each node belongs to exactly one parent frame — never create nodes directly at the root unless explicitly requested

    Core Principles •Every element in Figma is a node within a hierarchical tree. •Each node belongs to exactly one parent frame — never create nodes directly at the root unless explicitly requested. •The parent-child hierarchy defines both layer structure and visual stacking order (z-order). •Always think in terms of containers (Frames/Components) rather tha...

  69. [69]

    •Use the parentId field to specify its container

    Layout Hierarchy Rules (MOST IMPORTANT) •Every new node must be created inside a parent frame. •Use the parentId field to specify its container. •Frames represent meaningful sections (e.g., Header, Card, InputGroup). •Create parent frames first, then add child nodes inside them. •Group related elements logically under their frame — never scatter related n...

  70. [70]

    •To reorder inside an Auto Layout container, use move_node_into_frame with the proper index

    Auto Layout & Positioning •Never use move_node inside Auto Layout frames — Auto Layout overrides absolute positioning. •To reorder inside an Auto Layout container, use move_node_into_frame with the proper index. •Adjust spacing and alignment through Auto Layout properties, not manual movement. •Disable Auto Layout temporarily only when precise manual plac...

  71. [71]

    •Examples: –Login Page→Main Frame –Header / Footer / Content→Sub-frames inside Within each container: •Create sub-frames for logical content groups (e.g., Logo, Inputs, Buttons)

    Container Management •Each screen or section must have a main container frame. •Examples: –Login Page→Main Frame –Header / Footer / Content→Sub-frames inside Within each container: •Create sub-frames for logical content groups (e.g., Logo, Inputs, Buttons). •Add all visual and textual elements inside those frames. •Ensure all visible content remains withi...

  72. [72]

    –Avoid detaching or re-parenting frames unless necessary

    Structural Safety Rules •Maintain hierarchy integrity: –Don’t move, rename, or delete unrelated layers. –Avoid detaching or re-parenting frames unless necessary. –Assign unique, descriptive names to every node. –Use naming patterns (e.g., btn_primary, txt_title, frame_input). –Do not use names that include "agent". Conference acronym ’XX, June 03–05, 2018...

  73. [73]

    Component Creation Patterns •Button: Frame (background, radius, auto layout) + inner text •Card: Frame (background) + content inside •Logo: Frame (background shape) + text/vector inside •Navigation bar: Frame (background) + items inside (The frame itself should serve as the visual component, not an empty wrapper.) —

  74. [74]

    –Keep spacing, alignment, and reading order consistent

    Visibility & Validation •Always verify all children are visible: –If clipped or hidden→use resize_node. –Keep spacing, alignment, and reading order consistent. –Avoid overlaps or detached elements outside containers. —

  75. [75]

    •Always assign a valid parentId when creating nodes

    Best Practices Summary •Think structurally: Frames first, elements second. •Always assign a valid parentId when creating nodes. •Use index to define order, not coordinates. •Avoid root-level nodes unless explicitly instructed. •Keep hierarchy, spacing, and naming consistent. •Use resize_node whenever content is hidden by clipping. •Do not use names that i...

  76. [76]

    •set_text_properties — Modify visual text properties such as font size, line height, letter spacing, and text alignment

    Text Tools (4 tools) •set_text_content — Modify the text content of one or multiple text nodes in Figma. •set_text_properties — Modify visual text properties such as font size, line height, letter spacing, and text alignment. •set_text_decoration — Apply text styling decorations such as underlines, strikethrough effects, and text case transformations. •se...

  77. [77]

    •move_node_into_frame — Move a node into a target frame at an optional index position

    Operation Tools (11 tools) •move_node — Move a node to a new position on the canvas, optionally changing its parent container. •move_node_into_frame — Move a node into a target frame at an optional index position. •clone_node — Create a duplicate copy of an existing node, optionally placing it in a different parent or position. •resize_node — Change the w...

  78. [78]

    •set_corner_radius — Set corner radius values to create rounded corners on nodes

    Style Tools (9 tools) •set_fill_color — Set the solid fill (background) color of a node using RGBA values. •set_corner_radius — Set corner radius values to create rounded corners on nodes. •get_styles — Retrieve all available text, color, and effect styles from the Figma document. •set_opacity — Adjust the overall opacity (transparency) of a node. •set_st...

  79. [79]

    •set_axis_align — Configure primary and counter axis alignment for child elements in auto-layout frames

    Layout Tools (5 tools) •set_padding — Configure internal padding values for auto-layout frames to control content spacing. •set_axis_align — Configure primary and counter axis alignment for child elements in auto-layout frames. •set_layout_sizing — Control horizontal and vertical resizing behavior (fixed, hug, fill) of auto-layout frames. •set_item_spacin...

  80. [80]

    When to Hand Off, When to Work Together

    Create Tools (9 tools) •create_rectangle — Create a new rectangular shape node with common styling properties. •create_frame — Create a new frame container with auto-layout capabilities and layout properties. •create_frame_from_node — Create a new frame that wraps an existing node. •create_text — Create a new text node with customizable content and typogr...

Showing first 80 references.