pith. sign in

arxiv: 2604.15344 · v1 · submitted 2026-03-15 · 💻 cs.HC · cs.AI· cs.IR· cs.LG

To LLM, or Not to LLM: How Designers and Developers Navigate LLMs as Tools or Teammates

Pith reviewed 2026-05-15 11:54 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.IRcs.LG
keywords large language modelshuman-AI collaborationdesign workflowsaccountabilityrole framingtool vs teammategrounded theoryorganizational decision making
0
0 comments X

The pith

Designers and developers position LLMs as either controllable tools or collaborative teammates, which determines how they assign authority, accountability, and oversight in workflows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Interviews with 33 designers and developers across three large tech organizations show that people do not evaluate LLMs only by technical ability. Instead they reason about the role an LLM would occupy relative to human work. When framed as a tool under clear human control, use fits existing governance and feels acceptable. When framed as a teammate with ambiguous agency, participants hesitate unless responsibility can be explicitly justified. The paper supplies an analytic rubric that links these framings to concrete effects on decision authority, accountability ownership, oversight strategies, and organisational acceptability. The work therefore treats the question of LLM use as a sociotechnical positioning choice made during design rather than a post-deployment technical judgment.

Core claim

Participants consistently framed LLMs either as tools that remain under direct human control or as teammates that share or blur agency with humans. Tool framings preserved clear lines of decision authority and accountability ownership, allowing integration into existing organisational structures. Teammate framings raised concerns about ambiguous responsibility for outcomes, though some participants described productive teammate arrangements when explicit oversight mechanisms were kept in place. The resulting analytic rubric maps role framing directly onto shifts in authority, accountability, oversight strategies, and organisational acceptability.

What carries the argument

The tool and teammate framings together with the analytic rubric that shows how each framing alters decision authority, accountability ownership, oversight strategies, and organisational acceptability.

If this is right

  • Tool-framed LLMs can be adopted inside existing governance structures without new accountability rules.
  • Teammate-framed LLMs require explicit oversight structures to remain organisationally acceptable.
  • Ambiguous agency in teammate framings blocks clear justification of responsibility for outcomes.
  • Productive teammate use emerges only when collaborative reasoning stays embedded in human oversight.
  • The choice between framings is made at design time and shapes downstream workflow integration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interface designs that make the intended role framing explicit could reduce hesitation around LLM use.
  • The same framing logic may appear in other knowledge-work domains such as legal or medical drafting.
  • Organisations could codify role-framing guidelines to speed consistent decision-making about new models.
  • Training materials that teach designers to articulate their chosen framing might improve accountability documentation.

Load-bearing premise

The framings observed in interviews with 33 participants from three large technology organisations represent stable, general patterns that apply beyond these specific contexts and without major influence from unexamined organisational cultures or recruitment biases.

What would settle it

A replication study in a different industry or smaller organisation that finds participants rely on entirely different role categories or show no consistent distinction between tool and teammate framings when deciding on LLM use.

Figures

Figures reproduced from arXiv: 2604.15344 by Ivan Flechais, Marina Jirotka, Nigel Shadbolt, Varad Vishwarupe.

Figure 1
Figure 1. Figure 1: Role positioning of LLMs as tools or teammates in design [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sociotechnical tension matrix mapping LLM agency against [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: The paper is intended to stand independently of both appen [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

Large language models (LLMs) are increasingly integrated into design and development workflows, yet decisions about their use are rarely binary or purely technical. We report findings from a constructivist grounded theory study based on interviews with 33 designers and developers across three large technology organisations. Rather than evaluating LLMs solely by capability, participants reasoned about the role an LLM could occupy within a workflow and how that role would interact with existing structures of responsibility and organisational accountability. When LLMs were framed as tools under clear human control, their use was typically acceptable and could be integrated within existing governance structures. When framed as teammates with shared or ambiguous agency, practitioners expressed hesitation, particularly when responsibility for outcomes could not be clearly justified. At the same time, participants also described productive teammate configurations in which LLMs supported collaborative reasoning while remaining embedded within explicit oversight structures. We identify tool and teammate framings as recurring ways in which designers and developers position LLMs relative to human work and present an analytic rubric describing how role framing shapes decision authority, accountability ownership, oversight strategies, and organisational acceptability. By foregrounding design-time reasoning, this work reframes To LLM or Not to LLM as a sociotechnical positioning problem that emerges during system design rather than during post-deployment evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports findings from a constructivist grounded theory study of 33 interviews with designers and developers across three large technology organizations. It identifies recurring 'tool' and 'teammate' framings of LLMs relative to human work and presents an analytic rubric showing how these framings shape decision authority, accountability ownership, oversight strategies, and organizational acceptability. The central claim reframes LLM adoption decisions as sociotechnical positioning problems arising during system design.

Significance. If the patterns and rubric hold, the work offers a useful analytic lens for HCI research on AI integration in professional workflows, foregrounding organizational accountability structures over purely technical capability assessments and potentially informing design guidelines for LLM use.

major comments (2)
  1. [Methods] Methods section: The abstract outlines a constructivist grounded theory approach with 33 interviews but provides no details on coding procedures, saturation criteria, or resolution of contradictions in participant accounts. These omissions are load-bearing for the central claim that tool and teammate framings are recurring and that the rubric reliably captures their effects on authority, accountability, and acceptability.
  2. [Findings / Discussion] Findings and discussion: The rubric is presented as transferable without explicit scope limitations or cross-context validation; the sample is restricted to three large technology organizations, where accountability structures and risk tolerance may systematically differ from startups, smaller firms, or non-tech domains, weakening the claim that the framings represent stable patterns of reasoning.
minor comments (2)
  1. [Abstract] Abstract: The final sentence is long and compound; splitting it would improve readability while preserving the reframing claim.
  2. [Findings] Terminology: 'Teammate' framing is used both for ambiguous agency (hesitation) and for productive collaborative configurations; a brief clarification of subtypes early in the findings would reduce potential reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major point below and have prepared revisions to improve methodological transparency and clarify the scope of our claims.

read point-by-point responses
  1. Referee: [Methods] Methods section: The abstract outlines a constructivist grounded theory approach with 33 interviews but provides no details on coding procedures, saturation criteria, or resolution of contradictions in participant accounts. These omissions are load-bearing for the central claim that tool and teammate framings are recurring and that the rubric reliably captures their effects on authority, accountability, and acceptability.

    Authors: We agree that the Methods section requires greater detail to support the claims. Although the full manuscript describes the overall constructivist grounded theory approach and interview protocol, it does not explicitly document the coding procedures, saturation criteria, or how contradictions across accounts were resolved. In the revised manuscript we will expand the Methods section to include these elements: a description of the iterative open and axial coding process, the criteria used to determine theoretical saturation, and the constant-comparison techniques employed to reconcile divergent participant accounts. revision: yes

  2. Referee: [Findings / Discussion] Findings and discussion: The rubric is presented as transferable without explicit scope limitations or cross-context validation; the sample is restricted to three large technology organizations, where accountability structures and risk tolerance may systematically differ from startups, smaller firms, or non-tech domains, weakening the claim that the framings represent stable patterns of reasoning.

    Authors: We accept the need for clearer scope limitations. The study is indeed confined to three large technology organizations, and we will add an explicit limitations paragraph in the Discussion that notes potential differences in accountability structures, risk tolerance, and governance practices in startups, smaller firms, or non-technology domains. At the same time, we maintain that the tool and teammate framings are presented as recurring patterns observed within the sampled contexts rather than as universally stable across all organizational settings; the rubric is offered as an analytic lens for examining sociotechnical positioning, not as a validated general model. We will revise the language in the Findings and Discussion to avoid any implication of broad transferability without further empirical validation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical grounded theory analysis of interview data

full rationale

The paper reports findings from a constructivist grounded theory study of 33 interviews across three organizations. It identifies recurring tool and teammate framings and presents an analytic rubric describing their effects on authority, accountability, oversight, and acceptability. No mathematical derivations, parameter fitting, or self-citation chains exist; the central claims are direct summaries of observed participant reasoning rather than outputs that reduce to the inputs by construction. The analysis is self-contained as an empirical report of sociotechnical positioning patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the methodological validity of constructivist grounded theory applied to the interview transcripts; no free parameters, quantitative fits, or new postulated entities are introduced.

axioms (1)
  • domain assumption Constructivist grounded theory provides a valid way to surface practitioners' reasoning about technology roles from interview data.
    The study explicitly adopts this approach to derive the tool/teammate framings and rubric.

pith-pipeline@v0.9.0 · 5543 in / 1332 out tokens · 53040 ms · 2026-05-15T11:54:35.347033+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Thomas Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human- AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, ...

  2. [2]

    Gagan Bansal, Tongshuang Wu, and Joyce Zhou. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 592, 16 pages. doi:10.1145/3411764.3445088

  3. [3]

    Anol Bhattacherjee. 2001. Understanding Information Systems Continuance: An Expectation-Confirmation Model.MIS Quarterly25, 3 (2001), 351–370. doi:10. 2307/3250921

  4. [4]

    2014.Constructing Grounded Theory(2 ed.)

    Kathy Charmaz. 2014.Constructing Grounded Theory(2 ed.). SAGE Publications Ltd, London, United Kingdom

  5. [5]

    Malin Eiband, Daniel Buschek, Heinrich Hussmann, and Alexander Butz. 2018. Bringing Transparency Design into Practice. InProceedings of the 23rd Inter- national Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 211–223. doi:10.1145/3172944.3172961

  6. [6]

    Hoff and Masooda Bashir

    Kevin A. Hoff and Masooda Bashir. 2015. Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust.Human Factors57, 3 (2015), 407–434. doi:10.1177/0018720814554227

  7. [7]

    Himanshu Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA...

  8. [8]

    Lee and Katrina A

    John D. Lee and Katrina A. See. 2004. Trust in Automation: Designing for Appropriate Reliance.Human Factors46, 1 (2004), 50–80. doi:10.1518/hfes.46.1. 50_30392

  9. [9]

    Vera Liao, Daniel Gruen, and Sarah Miller

    Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: In- forming Design Practices for Explainable AI User Experiences. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Associ- ation for Computing Machinery, New York, NY, USA, Article 416, 15 pages. doi:10.1145/3313831.3376590

  10. [10]

    Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach

    Michael A. Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach

  11. [11]

    Guo, Robert DeLine, and Sumit Gulwani

    Co-designing Checklists to Understand Organizational Challenges and Opportunities Around Fairness in AI. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 693, 14 pages. doi:10.1145/3313831.3376445

  12. [12]

    Blackwell

    Aoife O’Driscoll and Alan F. Blackwell. 2025. Social Norms, Social AI: Investigat- ing the Effects of AI (Im)politeness and Gender on User Perception. InProceedings of BCS Human-Computer Interaction Conference 2025. BCS Learning & Develop- ment Ltd. doi:10.14236/ewic/BCSHCI2025.66

  13. [13]

    Richard L. Oliver. 1980. A Cognitive Model of the Antecedents and Consequences of Satisfaction Decisions.Journal of Marketing Research17, 4 (1980), 460–469

  14. [14]

    Samir Passi and Solon Barocas. 2019. Problem Formulation and Fairness. In Proceedings of the Conference on Fairness, Accountability, and Transparency. As- sociation for Computing Machinery, New York, NY, USA, 39–48. doi:10.1145/ 3287560.3287567

  15. [15]

    Haazique Sayyed, Meshari Alwazae, and Varad Vishwarupe. 2025. BlockSafe: Universal Blockchain-Based Identity Management. InBig Data in Finance: Trans- forming the Financial Landscape: Volume 2. Springer Nature Switzerland, Cham, 57–66

  16. [16]

    Ben Shneiderman. 2020. Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy.International Journal of Human-Computer Interaction36, 6 (2020), 495–504. doi:10.1080/10447318.2020.1741118

  17. [17]

    Lucy A. Suchman. 2007.Human-Machine Reconfigurations: Plans and Situated Actions(2 ed.). Cambridge University Press, Cambridge, United Kingdom

  18. [18]

    Varad Vishwarupe, Mangesh Bedekar, Milind Pande, and Anil Hiwale. 2018. Intelligent Twitter Spam Detection: A Hybrid Approach. InSmart Trends in Systems, Security and Sustainability. Springer Singapore, Singapore, 189–197

  19. [19]

    Sheena Rani, Meshari Alwazae, Haazique Sayyed, Vishal Pawar, Vidya Kamma, and Priyanka Kuklani

    Varad Vishwarupe, Alexander Hankey, Shailesh Pangaonkar, Shwetanshu Shekhar, R. Sheena Rani, Meshari Alwazae, Haazique Sayyed, Vishal Pawar, Vidya Kamma, and Priyanka Kuklani. 2025. Predicting Mental Health Ailments Using Social Media Activities and Keystroke Dynamics with Machine Learning. InBig Data in Finance: Transforming the Financial Landscape: Volu...

  20. [20]

    Varad Vishwarupe, Prachi Joshi, Shrey Maheshwari, Priyanka Kuklani, Prathamesh Shingote, Milind Pande, Vishal Pawar, and Aseem Deshmukh. 2023. Exploring Human Computer Interaction in Industry 4.0. InAI, IoT, Big Data and Cloud Computing for Industry 4.0. Springer, 21–38. doi:10.1007/978-3-031-29713- 7_2 CHI EA ’26, April 13–17, 2026, Barcelona, Spain Vish...

  21. [21]

    Joshi, Nicole Mathias, Shrey Maheshwari, Shweta Mhaisalkar, and Vishal Pawar

    Varad Vishwarupe, Prachi M. Joshi, Nicole Mathias, Shrey Maheshwari, Shweta Mhaisalkar, and Vishal Pawar. 2022. Explainable AI and Interpretable Machine Learning: A Case Study in Perspective.Procedia Computer Science204 (2022), 869–876. doi:10.1016/j.procs.2022.08.105

  22. [22]

    Joshi, and Nicole Mathias

    Varad Vishwarupe, Shrey Maheshwari, Aseem Deshmukh, Shweta Mhaisalkar, Prachi M. Joshi, and Nicole Mathias. 2022. Bringing Humans at the Epicenter of Artificial Intelligence: A Confluence of AI, HCI and Human Centered Computing. Procedia Computer Science204 (2022), 914–921. doi:10.1016/j.procs.2022.08.111

  23. [23]

    I’m accountable, so I need to be able to stand behind the deci- sion

    Saniya Zahoor, Mangesh Bedekar, Vinod Mane, and Varad Vishwarupe. 2016. Uniqueness in User Behavior While Using the Web. InProceedings of the In- ternational Congress on Information and Communication Technology. Springer Singapore, Singapore, 221–228. A APPENDIX A: Interview Guide Interviews were semi-structured and adaptive. Questions were used flexibly ...