pith. sign in

arxiv: 2604.18096 · v1 · submitted 2026-04-20 · 💻 cs.HC · cs.AI· cs.IR· cs.LG

The Collaboration Gap in Human-AI Work

Pith reviewed 2026-05-10 04:14 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.IRcs.LG
keywords human-AI collaborationgrounding conditionsLLM interactionsrepair mechanismscollaboration structuresinteraction designasymmetric repair
0
0 comments X

The pith

Stable collaboration with AI depends on the interaction's grounding conditions, not just model capability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that human-AI work with large language models often falls short because users must constantly diagnose misunderstandings and repair misaligned outputs. Drawing on interviews with designers and developers, it identifies three recurring patterns of interaction: one-shot assistance, weak collaboration where repair falls asymmetrically on the human, and grounded collaboration with shared assumptions. The key argument is that breakdowns occur when the surface appearance of partnership exceeds the actual capacity for establishing and maintaining common ground. This matters for anyone building or using AI tools because capability improvements alone will not close the gap without attention to how context and repairs are handled in the interaction.

Core claim

Drawing on a constructivist grounded theory analysis of 16 interviews, the authors argue that stable collaboration depends not only on model capability but on the interaction's grounding conditions. They distinguish three recurrent structures of human-AI work: one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration. Collaboration breaks down when the appearance of partnership outpaces the grounding capacity of the interaction.

What carries the argument

Grounding conditions in the interaction, which enable shared assumptions and symmetric repair of misalignments across the three identified structures of human-AI work.

If this is right

  • Design efforts focused solely on increasing model capability will leave persistent repair burdens on users in weak collaboration settings.
  • Interfaces that make grounding explicit, such as shared context summaries or assumption checks, could shift more interactions toward grounded collaboration.
  • Evaluation of LLM tools should measure repair effort and grounding failures rather than output quality alone.
  • The three structures provide a vocabulary for comparing collaboration experiences across programming, design, and analysis tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers might reduce user frustration by defaulting to one-shot assistance modes unless explicit grounding features are enabled.
  • The framework could extend to non-LLM AI systems where users similarly reconstruct missing context during tasks.
  • Training data for future models could incorporate patterns of successful repair to improve baseline grounding.

Load-bearing premise

The conceptual distinctions identified in the 16 interviews represent recurrent and generalizable structures across different human-AI work settings.

What would settle it

Systematic observation of human-AI sessions in additional domains, such as medical analysis or legal review, showing either new collaboration structures or breakdowns driven by factors other than mismatched grounding capacity.

Figures

Figures reproduced from arXiv: 2604.18096 by Ivan Flechais, Marina Jirotka, Nigel Shadbolt, Varad Vishwarupe.

Figure 1
Figure 1. Figure 1: Grounding and repair conditions for human–AI collaboration. One-shot assistance, weak collaboration, and grounded collaboration differ in how much grounding the interaction supports and how repair burden is distributed. 4.3 Grounded collaboration At the highest level of grounding, the interaction begins to support explicit clarification, signalling, and mutual repair. The system helps surface assumptions, … view at source ↗
read the original abstract

LLMs are increasingly presented as collaborators in programming, design, writing, and analysis. Yet the practical experience of working with them often falls short of this promise. In many settings, users must diagnose misunderstandings, reconstruct missing assumptions, and repeatedly repair misaligned responses. This poster introduces a conceptual framework for understanding why such collaboration remains fragile. Drawing on a constructivist grounded theory analysis of 16 interviews with designers, developers, and applied AI practitioners working on LLM-enabled systems, and informed by literature on human-AI collaboration, we argue that stable collaboration depends not only on model capability but on the interaction's grounding conditions. We distinguish three recurrent structures of human-AI work: one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration. We propose that collaboration breaks down when the appearance of partnership outpaces the grounding capacity of the interaction and contribute a framework for discussing grounding, repair, and interaction structure in LLM-enabled work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a conceptual framework for human-AI collaboration with LLMs based on a constructivist grounded theory analysis of 16 interviews with designers, developers, and applied AI practitioners. It identifies three recurrent structures of human-AI work—one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration—and argues that stable collaboration depends on the interaction's grounding conditions, breaking down when the appearance of partnership outpaces the grounding capacity.

Significance. This framework offers a valuable perspective for the HCI community by shifting focus from model capabilities to interactional factors like grounding and repair in LLM-enabled work. If the distinctions prove generalizable, it could guide the development of more robust collaborative systems and inform future empirical studies on human-AI partnerships.

major comments (2)
  1. [Methods] The description of the constructivist grounded theory analysis lacks specific details on participant selection, interview protocols, the coding process, and how theoretical saturation was achieved. This information is essential to assess the robustness of the derived framework and the claim that the three structures are recurrent.
  2. [Findings] The manuscript does not provide evidence, such as participant counts per structure or representative quotes, demonstrating that the three structures reliably appear across the sample. Without this, the assertion of 'recurrent structures' remains under-supported for a general claim about human-AI work.
minor comments (2)
  1. [Abstract] The abstract could benefit from a brief mention of the key literature informing the framework to better contextualize the contribution.
  2. [Discussion] Consider adding implications for design or future research directions to strengthen the practical impact of the framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the framework's potential contribution to HCI and for highlighting areas where additional transparency and evidence are needed. We agree that the poster format constrained the level of methodological detail and empirical illustration provided. Below we address each major comment and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Methods] The description of the constructivist grounded theory analysis lacks specific details on participant selection, interview protocols, the coding process, and how theoretical saturation was achieved. This information is essential to assess the robustness of the derived framework and the claim that the three structures are recurrent.

    Authors: We accept this observation. The current poster version omitted these details due to length limits. In the revised manuscript we will add a dedicated methods subsection that specifies: (1) purposive sampling through HCI and AI practitioner networks with inclusion criteria focused on recent LLM tool use; (2) the semi-structured interview guide covering workflow, breakdowns, and repair strategies; (3) the constructivist coding process (initial line-by-line coding, focused coding, and memoing per Charmaz); and (4) the saturation criterion, which was reached after the 14th interview with the final two interviews confirming no new categories. These additions will allow readers to evaluate the analytic rigor. revision: yes

  2. Referee: [Findings] The manuscript does not provide evidence, such as participant counts per structure or representative quotes, demonstrating that the three structures reliably appear across the sample. Without this, the assertion of 'recurrent structures' remains under-supported for a general claim about human-AI work.

    Authors: We agree that the poster does not currently display the supporting evidence. In revision we will insert a summary table showing the distribution of the 16 participants across the three structures (with note that some participants exhibited elements of more than one) and will include one or two anonymized, representative quotes per structure drawn directly from the interview transcripts. This will make the recurrence claim empirically traceable while preserving participant confidentiality. revision: yes

Circularity Check

0 steps flagged

No circularity: qualitative framework derived from interviews and literature

full rationale

The paper presents a conceptual framework distinguishing three structures of human-AI work (one-shot assistance, weak collaboration with asymmetric repair, grounded collaboration) obtained via constructivist grounded theory from 16 interviews with designers, developers, and AI practitioners, plus literature review. No equations, quantitative derivations, parameter fitting, or predictions exist. No self-citations are load-bearing for the core distinctions, and the framework is not defined in terms of itself or renamed from prior results by the same authors. The derivation chain is self-contained against external benchmarks (interview data and cited literature) with no reduction by construction. Generalizability from the sample is a validity question, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim relies on qualitative interpretive assumptions and introduces new conceptual entities without external validation or quantitative measures.

axioms (1)
  • domain assumption The patterns identified in interviews with 16 practitioners reflect generalizable structures in human-AI collaboration.
    The framework is built on this inductive generalization from a small sample.
invented entities (2)
  • Grounding capacity of the interaction no independent evidence
    purpose: Explains the limit on stable collaboration
    A new conceptual construct introduced to account for collaboration fragility.
  • Asymmetric repair in weak collaboration no independent evidence
    purpose: Describes a common failure mode
    Derived category from the analysis.

pith-pipeline@v0.9.0 · 5468 in / 1345 out tokens · 54253 ms · 2026-05-10T04:14:51.336147+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

    Guidelines for Human-AI Interaction. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/3290605.3300233 Bansal, Gagan, Tongshuang Wu, and Joyce Zhou

  2. [2]

    In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

    Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/3411764.3445088 Charmaz, Kathy . 2014.Constructing Grounded Theory . London, United Kingdom: SA...

  3. [3]

    The collaboration gap

    The Collaboration Gap. arXiv preprint arXiv:2511.02687 . Eiband, Malin, Daniel Buschek, Heinrich Hussmann, and Alexander Butz

  4. [4]

    In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp

    Bringing Transparency Design into Practice. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp. 211–223. New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/ 3172944.3172961 Fussell, Susan R. and Robert M. Krauss

  5. [5]

    Brockman, Nasir Memon, and Sameer Patil

    Interpreting Interpretability: Understanding Data Scientists’ Use of Inter- pretability Tools for Machine Learning. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https: //doi.org/10.1145/3313831.3376212 Kraut, Robert E., Darren Gergle, and Susan R. Fussell

  6. [6]

    In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work , pp

    The Use of Visual Information in Shared Visual Spaces: Informing the Development of Virtual Co-Presence. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work , pp. 31–40. New York, NY, USA: Association for Computing Machinery . Liao, Q. Vera, Daniel Gruen, and Sarah Miller

  7. [7]

    Vera Liao, Daniel Gruen, and Sarah Miller

    Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery . https://doi.org/ 10.1145/3313831.3376590 Poelitz, Christian, Finale Doshi-Velez, and Siân Lindley

  8. [8]

    arXiv preprint arXiv:2602.21337

    A Benchmark to Assess Common Ground in Human–AI Collaboration. arXiv preprint arXiv:2602.21337 . Roschelle, Jeremy and Stephanie D. Teasley

  9. [9]

    In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies (Volume 1: Long Papers), pp

    Grounding Gaps in Language Model Generations. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies (Volume 1: Long Papers), pp. 6279–6296. Mexico City , Mexico: Association for Computational Linguistics. 6 Shneiderman, Ben

  10. [10]

    International Journal of Human–Computer Interaction 36, 1902–1911

    Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy . In- ternational Journal of Human-Computer Interaction , 36 (6): 495–504. https: //doi.org/10.1080/ 10447318.2020.1741118 Traum, David

  11. [11]

    To LLM, or Not to LLM?

    “To LLM, or Not to LLM?”: How Designers and Developers Navigate LLMs as Tools or Teammates. Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems . https: //doi.org/10.1145/3772363. 3798953 7