The Collaboration Gap in Human-AI Work

Ivan Flechais; Marina Jirotka; Nigel Shadbolt; Varad Vishwarupe

arxiv: 2604.18096 · v1 · submitted 2026-04-20 · 💻 cs.HC · cs.AI· cs.IR· cs.LG

The Collaboration Gap in Human-AI Work

Varad Vishwarupe , Marina Jirotka , Nigel Shadbolt , Ivan Flechais This is my paper

Pith reviewed 2026-05-10 04:14 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.IRcs.LG

keywords human-AI collaborationgrounding conditionsLLM interactionsrepair mechanismscollaboration structuresinteraction designasymmetric repair

0 comments

The pith

Stable collaboration with AI depends on the interaction's grounding conditions, not just model capability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that human-AI work with large language models often falls short because users must constantly diagnose misunderstandings and repair misaligned outputs. Drawing on interviews with designers and developers, it identifies three recurring patterns of interaction: one-shot assistance, weak collaboration where repair falls asymmetrically on the human, and grounded collaboration with shared assumptions. The key argument is that breakdowns occur when the surface appearance of partnership exceeds the actual capacity for establishing and maintaining common ground. This matters for anyone building or using AI tools because capability improvements alone will not close the gap without attention to how context and repairs are handled in the interaction.

Core claim

Drawing on a constructivist grounded theory analysis of 16 interviews, the authors argue that stable collaboration depends not only on model capability but on the interaction's grounding conditions. They distinguish three recurrent structures of human-AI work: one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration. Collaboration breaks down when the appearance of partnership outpaces the grounding capacity of the interaction.

What carries the argument

Grounding conditions in the interaction, which enable shared assumptions and symmetric repair of misalignments across the three identified structures of human-AI work.

If this is right

Design efforts focused solely on increasing model capability will leave persistent repair burdens on users in weak collaboration settings.
Interfaces that make grounding explicit, such as shared context summaries or assumption checks, could shift more interactions toward grounded collaboration.
Evaluation of LLM tools should measure repair effort and grounding failures rather than output quality alone.
The three structures provide a vocabulary for comparing collaboration experiences across programming, design, and analysis tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers might reduce user frustration by defaulting to one-shot assistance modes unless explicit grounding features are enabled.
The framework could extend to non-LLM AI systems where users similarly reconstruct missing context during tasks.
Training data for future models could incorporate patterns of successful repair to improve baseline grounding.

Load-bearing premise

The conceptual distinctions identified in the 16 interviews represent recurrent and generalizable structures across different human-AI work settings.

What would settle it

Systematic observation of human-AI sessions in additional domains, such as medical analysis or legal review, showing either new collaboration structures or breakdowns driven by factors other than mismatched grounding capacity.

Figures

Figures reproduced from arXiv: 2604.18096 by Ivan Flechais, Marina Jirotka, Nigel Shadbolt, Varad Vishwarupe.

**Figure 1.** Figure 1: Grounding and repair conditions for human–AI collaboration. One-shot assistance, weak collaboration, and grounded collaboration differ in how much grounding the interaction supports and how repair burden is distributed. 4.3 Grounded collaboration At the highest level of grounding, the interaction begins to support explicit clarification, signalling, and mutual repair. The system helps surface assumptions, … view at source ↗

read the original abstract

LLMs are increasingly presented as collaborators in programming, design, writing, and analysis. Yet the practical experience of working with them often falls short of this promise. In many settings, users must diagnose misunderstandings, reconstruct missing assumptions, and repeatedly repair misaligned responses. This poster introduces a conceptual framework for understanding why such collaboration remains fragile. Drawing on a constructivist grounded theory analysis of 16 interviews with designers, developers, and applied AI practitioners working on LLM-enabled systems, and informed by literature on human-AI collaboration, we argue that stable collaboration depends not only on model capability but on the interaction's grounding conditions. We distinguish three recurrent structures of human-AI work: one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration. We propose that collaboration breaks down when the appearance of partnership outpaces the grounding capacity of the interaction and contribute a framework for discussing grounding, repair, and interaction structure in LLM-enabled work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable three-way split on LLM collaboration patterns from 16 interviews but the recurrent claim needs more methodological backing to travel beyond the sample.

read the letter

The main takeaway is that stable human-LLM work hinges on grounding conditions more than raw model smarts, and the authors split observed interactions into one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration. This framing comes from constructivist grounded theory on interviews with designers, developers, and applied AI practitioners, and it connects directly to existing human-AI literature without forcing a big theoretical leap. The distinction between the three structures is clear enough on paper and could help people designing interfaces think about when repair work falls on the user versus when the interaction stays aligned. That part is useful and fairly new in its emphasis on the appearance of partnership outrunning actual grounding capacity. The interviews supply concrete practitioner experience that prior conceptual pieces sometimes lack. The soft spot is the leap from 16 purposive interviews to recurrent structures. The abstract and stress-test note both flag missing details on participant selection, interview guides, coding process, and saturation checks. Without those, it is hard to tell whether the three patterns are stable across settings or mainly reflect the shared professional world of the sample. If the full paper shows clear cross-case evidence and variation analysis, that concern shrinks; otherwise the framework stays suggestive rather than general. This is aimed at HCI researchers and tool builders who want a lens for analyzing LLM workflows. A reader already working on collaboration interfaces or evaluation methods would find it worth reading for the categories alone. I would send it to peer review so referees can check the methods section and any supporting quotes or diagrams against the claims. The ideas are coherent and the empirical base is honest even if narrow.

Referee Report

2 major / 2 minor

Summary. The paper introduces a conceptual framework for human-AI collaboration with LLMs based on a constructivist grounded theory analysis of 16 interviews with designers, developers, and applied AI practitioners. It identifies three recurrent structures of human-AI work—one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration—and argues that stable collaboration depends on the interaction's grounding conditions, breaking down when the appearance of partnership outpaces the grounding capacity.

Significance. This framework offers a valuable perspective for the HCI community by shifting focus from model capabilities to interactional factors like grounding and repair in LLM-enabled work. If the distinctions prove generalizable, it could guide the development of more robust collaborative systems and inform future empirical studies on human-AI partnerships.

major comments (2)

[Methods] The description of the constructivist grounded theory analysis lacks specific details on participant selection, interview protocols, the coding process, and how theoretical saturation was achieved. This information is essential to assess the robustness of the derived framework and the claim that the three structures are recurrent.
[Findings] The manuscript does not provide evidence, such as participant counts per structure or representative quotes, demonstrating that the three structures reliably appear across the sample. Without this, the assertion of 'recurrent structures' remains under-supported for a general claim about human-AI work.

minor comments (2)

[Abstract] The abstract could benefit from a brief mention of the key literature informing the framework to better contextualize the contribution.
[Discussion] Consider adding implications for design or future research directions to strengthen the practical impact of the framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the framework's potential contribution to HCI and for highlighting areas where additional transparency and evidence are needed. We agree that the poster format constrained the level of methodological detail and empirical illustration provided. Below we address each major comment and outline the revisions we will make.

read point-by-point responses

Referee: [Methods] The description of the constructivist grounded theory analysis lacks specific details on participant selection, interview protocols, the coding process, and how theoretical saturation was achieved. This information is essential to assess the robustness of the derived framework and the claim that the three structures are recurrent.

Authors: We accept this observation. The current poster version omitted these details due to length limits. In the revised manuscript we will add a dedicated methods subsection that specifies: (1) purposive sampling through HCI and AI practitioner networks with inclusion criteria focused on recent LLM tool use; (2) the semi-structured interview guide covering workflow, breakdowns, and repair strategies; (3) the constructivist coding process (initial line-by-line coding, focused coding, and memoing per Charmaz); and (4) the saturation criterion, which was reached after the 14th interview with the final two interviews confirming no new categories. These additions will allow readers to evaluate the analytic rigor. revision: yes
Referee: [Findings] The manuscript does not provide evidence, such as participant counts per structure or representative quotes, demonstrating that the three structures reliably appear across the sample. Without this, the assertion of 'recurrent structures' remains under-supported for a general claim about human-AI work.

Authors: We agree that the poster does not currently display the supporting evidence. In revision we will insert a summary table showing the distribution of the 16 participants across the three structures (with note that some participants exhibited elements of more than one) and will include one or two anonymized, representative quotes per structure drawn directly from the interview transcripts. This will make the recurrence claim empirically traceable while preserving participant confidentiality. revision: yes

Circularity Check

0 steps flagged

No circularity: qualitative framework derived from interviews and literature

full rationale

The paper presents a conceptual framework distinguishing three structures of human-AI work (one-shot assistance, weak collaboration with asymmetric repair, grounded collaboration) obtained via constructivist grounded theory from 16 interviews with designers, developers, and AI practitioners, plus literature review. No equations, quantitative derivations, parameter fitting, or predictions exist. No self-citations are load-bearing for the core distinctions, and the framework is not defined in terms of itself or renamed from prior results by the same authors. The derivation chain is self-contained against external benchmarks (interview data and cited literature) with no reduction by construction. Generalizability from the sample is a validity question, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim relies on qualitative interpretive assumptions and introduces new conceptual entities without external validation or quantitative measures.

axioms (1)

domain assumption The patterns identified in interviews with 16 practitioners reflect generalizable structures in human-AI collaboration.
The framework is built on this inductive generalization from a small sample.

invented entities (2)

Grounding capacity of the interaction no independent evidence
purpose: Explains the limit on stable collaboration
A new conceptual construct introduced to account for collaboration fragility.
Asymmetric repair in weak collaboration no independent evidence
purpose: Describes a common failure mode
Derived category from the analysis.

pith-pipeline@v0.9.0 · 5468 in / 1345 out tokens · 54253 ms · 2026-05-10T04:14:51.336147+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

Guidelines for Human-AI Interaction. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/3290605.3300233 Bansal, Gagan, Tongshuang Wu, and Joyce Zhou

work page doi:10.1145/3290605.3300233 2019
[2]

In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/3411764.3445088 Charmaz, Kathy . 2014.Constructing Grounded Theory . London, United Kingdom: SA...

work page doi:10.1145/3411764.3445088 2021
[3]

The collaboration gap

The Collaboration Gap. arXiv preprint arXiv:2511.02687 . Eiband, Malin, Daniel Buschek, Heinrich Hussmann, and Alexander Butz

work page arXiv
[4]

In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp

Bringing Transparency Design into Practice. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp. 211–223. New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/ 3172944.3172961 Fussell, Susan R. and Robert M. Krauss

work page arXiv
[5]

Brockman, Nasir Memon, and Sameer Patil

Interpreting Interpretability: Understanding Data Scientists’ Use of Inter- pretability Tools for Machine Learning. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https: //doi.org/10.1145/3313831.3376212 Kraut, Robert E., Darren Gergle, and Susan R. Fussell

work page doi:10.1145/3313831.3376212 2020
[6]

In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work , pp

The Use of Visual Information in Shared Visual Spaces: Informing the Development of Virtual Co-Presence. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work , pp. 31–40. New York, NY, USA: Association for Computing Machinery . Liao, Q. Vera, Daniel Gruen, and Sarah Miller

work page 2002
[7]

Vera Liao, Daniel Gruen, and Sarah Miller

Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery . https://doi.org/ 10.1145/3313831.3376590 Poelitz, Christian, Finale Doshi-Velez, and Siân Lindley

work page doi:10.1145/3313831.3376590 2020
[8]

arXiv preprint arXiv:2602.21337

A Benchmark to Assess Common Ground in Human–AI Collaboration. arXiv preprint arXiv:2602.21337 . Roschelle, Jeremy and Stephanie D. Teasley

work page arXiv
[9]

In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies (Volume 1: Long Papers), pp

Grounding Gaps in Language Model Generations. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies (Volume 1: Long Papers), pp. 6279–6296. Mexico City , Mexico: Association for Computational Linguistics. 6 Shneiderman, Ben

work page 2024
[10]

International Journal of Human–Computer Interaction 36, 1902–1911

Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy . In- ternational Journal of Human-Computer Interaction , 36 (6): 495–504. https: //doi.org/10.1080/ 10447318.2020.1741118 Traum, David

work page arXiv 2020
[11]

To LLM, or Not to LLM?

“To LLM, or Not to LLM?”: How Designers and Developers Navigate LLMs as Tools or Teammates. Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems . https: //doi.org/10.1145/3772363. 3798953 7

work page doi:10.1145/3772363 2026

[1] [1]

Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

Guidelines for Human-AI Interaction. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/3290605.3300233 Bansal, Gagan, Tongshuang Wu, and Joyce Zhou

work page doi:10.1145/3290605.3300233 2019

[2] [2]

In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/3411764.3445088 Charmaz, Kathy . 2014.Constructing Grounded Theory . London, United Kingdom: SA...

work page doi:10.1145/3411764.3445088 2021

[3] [3]

The collaboration gap

The Collaboration Gap. arXiv preprint arXiv:2511.02687 . Eiband, Malin, Daniel Buschek, Heinrich Hussmann, and Alexander Butz

work page arXiv

[4] [4]

In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp

Bringing Transparency Design into Practice. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp. 211–223. New York, NY, USA: Association for Computing Machinery . https://doi.org/10.1145/ 3172944.3172961 Fussell, Susan R. and Robert M. Krauss

work page arXiv

[5] [5]

Brockman, Nasir Memon, and Sameer Patil

Interpreting Interpretability: Understanding Data Scientists’ Use of Inter- pretability Tools for Machine Learning. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems . New York, NY, USA: Association for Computing Machinery . https: //doi.org/10.1145/3313831.3376212 Kraut, Robert E., Darren Gergle, and Susan R. Fussell

work page doi:10.1145/3313831.3376212 2020

[6] [6]

In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work , pp

The Use of Visual Information in Shared Visual Spaces: Informing the Development of Virtual Co-Presence. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work , pp. 31–40. New York, NY, USA: Association for Computing Machinery . Liao, Q. Vera, Daniel Gruen, and Sarah Miller

work page 2002

[7] [7]

Vera Liao, Daniel Gruen, and Sarah Miller

Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery . https://doi.org/ 10.1145/3313831.3376590 Poelitz, Christian, Finale Doshi-Velez, and Siân Lindley

work page doi:10.1145/3313831.3376590 2020

[8] [8]

arXiv preprint arXiv:2602.21337

A Benchmark to Assess Common Ground in Human–AI Collaboration. arXiv preprint arXiv:2602.21337 . Roschelle, Jeremy and Stephanie D. Teasley

work page arXiv

[9] [9]

In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies (Volume 1: Long Papers), pp

Grounding Gaps in Language Model Generations. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies (Volume 1: Long Papers), pp. 6279–6296. Mexico City , Mexico: Association for Computational Linguistics. 6 Shneiderman, Ben

work page 2024

[10] [10]

International Journal of Human–Computer Interaction 36, 1902–1911

Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy . In- ternational Journal of Human-Computer Interaction , 36 (6): 495–504. https: //doi.org/10.1080/ 10447318.2020.1741118 Traum, David

work page arXiv 2020

[11] [11]

To LLM, or Not to LLM?

“To LLM, or Not to LLM?”: How Designers and Developers Navigate LLMs as Tools or Teammates. Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems . https: //doi.org/10.1145/3772363. 3798953 7

work page doi:10.1145/3772363 2026