Human-AI Coordination Zones: A Framework for Designing Human-in-the-Loop Experiences with Agentic AI

Brian Granger; James Pierce; Siddharth Gupta; Vaiva Kalnikait\.e

arxiv: 2606.09848 · v1 · pith:NIJM7VNCnew · submitted 2026-05-01 · 💻 cs.HC · cs.AI· cs.CY

Human-AI Coordination Zones: A Framework for Designing Human-in-the-Loop Experiences with Agentic AI

James Pierce , Vaiva Kalnikait\.e , Siddharth Gupta , Brian Granger This is my paper

Pith reviewed 2026-07-01 08:18 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CY

keywords human-AI coordinationagentic AIhuman-in-the-loopdesign frameworkcoordination zonessalienceinvolvementactivity

0 comments

The pith

Analysis of 60 commercial AI applications produces a three-dimension framework for human-AI coordination zones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a mid-level design framework that treats human-AI coordination as the combined result of how prominently AI appears, what actions users can take with it, and what the AI actually performs. This framework supplies concrete tools such as four coordination zones, an input taxonomy, and journey-mapping curves. A sympathetic reader would care because current AI products lack shared language between high-level guidelines and specific interface choices, leaving designers without reliable ways to support usability, trust, and safety. The work shows the framework can generate new designs, evaluate existing ones, and help teams communicate requirements.

Core claim

Through landscape and artifact analysis of 60 commercial AI applications, the paper defines human-AI coordination as the interplay of three dimensions—salience, involvement, and activity—and introduces coordination zones (done-for-me, done-under-me, done-with-me, done-without-me), an input taxonomy (prompted, sparked, inferred, layered), coordination curves, and reusable design patterns.

What carries the argument

The coordination zones framework, which treats human-AI coordination as the interplay of salience (AI prominence), involvement (user actions), and activity (AI actions) to generate, evaluate, and communicate interface designs.

If this is right

Designers can use the zones and curves to create new human-in-the-loop experiences.
Existing AI interfaces can be evaluated by mapping them onto the salience-involvement-activity space.
Cross-functional teams gain a shared vocabulary for specifying coordination requirements.
The input taxonomy supports systematic choices about how users trigger AI actions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The zones could be applied to non-commercial or research prototypes to check whether the same patterns appear outside the original 60 apps.
Coordination curves might be used in longitudinal studies to track how user journeys change after software updates.
The framework offers a way to compare coordination styles across product categories such as productivity tools versus entertainment apps.

Load-bearing premise

Landscape and artifact analysis of 60 commercial AI applications yields generalizable mid-level design knowledge that applies beyond the sampled products.

What would settle it

A collection of commercial AI applications whose coordination behaviors cannot be described by any combination of the three dimensions or four zones would show the framework does not cover the space of existing products.

read the original abstract

As generative and agentic AI becomes embedded in everyday products, practitioners face a persistent challenge: how to design human-AI coordination -- the ongoing mutual adjustment between users and AI systems as mediate through interfaces-that supports usability, trust, and safety. Existing resources offer high-level principles ("be transparent," "maintain user control") or low-level UI patterns, but there is a lack of mid-level design knowledge bridging the two. Through landscape and artifact analysis of 60 commercial AI applications, we introduce a framework defining human-AI coordination as the interplay of three dimensions: salience (how prominently AI is presented), involvement (what users can do to engage AI), and activity (what AI actually does). We contribute mid-level tools including coordination zones (done-for-me, done-under-me, done-with-me, done-without-me), an input taxonomy (prompted, sparked, inferred, layered), coordination curves for mapping user journeys, and design patterns demonstrating the generative capacity of the framework. The framework can be applied generatively to design experiences, analytically to evaluate existing ones, and communicatively to articulate ideas across stakeholders.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies a usable mid-level framework with named zones and taxonomies from 60 apps, but the extraction process is not described enough to judge how solid the categories are.

read the letter

The main takeaway is that this work gives HCI designers a practical set of labels and diagrams for thinking about how users and agentic AI share control. The three dimensions (salience, involvement, activity) plus the four coordination zones (done-for-me through done-without-me), the input taxonomy, and the coordination curves are presented as new organizing tools pulled from commercial examples.

What the paper does well is turn scattered interface choices into something that can be used both to generate designs and to talk about existing ones. Naming the zones and showing example patterns makes the framework more than another set of high-level guidelines. That kind of mid-level language is genuinely missing in this area.

The soft spot is the analysis step itself. The abstract says the framework comes from landscape and artifact analysis of 60 apps, yet it gives no information on selection criteria, how the dimensions were identified, or any check on consistency. Without those details it is hard to tell whether the zones reflect a stable pattern across apps or mainly the authors' reading of the ones they chose. That does not make the framework useless, but it does limit how far we can trust it as general design knowledge right now.

This is for people who design or evaluate agentic AI products and need something more concrete than broad principles. A practitioner could try the zones on a new feature and see if they help surface options. It is worth sending to peer review because the idea is concrete enough that referees can test the categories against more cases and ask for the missing methodological steps. The core synthesis looks worth refining rather than discarding.

Referee Report

3 major / 2 minor

Summary. The paper claims that landscape and artifact analysis of 60 commercial AI applications yields a three-dimensional framework for human-AI coordination (salience: how prominently AI is presented; involvement: what users can do to engage AI; activity: what AI actually does). It contributes mid-level design tools including coordination zones (done-for-me, done-under-me, done-with-me, done-without-me), an input taxonomy (prompted, sparked, inferred, layered), coordination curves for mapping user journeys, and design patterns. The framework is intended for generative design, analytical evaluation, and cross-stakeholder communication of human-in-the-loop experiences with agentic AI.

Significance. If the derivation and generalizability hold, the work supplies needed mid-level design knowledge that sits between high-level principles and low-level UI patterns, offering structured vocabulary and visual tools that could improve usability, trust, and safety in everyday agentic AI products. The explicit coordination zones and input taxonomy provide concrete, reusable artifacts for practitioners.

major comments (3)

[Methods] Methods section: The manuscript provides no information on selection criteria for the 60 applications, the coding scheme or process used to surface the three dimensions and zones, inter-rater reliability, or any validation against external data. This absence directly undermines the claim that the framework constitutes generalizable mid-level design knowledge rather than an interpretive synthesis.
[§4] §4 (Framework): The mapping from observed app features to the specific coordination zones and input taxonomy is presented as emergent from the analysis, yet no trace of the analytic steps, counter-examples, or saturation criteria is supplied. Without this, it is impossible to evaluate whether the zones are load-bearing constructs or post-hoc categorizations.
[§5] §5 (Design patterns): The generative capacity of the framework is illustrated with examples, but these examples are not cross-checked against the original 60-app corpus or tested for predictive utility; the section therefore does not demonstrate that the framework adds explanatory power beyond existing high-level guidelines.

minor comments (2)

[Figure 2] Figure 2 (coordination curves): Axis labels and legend are too small for print; add a caption that explicitly ties each curve segment to the three dimensions.
[Introduction] The abstract states the framework is derived from 'landscape and artifact analysis' but the introduction does not cite prior HCI landscape studies; adding 2-3 references would clarify the methodological lineage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which highlight important opportunities to strengthen the methodological transparency of our landscape analysis. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Methods] Methods section: The manuscript provides no information on selection criteria for the 60 applications, the coding scheme or process used to surface the three dimensions and zones, inter-rater reliability, or any validation against external data. This absence directly undermines the claim that the framework constitutes generalizable mid-level design knowledge rather than an interpretive synthesis.

Authors: We agree that the current manuscript lacks a dedicated Methods section with these details. In the revision, we will add a Methods section describing: (1) selection criteria (apps were chosen from major platforms for diversity in domains such as productivity, creative tools, and consumer services, prioritizing those with agentic features released or updated in 2023-2024); (2) the artifact analysis process (systematic review of interfaces, onboarding flows, and feature sets by the author team through iterative discussion); and (3) the emergent coding that surfaced the dimensions of salience, involvement, and activity. We will explicitly note that this was an interpretive synthesis without formal inter-rater reliability metrics or external validation, and we will add a limitations subsection discussing implications for generalizability. revision: yes
Referee: [§4] §4 (Framework): The mapping from observed app features to the specific coordination zones and input taxonomy is presented as emergent from the analysis, yet no trace of the analytic steps, counter-examples, or saturation criteria is supplied. Without this, it is impossible to evaluate whether the zones are load-bearing constructs or post-hoc categorizations.

Authors: We acknowledge the absence of traceable analytic steps in the current draft. The revision will expand §4 with a new subsection that outlines the derivation process, including representative examples from the 60-app corpus that informed each coordination zone (done-for-me, done-under-me, done-with-me, done-without-me) and input type, as well as counter-examples considered during refinement. This will make the grounding of the constructs more explicit while preserving the mid-level, design-oriented nature of the contribution. revision: yes
Referee: [§5] §5 (Design patterns): The generative capacity of the framework is illustrated with examples, but these examples are not cross-checked against the original 60-app corpus or tested for predictive utility; the section therefore does not demonstrate that the framework adds explanatory power beyond existing high-level guidelines.

Authors: The examples in §5 are primarily illustrative to demonstrate generative use. In revision, we will add explicit cross-references linking each design pattern back to specific applications from the original corpus and clarify how the coordination zones and curves provide structured vocabulary and visual mapping that extend beyond generic principles. We agree that formal predictive testing lies outside the scope of this paper and will add language in the Discussion and Limitations sections to this effect, positioning the work as a mid-level framework rather than a validated predictive model. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives its three-dimensional framework (salience, involvement, activity) and mid-level tools (coordination zones, input taxonomy, coordination curves) explicitly from landscape and artifact analysis of 60 external commercial AI applications. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the provided abstract or description; the central claim rests on independent empirical synthesis rather than reducing to prior author work or internal definitions by construction. This is the most common honest non-finding for descriptive design papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The paper's contribution rests on a domain assumption about the value of commercial app analysis and introduces new conceptual entities without independent empirical validation beyond the analysis itself.

axioms (1)

domain assumption Analysis of 60 commercial AI applications can reveal generalizable mid-level design knowledge for human-AI coordination.
This premise directly supports the construction and claimed utility of the framework.

invented entities (3)

Coordination zones (done-for-me, done-under-me, done-with-me, done-without-me) no independent evidence
purpose: Categorize modes of human-AI coordination
New conceptual categories derived from the app analysis.
Input taxonomy (prompted, sparked, inferred, layered) no independent evidence
purpose: Classify user engagement methods with AI
New classification scheme introduced by the framework.
Coordination curves no independent evidence
purpose: Map changes in coordination over user journeys
New visualization tool proposed in the framework.

pith-pipeline@v0.9.1-grok · 5741 in / 1424 out tokens · 57676 ms · 2026-07-01T08:18:31.163211+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 1 canonical work pages · 1 internal anchor

[1]

A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores

De-Arteaga, Maria, Riccardo Fogliato, and Alexandra Chouldechova. "A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores." In Proceedings of the 2020 CHI conference on human factors in computing systems, pp. 1-12. 2020. [19] Do, H.J., Brachman, M., Dugan, C., Pan, Q., Rai, P., Johnson, J.M. and Thawani, R., 2024. Evaluat...

2020
[2]

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Miller, Tim. "Explanation in artificial intelligence: Insights from the social sciences." Artificial intelligence 267 (2019): 1-38 [81] Moruzzi, C., Tallyn, E., Liddell, F., Dixon, B., Collomosse, J. and Elsden, C., 2025, June. Content Authenticities: A Discussion on the Values of Provenance Data for Creatives and Their Audiences. In Proceedings of the 20...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[3]

Liner AI-powered web highlighter and research tool that helps users organize and summarize content. 7. Civitai A platform for sharing and discovering AI-generated images, models, and LoRAs. 8. Luma AI-powered 3D content generation and video enhancement tool. 9. Leonardo AI image-generation tool focused on high-quality, creative outputs for artists and des...
[4]

too much transparency is overwhelming?

Amazon Prime Video Media streaming service featuring AI-generated recommendations and content. 45. Netflix Media streaming service featuring AI-generated recommendations and content 46. Google Maps Map and navigation app featuring AI-generated routes, menus, and review summaries. 47. Adobe Creative Cloud A collection of creative software applications and ...

2025

[1] [1]

A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores

De-Arteaga, Maria, Riccardo Fogliato, and Alexandra Chouldechova. "A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores." In Proceedings of the 2020 CHI conference on human factors in computing systems, pp. 1-12. 2020. [19] Do, H.J., Brachman, M., Dugan, C., Pan, Q., Rai, P., Johnson, J.M. and Thawani, R., 2024. Evaluat...

2020

[2] [2]

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Miller, Tim. "Explanation in artificial intelligence: Insights from the social sciences." Artificial intelligence 267 (2019): 1-38 [81] Moruzzi, C., Tallyn, E., Liddell, F., Dixon, B., Collomosse, J. and Elsden, C., 2025, June. Content Authenticities: A Discussion on the Values of Provenance Data for Creatives and Their Audiences. In Proceedings of the 20...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[3] [3]

Liner AI-powered web highlighter and research tool that helps users organize and summarize content. 7. Civitai A platform for sharing and discovering AI-generated images, models, and LoRAs. 8. Luma AI-powered 3D content generation and video enhancement tool. 9. Leonardo AI image-generation tool focused on high-quality, creative outputs for artists and des...

[4] [4]

too much transparency is overwhelming?

Amazon Prime Video Media streaming service featuring AI-generated recommendations and content. 45. Netflix Media streaming service featuring AI-generated recommendations and content 46. Google Maps Map and navigation app featuring AI-generated routes, menus, and review summaries. 47. Adobe Creative Cloud A collection of creative software applications and ...

2025