Cognitive Offloading in Agile Teams: How Artificial Intelligence Reshapes Risk Assessment and Planning Quality

Adriana Caraeni; Alexander Shick; Andrew Lan

arxiv: 2604.13814 · v1 · submitted 2026-04-15 · 💻 cs.HC · cs.AI

Cognitive Offloading in Agile Teams: How Artificial Intelligence Reshapes Risk Assessment and Planning Quality

Adriana Caraeni , Alexander Shick , Andrew Lan This is my paper

Pith reviewed 2026-05-10 12:42 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords cognitive offloadingAgile sprint planningAI in project managementrisk assessmenthybrid planningrework ratesestimation accuracyplanning quality

0 comments

The pith

AI-only Agile planning cuts time and cost but degrades risk capture and increases rework from unstated assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the effects of cognitive offloading by comparing three sprint planning models on a real client project in a controlled experiment. AI-only planning finishes faster and cheaper yet misses risks and triggers more rework because it leaves assumptions unexamined. Human-only planning adapts well to changes but requires much more time and effort. The authors therefore outline a hybrid approach that lets AI manage estimates and backlog formatting while people handle risk identification and ambiguity resolution. This matters for teams because it shows that speed gains from AI can come at the expense of planning quality unless roles are deliberately split.

Core claim

In a controlled three-condition experiment on a live deliverable, AI-only planning minimized time and cost yet significantly degraded risk capture rates and increased rework due to unstated assumptions, whereas human-only planning excelled at adaptability but incurred substantial overhead. These results support a proposed theoretical framework for hybrid AI-human sprint planning that assigns algorithmic tools to estimation and backlog formatting while requiring human deliberation for risk assessment and ambiguity resolution. The findings indicate that efficiency does not equate to effectiveness when planning quality is measured by risk-related outcomes.

What carries the argument

The three-condition controlled experiment comparing AI-only, human-only, and hybrid sprint planning models, evaluated with quantitative metrics of estimation accuracy, rework rates, and scope change recovery time plus qualitative indicators of planning robustness.

If this is right

Hybrid models can preserve adaptability and risk awareness while using AI to reduce routine planning overhead.
Teams must measure rework and risk outcomes, not just time and cost, to judge AI planning tools.
Governance rules should require human review of assumptions before finalizing AI-generated plans.
The split of tasks in the proposed framework offers a practical way to augment rather than erode team cognition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same role split could be tested in non-digital domains such as engineering or research projects to identify needed adjustments.
Sustained AI use without explicit human checkpoints might gradually reduce team skill at spotting risks.
Planning software could add features that surface and require confirmation of assumptions to support hybrid use.

Load-bearing premise

That results from one controlled experiment on a single live deliverable at a mid-sized digital agency will generalize to other Agile teams and project types.

What would settle it

A replication experiment across multiple projects and organizations that finds no significant difference in rework rates or risk capture between AI-only and hybrid conditions would falsify the reported trade-offs.

read the original abstract

Recent advances in artificial intelligence (AI) have shown promise in automating key aspects of Agile project management, yet their impact on team cognition remains underexplored. In this work, we investigate cognitive offloading in Agile sprint planning by conducting a controlled, three-condition experiment comparing AI-only, human-only, and hybrid planning models on a live client deliverable at a mid-sized digital agency. Using quantitative metrics -- including estimation accuracy, rework rates, and scope change recovery time -- alongside qualitative indicators of planning robustness, we evaluate each model's effectiveness beyond raw efficiency. We find that while AI-only planning minimizes time and cost, it significantly degrades risk capture rates and increases rework due to unstated assumptions, whereas human-only planning excels at adaptability but incurs substantial overhead. Drawing on these findings, we propose a theoretical framework for hybrid AI-human sprint planning that assigns algorithmic tools to estimation and backlog formatting while mandating human deliberation for risk assessment and ambiguity resolution. Our results challenge the assumption that efficiency equates to effectiveness, offering actionable governance strategies for organizations seeking to augment rather than erode team cognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

One live-project experiment finds AI-only planning misses risks and raises rework while human planning adds overhead, but the single-case design leaves the hybrid recommendation on shaky ground.

read the letter

The main thing to know is that they ran a three-condition trial on one actual client deliverable at a mid-sized agency and saw AI-only sprint planning cut time and cost but drop risk capture and raise rework from unstated assumptions, while human-only planning handled adaptability better at the price of extra effort. The hybrid they sketch, with AI on estimation and backlog work and humans on risk and ambiguity, comes straight out of those patterns.

Referee Report

2 major / 2 minor

Summary. The paper claims to have conducted a controlled three-condition experiment comparing AI-only, human-only, and hybrid planning in an Agile sprint for a live client deliverable. Quantitative results show AI-only minimizes time and cost but significantly degrades risk capture rates and increases rework due to unstated assumptions. Human-only planning is adaptable but has substantial overhead. The authors propose a hybrid framework for sprint planning that uses AI for estimation and backlog formatting while requiring human input for risk assessment and ambiguity resolution.

Significance. If the results are valid, this study contributes to the understanding of how AI can lead to cognitive offloading in team settings, particularly in risk assessment during planning. It provides actionable insights for organizations on balancing AI efficiency with human strengths in Agile environments. The empirical approach with mixed methods is a strength, offering both metrics and qualitative insights into planning quality.

major comments (2)

[§3 Experimental Procedure] The experiment is limited to a single live deliverable at one mid-sized digital agency with a three-condition design. This single-case approach means that the observed patterns in estimation accuracy, rework rates, and risk capture may not generalize beyond this specific context, project type, or team, which is critical for the validity of the proposed hybrid framework.
[§4.2 Quantitative Metrics] The paper reports differences in metrics such as rework rates and scope change recovery time without providing sample sizes, statistical significance tests, confidence intervals, or details on data collection procedures. This omission makes it impossible to determine if the differences are statistically meaningful or practically significant, directly affecting the support for the main findings.

minor comments (2)

[Abstract] The abstract refers to 'qualitative indicators of planning robustness' without specifying what these indicators are or how they were measured.
[References] A few references appear to be missing DOIs or have inconsistent citation styles.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have addressed each major comment point by point below, providing honest responses and indicating where revisions will be incorporated to strengthen the paper.

read point-by-point responses

Referee: [§3 Experimental Procedure] The experiment is limited to a single live deliverable at one mid-sized digital agency with a three-condition design. This single-case approach means that the observed patterns in estimation accuracy, rework rates, and risk capture may not generalize beyond this specific context, project type, or team, which is critical for the validity of the proposed hybrid framework.

Authors: We agree that the single-case design is a genuine limitation that restricts generalizability of the observed patterns to other contexts, project types, or teams, and this directly affects how broadly the hybrid framework can be claimed to apply. At the same time, the live client deliverable provides ecological validity that simulated tasks cannot match, revealing real cognitive offloading effects in an authentic Agile setting. In the revised manuscript we will add an explicit limitations subsection in the Discussion that acknowledges the single-case constraint, explains why a live project was chosen despite this trade-off, and outlines plans for future multi-site replications to test the framework's robustness. revision: yes
Referee: [§4.2 Quantitative Metrics] The paper reports differences in metrics such as rework rates and scope change recovery time without providing sample sizes, statistical significance tests, confidence intervals, or details on data collection procedures. This omission makes it impossible to determine if the differences are statistically meaningful or practically significant, directly affecting the support for the main findings.

Authors: The referee is correct that the original submission omitted these critical details, which weakens the ability to evaluate the quantitative claims. We will revise §4.2 to include the exact sample sizes per condition, the statistical tests performed (with assumptions verified), p-values, effect sizes, confidence intervals around key differences such as rework rates, and a full description of data collection and coding procedures for metrics like scope change recovery time. These additions will allow readers to assess both statistical and practical significance, directly supporting the main findings on hybrid planning. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical experiment reports direct outcomes without self-referential derivations

full rationale

The paper conducts a controlled three-condition experiment comparing AI-only, human-only, and hybrid planning on one live deliverable, then reports observed metrics (estimation accuracy, rework rates, scope recovery time) and qualitative indicators before proposing a hybrid framework. No equations, fitted parameters, predictions, or self-citations appear in the provided text; the central claims rest on experimental data rather than reducing to inputs by construction or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the controlled experiment and the chosen metrics; no free parameters, new entities, or non-standard axioms are introduced in the abstract.

axioms (1)

standard math Standard assumptions of controlled experimental design and statistical comparison across conditions
Invoked when claiming differences in risk capture and rework rates between AI-only, human-only, and hybrid conditions.

pith-pipeline@v0.9.0 · 5489 in / 1164 out tokens · 33901 ms · 2026-05-10T12:42:17.090148+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Claude sonnet 4.6.https://www.anthropic.com, 2025

Anthropic. Claude sonnet 4.6.https://www.anthropic.com, 2025. Accessed via claude.ai

work page 2025
[2]

A systematic literature review of AI-powered Scrum.Journal of Software Engineering and Applications, 2024

Juan Campoverde Morales. A systematic literature review of AI-powered Scrum.Journal of Software Engineering and Applications, 2024

work page 2024
[3]

Thordur Vikingur Fridgeirsson, Helgi Thor Ingason, Haukur Ingi Jonasson, and Helena Gunnarsdottir. A qualitative study on artificial intelligence and its impact on the project schedule, cost and risk management knowledge areas as presented in PMBOK.Applied Sciences, 13(19):11081, 2023

work page 2023
[4]

Ai-assisted planning in project workflows

Gartner Research. Ai-assisted planning in project workflows. Technical report, Gartner, 2023. 6 Cognitive Offloading in Agile Teams: How Artificial Intelligence Reshapes Risk Assessment and Planning Quality

work page 2023
[5]

Human-machine collaboration in managerial decision making.Computers in Human Behavior, 119:106730, 2021

Tessa Haesevoets et al. Human-machine collaboration in managerial decision making.Computers in Human Behavior, 119:106730, 2021

work page 2021
[6]

Artificial intelligence and the future of work: Human-AI symbiosis in organizational decision making.Business Horizons, 61(4):577–586, 2018

Mohammad Hossein Jarrahi. Artificial intelligence and the future of work: Human-AI symbiosis in organizational decision making.Business Horizons, 61(4):577–586, 2018

work page 2018
[7]

Advancing decision-making through AI-human collaboration: A systematic review and conceptual framework.Group Decision and Negotiation, 35(2), 2026

Han Li and Feng Tian. Advancing decision-making through AI-human collaboration: A systematic review and conceptual framework.Group Decision and Negotiation, 35(2), 2026

work page 2026
[8]

How machine learning is embedded to support clinician decision making: An analysis of FDA-approved medical devices.BMJ Health & Care Informatics, 28(1):e100301, 2021

David Lyell et al. How machine learning is embedded to support clinician decision making: An analysis of FDA-approved medical devices.BMJ Health & Care Informatics, 28(1):e100301, 2021

work page 2021
[9]

The state of AI in 2022 — and a half decade in review

McKinsey & Company. The state of AI in 2022 — and a half decade in review. Technical report, McKinsey & Company, 2022

work page 2022
[10]

Jessica Morley et al. From what to how: An initial review of publicly available AI ethics tools, methods and research to translate principles into practices.Science and Engineering Ethics, 26(4):2141–2168, 2020

work page 2020
[11]

Exploring automation bias in Human–AI collaboration: A review and implications for explainable AI.AI & Society, 41(1):259–278, 2026

Giuseppe Romeo and Daniela Conti. Exploring automation bias in Human–AI collaboration: A review and implications for explainable AI.AI & Society, 41(1):259–278, 2026

work page 2026
[12]

Augmenting team collaboration using artificial intelligence systems.Group Decision and Negotiation, 30:1–30, 2021

Sara Shafiee and David Sundaram. Augmenting team collaboration using artificial intelligence systems.Group Decision and Negotiation, 30:1–30, 2021

work page 2021
[13]

Artificial intelligence enabled project management: A systematic literature review.Applied Sciences, 13(8):5014, 2023

Ianire Taboada et al. Artificial intelligence enabled project management: A systematic literature review.Applied Sciences, 13(8):5014, 2023

work page 2023
[14]

Availability: A heuristic for judging frequency and probability.Cognitive Psychology, 5(2):207–232, 1973

Amos Tversky and Daniel Kahneman. Availability: A heuristic for judging frequency and probability.Cognitive Psychology, 5(2):207–232, 1973

work page 1973
[15]

A comparative study of artificial intelligence methods for project duration forecasting.Expert Systems with Applications, 46:249–261, 2016

Mathieu Wauters and Mario Vanhoucke. A comparative study of artificial intelligence methods for project duration forecasting.Expert Systems with Applications, 46:249–261, 2016. 7

work page 2016

[1] [1]

Claude sonnet 4.6.https://www.anthropic.com, 2025

Anthropic. Claude sonnet 4.6.https://www.anthropic.com, 2025. Accessed via claude.ai

work page 2025

[2] [2]

A systematic literature review of AI-powered Scrum.Journal of Software Engineering and Applications, 2024

Juan Campoverde Morales. A systematic literature review of AI-powered Scrum.Journal of Software Engineering and Applications, 2024

work page 2024

[3] [3]

Thordur Vikingur Fridgeirsson, Helgi Thor Ingason, Haukur Ingi Jonasson, and Helena Gunnarsdottir. A qualitative study on artificial intelligence and its impact on the project schedule, cost and risk management knowledge areas as presented in PMBOK.Applied Sciences, 13(19):11081, 2023

work page 2023

[4] [4]

Ai-assisted planning in project workflows

Gartner Research. Ai-assisted planning in project workflows. Technical report, Gartner, 2023. 6 Cognitive Offloading in Agile Teams: How Artificial Intelligence Reshapes Risk Assessment and Planning Quality

work page 2023

[5] [5]

Human-machine collaboration in managerial decision making.Computers in Human Behavior, 119:106730, 2021

Tessa Haesevoets et al. Human-machine collaboration in managerial decision making.Computers in Human Behavior, 119:106730, 2021

work page 2021

[6] [6]

Artificial intelligence and the future of work: Human-AI symbiosis in organizational decision making.Business Horizons, 61(4):577–586, 2018

Mohammad Hossein Jarrahi. Artificial intelligence and the future of work: Human-AI symbiosis in organizational decision making.Business Horizons, 61(4):577–586, 2018

work page 2018

[7] [7]

Advancing decision-making through AI-human collaboration: A systematic review and conceptual framework.Group Decision and Negotiation, 35(2), 2026

Han Li and Feng Tian. Advancing decision-making through AI-human collaboration: A systematic review and conceptual framework.Group Decision and Negotiation, 35(2), 2026

work page 2026

[8] [8]

How machine learning is embedded to support clinician decision making: An analysis of FDA-approved medical devices.BMJ Health & Care Informatics, 28(1):e100301, 2021

David Lyell et al. How machine learning is embedded to support clinician decision making: An analysis of FDA-approved medical devices.BMJ Health & Care Informatics, 28(1):e100301, 2021

work page 2021

[9] [9]

The state of AI in 2022 — and a half decade in review

McKinsey & Company. The state of AI in 2022 — and a half decade in review. Technical report, McKinsey & Company, 2022

work page 2022

[10] [10]

Jessica Morley et al. From what to how: An initial review of publicly available AI ethics tools, methods and research to translate principles into practices.Science and Engineering Ethics, 26(4):2141–2168, 2020

work page 2020

[11] [11]

Exploring automation bias in Human–AI collaboration: A review and implications for explainable AI.AI & Society, 41(1):259–278, 2026

Giuseppe Romeo and Daniela Conti. Exploring automation bias in Human–AI collaboration: A review and implications for explainable AI.AI & Society, 41(1):259–278, 2026

work page 2026

[12] [12]

Augmenting team collaboration using artificial intelligence systems.Group Decision and Negotiation, 30:1–30, 2021

Sara Shafiee and David Sundaram. Augmenting team collaboration using artificial intelligence systems.Group Decision and Negotiation, 30:1–30, 2021

work page 2021

[13] [13]

Artificial intelligence enabled project management: A systematic literature review.Applied Sciences, 13(8):5014, 2023

Ianire Taboada et al. Artificial intelligence enabled project management: A systematic literature review.Applied Sciences, 13(8):5014, 2023

work page 2023

[14] [14]

Availability: A heuristic for judging frequency and probability.Cognitive Psychology, 5(2):207–232, 1973

Amos Tversky and Daniel Kahneman. Availability: A heuristic for judging frequency and probability.Cognitive Psychology, 5(2):207–232, 1973

work page 1973

[15] [15]

A comparative study of artificial intelligence methods for project duration forecasting.Expert Systems with Applications, 46:249–261, 2016

Mathieu Wauters and Mario Vanhoucke. A comparative study of artificial intelligence methods for project duration forecasting.Expert Systems with Applications, 46:249–261, 2016. 7

work page 2016