Agentic AI in Industry: Adoption Level and Deployment Barriers

Helena Holmstr\"om Olsson; Jan Bosch; Spyridon Alvanakis Apostolou

arxiv: 2605.14675 · v1 · pith:PNA363CUnew · submitted 2026-05-14 · 💻 cs.SE

Agentic AI in Industry: Adoption Level and Deployment Barriers

Spyridon Alvanakis Apostolou , Jan Bosch , Helena Holmstr\"om Olsson This is my paper

Pith reviewed 2026-06-30 20:20 UTC · model grok-4.3

classification 💻 cs.SE

keywords agentic AIadoption maturityverification gapdeployment barriershuman-in-the-loopindustrial software engineeringmulti-agent systems

0 comments

The pith

Industrial firms encounter a verification gap that blocks deployment of advanced agentic AI systems into production.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports on interviews with sixteen practitioners from twelve companies to map how agentic AI is being adopted in industry. It applies a six-level maturity model and finds that most organizations remain at the first two levels, with only one reaching multi-agent orchestration. The key observation is that four companies have tested higher capabilities yet cannot move them into live workflows because no reliable way exists to check AI outputs without human review. Technical limits on context, language support, and determinism, along with confidentiality rules, create this blockage. The authors identify information asymmetry between what the AI produces and what can be qualified as the core open problem.

Core claim

The paper establishes that a capability-deployment verification gap prevents four of the studied companies from integrating experimentally demonstrated higher-level agentic AI into production workflows. Adequate output verification mechanisms are missing, so human-in-the-loop review stays the sole trusted method. This situation is produced by four recurring barriers: constraints on LLM context windows when aggregating diverse knowledge, weaker results on proprietary programming languages and protocols, non-determinism that conflicts with qualification standards, and worries over data confidentiality. From these barriers two interdependent dimensions arise, information asymmetry and qualifica

What carries the argument

The capability-deployment verification gap, which separates experimental AI capabilities from deployable production systems due to the lack of trusted verification beyond human oversight.

If this is right

Seven companies operate only at Level 1 with basic AI assistants.
Four companies reach Level 2 as AI compensators but no further in production.
Only one company achieves Level 3 multi-agent orchestration.
Large and safety-regulated organizations appear among the more advanced adopters.
The verification gap is reinforced by context constraints, proprietary language issues, non-determinism, and confidentiality concerns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developing automated verification tools could allow more companies to advance beyond experimental stages.
The gap may widen in highly regulated sectors where qualification standards are strict.
Exploring standardized processes for qualifying AI outputs might reduce information asymmetry.
Further studies could test whether the identified barriers persist across additional industries not covered here.

Load-bearing premise

The six-level maturity framework correctly sorts the observed adoption stages and the interview sample from twelve companies captures the main barriers that would apply more broadly.

What would settle it

Observing a company that has moved higher-level agentic AI capabilities into production using automated verification instead of human review would indicate the gap does not hold as described.

read the original abstract

Agentic AI systems are entering software engineering workflows, yet empirical evidence on how industrial organizations actually adopt them remains sparse. We present a qualitative interview study with sixteen practitioners across twelve companies of varying size and domain. This study characterizes the current agentic AI adoption state of these companies, employing a six-level maturity framework adapted from established AI-driven organizations. The findings reveal that seven companies operate at Level~1 (AI Assistants), four companies at Level~2 (AI Compensators), and only one in Level~3 (Multi-Agent Orchestration), with large and safety-regulated organizations among the most advanced adopters. The primary finding is a capability-deployment verification gap, four companies demonstrated higher-level experimental AI capabilities but cannot integrate them into production workflows because adequate output verification mechanisms are absent, leaving human-in-the-loop as the only trusted verification mechanism. This gap is shaped by four recurring barriers: context window of LLMs constraints especially when diverse knowledge aggregation is needed, under-performance on proprietary programming languages and protocols, non-determinism incompatible with qualification standards, and data confidentiality concerns. Two interdependent dimensions of this gap emerge from these findings (information asymmetry and qualification absence) framing a core open problem for industrial agentic integration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Small interview study maps agentic AI maturity levels and flags a verification gap, but thin methods details limit how far the findings travel.

read the letter

The main takeaway is that this paper gives a snapshot of where 12 companies sit on an adapted six-level AI maturity scale, with most at the bottom and a clear split between experimental capability and production deployment. The four companies that hit higher experimental levels hit a wall on verification, and the authors pull out four recurring barriers from the interviews: context-window limits, poor handling of proprietary languages, non-determinism clashing with qualification rules, and confidentiality worries. Two dimensions (information asymmetry and missing qualification processes) are named as the core open problem.

What the work actually does is collect practitioner accounts and map them to the framework. The distribution (seven at level 1, four at level 2, one at level 3) and the explicit listing of the barriers are new data points for this specific domain. The barriers themselves line up with known LLM weaknesses, but tying them directly to the capability-deployment gap in agentic settings adds a concrete industry angle.

The soft spots are straightforward. The abstract and stress-test note give no information on interview protocol, how the maturity framework was adapted, or the coding process, so the mapping from quotes to levels and barriers rests on unshown judgment calls. With only 16 people across 12 firms, any claim about recurring barriers stays descriptive of this sample; the paper does not show evidence that the pattern holds more widely. Large or safety-regulated firms appear more advanced in the sample, but that could be selection or reporting bias.

This is the kind of paper that belongs in a reading group if the group wants current industry signals on agentic AI, but it is not strong enough on its own to shift research priorities. I would not cite it for a technical claim, though the barrier list could be referenced as practitioner-reported obstacles. It deserves a serious referee because the topic is live and the empirical gap is real, even if the current write-up needs clearer methods and tighter scope statements before publication.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a qualitative interview study with 16 practitioners from 12 companies on the adoption of agentic AI in industrial software engineering. Using an adapted six-level maturity framework, it reports that most companies are at low maturity levels (7 at Level 1, 4 at Level 2, 1 at Level 3), identifies a capability-deployment verification gap in four companies where experimental capabilities cannot be deployed due to lack of verification mechanisms, and outlines four barriers: LLM context window constraints, under-performance on proprietary languages, non-determinism, and data confidentiality concerns. The study frames this gap in terms of information asymmetry and qualification absence.

Significance. If the findings hold, they offer valuable empirical evidence on the current state of agentic AI adoption in industry, particularly highlighting the verification gap as a key barrier to deployment. This contributes to the software engineering literature by providing practitioner perspectives on practical challenges, which could guide future tool development and research on verification methods for non-deterministic AI systems. The qualitative approach allows for in-depth insights into real-world barriers.

major comments (2)

[Methodology] The description of the interview protocol, participant recruitment, data coding and analysis process, and the specific adaptations made to the six-level maturity framework is insufficient. Without these details, it is difficult to evaluate the reliability of the maturity level assignments and the identification of the four recurring barriers, which are central to the primary finding of the verification gap.
[Findings] The claim that four companies demonstrated higher-level experimental capabilities but could not integrate them due to absent verification mechanisms relies on self-reported data; additional evidence or examples from the interviews would strengthen this load-bearing observation.

minor comments (1)

[Abstract] The abstract mentions the sample size and findings but could briefly note the qualitative nature and limitations for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and outline the revisions we will incorporate.

read point-by-point responses

Referee: [Methodology] The description of the interview protocol, participant recruitment, data coding and analysis process, and the specific adaptations made to the six-level maturity framework is insufficient. Without these details, it is difficult to evaluate the reliability of the maturity level assignments and the identification of the four recurring barriers, which are central to the primary finding of the verification gap.

Authors: We agree that the methodology section would benefit from greater detail to support evaluation of the findings. In the revised manuscript we will expand the relevant sections to describe: the participant recruitment process and selection criteria for the 12 companies; the semi-structured interview protocol with representative questions; the data coding and thematic analysis procedure, including how themes were derived and validated; and the precise adaptations made to the six-level maturity framework together with the criteria used to assign levels from interview responses. These additions will improve transparency without altering the core findings. revision: yes
Referee: [Findings] The claim that four companies demonstrated higher-level experimental capabilities but could not integrate them due to absent verification mechanisms relies on self-reported data; additional evidence or examples from the interviews would strengthen this load-bearing observation.

Authors: We acknowledge that additional concrete examples would strengthen the presentation of this central observation. While the study relies on self-reported interview data, which is inherent to qualitative research of this type, we will add further anonymized interview excerpts and specific illustrative examples in the revised manuscript to substantiate the capability-deployment verification gap across the four companies, while preserving confidentiality. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This paper is a qualitative interview study with 16 practitioners across 12 companies. It maps self-reported adoption states to an adapted six-level maturity framework and enumerates observed barriers directly from the interview data. No equations, fitted parameters, predictions, or derivations exist; the capability-deployment gap and four barriers are presented as empirical observations within the studied sample. The framework is described as adapted from prior work but functions only as a classification lens, not as a self-referential input that forces the reported outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the adapted six-level maturity framework and the assumption that the 16 interviews capture representative barriers; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The six-level maturity framework adapted from established AI-driven organizations is a valid and appropriate measure for classifying agentic AI adoption in software engineering contexts.
The study explicitly employs this framework to assign companies to levels 1-3.

pith-pipeline@v0.9.1-grok · 5748 in / 1363 out tokens · 36183 ms · 2026-06-30T20:20:40.017170+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 15 canonical work pages · 2 internal anchors

[1]

Akbar, M.A., Khan, A.A., Hamza, M., et al.: Agentic AI in Software Engineering: Practitioner Perspectives Across the Software Development Life Cycle (Sep 2025), https://papers.ssrn.com/abstract=5520159

2025
[2]

Akhoroz, M., Yildirim, C.: Conversational AI as a Coding Assistant: Understanding Programmers’ Interactions with and Expectations from Large Language Models for Coding (Mar 2025),http://arxiv.org/abs/2503.16508

work page arXiv 2025
[3]

Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344

Alenezi, M., Akour, M.: AI-Driven Innovations in Software Engineering: A Review of Current Practices and Future Directions. Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344

2025
[4]

In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026)

Bosch, J., Olsson, H.H.: Towards AI-Driven Organizations. In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026). https://doi.org/10.1007/978-3-032-04207-1_19 16 Spyridon Alvanakis Apostolou, Jan Bosch, and Helena Holmström Olsson

work page doi:10.1007/978-3-032-04207-1_19 2026
[5]

Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

Ferino, S., Hoda, R., Grundy, J., Treude, C.: Walking the Tightrope of LLMs for Software Development: A Practitioners’ Perspective (Nov 2025),http://arxiv. org/abs/2511.06428, arXiv:2511.06428 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Routledge (2017)

Glaser, B., Strauss, A.: Discovery of grounded theory: Strategies for qualitative research. Routledge (2017)

2017
[7]

He, J., Treude, C., Lo, D.: LLM-Based Multi-Agent Systems for Software En- gineering: Literature Review, Vision and the Road Ahead (Apr 2024),https: //arxiv.org/abs/2404.04834v4

work page arXiv 2024
[8]

Journal of Computer Information Systems65(4), 489–517 (Jul 2025),https://doi.org/10.1080/08874417.2025.2483832

Hughes, L., Dwivedi, Y.K., Malik, T., et al.: AI Agents and Agentic Systems: A Multi-Expert Analysis. Journal of Computer Information Systems65(4), 489–517 (Jul 2025),https://doi.org/10.1080/08874417.2025.2483832

work page doi:10.1080/08874417.2025.2483832 2025
[9]

Kelly, H., Anton, T., Jeff, H.: Context Rot: How Increasing Input Tokens Impacts LLM Performance (Jul 2025),https://research.trychroma.com/context-rot

2025
[10]

Kwa, T., West, B., Becker, J., et al.: Measuring AI Ability to Complete Long Tasks (Mar 2025),http://arxiv.org/abs/2503.14499

work page arXiv 2025
[11]

Liang, J.T., Yang, C., Myers, B.A.: A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges (Sep 2023),http://arxiv.org/ abs/2303.17125

work page arXiv 2023
[12]

In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

Mastropaolo,A.,Pascarella,L.,Guglielmi,E.,etal.:OntheRobustnessofCodeGen- eration Techniques: An Empirical Study on GitHub Copilot. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). pp. 2149–2160 (May 2023),https://ieeexplore.ieee.org/abstract/document/10172792

work page arXiv 2023
[13]

Nguyen, M.H., Chau, T.P., Nguyen, P.X., et al.: AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology (Jul 2024),http: //arxiv.org/abs/2406.11912

work page arXiv 2024
[14]

SAGE Publications (Oct 2014)

Patton, M.Q.: Qualitative Research & Evaluation Methods: Integrating Theory and Practice. SAGE Publications (Oct 2014)

2014
[15]

IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

Qin, Y., Wang, S., Lou, Y., et al.: SoapFL: A Standard Operating Procedure for LLM-Based Method-Level Fault Localization. IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

2025
[16]

Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8

Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8

work page doi:10.1007/s10664-008-9102-8 2009
[17]

Sawant, P.: Agentic AI: A Quantitative Analysis of Performance and Applications (Feb 2025),https://www.preprints.org/manuscript/202502.1647

work page arXiv 2025
[18]

Stray, V., Brandtzæg, E.G., Wivestad, V.T., et al.: Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study (Jan 2026),http://arxiv.org/abs/2509.20353

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

Sun, S., Staron, M.: Agentic Pipelines in Embedded Software Engineering: Emerging Practices and Challenges (Jan 2026),http://arxiv.org/abs/2601.10220

work page arXiv 2026
[20]

European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10

Walsham, G.: Interpretive case studies in IS research: nature and method. European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10. 1057/ejis.1995.9

1995
[21]

Watanabe, M., Li, H., Kashiwa, Y., et al.: On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub (Sep 2025),https://arxiv.org/abs/ 2509.14745v3

work page arXiv 2025
[22]

SAGE (2009)

Yin, R.K.: Case Study Research: Design and Methods. SAGE (2009)

2009
[23]

ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848

Yujia, F., Peng, L., Amjed, T., et al.: Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study. ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848

work page doi:10.1145/3716848 2025

[1] [1]

Akbar, M.A., Khan, A.A., Hamza, M., et al.: Agentic AI in Software Engineering: Practitioner Perspectives Across the Software Development Life Cycle (Sep 2025), https://papers.ssrn.com/abstract=5520159

2025

[2] [2]

Akhoroz, M., Yildirim, C.: Conversational AI as a Coding Assistant: Understanding Programmers’ Interactions with and Expectations from Large Language Models for Coding (Mar 2025),http://arxiv.org/abs/2503.16508

work page arXiv 2025

[3] [3]

Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344

Alenezi, M., Akour, M.: AI-Driven Innovations in Software Engineering: A Review of Current Practices and Future Directions. Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344

2025

[4] [4]

In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026)

Bosch, J., Olsson, H.H.: Towards AI-Driven Organizations. In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026). https://doi.org/10.1007/978-3-032-04207-1_19 16 Spyridon Alvanakis Apostolou, Jan Bosch, and Helena Holmström Olsson

work page doi:10.1007/978-3-032-04207-1_19 2026

[5] [5]

Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

Ferino, S., Hoda, R., Grundy, J., Treude, C.: Walking the Tightrope of LLMs for Software Development: A Practitioners’ Perspective (Nov 2025),http://arxiv. org/abs/2511.06428, arXiv:2511.06428 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

Routledge (2017)

Glaser, B., Strauss, A.: Discovery of grounded theory: Strategies for qualitative research. Routledge (2017)

2017

[7] [7]

He, J., Treude, C., Lo, D.: LLM-Based Multi-Agent Systems for Software En- gineering: Literature Review, Vision and the Road Ahead (Apr 2024),https: //arxiv.org/abs/2404.04834v4

work page arXiv 2024

[8] [8]

Journal of Computer Information Systems65(4), 489–517 (Jul 2025),https://doi.org/10.1080/08874417.2025.2483832

Hughes, L., Dwivedi, Y.K., Malik, T., et al.: AI Agents and Agentic Systems: A Multi-Expert Analysis. Journal of Computer Information Systems65(4), 489–517 (Jul 2025),https://doi.org/10.1080/08874417.2025.2483832

work page doi:10.1080/08874417.2025.2483832 2025

[9] [9]

Kelly, H., Anton, T., Jeff, H.: Context Rot: How Increasing Input Tokens Impacts LLM Performance (Jul 2025),https://research.trychroma.com/context-rot

2025

[10] [10]

Kwa, T., West, B., Becker, J., et al.: Measuring AI Ability to Complete Long Tasks (Mar 2025),http://arxiv.org/abs/2503.14499

work page arXiv 2025

[11] [11]

Liang, J.T., Yang, C., Myers, B.A.: A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges (Sep 2023),http://arxiv.org/ abs/2303.17125

work page arXiv 2023

[12] [12]

In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

Mastropaolo,A.,Pascarella,L.,Guglielmi,E.,etal.:OntheRobustnessofCodeGen- eration Techniques: An Empirical Study on GitHub Copilot. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). pp. 2149–2160 (May 2023),https://ieeexplore.ieee.org/abstract/document/10172792

work page arXiv 2023

[13] [13]

Nguyen, M.H., Chau, T.P., Nguyen, P.X., et al.: AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology (Jul 2024),http: //arxiv.org/abs/2406.11912

work page arXiv 2024

[14] [14]

SAGE Publications (Oct 2014)

Patton, M.Q.: Qualitative Research & Evaluation Methods: Integrating Theory and Practice. SAGE Publications (Oct 2014)

2014

[15] [15]

IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

Qin, Y., Wang, S., Lou, Y., et al.: SoapFL: A Standard Operating Procedure for LLM-Based Method-Level Fault Localization. IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

2025

[16] [16]

Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8

Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8

work page doi:10.1007/s10664-008-9102-8 2009

[17] [17]

Sawant, P.: Agentic AI: A Quantitative Analysis of Performance and Applications (Feb 2025),https://www.preprints.org/manuscript/202502.1647

work page arXiv 2025

[18] [18]

Stray, V., Brandtzæg, E.G., Wivestad, V.T., et al.: Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study (Jan 2026),http://arxiv.org/abs/2509.20353

work page internal anchor Pith review Pith/arXiv arXiv 2026

[19] [19]

Sun, S., Staron, M.: Agentic Pipelines in Embedded Software Engineering: Emerging Practices and Challenges (Jan 2026),http://arxiv.org/abs/2601.10220

work page arXiv 2026

[20] [20]

European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10

Walsham, G.: Interpretive case studies in IS research: nature and method. European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10. 1057/ejis.1995.9

1995

[21] [21]

Watanabe, M., Li, H., Kashiwa, Y., et al.: On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub (Sep 2025),https://arxiv.org/abs/ 2509.14745v3

work page arXiv 2025

[22] [22]

SAGE (2009)

Yin, R.K.: Case Study Research: Design and Methods. SAGE (2009)

2009

[23] [23]

ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848

Yujia, F., Peng, L., Amjed, T., et al.: Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study. ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848

work page doi:10.1145/3716848 2025