pith. sign in

arxiv: 2605.14675 · v1 · pith:PNA363CUnew · submitted 2026-05-14 · 💻 cs.SE

Agentic AI in Industry: Adoption Level and Deployment Barriers

Pith reviewed 2026-06-30 20:20 UTC · model grok-4.3

classification 💻 cs.SE
keywords agentic AIadoption maturityverification gapdeployment barriershuman-in-the-loopindustrial software engineeringmulti-agent systems
0
0 comments X

The pith

Industrial firms encounter a verification gap that blocks deployment of advanced agentic AI systems into production.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports on interviews with sixteen practitioners from twelve companies to map how agentic AI is being adopted in industry. It applies a six-level maturity model and finds that most organizations remain at the first two levels, with only one reaching multi-agent orchestration. The key observation is that four companies have tested higher capabilities yet cannot move them into live workflows because no reliable way exists to check AI outputs without human review. Technical limits on context, language support, and determinism, along with confidentiality rules, create this blockage. The authors identify information asymmetry between what the AI produces and what can be qualified as the core open problem.

Core claim

The paper establishes that a capability-deployment verification gap prevents four of the studied companies from integrating experimentally demonstrated higher-level agentic AI into production workflows. Adequate output verification mechanisms are missing, so human-in-the-loop review stays the sole trusted method. This situation is produced by four recurring barriers: constraints on LLM context windows when aggregating diverse knowledge, weaker results on proprietary programming languages and protocols, non-determinism that conflicts with qualification standards, and worries over data confidentiality. From these barriers two interdependent dimensions arise, information asymmetry and qualifica

What carries the argument

The capability-deployment verification gap, which separates experimental AI capabilities from deployable production systems due to the lack of trusted verification beyond human oversight.

If this is right

  • Seven companies operate only at Level 1 with basic AI assistants.
  • Four companies reach Level 2 as AI compensators but no further in production.
  • Only one company achieves Level 3 multi-agent orchestration.
  • Large and safety-regulated organizations appear among the more advanced adopters.
  • The verification gap is reinforced by context constraints, proprietary language issues, non-determinism, and confidentiality concerns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developing automated verification tools could allow more companies to advance beyond experimental stages.
  • The gap may widen in highly regulated sectors where qualification standards are strict.
  • Exploring standardized processes for qualifying AI outputs might reduce information asymmetry.
  • Further studies could test whether the identified barriers persist across additional industries not covered here.

Load-bearing premise

The six-level maturity framework correctly sorts the observed adoption stages and the interview sample from twelve companies captures the main barriers that would apply more broadly.

What would settle it

Observing a company that has moved higher-level agentic AI capabilities into production using automated verification instead of human review would indicate the gap does not hold as described.

read the original abstract

Agentic AI systems are entering software engineering workflows, yet empirical evidence on how industrial organizations actually adopt them remains sparse. We present a qualitative interview study with sixteen practitioners across twelve companies of varying size and domain. This study characterizes the current agentic AI adoption state of these companies, employing a six-level maturity framework adapted from established AI-driven organizations. The findings reveal that seven companies operate at Level~1 (AI Assistants), four companies at Level~2 (AI Compensators), and only one in Level~3 (Multi-Agent Orchestration), with large and safety-regulated organizations among the most advanced adopters. The primary finding is a capability-deployment verification gap, four companies demonstrated higher-level experimental AI capabilities but cannot integrate them into production workflows because adequate output verification mechanisms are absent, leaving human-in-the-loop as the only trusted verification mechanism. This gap is shaped by four recurring barriers: context window of LLMs constraints especially when diverse knowledge aggregation is needed, under-performance on proprietary programming languages and protocols, non-determinism incompatible with qualification standards, and data confidentiality concerns. Two interdependent dimensions of this gap emerge from these findings (information asymmetry and qualification absence) framing a core open problem for industrial agentic integration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a qualitative interview study with 16 practitioners from 12 companies on the adoption of agentic AI in industrial software engineering. Using an adapted six-level maturity framework, it reports that most companies are at low maturity levels (7 at Level 1, 4 at Level 2, 1 at Level 3), identifies a capability-deployment verification gap in four companies where experimental capabilities cannot be deployed due to lack of verification mechanisms, and outlines four barriers: LLM context window constraints, under-performance on proprietary languages, non-determinism, and data confidentiality concerns. The study frames this gap in terms of information asymmetry and qualification absence.

Significance. If the findings hold, they offer valuable empirical evidence on the current state of agentic AI adoption in industry, particularly highlighting the verification gap as a key barrier to deployment. This contributes to the software engineering literature by providing practitioner perspectives on practical challenges, which could guide future tool development and research on verification methods for non-deterministic AI systems. The qualitative approach allows for in-depth insights into real-world barriers.

major comments (2)
  1. [Methodology] The description of the interview protocol, participant recruitment, data coding and analysis process, and the specific adaptations made to the six-level maturity framework is insufficient. Without these details, it is difficult to evaluate the reliability of the maturity level assignments and the identification of the four recurring barriers, which are central to the primary finding of the verification gap.
  2. [Findings] The claim that four companies demonstrated higher-level experimental capabilities but could not integrate them due to absent verification mechanisms relies on self-reported data; additional evidence or examples from the interviews would strengthen this load-bearing observation.
minor comments (1)
  1. [Abstract] The abstract mentions the sample size and findings but could briefly note the qualitative nature and limitations for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and outline the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Methodology] The description of the interview protocol, participant recruitment, data coding and analysis process, and the specific adaptations made to the six-level maturity framework is insufficient. Without these details, it is difficult to evaluate the reliability of the maturity level assignments and the identification of the four recurring barriers, which are central to the primary finding of the verification gap.

    Authors: We agree that the methodology section would benefit from greater detail to support evaluation of the findings. In the revised manuscript we will expand the relevant sections to describe: the participant recruitment process and selection criteria for the 12 companies; the semi-structured interview protocol with representative questions; the data coding and thematic analysis procedure, including how themes were derived and validated; and the precise adaptations made to the six-level maturity framework together with the criteria used to assign levels from interview responses. These additions will improve transparency without altering the core findings. revision: yes

  2. Referee: [Findings] The claim that four companies demonstrated higher-level experimental capabilities but could not integrate them due to absent verification mechanisms relies on self-reported data; additional evidence or examples from the interviews would strengthen this load-bearing observation.

    Authors: We acknowledge that additional concrete examples would strengthen the presentation of this central observation. While the study relies on self-reported interview data, which is inherent to qualitative research of this type, we will add further anonymized interview excerpts and specific illustrative examples in the revised manuscript to substantiate the capability-deployment verification gap across the four companies, while preserving confidentiality. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This paper is a qualitative interview study with 16 practitioners across 12 companies. It maps self-reported adoption states to an adapted six-level maturity framework and enumerates observed barriers directly from the interview data. No equations, fitted parameters, predictions, or derivations exist; the capability-deployment gap and four barriers are presented as empirical observations within the studied sample. The framework is described as adapted from prior work but functions only as a classification lens, not as a self-referential input that forces the reported outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the adapted six-level maturity framework and the assumption that the 16 interviews capture representative barriers; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The six-level maturity framework adapted from established AI-driven organizations is a valid and appropriate measure for classifying agentic AI adoption in software engineering contexts.
    The study explicitly employs this framework to assign companies to levels 1-3.

pith-pipeline@v0.9.1-grok · 5748 in / 1363 out tokens · 36183 ms · 2026-06-30T20:20:40.017170+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 15 canonical work pages · 2 internal anchors

  1. [1]

    Akbar, M.A., Khan, A.A., Hamza, M., et al.: Agentic AI in Software Engineering: Practitioner Perspectives Across the Software Development Life Cycle (Sep 2025), https://papers.ssrn.com/abstract=5520159

  2. [2]

    Akhoroz, M., Yildirim, C.: Conversational AI as a Coding Assistant: Understanding Programmers’ Interactions with and Expectations from Large Language Models for Coding (Mar 2025),http://arxiv.org/abs/2503.16508

  3. [3]

    Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344

    Alenezi, M., Akour, M.: AI-Driven Innovations in Software Engineering: A Review of Current Practices and Future Directions. Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344

  4. [4]

    In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026)

    Bosch, J., Olsson, H.H.: Towards AI-Driven Organizations. In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026). https://doi.org/10.1007/978-3-032-04207-1_19 16 Spyridon Alvanakis Apostolou, Jan Bosch, and Helena Holmström Olsson

  5. [5]

    Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective

    Ferino, S., Hoda, R., Grundy, J., Treude, C.: Walking the Tightrope of LLMs for Software Development: A Practitioners’ Perspective (Nov 2025),http://arxiv. org/abs/2511.06428, arXiv:2511.06428 [cs]

  6. [6]

    Routledge (2017)

    Glaser, B., Strauss, A.: Discovery of grounded theory: Strategies for qualitative research. Routledge (2017)

  7. [7]

    He, J., Treude, C., Lo, D.: LLM-Based Multi-Agent Systems for Software En- gineering: Literature Review, Vision and the Road Ahead (Apr 2024),https: //arxiv.org/abs/2404.04834v4

  8. [8]

    Journal of Computer Information Systems65(4), 489–517 (Jul 2025),https://doi.org/10.1080/08874417.2025.2483832

    Hughes, L., Dwivedi, Y.K., Malik, T., et al.: AI Agents and Agentic Systems: A Multi-Expert Analysis. Journal of Computer Information Systems65(4), 489–517 (Jul 2025),https://doi.org/10.1080/08874417.2025.2483832

  9. [9]

    Kelly, H., Anton, T., Jeff, H.: Context Rot: How Increasing Input Tokens Impacts LLM Performance (Jul 2025),https://research.trychroma.com/context-rot

  10. [10]

    Kwa, T., West, B., Becker, J., et al.: Measuring AI Ability to Complete Long Tasks (Mar 2025),http://arxiv.org/abs/2503.14499

  11. [11]

    Liang, J.T., Yang, C., Myers, B.A.: A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges (Sep 2023),http://arxiv.org/ abs/2303.17125

  12. [12]

    In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

    Mastropaolo,A.,Pascarella,L.,Guglielmi,E.,etal.:OntheRobustnessofCodeGen- eration Techniques: An Empirical Study on GitHub Copilot. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). pp. 2149–2160 (May 2023),https://ieeexplore.ieee.org/abstract/document/10172792

  13. [13]

    Nguyen, M.H., Chau, T.P., Nguyen, P.X., et al.: AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology (Jul 2024),http: //arxiv.org/abs/2406.11912

  14. [14]

    SAGE Publications (Oct 2014)

    Patton, M.Q.: Qualitative Research & Evaluation Methods: Integrating Theory and Practice. SAGE Publications (Oct 2014)

  15. [15]

    IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

    Qin, Y., Wang, S., Lou, Y., et al.: SoapFL: A Standard Operating Procedure for LLM-Based Method-Level Fault Localization. IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

  16. [16]

    Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8

    Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8

  17. [17]

    Sawant, P.: Agentic AI: A Quantitative Analysis of Performance and Applications (Feb 2025),https://www.preprints.org/manuscript/202502.1647

  18. [18]

    Stray, V., Brandtzæg, E.G., Wivestad, V.T., et al.: Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study (Jan 2026),http://arxiv.org/abs/2509.20353

  19. [19]

    Sun, S., Staron, M.: Agentic Pipelines in Embedded Software Engineering: Emerging Practices and Challenges (Jan 2026),http://arxiv.org/abs/2601.10220

  20. [20]

    European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10

    Walsham, G.: Interpretive case studies in IS research: nature and method. European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10. 1057/ejis.1995.9

  21. [21]

    Watanabe, M., Li, H., Kashiwa, Y., et al.: On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub (Sep 2025),https://arxiv.org/abs/ 2509.14745v3

  22. [22]

    SAGE (2009)

    Yin, R.K.: Case Study Research: Design and Methods. SAGE (2009)

  23. [23]

    ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848

    Yujia, F., Peng, L., Amjed, T., et al.: Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study. ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848