Agentic AI in Industry: Adoption Level and Deployment Barriers
Pith reviewed 2026-06-30 20:20 UTC · model grok-4.3
The pith
Industrial firms encounter a verification gap that blocks deployment of advanced agentic AI systems into production.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that a capability-deployment verification gap prevents four of the studied companies from integrating experimentally demonstrated higher-level agentic AI into production workflows. Adequate output verification mechanisms are missing, so human-in-the-loop review stays the sole trusted method. This situation is produced by four recurring barriers: constraints on LLM context windows when aggregating diverse knowledge, weaker results on proprietary programming languages and protocols, non-determinism that conflicts with qualification standards, and worries over data confidentiality. From these barriers two interdependent dimensions arise, information asymmetry and qualifica
What carries the argument
The capability-deployment verification gap, which separates experimental AI capabilities from deployable production systems due to the lack of trusted verification beyond human oversight.
If this is right
- Seven companies operate only at Level 1 with basic AI assistants.
- Four companies reach Level 2 as AI compensators but no further in production.
- Only one company achieves Level 3 multi-agent orchestration.
- Large and safety-regulated organizations appear among the more advanced adopters.
- The verification gap is reinforced by context constraints, proprietary language issues, non-determinism, and confidentiality concerns.
Where Pith is reading between the lines
- Developing automated verification tools could allow more companies to advance beyond experimental stages.
- The gap may widen in highly regulated sectors where qualification standards are strict.
- Exploring standardized processes for qualifying AI outputs might reduce information asymmetry.
- Further studies could test whether the identified barriers persist across additional industries not covered here.
Load-bearing premise
The six-level maturity framework correctly sorts the observed adoption stages and the interview sample from twelve companies captures the main barriers that would apply more broadly.
What would settle it
Observing a company that has moved higher-level agentic AI capabilities into production using automated verification instead of human review would indicate the gap does not hold as described.
read the original abstract
Agentic AI systems are entering software engineering workflows, yet empirical evidence on how industrial organizations actually adopt them remains sparse. We present a qualitative interview study with sixteen practitioners across twelve companies of varying size and domain. This study characterizes the current agentic AI adoption state of these companies, employing a six-level maturity framework adapted from established AI-driven organizations. The findings reveal that seven companies operate at Level~1 (AI Assistants), four companies at Level~2 (AI Compensators), and only one in Level~3 (Multi-Agent Orchestration), with large and safety-regulated organizations among the most advanced adopters. The primary finding is a capability-deployment verification gap, four companies demonstrated higher-level experimental AI capabilities but cannot integrate them into production workflows because adequate output verification mechanisms are absent, leaving human-in-the-loop as the only trusted verification mechanism. This gap is shaped by four recurring barriers: context window of LLMs constraints especially when diverse knowledge aggregation is needed, under-performance on proprietary programming languages and protocols, non-determinism incompatible with qualification standards, and data confidentiality concerns. Two interdependent dimensions of this gap emerge from these findings (information asymmetry and qualification absence) framing a core open problem for industrial agentic integration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a qualitative interview study with 16 practitioners from 12 companies on the adoption of agentic AI in industrial software engineering. Using an adapted six-level maturity framework, it reports that most companies are at low maturity levels (7 at Level 1, 4 at Level 2, 1 at Level 3), identifies a capability-deployment verification gap in four companies where experimental capabilities cannot be deployed due to lack of verification mechanisms, and outlines four barriers: LLM context window constraints, under-performance on proprietary languages, non-determinism, and data confidentiality concerns. The study frames this gap in terms of information asymmetry and qualification absence.
Significance. If the findings hold, they offer valuable empirical evidence on the current state of agentic AI adoption in industry, particularly highlighting the verification gap as a key barrier to deployment. This contributes to the software engineering literature by providing practitioner perspectives on practical challenges, which could guide future tool development and research on verification methods for non-deterministic AI systems. The qualitative approach allows for in-depth insights into real-world barriers.
major comments (2)
- [Methodology] The description of the interview protocol, participant recruitment, data coding and analysis process, and the specific adaptations made to the six-level maturity framework is insufficient. Without these details, it is difficult to evaluate the reliability of the maturity level assignments and the identification of the four recurring barriers, which are central to the primary finding of the verification gap.
- [Findings] The claim that four companies demonstrated higher-level experimental capabilities but could not integrate them due to absent verification mechanisms relies on self-reported data; additional evidence or examples from the interviews would strengthen this load-bearing observation.
minor comments (1)
- [Abstract] The abstract mentions the sample size and findings but could briefly note the qualitative nature and limitations for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below and outline the revisions we will incorporate.
read point-by-point responses
-
Referee: [Methodology] The description of the interview protocol, participant recruitment, data coding and analysis process, and the specific adaptations made to the six-level maturity framework is insufficient. Without these details, it is difficult to evaluate the reliability of the maturity level assignments and the identification of the four recurring barriers, which are central to the primary finding of the verification gap.
Authors: We agree that the methodology section would benefit from greater detail to support evaluation of the findings. In the revised manuscript we will expand the relevant sections to describe: the participant recruitment process and selection criteria for the 12 companies; the semi-structured interview protocol with representative questions; the data coding and thematic analysis procedure, including how themes were derived and validated; and the precise adaptations made to the six-level maturity framework together with the criteria used to assign levels from interview responses. These additions will improve transparency without altering the core findings. revision: yes
-
Referee: [Findings] The claim that four companies demonstrated higher-level experimental capabilities but could not integrate them due to absent verification mechanisms relies on self-reported data; additional evidence or examples from the interviews would strengthen this load-bearing observation.
Authors: We acknowledge that additional concrete examples would strengthen the presentation of this central observation. While the study relies on self-reported interview data, which is inherent to qualitative research of this type, we will add further anonymized interview excerpts and specific illustrative examples in the revised manuscript to substantiate the capability-deployment verification gap across the four companies, while preserving confidentiality. revision: yes
Circularity Check
No significant circularity
full rationale
This paper is a qualitative interview study with 16 practitioners across 12 companies. It maps self-reported adoption states to an adapted six-level maturity framework and enumerates observed barriers directly from the interview data. No equations, fitted parameters, predictions, or derivations exist; the capability-deployment gap and four barriers are presented as empirical observations within the studied sample. The framework is described as adapted from prior work but functions only as a classification lens, not as a self-referential input that forces the reported outcomes.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The six-level maturity framework adapted from established AI-driven organizations is a valid and appropriate measure for classifying agentic AI adoption in software engineering contexts.
Reference graph
Works this paper leans on
-
[1]
Akbar, M.A., Khan, A.A., Hamza, M., et al.: Agentic AI in Software Engineering: Practitioner Perspectives Across the Software Development Life Cycle (Sep 2025), https://papers.ssrn.com/abstract=5520159
2025
- [2]
-
[3]
Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344
Alenezi, M., Akour, M.: AI-Driven Innovations in Software Engineering: A Review of Current Practices and Future Directions. Applied Sciences15(3) (Jan 2025), https://www.mdpi.com/2076-3417/15/3/1344
2025
-
[4]
In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026)
Bosch, J., Olsson, H.H.: Towards AI-Driven Organizations. In: Software Engineering andAdvancedApplications.pp.280–295.SpringerNatureSwitzerland,Cham(2026). https://doi.org/10.1007/978-3-032-04207-1_19 16 Spyridon Alvanakis Apostolou, Jan Bosch, and Helena Holmström Olsson
-
[5]
Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective
Ferino, S., Hoda, R., Grundy, J., Treude, C.: Walking the Tightrope of LLMs for Software Development: A Practitioners’ Perspective (Nov 2025),http://arxiv. org/abs/2511.06428, arXiv:2511.06428 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Routledge (2017)
Glaser, B., Strauss, A.: Discovery of grounded theory: Strategies for qualitative research. Routledge (2017)
2017
- [7]
-
[8]
Hughes, L., Dwivedi, Y.K., Malik, T., et al.: AI Agents and Agentic Systems: A Multi-Expert Analysis. Journal of Computer Information Systems65(4), 489–517 (Jul 2025),https://doi.org/10.1080/08874417.2025.2483832
-
[9]
Kelly, H., Anton, T., Jeff, H.: Context Rot: How Increasing Input Tokens Impacts LLM Performance (Jul 2025),https://research.trychroma.com/context-rot
2025
- [10]
- [11]
-
[12]
In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)
Mastropaolo,A.,Pascarella,L.,Guglielmi,E.,etal.:OntheRobustnessofCodeGen- eration Techniques: An Empirical Study on GitHub Copilot. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). pp. 2149–2160 (May 2023),https://ieeexplore.ieee.org/abstract/document/10172792
- [13]
-
[14]
SAGE Publications (Oct 2014)
Patton, M.Q.: Qualitative Research & Evaluation Methods: Integrating Theory and Practice. SAGE Publications (Oct 2014)
2014
-
[15]
IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926
Qin, Y., Wang, S., Lou, Y., et al.: SoapFL: A Standard Operating Procedure for LLM-Based Method-Level Fault Localization. IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926
2025
-
[16]
Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8
Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering14(2), 131–164 (Apr 2009), https://doi.org/10.1007/s10664-008-9102-8
- [17]
-
[18]
Stray, V., Brandtzæg, E.G., Wivestad, V.T., et al.: Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study (Jan 2026),http://arxiv.org/abs/2509.20353
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [19]
-
[20]
European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10
Walsham, G.: Interpretive case studies in IS research: nature and method. European Journal of Information Systems4(2), 74–81 (May 1995),https://doi.org/10. 1057/ejis.1995.9
1995
- [21]
-
[22]
SAGE (2009)
Yin, R.K.: Case Study Research: Design and Methods. SAGE (2009)
2009
-
[23]
Yujia, F., Peng, L., Amjed, T., et al.: Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study. ACM Transactions on Software Engineering and Methodology (2025),https://dl.acm.org/doi/10.1145/3716848
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.