AI4SE and SE4AI Exploration: A Decade Looking Back and Forward
Pith reviewed 2026-06-26 20:29 UTC · model grok-4.3
The pith
A human-AI literature review of systems engineering publications identifies five critical gaps in AI4SE and SE4AI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through a human-AI agreement literature review, the paper identifies five critical research gaps in AI4SE and SE4AI, while describing progress across three phases and providing guidance for AI adoption, assurance, and workforce transformation in systems engineering.
What carries the argument
The human-AI agreement literature review process that combines human expertise with ratings from six AI models to assess relevance of over 2,600 publications and surface gaps.
If this is right
- The five gaps offer concrete priorities for practitioners working on AI adoption in systems engineering projects.
- The shared agreement data and web application let readers test their own judgments against the human and AI raters.
- Guidance on assurance and workforce transformation follows directly from the identified gaps.
- The three-phase historical framing can be used to track future convergence or divergence in the field.
Where Pith is reading between the lines
- Similar human-AI review methods could be applied to map gaps in other interdisciplinary areas such as AI and biology or AI and materials science.
- The phase labels (foundational, applied, LLM inflection) invite testing whether a fourth phase emerges with newer model capabilities.
- If the gaps are closed, the result would be measurable improvements in the reliability of AI components within large engineered systems.
Load-bearing premise
The authors' selection of core papers and the relevance judgments produced by the six AI models together provide an unbiased and sufficiently complete picture of the field's open problems.
What would settle it
Re-running the relevance assessment on a larger or differently sampled set of publications, or with a different collection of AI models, and obtaining a substantially different set of gaps would falsify the claim that the five gaps are the critical ones.
Figures
read the original abstract
The March 2020 INCOSE INSIGHT special issue on AI and Systems Engineering (SE) became the most downloaded issue in the publication's history and launched a research community that now draws over 250 registrants to its annual workshop. In this article, we trace the progress in AI and SE across three phases (labeled here foundational, applied, and LLM inflection) based on the authors' reading of the field's core papers, and describe our opinions of where the community has converged and where critical gaps remain. Separately, a human-AI agreement literature review leveraging both human expertise and six AI models was performed to assess the relevance of 1,712 INCOSE INSIGHT articles and 889 SERC publications. The results identify five critical research gaps and offer guidance for practitioners navigating AI adoption, assurance, and workforce transformation in SE. We share the agreement data and the AI4SE/SE4AI Explorer web application so readers can compare their own relevance judgments with the human and AI raters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript traces progress in AI4SE and SE4AI across three phases (foundational, applied, and LLM inflection) based on the authors' reading of core papers. It separately reports a human-AI agreement literature review of 1,712 INCOSE INSIGHT articles and 889 SERC publications using six AI models, from which five critical research gaps are identified, along with practitioner guidance. The agreement data and an AI4SE/SE4AI Explorer web application are released.
Significance. If the five gaps are extracted through a transparent, reproducible process and validated as field-wide rather than corpus-specific, the work could usefully guide AI adoption, assurance, and workforce issues in systems engineering. The public release of agreement data and the web application is a clear strength that supports community verification and extension.
major comments (2)
- [Abstract and literature-review section] Abstract and literature-review section: the claim that the review 'identifies five critical research gaps' is not supported by any description of how the gaps were derived from the agreement scores (e.g., thresholds, clustering, or post-hoc filtering), making the central output non-reproducible from the reported data.
- [Human-AI agreement methodology] Human-AI agreement methodology: no external accuracy benchmark against held-out expert labels or comparison to a broader corpus (e.g., IEEE/ACM venues) is provided, so the relevance judgments from the six AI models cannot be shown to be unbiased or complete enough to establish field-wide gaps rather than artifacts of the chosen models and sources.
minor comments (1)
- [Three-phase narrative] The three-phase narrative relies on unblinded core-paper selection; adding explicit inclusion criteria or a supplementary table of selected papers would improve transparency without altering the main claim.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on reproducibility and scope. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract and literature-review section] Abstract and literature-review section: the claim that the review 'identifies five critical research gaps' is not supported by any description of how the gaps were derived from the agreement scores (e.g., thresholds, clustering, or post-hoc filtering), making the central output non-reproducible from the reported data.
Authors: We agree that the manuscript does not provide an explicit description of the derivation process for the five gaps from the agreement scores. The gaps were identified through qualitative synthesis of topics with low human-AI agreement and low coverage in the INCOSE and SERC corpora. We will revise the literature-review section to detail the methodology, including agreement thresholds, topic identification approach, and any post-hoc filtering applied. The released agreement data supports verification of this process. revision: yes
-
Referee: [Human-AI agreement methodology] Human-AI agreement methodology: no external accuracy benchmark against held-out expert labels or comparison to a broader corpus (e.g., IEEE/ACM venues) is provided, so the relevance judgments from the six AI models cannot be shown to be unbiased or complete enough to establish field-wide gaps rather than artifacts of the chosen models and sources.
Authors: The analysis is scoped to the INCOSE INSIGHT and SERC publications as representative sources for the AI4SE/SE4AI community. No external benchmark against held-out labels from broader venues was conducted. We will revise the methodology section to explicitly limit claims to this corpus, discuss potential model and source biases, and clarify that gaps are corpus-specific rather than field-wide. A full external validation is outside the current scope. revision: partial
- Conducting an external accuracy benchmark against held-out expert labels from a broader corpus (e.g., IEEE/ACM venues) would require new data collection and labeling not feasible in this revision.
Circularity Check
Literature review derives gaps from external corpus without self-referential reduction
full rationale
The paper conducts a human-AI literature review on 1,712 INCOSE INSIGHT articles and 889 SERC publications to extract five critical research gaps, supplemented by the authors' narrative reading of core papers across three phases. No equations, fitted parameters, or derivations are present. The output (gap list) is produced by direct analysis of the external publications rather than by construction from the review process itself. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claims. The process is self-contained against external benchmarks because the source articles are independent of the present paper. This is the standard honest finding for a descriptive review paper.
Axiom & Free-Parameter Ledger
axioms (2)
- ad hoc to paper The three phases (foundational, applied, LLM inflection) accurately represent the field's development
- domain assumption AI models can be used to assess publication relevance at a level comparable to human experts
Reference graph
Works this paper leans on
-
[1]
Bank, Sinan and Herber, Daniel and Bradley, Thomas , year =
-
[2]
Bell, Ryan and Longshore, Ryan and Madachy, Raymond , year =
-
[3]
INCOSE International Symposium , volume =
Bonner, Maria and Zeller, Marc and Schulz, Gabor and Savu, Ana , year =. INCOSE International Symposium , volume =
-
[4]
, year =
DeHart, John K. , year =. INCOSE International Symposium , volume =
-
[5]
Systems Engineering , volume =
Dunbar, Daniel and Hagedorn, Thomas and Blackburn, Mark and Dzielski, John and Hespelt, Steven and Kruse, Benjamin and Verma, Dinesh and Yu, Zhongyuan , year =. Systems Engineering , volume =
-
[6]
INSIGHT , volume =
Freeman, Laura , year =. INSIGHT , volume =
-
[7]
INSIGHT , volume =
Hagedorn, Thomas and Bone, Mary and Kruse, Benjamin and Grosse, Ian and Blackburn, Mark , year =. INSIGHT , volume =
-
[8]
INCOSE International Symposium , volume =
Johns, Brian and Carroll, Kristina and Medina, Casey and Lewark, Rae and Walliser, James , year =. INCOSE International Symposium , volume =
-
[9]
INCOSE International Symposium , volume =
Kulcs. INCOSE International Symposium , volume =. 2022 , title =
2022
-
[10]
, year =
Madni, Azad M. , year =. INSIGHT , volume =
-
[11]
INCOSE International Symposium , volume =
McDermott, Tom and Pepe, Kara and Clifford, Megan , year =. INCOSE International Symposium , volume =
-
[12]
INSIGHT , volume =
McDermott, Tom and DeLaurentis, Dan and Beling, Peter and Blackburn, Mark and Bone, Mary , year =. INSIGHT , volume =
-
[13]
and Esho, T
Gadewadikar, J. and Esho, T. and Marshall, J. , year =. AI4SE and SE4AI Workshop , address =
-
[14]
INCOSE International Symposium , volume =
Paramasivam, Prameela and P, Shashi Kumar and Paulraj, Vasantha Selvi and Chandrashekar, Rooparani , year =. INCOSE International Symposium , volume =
-
[15]
INSIGHT , volume =
Pepe, Kara and Hutchison, Nicole , year =. INSIGHT , volume =
-
[16]
Proceedings of the Design Society , volume =
Poulsen, Victor Vilhelm and Guertler, Matthias and Eisenbart, Boris and Sick, Nathalie , year =. Proceedings of the Design Society , volume =
-
[17]
INCOSE International Symposium , volume =
Riesener, Michael and D. INCOSE International Symposium , volume =. 2021 , title =
2021
-
[18]
, year =
Rouse, William B. , year =. INSIGHT , volume =
-
[19]
Tabassi, Elham , year =
-
[20]
INCOSE International Symposium , volume =
Zeller, Marc , year =. INCOSE International Symposium , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.