AI4SE and SE4AI Exploration: A Decade Looking Back and Forward

Daniel R. Herber; H. Sinan Bank; Thomas Bradley

arxiv: 2606.19630 · v1 · pith:T2FYQJV6new · submitted 2026-06-17 · 💻 cs.AI · cs.DL· cs.SY· eess.SY

AI4SE and SE4AI Exploration: A Decade Looking Back and Forward

H. Sinan Bank , Daniel R. Herber , Thomas Bradley This is my paper

Pith reviewed 2026-06-26 20:29 UTC · model grok-4.3

classification 💻 cs.AI cs.DLcs.SYeess.SY

keywords AI4SESE4AIsystems engineeringliterature reviewresearch gapshuman-AI agreementINCOSESERC

0 comments

The pith

A human-AI literature review of systems engineering publications identifies five critical gaps in AI4SE and SE4AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper traces the evolution of research at the intersection of artificial intelligence and systems engineering over the past decade, organizing it into foundational, applied, and LLM inflection phases. It presents the authors' views on areas of convergence and remaining challenges based on key papers. A separate analysis using human experts and six AI models evaluated the relevance of 1,712 INCOSE INSIGHT articles and 889 SERC publications to pinpoint five specific research gaps. This matters because the integration affects how complex engineered systems incorporate AI and how systems engineering practices can support reliable AI development. The work shares the underlying agreement data and a web application for others to examine the ratings.

Core claim

Through a human-AI agreement literature review, the paper identifies five critical research gaps in AI4SE and SE4AI, while describing progress across three phases and providing guidance for AI adoption, assurance, and workforce transformation in systems engineering.

What carries the argument

The human-AI agreement literature review process that combines human expertise with ratings from six AI models to assess relevance of over 2,600 publications and surface gaps.

If this is right

The five gaps offer concrete priorities for practitioners working on AI adoption in systems engineering projects.
The shared agreement data and web application let readers test their own judgments against the human and AI raters.
Guidance on assurance and workforce transformation follows directly from the identified gaps.
The three-phase historical framing can be used to track future convergence or divergence in the field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar human-AI review methods could be applied to map gaps in other interdisciplinary areas such as AI and biology or AI and materials science.
The phase labels (foundational, applied, LLM inflection) invite testing whether a fourth phase emerges with newer model capabilities.
If the gaps are closed, the result would be measurable improvements in the reliability of AI components within large engineered systems.

Load-bearing premise

The authors' selection of core papers and the relevance judgments produced by the six AI models together provide an unbiased and sufficiently complete picture of the field's open problems.

What would settle it

Re-running the relevance assessment on a larger or differently sampled set of publications, or with a different collection of AI models, and obtaining a substantially different set of gaps would falsify the claim that the five gaps are the critical ones.

Figures

Figures reproduced from arXiv: 2606.19630 by Daniel R. Herber, H. Sinan Bank, Thomas Bradley.

**Figure 2.** Figure 2: Top: Human–AI agreement by year (1995–2025). Bottom: human–model consensus by year (2020–2025), [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Mean citations per article: agreed-relevant AI articles versus general INCOSE INSIGHT articles. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: AI4SE/SE4AI Explorer interactive web application for readers to test their relevance judgments against the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Consensus AI-relevant publication count by year (2020 onwards): INSIGHT (red) and SERC (light blue) [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

The March 2020 INCOSE INSIGHT special issue on AI and Systems Engineering (SE) became the most downloaded issue in the publication's history and launched a research community that now draws over 250 registrants to its annual workshop. In this article, we trace the progress in AI and SE across three phases (labeled here foundational, applied, and LLM inflection) based on the authors' reading of the field's core papers, and describe our opinions of where the community has converged and where critical gaps remain. Separately, a human-AI agreement literature review leveraging both human expertise and six AI models was performed to assess the relevance of 1,712 INCOSE INSIGHT articles and 889 SERC publications. The results identify five critical research gaps and offer guidance for practitioners navigating AI adoption, assurance, and workforce transformation in SE. We share the agreement data and the AI4SE/SE4AI Explorer web application so readers can compare their own relevance judgments with the human and AI raters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a synthesis paper that frames AI-SE history in three phases and lists five gaps from a human-AI review of INCOSE and SERC articles, with data released.

read the letter

The paper's core contribution is a three-phase narrative (foundational, applied, LLM inflection) drawn from the authors' reading of key papers, paired with a human-AI scan of 1,712 INCOSE INSIGHT articles and 889 SERC publications that surfaces five gaps. They also release the agreement data and a web app for others to inspect the ratings.

What stands out is the effort to combine human expertise with six AI models for relevance labeling and the decision to make the raw judgments public. That makes the work more checkable than most opinion-based reviews. The practitioner guidance on adoption, assurance, and workforce issues is straightforward and tied to the sources they examined.

The soft spot is the leap from agreement statistics to "critical" gaps. The abstract and description give no detail on how the five gaps were derived from the scores, no held-out expert validation of the AI labels, and no comparison against broader venues like IEEE or ACM. Without that, the gaps risk reflecting the models' training biases or the limited corpus rather than field-wide priorities. The core-paper selection for the phases is also unblinded author judgment.

This is aimed at systems engineers and AI researchers already working on integration questions. A reader looking for an organized starting point and downloadable data will get value; someone needing new empirical results or rigorously validated open problems will not.

It deserves peer review as a review article. The data release is a plus, but referees will likely press on the validation of the AI-driven gap identification.

Referee Report

2 major / 1 minor

Summary. The manuscript traces progress in AI4SE and SE4AI across three phases (foundational, applied, and LLM inflection) based on the authors' reading of core papers. It separately reports a human-AI agreement literature review of 1,712 INCOSE INSIGHT articles and 889 SERC publications using six AI models, from which five critical research gaps are identified, along with practitioner guidance. The agreement data and an AI4SE/SE4AI Explorer web application are released.

Significance. If the five gaps are extracted through a transparent, reproducible process and validated as field-wide rather than corpus-specific, the work could usefully guide AI adoption, assurance, and workforce issues in systems engineering. The public release of agreement data and the web application is a clear strength that supports community verification and extension.

major comments (2)

[Abstract and literature-review section] Abstract and literature-review section: the claim that the review 'identifies five critical research gaps' is not supported by any description of how the gaps were derived from the agreement scores (e.g., thresholds, clustering, or post-hoc filtering), making the central output non-reproducible from the reported data.
[Human-AI agreement methodology] Human-AI agreement methodology: no external accuracy benchmark against held-out expert labels or comparison to a broader corpus (e.g., IEEE/ACM venues) is provided, so the relevance judgments from the six AI models cannot be shown to be unbiased or complete enough to establish field-wide gaps rather than artifacts of the chosen models and sources.

minor comments (1)

[Three-phase narrative] The three-phase narrative relies on unblinded core-paper selection; adding explicit inclusion criteria or a supplementary table of selected papers would improve transparency without altering the main claim.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments on reproducibility and scope. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract and literature-review section] Abstract and literature-review section: the claim that the review 'identifies five critical research gaps' is not supported by any description of how the gaps were derived from the agreement scores (e.g., thresholds, clustering, or post-hoc filtering), making the central output non-reproducible from the reported data.

Authors: We agree that the manuscript does not provide an explicit description of the derivation process for the five gaps from the agreement scores. The gaps were identified through qualitative synthesis of topics with low human-AI agreement and low coverage in the INCOSE and SERC corpora. We will revise the literature-review section to detail the methodology, including agreement thresholds, topic identification approach, and any post-hoc filtering applied. The released agreement data supports verification of this process. revision: yes
Referee: [Human-AI agreement methodology] Human-AI agreement methodology: no external accuracy benchmark against held-out expert labels or comparison to a broader corpus (e.g., IEEE/ACM venues) is provided, so the relevance judgments from the six AI models cannot be shown to be unbiased or complete enough to establish field-wide gaps rather than artifacts of the chosen models and sources.

Authors: The analysis is scoped to the INCOSE INSIGHT and SERC publications as representative sources for the AI4SE/SE4AI community. No external benchmark against held-out labels from broader venues was conducted. We will revise the methodology section to explicitly limit claims to this corpus, discuss potential model and source biases, and clarify that gaps are corpus-specific rather than field-wide. A full external validation is outside the current scope. revision: partial

standing simulated objections not resolved

Conducting an external accuracy benchmark against held-out expert labels from a broader corpus (e.g., IEEE/ACM venues) would require new data collection and labeling not feasible in this revision.

Circularity Check

0 steps flagged

Literature review derives gaps from external corpus without self-referential reduction

full rationale

The paper conducts a human-AI literature review on 1,712 INCOSE INSIGHT articles and 889 SERC publications to extract five critical research gaps, supplemented by the authors' narrative reading of core papers across three phases. No equations, fitted parameters, or derivations are present. The output (gap list) is produced by direct analysis of the external publications rather than by construction from the review process itself. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claims. The process is self-contained against external benchmarks because the source articles are independent of the present paper. This is the standard honest finding for a descriptive review paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two ad-hoc elements: the authors' subjective division of the field into three phases and the assumption that AI-model relevance scores are sufficiently reliable to surface the true gaps.

axioms (2)

ad hoc to paper The three phases (foundational, applied, LLM inflection) accurately represent the field's development
Phases are labeled based on the authors' reading of core papers.
domain assumption AI models can be used to assess publication relevance at a level comparable to human experts
Six AI models were employed alongside human raters for the 2601-article screen.

pith-pipeline@v0.9.1-grok · 5718 in / 1262 out tokens · 25801 ms · 2026-06-26T20:29:40.568056+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references

[1]

Bank, Sinan and Herber, Daniel and Bradley, Thomas , year =
[2]

Bell, Ryan and Longshore, Ryan and Madachy, Raymond , year =
[3]

INCOSE International Symposium , volume =

Bonner, Maria and Zeller, Marc and Schulz, Gabor and Savu, Ana , year =. INCOSE International Symposium , volume =
[4]

, year =

DeHart, John K. , year =. INCOSE International Symposium , volume =
[5]

Systems Engineering , volume =

Dunbar, Daniel and Hagedorn, Thomas and Blackburn, Mark and Dzielski, John and Hespelt, Steven and Kruse, Benjamin and Verma, Dinesh and Yu, Zhongyuan , year =. Systems Engineering , volume =
[6]

INSIGHT , volume =

Freeman, Laura , year =. INSIGHT , volume =
[7]

INSIGHT , volume =

Hagedorn, Thomas and Bone, Mary and Kruse, Benjamin and Grosse, Ian and Blackburn, Mark , year =. INSIGHT , volume =
[8]

INCOSE International Symposium , volume =

Johns, Brian and Carroll, Kristina and Medina, Casey and Lewark, Rae and Walliser, James , year =. INCOSE International Symposium , volume =
[9]

INCOSE International Symposium , volume =

Kulcs. INCOSE International Symposium , volume =. 2022 , title =

2022
[10]

, year =

Madni, Azad M. , year =. INSIGHT , volume =
[11]

INCOSE International Symposium , volume =

McDermott, Tom and Pepe, Kara and Clifford, Megan , year =. INCOSE International Symposium , volume =
[12]

INSIGHT , volume =

McDermott, Tom and DeLaurentis, Dan and Beling, Peter and Blackburn, Mark and Bone, Mary , year =. INSIGHT , volume =
[13]

and Esho, T

Gadewadikar, J. and Esho, T. and Marshall, J. , year =. AI4SE and SE4AI Workshop , address =
[14]

INCOSE International Symposium , volume =

Paramasivam, Prameela and P, Shashi Kumar and Paulraj, Vasantha Selvi and Chandrashekar, Rooparani , year =. INCOSE International Symposium , volume =
[15]

INSIGHT , volume =

Pepe, Kara and Hutchison, Nicole , year =. INSIGHT , volume =
[16]

Proceedings of the Design Society , volume =

Poulsen, Victor Vilhelm and Guertler, Matthias and Eisenbart, Boris and Sick, Nathalie , year =. Proceedings of the Design Society , volume =
[17]

INCOSE International Symposium , volume =

Riesener, Michael and D. INCOSE International Symposium , volume =. 2021 , title =

2021
[18]

, year =

Rouse, William B. , year =. INSIGHT , volume =
[19]

Tabassi, Elham , year =
[20]

INCOSE International Symposium , volume =

Zeller, Marc , year =. INCOSE International Symposium , volume =

[1] [1]

Bank, Sinan and Herber, Daniel and Bradley, Thomas , year =

[2] [2]

Bell, Ryan and Longshore, Ryan and Madachy, Raymond , year =

[3] [3]

INCOSE International Symposium , volume =

Bonner, Maria and Zeller, Marc and Schulz, Gabor and Savu, Ana , year =. INCOSE International Symposium , volume =

[4] [4]

, year =

DeHart, John K. , year =. INCOSE International Symposium , volume =

[5] [5]

Systems Engineering , volume =

Dunbar, Daniel and Hagedorn, Thomas and Blackburn, Mark and Dzielski, John and Hespelt, Steven and Kruse, Benjamin and Verma, Dinesh and Yu, Zhongyuan , year =. Systems Engineering , volume =

[6] [6]

INSIGHT , volume =

Freeman, Laura , year =. INSIGHT , volume =

[7] [7]

INSIGHT , volume =

Hagedorn, Thomas and Bone, Mary and Kruse, Benjamin and Grosse, Ian and Blackburn, Mark , year =. INSIGHT , volume =

[8] [8]

INCOSE International Symposium , volume =

Johns, Brian and Carroll, Kristina and Medina, Casey and Lewark, Rae and Walliser, James , year =. INCOSE International Symposium , volume =

[9] [9]

INCOSE International Symposium , volume =

Kulcs. INCOSE International Symposium , volume =. 2022 , title =

2022

[10] [10]

, year =

Madni, Azad M. , year =. INSIGHT , volume =

[11] [11]

INCOSE International Symposium , volume =

McDermott, Tom and Pepe, Kara and Clifford, Megan , year =. INCOSE International Symposium , volume =

[12] [12]

INSIGHT , volume =

McDermott, Tom and DeLaurentis, Dan and Beling, Peter and Blackburn, Mark and Bone, Mary , year =. INSIGHT , volume =

[13] [13]

and Esho, T

Gadewadikar, J. and Esho, T. and Marshall, J. , year =. AI4SE and SE4AI Workshop , address =

[14] [14]

INCOSE International Symposium , volume =

Paramasivam, Prameela and P, Shashi Kumar and Paulraj, Vasantha Selvi and Chandrashekar, Rooparani , year =. INCOSE International Symposium , volume =

[15] [15]

INSIGHT , volume =

Pepe, Kara and Hutchison, Nicole , year =. INSIGHT , volume =

[16] [16]

Proceedings of the Design Society , volume =

Poulsen, Victor Vilhelm and Guertler, Matthias and Eisenbart, Boris and Sick, Nathalie , year =. Proceedings of the Design Society , volume =

[17] [17]

INCOSE International Symposium , volume =

Riesener, Michael and D. INCOSE International Symposium , volume =. 2021 , title =

2021

[18] [18]

, year =

Rouse, William B. , year =. INSIGHT , volume =

[19] [19]

Tabassi, Elham , year =

[20] [20]

INCOSE International Symposium , volume =

Zeller, Marc , year =. INCOSE International Symposium , volume =