arxiv: 2604.24521 · v1 · submitted 2026-04-27 · 💻 cs.SE

Recognition: unknown

How Do Software Engineering Students Use Generative AI in Real-World Capstone Projects? An Empirical Baseline Study

Michael Mircea , Elisa Schmid , Jakob Droste , Kurt Schneider

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:05 UTC · model grok-4.3

classification 💻 cs.SE

keywords generative AIcapstone projectssoftware engineering educationempirical studyresponsible AI usestudent workflowsAI in software development

0 comments

The pith

Software engineering students use generative AI across the full project lifecycle but emphasize verification and keeping independent understanding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to build an empirical baseline for how students actually bring generative AI into real client capstone projects rather than in controlled exercises. It gathered mixed-method survey responses from 150 students working in 18 teams on 15 external projects plus input from client stakeholders to describe usage patterns, emerging workflows, recommended practices, and concerns. A sympathetic reader would care because these projects are the main place where students learn to ship software under authentic constraints, so unguided AI adoption could quietly change what skills are practiced and what clients receive. The work distinguishes phase-specific workflows and surfaces directives that students themselves propose for responsible use.

Core claim

Generative AI appears throughout the software engineering lifecycle in student capstone work, producing distinct workflows; students recommend targeted use cases paired with verification steps and maintenance of personal comprehension; clients endorse the tools yet require teams to demonstrate understanding, deliver quality, and safeguard data; and courses will need explicit guidelines, AI literacy support, and team governance roles to handle the shift.

What carries the argument

Mixed-method surveys of self-reported attitudes, usage prevalence, workflows, benefits, risks, and client expectations collected from 150 students and stakeholders in one four-month real-world capstone course.

If this is right

Generative AI use spans requirements, design, coding, testing, and documentation, creating identifiable workflow clusters rather than uniform adoption.
Students converge on verification of outputs and preservation of independent understanding as core responsible-use rules.
Clients voice strong support for AI assistance conditional on teams retaining comprehension and meeting quality plus data-protection standards.
Course design should add explicit responsible-use guidelines, targeted literacy resources, and designated team roles for AI governance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Educators could test whether assigning separate reflection tasks on AI-assisted versus non-assisted work changes learning outcomes.
The observed workflow differences suggest that phase-specific training modules might reduce uneven adoption across teams.
Client data-protection concerns point to a need for explicit privacy modules inside capstone syllabi.

Load-bearing premise

Self-reported survey answers from one course accurately reflect real usage behaviors and perceptions without recall or social-desirability bias and can be generalized beyond that single setting.

What would settle it

A follow-up study that logs actual tool invocations in comparable capstone projects and finds large gaps between logged behavior and the survey reports, or a replication across several independent courses that produces inconsistent client expectations or workflow patterns.

Figures

Figures reproduced from arXiv: 2604.24521 by Elisa Schmid, Jakob Droste, Kurt Schneider, Michael Mircea.

**Figure 1.** Figure 1: Programming experience of the 150 participants, view at source ↗

**Figure 2.** Figure 2: Distribution of GenAI application type use among view at source ↗

**Figure 3.** Figure 3: Distribution of use cases for which 150 students used view at source ↗

**Figure 4.** Figure 4: Heatmap of the distribution of Human-In-The-Loop view at source ↗

**Figure 5.** Figure 5: Distribution of the views of students (𝑁 = 150) on GenAI in the context of a RWCP view at source ↗

read the original abstract

Real-world Capstone Projects (RWCPs) are a key component of software engineering education, enabling students to develop software for external clients under authentic conditions. Their high ecological validity, combined with substantial variation in domains, technologies, and stakeholders, typically requires flexible and minimally prescriptive teaching approaches. The rapid integration of generative AI (GenAI) into professional software development adds new challenges: students are expected to use AI tools that are common in practice, yet unguided use may affect learning, collaboration, and consistency in ways that are not yet well understood. To establish an empirical baseline for responsible GenAI integration, we conducted a large-scale study of self-determined GenAI use in an undergraduate RWCP course. The module involved 178 students working in 18 teams across 15 client projects over four months, with GenAI use explicitly permitted. We collected mixed-method survey data from 150 students on attitudes, usage prevalence, workflows, use cases, and perceived benefits and risks, and surveyed client stakeholders regarding expectations and concerns. Our findings provide (1) a characterization of GenAI practices across the software engineering lifecycle, including a distinction between emerging workflows; (2) student-recommended use cases and responsible-use directives emphasizing verification and maintaining independent understanding; (3) client perspectives highlighting strong support for GenAI use but clear expectations regarding understanding, quality, and data protection; and (4) implications for future course iterations, including the need for explicit responsible-use guidelines, targeted AI literacy resources, and team-level governance roles. This study offers a status quo baseline for evidence-based pedagogical interventions in the era of GenAI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This gives a decent descriptive baseline on GenAI use in one real capstone course, but self-reported data and the single-cohort design keep the claims from being very strong.

read the letter

This paper reports original survey data from 150 students and some clients in an undergraduate real-world capstone where GenAI tools were explicitly allowed. They describe usage patterns across the software lifecycle, note a couple of emerging workflows, collect student recommendations on verification and keeping independent understanding, and capture client views on quality and data protection. That mix of student and stakeholder perspectives in an authentic project setting is the main new piece; most prior work on GenAI in education has been smaller or less tied to client-facing work.

Referee Report

3 major / 2 minor

Summary. The paper reports results from a mixed-methods survey of 150 students (out of 178) and client stakeholders in a single undergraduate real-world capstone project course where GenAI use was explicitly permitted. It characterizes self-reported GenAI practices across the software engineering lifecycle, distinguishes emerging workflows, presents student-recommended use cases and responsible-use directives, reports client expectations around understanding/quality/data protection, and derives pedagogical implications for guidelines, AI literacy resources, and team governance roles.

Significance. If the self-report data accurately reflect actual behaviors, the study supplies a useful empirical baseline for GenAI integration in authentic SE capstone settings. The large sample, client perspective, and ecological validity of the four-month, multi-client design are strengths that could inform curriculum development in similar courses.

major comments (3)

[§3] §3 (Data Collection and Analysis): The manuscript provides no description of survey instrument validation, pilot testing, or inter-rater reliability for qualitative coding of open responses on workflows and recommendations. Because the four main findings rest entirely on these self-reports, the absence of these details directly undermines confidence in the prevalence numbers and workflow distinctions.
[§4] §4 (Results): The characterization of practices and the distinction between emerging workflows are derived solely from retrospective self-reports without triangulation against project artifacts, commit logs, or observational data. In a permitted-use setting this leaves the central claims vulnerable to recall and social-desirability bias, as noted in the skeptic's assessment.
[§5] §5 (Implications and Generalizability): The recommendations for responsible-use guidelines and team-level governance are presented as broadly applicable, yet the study is limited to one institution and one cohort; the manuscript does not discuss how the single-case design constrains external validity of the reported patterns.

minor comments (2)

[Abstract and §3] The abstract states the response rate (150/178) but the methods section should explicitly report how non-response was handled and whether any weighting or sensitivity analysis was performed.
[§4] Table or figure captions for workflow diagrams should include the exact number of respondents per category to allow readers to assess the robustness of the emerging-workflow distinction.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to improve transparency and the treatment of limitations.

read point-by-point responses

Referee: [§3] §3 (Data Collection and Analysis): The manuscript provides no description of survey instrument validation, pilot testing, or inter-rater reliability for qualitative coding of open responses on workflows and recommendations. Because the four main findings rest entirely on these self-reports, the absence of these details directly undermines confidence in the prevalence numbers and workflow distinctions.

Authors: We agree that greater methodological transparency is needed. In the revised manuscript we will add a dedicated subsection in §3 that describes how the survey items were adapted from prior instruments on GenAI use in software engineering education and tailored to the capstone context. For the qualitative coding of open responses we will report that the authors applied thematic analysis with iterative consensus discussions to resolve disagreements. We will explicitly note that a formal pilot test was not performed owing to course scheduling constraints and that inter-rater reliability statistics were not computed; both points will be listed as limitations. revision: partial
Referee: [§4] §4 (Results): The characterization of practices and the distinction between emerging workflows are derived solely from retrospective self-reports without triangulation against project artifacts, commit logs, or observational data. In a permitted-use setting this leaves the central claims vulnerable to recall and social-desirability bias, as noted in the skeptic's assessment.

Authors: The study was intentionally scoped as a large-scale mixed-methods survey to obtain student-reported practices and perceptions across 15 heterogeneous client projects; collecting consistent artifact or observational data would have required a substantially different research design. In the revision we will expand the Limitations section to discuss recall bias and social-desirability bias in detail and to qualify the prevalence figures and workflow distinctions accordingly. We maintain that the self-report data still constitute a useful empirical baseline for an authentic, permitted-use setting, while acknowledging that future work could strengthen validity through triangulation. revision: partial
Referee: [§5] §5 (Implications and Generalizability): The recommendations for responsible-use guidelines and team-level governance are presented as broadly applicable, yet the study is limited to one institution and one cohort; the manuscript does not discuss how the single-case design constrains external validity of the reported patterns.

Authors: We accept that the single-institution, single-cohort design limits external validity. The revised manuscript will include an explicit paragraph in the Implications and Limitations section that states the constraints arising from the particular institutional policy, client mix, and four-month timeline, while noting the compensating strength of ecological validity. Recommendations will be reframed as context-informed implications rather than universally prescriptive. revision: yes

standing simulated objections not resolved

The study collected only post-project self-report data; objective triangulation against commit logs, code artifacts, or observational records cannot be added retrospectively without a new data-collection effort.

Circularity Check

0 steps flagged

No significant circularity in empirical survey study

full rationale

This paper is a purely descriptive empirical study relying on mixed-method survey data from 150 students and client stakeholders in a single capstone course. It contains no equations, derivations, fitted parameters, ansatzes, or uniqueness theorems. All findings (characterizations of workflows, recommendations, and implications) are presented as direct summaries and interpretations of the collected responses without any reduction to inputs by construction or self-referential definitions. The central claims rest on external data collection rather than internal logical closure, making the work self-contained against its own benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard empirical assumptions about data collection rather than new parameters or entities.

axioms (2)

domain assumption Self-reported survey responses accurately reflect students' actual GenAI usage, attitudes, and perceptions.
Core data source is student surveys without independent verification such as usage logs.
domain assumption The specific RWCP course setting and participant pool are representative of typical software engineering capstone experiences.
Generalizations to broader education and implications are drawn from this single module.

pith-pipeline@v0.9.0 · 5599 in / 1377 out tokens · 77894 ms · 2026-05-08T03:05:05.726143+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 2 canonical work pages

[1]

Khawla Ali Abd Al-Hameed. 2022. Spearman’s correlation coefficient in statistical analysis.International Journal of Nonlinear Analysis and Applications13, 1 (2022), 3249–3255

2022
[2]

María Cecilia Bastarrica, Daniel Perovich, and Maíra Marques Samary. 2017. What Can Students Get from a Software Engineering Capstone Course?. In 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering Education and Training Track (ICSE-SEET). 137–145. doi:10.1109/ICSE- SEET.2017.15

work page doi:10.1109/icse- 2017
[3]

Brennan and Dale J

Robert L. Brennan and Dale J. Prediger. 1981. Coefficient Kappa: Some Uses, Misuses, and Alternatives.Educational and Psychological Measurement41, 3 (1981), 687–699

1981
[4]

Rudrajit Choudhuri, Dylan Liu, Igor Steinmacher, Marco Gerosa, and Anita Sarma. 2024. How far are we? The triumphs and trials of generative AI in learning software engineering. InProceedings of the IEEE/ACM 46th international conference on software engineering. 1–13

2024
[5]

Rudrajit Choudhuri, Ambareesh Ramakrishnan, Amreeta Chatterjee, Bianca Trinkenreich, Igor Steinmacher, Marco Gerosa, and Anita Sarma. 2025. Insights from the frontline: Genai utilization among software engineering students. In 2025 IEEE/ACM 37th International Conference on Software Engineering Education and Training (CSEE&T). IEEE, 1–12

2025
[6]

Marian Daun and Jennifer Brings. 2023. How ChatGPT will change software engineering education. InProceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 110–116

2023
[7]

Paul Denny, James Prather, Brett A Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N Reeves, Eddie Antonio Santos, and Sami Sarsa. 2024. Computing education in the era of generative AI.Commun. ACM67, 2 (2024), 56–67

2024
[8]

Luis A Gonzalez, Andres Neyem, Ignacio Contreras-McKay, and Danilo Molina
[9]

Improving learning experiences in software engineering capstone courses using artificial intelligence virtual assistants.Computer Applications in Engineer- ing Education30, 5 (2022), 1370–1389

2022
[10]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agree- ment for Categorical Data.Biometrics33, 1 (1977), 159–174

1977
[11]

Aalbers, et al., Axfoundation/strax: v1.6.4 (2024)

Michael Mircea, Jakob Droste, Elisa Schmid, and Kurt Schneider. 2026."How Do Software Engineering Students Use Generative AI in Real-World Capstone Projects? An Empirical Baseline Study" - Supplementary Material. doi:10.5281/zenodo. 18836366

work page doi:10.5281/zenodo 2026
[12]

MIT NANDA. 2025. State of AI in Business 2025.Preprint at https://www. artificialintelligence-news. com/wp-content/uploads/2025/08/ai_report_2025. pdf (2025)

2025
[13]

Sriraam Natarajan, Saurabh Mathur, Sahil Sidheekh, Wolfgang Stammer, and Kristian Kersting. 2025. Human-in-the-loop or AI-in-the-loop? Automate or Collaborate?. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 28594–28600

2025
[14]

Andres Neyem, Juan Pablo Sandoval Alcocer, Marcelo Mendoza, Leonardo Centellas-Claros, Luis A Gonzalez, and Carlos Paredes-Robles. 2024. Explor- ing the impact of generative ai for standup report recommendations in software capstone project development. InProceedings of the 55th ACM Technical Sympo- sium on Computer Science Education V. 1. 951–957

2024
[15]

James Robertson and Suzanne Robertson. 2000. Volere.Requirements Specification Templates(2000)

2000
[16]

Nimisha Roy, Omojokun Olufisayo, and Oleksandr Horielko. 2025. Empowering Future Software Engineers: Integrating AI Tools into Advanced CS Curriculum. In Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 2. 1747–1747

2025
[17]

Johnny Saldaña. 2021. The coding manual for qualitative researchers. (2021)

2021
[18]

Mario Simaremare and Henry Edison. 2024. The state of generative AI adoption from software practitioners’ perspective: An empirical study. In2024 50th Eu- romicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 106–113

2024
[19]

Charles Spearman. 1904. The Proof and Measurement of Association between Two Things.The American Journal of Psychology15, 1 (1904), 72–101. doi:10. 2307/1412159

1904
[20]

Saara Tenhunen, Tomi Männistö, Matti Luukkainen, and Petri Ihantola. 2023. A systematic literature review of capstone courses in software engineering. Information and Software Technology159 (2023), 107191

2023
[21]

2012.Experimentation in software engineering

Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in software engineering. Vol. 236. Springer. Received 02.03.2026; revised 27.04.2026; accepted 30.04.2026

2012