Generative AI Use in Professional Graduate Thesis Writing: Adoption, Perceived Outcomes, and the Role of a Research-Specialized Agent
Pith reviewed 2026-05-13 19:00 UTC · model grok-4.3
The pith
Nearly all MBA thesis students use generative AI and report major gains in structure and quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the survey, 95.2 percent of students reported at least some generative AI use and 77.1 percent reported heavy use across literature review, drafting, and stuck-point consultation. Reported benefits included clearer arguments and structure for 82.3 percent, better revision quality for 73.4 percent, and faster writing for 70.9 percent. When students compared GAMER PAT, a research-specialized agent, against other AI tools, they significantly preferred it for inquiry deepening and structural organization.
What carries the argument
Post-thesis survey of AI adoption rates and direct preference ratings for a research-specialized agent (GAMER PAT) versus general tools across the full thesis workflow.
Where Pith is reading between the lines
- If specialized agents continue to outperform general ones, programs could develop or license custom research scaffolds for graduate work.
- Objective metrics such as examiner scores or citation error rates would be needed to confirm whether perceived quality gains are real.
- The same workflow integration and verification challenges are likely to appear in other professional master's programs beyond business.
Load-bearing premise
Students' self-reported feelings of quality improvement and tool preference accurately reflect real gains in thesis quality without any independent checks or comparison to non-AI writing.
What would settle it
A controlled comparison in which examiners score thesis quality, argument strength, and citation accuracy for matched AI-assisted versus non-AI-assisted submissions from the same program.
Figures
read the original abstract
This paper reports a survey of generative AI use among 83 MBA thesis students in Japan (target population 230; 36.1% response rate), conducted after thesis examiner evaluation. AI use was nearly universal: 95.2% reported at least some use and 77.1% heavy use. Students engaged AI across the full research-writing workflow - literature review, drafting, and consultation when stuck - reporting benefits centered on clearer argument and structure (82.3%), better revision quality (73.4%), and faster writing (70.9%), with a mean perceived quality improvement of 6.27 out of 7. Concerns about output accuracy (75.9%) and citation handling persisted alongside these gains. Among respondents who rated GAMER PAT, a research-specialized agent, against other AI, preferences significantly favored it for inquiry deepening and structural organization (both p < 0.05, exact binomial). A preliminary qualitative analysis of follow-up interviews further reveals active epistemic vigilance strategies and differentiated tool use across thesis phases. The central implication is not adoption itself but a shift in the educational challenge toward verification, source governance, and AI tool design - with GAMER PAT offering preliminary evidence that research-specialized scaffolding matters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper surveys 83 MBA thesis students in Japan (36.1% response rate from 230) on generative AI use in professional thesis writing after examiner evaluation. It reports near-universal adoption (95.2% at least some use, 77.1% heavy use), perceived benefits in argument structure (82.3%), revision quality (73.4%), writing speed (70.9%), and overall quality (mean 6.27/7), alongside concerns about accuracy (75.9%) and citations. Quantitative analysis shows statistically significant preference for the research-specialized agent GAMER PAT over other AI for inquiry deepening and structural organization (both p<0.05, exact binomial), supported by qualitative interviews revealing epistemic vigilance strategies. The central claim is that the key educational challenge has shifted to verification and source governance, with GAMER PAT providing preliminary evidence that specialized scaffolding matters.
Significance. If the results hold under objective validation, this study supplies timely empirical data on generative AI adoption patterns and perceived impacts in graduate professional writing, particularly in a Japanese MBA context. The documentation of differentiated tool use, active vigilance strategies, and statistically tested preferences for specialized agents like GAMER PAT offers a concrete basis for curriculum design and AI tool development focused on critical evaluation skills rather than blanket prohibition.
major comments (2)
- [Results] Results section on perceived outcomes: The headline claims of quality improvement (mean 6.27/7) and structural benefits (82.3%) rest solely on unvalidated self-reports collected after thesis evaluation, with no reported correlations to objective metrics such as examiner scores, final grades, or blinded quality rubrics. This directly weakens support for the stronger implications about actual outcomes and the value of specialized scaffolding.
- [Results] Methods and Results on GAMER PAT preferences: The statistically significant preference for GAMER PAT (p<0.05) is presented as preliminary evidence that research-specialized scaffolding matters, yet no linkage is shown to differences in actual thesis quality, usage logs, or control comparisons with non-specialized tools, leaving the causal implication unsupported.
minor comments (3)
- [Methods] The 36.1% response rate is acknowledged but not accompanied by non-response bias analysis or demographic comparison to the full target population of 230 students.
- [Results] Clarify the exact sample sizes and respondent subsets for each binomial test on GAMER PAT preferences, as the abstract does not specify how many rated the specialized agent.
- [Methods] The qualitative interview analysis is described as preliminary; adding a brief methods subsection on coding procedures and inter-rater reliability would strengthen transparency.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope and limitations of our exploratory survey. Our study focuses on self-reported adoption patterns, perceived outcomes, and tool preferences among Japanese MBA students, rather than objective measures of thesis quality. We address each major comment below by revising the manuscript to better align claims with the data collected.
read point-by-point responses
-
Referee: [Results] Results section on perceived outcomes: The headline claims of quality improvement (mean 6.27/7) and structural benefits (82.3%) rest solely on unvalidated self-reports collected after thesis evaluation, with no reported correlations to objective metrics such as examiner scores, final grades, or blinded quality rubrics. This directly weakens support for the stronger implications about actual outcomes and the value of specialized scaffolding.
Authors: We agree that the reported benefits are based exclusively on self-reports and that no objective metrics (e.g., examiner scores or grades) were collected or correlated. The study design was a post-evaluation survey capturing student perceptions in a real-world professional thesis context; institutional privacy policies prevented access to individual examiner data. We have revised the abstract, results, and discussion to explicitly frame all quality-related findings as 'perceived' or 'self-reported' and have added a dedicated limitations subsection stating the absence of objective validation. This tempers the implications while preserving the value of documenting perceived patterns in an under-studied population. revision: partial
-
Referee: [Results] Methods and Results on GAMER PAT preferences: The statistically significant preference for GAMER PAT (p<0.05) is presented as preliminary evidence that research-specialized scaffolding matters, yet no linkage is shown to differences in actual thesis quality, usage logs, or control comparisons with non-specialized tools, leaving the causal implication unsupported.
Authors: The GAMER PAT preference results derive from within-respondent comparisons (exact binomial test) among students who used multiple tools, supplemented by qualitative interview themes on differentiated use. We present this only as preliminary evidence of preference, not as proof of causal impact on thesis quality. We have revised the results and conclusion sections to use more cautious phrasing ('suggestive of potential value' rather than 'evidence that specialized scaffolding matters') and have expanded the methods and limitations sections to note the lack of usage logs, control groups, or quality linkages, which were outside the survey's scope. No stronger causal claims are retained. revision: partial
Circularity Check
No circularity; all claims derive from survey data and standard statistical tests
full rationale
The paper reports empirical results from a survey of 83 respondents (36.1% response rate), including adoption rates (95.2% some use, 77.1% heavy use), perceived benefits (e.g., 82.3% clearer structure), mean quality improvement (6.27/7), and exact binomial tests on GAMER PAT preferences (p<0.05 for inquiry and organization). No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the derivation chain. Claims about specialized scaffolding rest on direct respondent ratings and tests within this dataset, not on renaming, ansatz smuggling, or uniqueness theorems imported from prior author work. The analysis is self-contained against external benchmarks via standard survey methods.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-reported survey responses accurately reflect actual AI usage patterns and perceived quality changes
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AI use was nearly universal: 95.2% reported at least some use... preferences significantly favored it for inquiry deepening and structural organization (both p < 0.05, exact binomial).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Large language models reflect human citation pattern s with a heightened citation bias
Andres Algaba, Carmen Mazijn, Vincent Holst, Floriano Tori, Sylvia Wenmackers, and Vincent Ginis. Large language models reflect human citation pattern s with a heightened citation bias. In Findings of the Association for Computational Linguistics : NAACL 2025, pages 6844–6879, Albuquerque, New Mexico, 2025. Association f or Computational Linguistics
work page 2025
-
[2]
Debby R. E. Cotton, Peter A. Cotton, and James R. Shipway. Ch atting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2):228–239, 2024
work page 2024
-
[3]
ChatGPT for good? On o pportunities and challenges of large language models for education
Enkelejda Kasneci, Kathrin Seßler, Stefan K¨ uchemann, Maria B annert, Daryna Demen- tieva, Frank Fischer, Urs Gasser, George Groh, Stephan G¨ unne mann, Eyke H¨ ullermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, J¨ urgen Pfeffer, Olek- sandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matth ias Stadler, Jochen Weller...
work page 2023
-
[4]
S tudents–generative AI interaction patterns and its impact on academic writing
Jinhee Kim, Sang-Soog Lee, Rita Detrick, Jingyi Wang, and Na Li. S tudents–generative AI interaction patterns and its impact on academic writing. Journal of Computing in Higher Education , 38:504–525, 2026
work page 2026
-
[5]
SelfCheckGPT: Z ero-resource black- box hallucination detection for generative large language models
Potsawee Manakul, Adian Liusie, and Mark Gales. SelfCheckGPT: Z ero-resource black- box hallucination detection for generative large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Pr ocessing, pages 9004–9017, Singapore, 2023. Association for Computational Linguistics
work page 2023
-
[6]
GAMER PAT: Research as a serious ga me, 2025
Kenji Saito and Rei Tadika. GAMER PAT: Research as a serious ga me, 2025. arXiv:2510.21719 [cs.HC]. https://arxiv.org/abs/2510.21719. 11
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.