pith. sign in

arxiv: 2604.02792 · v1 · submitted 2026-04-03 · 💻 cs.CY · cs.HC

Generative AI Use in Professional Graduate Thesis Writing: Adoption, Perceived Outcomes, and the Role of a Research-Specialized Agent

Pith reviewed 2026-05-13 19:00 UTC · model grok-4.3

classification 💻 cs.CY cs.HC
keywords generative AIthesis writingMBA studentsresearch agentsadoptionperceived qualityAI scaffoldingverification skills
0
0 comments X

The pith

Nearly all MBA thesis students use generative AI and report major gains in structure and quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents results from a post-thesis survey of 83 Japanese MBA students showing that generative AI has become a standard part of the research-writing process for almost everyone. Students describe AI as helpful for organizing arguments, revising drafts, and speeding up work, with an average perceived quality lift of 6.27 out of 7. They still worry about factual accuracy and proper citations. A specialized research agent called GAMER PAT was rated higher than general tools for deepening questions and building structure. The finding reframes the practical problem from adoption to teaching students how to check and govern AI outputs.

Core claim

In the survey, 95.2 percent of students reported at least some generative AI use and 77.1 percent reported heavy use across literature review, drafting, and stuck-point consultation. Reported benefits included clearer arguments and structure for 82.3 percent, better revision quality for 73.4 percent, and faster writing for 70.9 percent. When students compared GAMER PAT, a research-specialized agent, against other AI tools, they significantly preferred it for inquiry deepening and structural organization.

What carries the argument

Post-thesis survey of AI adoption rates and direct preference ratings for a research-specialized agent (GAMER PAT) versus general tools across the full thesis workflow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If specialized agents continue to outperform general ones, programs could develop or license custom research scaffolds for graduate work.
  • Objective metrics such as examiner scores or citation error rates would be needed to confirm whether perceived quality gains are real.
  • The same workflow integration and verification challenges are likely to appear in other professional master's programs beyond business.

Load-bearing premise

Students' self-reported feelings of quality improvement and tool preference accurately reflect real gains in thesis quality without any independent checks or comparison to non-AI writing.

What would settle it

A controlled comparison in which examiners score thesis quality, argument strength, and citation accuracy for matched AI-assisted versus non-AI-assisted submissions from the same program.

Figures

Figures reproduced from arXiv: 2604.02792 by Hiroshi Kanno, Kenji Saito, Rei Tajika, Satoru Shibuya.

Figure 1
Figure 1. Figure 1: AI tools used, among AI users (n = 79). such as DeepL or Grammarly (19.0%), and search-summarization tools such as Perplexity (13.9%) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Phases of AI use in the thesis workflow, among AI users ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Perceived quality improvement (7-point scale), among AI u [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Perceived benefits and concerns among AI users ( [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: GAMER PAT vs. other AI: respondent ratings on overall p [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

This paper reports a survey of generative AI use among 83 MBA thesis students in Japan (target population 230; 36.1% response rate), conducted after thesis examiner evaluation. AI use was nearly universal: 95.2% reported at least some use and 77.1% heavy use. Students engaged AI across the full research-writing workflow - literature review, drafting, and consultation when stuck - reporting benefits centered on clearer argument and structure (82.3%), better revision quality (73.4%), and faster writing (70.9%), with a mean perceived quality improvement of 6.27 out of 7. Concerns about output accuracy (75.9%) and citation handling persisted alongside these gains. Among respondents who rated GAMER PAT, a research-specialized agent, against other AI, preferences significantly favored it for inquiry deepening and structural organization (both p < 0.05, exact binomial). A preliminary qualitative analysis of follow-up interviews further reveals active epistemic vigilance strategies and differentiated tool use across thesis phases. The central implication is not adoption itself but a shift in the educational challenge toward verification, source governance, and AI tool design - with GAMER PAT offering preliminary evidence that research-specialized scaffolding matters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. This paper surveys 83 MBA thesis students in Japan (36.1% response rate from 230) on generative AI use in professional thesis writing after examiner evaluation. It reports near-universal adoption (95.2% at least some use, 77.1% heavy use), perceived benefits in argument structure (82.3%), revision quality (73.4%), writing speed (70.9%), and overall quality (mean 6.27/7), alongside concerns about accuracy (75.9%) and citations. Quantitative analysis shows statistically significant preference for the research-specialized agent GAMER PAT over other AI for inquiry deepening and structural organization (both p<0.05, exact binomial), supported by qualitative interviews revealing epistemic vigilance strategies. The central claim is that the key educational challenge has shifted to verification and source governance, with GAMER PAT providing preliminary evidence that specialized scaffolding matters.

Significance. If the results hold under objective validation, this study supplies timely empirical data on generative AI adoption patterns and perceived impacts in graduate professional writing, particularly in a Japanese MBA context. The documentation of differentiated tool use, active vigilance strategies, and statistically tested preferences for specialized agents like GAMER PAT offers a concrete basis for curriculum design and AI tool development focused on critical evaluation skills rather than blanket prohibition.

major comments (2)
  1. [Results] Results section on perceived outcomes: The headline claims of quality improvement (mean 6.27/7) and structural benefits (82.3%) rest solely on unvalidated self-reports collected after thesis evaluation, with no reported correlations to objective metrics such as examiner scores, final grades, or blinded quality rubrics. This directly weakens support for the stronger implications about actual outcomes and the value of specialized scaffolding.
  2. [Results] Methods and Results on GAMER PAT preferences: The statistically significant preference for GAMER PAT (p<0.05) is presented as preliminary evidence that research-specialized scaffolding matters, yet no linkage is shown to differences in actual thesis quality, usage logs, or control comparisons with non-specialized tools, leaving the causal implication unsupported.
minor comments (3)
  1. [Methods] The 36.1% response rate is acknowledged but not accompanied by non-response bias analysis or demographic comparison to the full target population of 230 students.
  2. [Results] Clarify the exact sample sizes and respondent subsets for each binomial test on GAMER PAT preferences, as the abstract does not specify how many rated the specialized agent.
  3. [Methods] The qualitative interview analysis is described as preliminary; adding a brief methods subsection on coding procedures and inter-rater reliability would strengthen transparency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and limitations of our exploratory survey. Our study focuses on self-reported adoption patterns, perceived outcomes, and tool preferences among Japanese MBA students, rather than objective measures of thesis quality. We address each major comment below by revising the manuscript to better align claims with the data collected.

read point-by-point responses
  1. Referee: [Results] Results section on perceived outcomes: The headline claims of quality improvement (mean 6.27/7) and structural benefits (82.3%) rest solely on unvalidated self-reports collected after thesis evaluation, with no reported correlations to objective metrics such as examiner scores, final grades, or blinded quality rubrics. This directly weakens support for the stronger implications about actual outcomes and the value of specialized scaffolding.

    Authors: We agree that the reported benefits are based exclusively on self-reports and that no objective metrics (e.g., examiner scores or grades) were collected or correlated. The study design was a post-evaluation survey capturing student perceptions in a real-world professional thesis context; institutional privacy policies prevented access to individual examiner data. We have revised the abstract, results, and discussion to explicitly frame all quality-related findings as 'perceived' or 'self-reported' and have added a dedicated limitations subsection stating the absence of objective validation. This tempers the implications while preserving the value of documenting perceived patterns in an under-studied population. revision: partial

  2. Referee: [Results] Methods and Results on GAMER PAT preferences: The statistically significant preference for GAMER PAT (p<0.05) is presented as preliminary evidence that research-specialized scaffolding matters, yet no linkage is shown to differences in actual thesis quality, usage logs, or control comparisons with non-specialized tools, leaving the causal implication unsupported.

    Authors: The GAMER PAT preference results derive from within-respondent comparisons (exact binomial test) among students who used multiple tools, supplemented by qualitative interview themes on differentiated use. We present this only as preliminary evidence of preference, not as proof of causal impact on thesis quality. We have revised the results and conclusion sections to use more cautious phrasing ('suggestive of potential value' rather than 'evidence that specialized scaffolding matters') and have expanded the methods and limitations sections to note the lack of usage logs, control groups, or quality linkages, which were outside the survey's scope. No stronger causal claims are retained. revision: partial

Circularity Check

0 steps flagged

No circularity; all claims derive from survey data and standard statistical tests

full rationale

The paper reports empirical results from a survey of 83 respondents (36.1% response rate), including adoption rates (95.2% some use, 77.1% heavy use), perceived benefits (e.g., 82.3% clearer structure), mean quality improvement (6.27/7), and exact binomial tests on GAMER PAT preferences (p<0.05 for inquiry and organization). No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the derivation chain. Claims about specialized scaffolding rest on direct respondent ratings and tests within this dataset, not on renaming, ansatz smuggling, or uniqueness theorems imported from prior author work. The analysis is self-contained against external benchmarks via standard survey methods.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard survey research assumptions about response accuracy and sample representativeness rather than new axioms or invented entities.

axioms (1)
  • domain assumption Self-reported survey responses accurately reflect actual AI usage patterns and perceived quality changes
    This assumption underpins the reported 95.2% adoption rate, 82.3% clarity benefit, and mean quality score of 6.27.

pith-pipeline@v0.9.0 · 5537 in / 1403 out tokens · 79376 ms · 2026-05-13T19:00:30.475403+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    Large language models reflect human citation pattern s with a heightened citation bias

    Andres Algaba, Carmen Mazijn, Vincent Holst, Floriano Tori, Sylvia Wenmackers, and Vincent Ginis. Large language models reflect human citation pattern s with a heightened citation bias. In Findings of the Association for Computational Linguistics : NAACL 2025, pages 6844–6879, Albuquerque, New Mexico, 2025. Association f or Computational Linguistics

  2. [2]

    Debby R. E. Cotton, Peter A. Cotton, and James R. Shipway. Ch atting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2):228–239, 2024

  3. [3]

    ChatGPT for good? On o pportunities and challenges of large language models for education

    Enkelejda Kasneci, Kathrin Seßler, Stefan K¨ uchemann, Maria B annert, Daryna Demen- tieva, Frank Fischer, Urs Gasser, George Groh, Stephan G¨ unne mann, Eyke H¨ ullermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, J¨ urgen Pfeffer, Olek- sandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matth ias Stadler, Jochen Weller...

  4. [4]

    S tudents–generative AI interaction patterns and its impact on academic writing

    Jinhee Kim, Sang-Soog Lee, Rita Detrick, Jingyi Wang, and Na Li. S tudents–generative AI interaction patterns and its impact on academic writing. Journal of Computing in Higher Education , 38:504–525, 2026

  5. [5]

    SelfCheckGPT: Z ero-resource black- box hallucination detection for generative large language models

    Potsawee Manakul, Adian Liusie, and Mark Gales. SelfCheckGPT: Z ero-resource black- box hallucination detection for generative large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Pr ocessing, pages 9004–9017, Singapore, 2023. Association for Computational Linguistics

  6. [6]

    GAMER PAT: Research as a serious ga me, 2025

    Kenji Saito and Rei Tadika. GAMER PAT: Research as a serious ga me, 2025. arXiv:2510.21719 [cs.HC]. https://arxiv.org/abs/2510.21719. 11