Game of Tones: Faculty detection of GPT-4 generated content in university assessments

(2) James Cook University Singapore; Darius Postma (1); Don Hickerson (1) ((1) British University Vietnam; James McGaughran (1); Jasper Roe (2); Mike Perkins (1); Singapore); Vietnam

arxiv: 2305.18081 · v1 · pith:4KYNH4YZnew · submitted 2023-05-29 · 💻 cs.CY · cs.AI

Game of Tones: Faculty detection of GPT-4 generated content in university assessments

Mike Perkins (1) , Jasper Roe (2) , Darius Postma (1) , James McGaughran (1) , Don Hickerson (1) ((1) British University Vietnam , Vietnam , (2) James Cook University Singapore , Singapore) This is my paper

classification 💻 cs.CY cs.AI

keywords contentdetectionacademicassessmentfacultygpt-4submissionsai-generated

0 comments

read the original abstract

This study explores the robustness of university assessments against the use of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and evaluates the ability of academic staff to detect its use when supported by the Turnitin Artificial Intelligence (AI) detection tool. The research involved twenty-two GPT-4 generated submissions being created and included in the assessment process to be marked by fifteen different faculty members. The study reveals that although the detection tool identified 91% of the experimental submissions as containing some AI-generated content, the total detected content was only 54.8%. This suggests that the use of adversarial techniques regarding prompt engineering is an effective method in evading AI detection tools and highlights that improvements to AI detection software are needed. Using the Turnitin AI detect tool, faculty reported 54.5% of the experimental submissions to the academic misconduct process, suggesting the need for increased awareness and training into these tools. Genuine submissions received a mean score of 54.4, whereas AI-generated content scored 52.3, indicating the comparable performance of GPT-4 in real-life situations. Recommendations include adjusting assessment strategies to make them more resistant to the use of AI tools, using AI-inclusive assessment where possible, and providing comprehensive training programs for faculty and students. This research contributes to understanding the relationship between AI-generated content and academic assessment, urging further investigation to preserve academic integrity.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization
cs.LG 2026-05 unverdicted novelty 7.0

DEPO formulates detector-evasive paraphrasing as a constrained MDP and solves it via Lagrangian primal-dual RL with GRPO-style updates to achieve evasion while satisfying a semantic-preservation constraint.
Fighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations
cs.NI 2026-03 unverdicted novelty 6.0

AI-Sinkhole uses AI classification with quantized LLMs and Pi-Hole DNS blocking to dynamically prevent access to LLM services during student evaluations, reporting F1 scores above 0.83.
GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization
cs.CL 2024-10 unverdicted novelty 6.0

GigaCheck detects LLM-generated text at both document and span levels by combining fine-tuned language-model embeddings with a DETR-like architecture that treats generated intervals as detectable objects.