FOXGLOVE dataset of 2340 comments shows LLMs and instructors align on feedback goals and positions but diverge on sentence selection, with LLMs using more complex language and fewer questions and higher quality ratings driven by comment length.
A Comparative Study of Student Perspectives on Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
To address the scalability of feedback in computer science while mitigating the privacy and cost limitations of commercial Large Language Models (LLMs), this study evaluates a locally hosted Small Language Model (SLM). We deployed a quantized Llama-3.1, GPT-4, and human instructors across introductory programming (N=176), operating systems (N=80), and a writing seminar (N=7). Mixed-methods analysis of student perceptions reveals that while the local SLM matched commercial LLMs and was rated higher by students for readability and actionability in technical courses, human feedback remained more favoured for highly specialized writing tasks. We demonstrate that local SLMs offer a privacy-preserving, zero-marginal-cost alternative for foundational feedback, supporting a tiered pedagogical framework where AI handles structural guidance while instructors focus on high-level conceptual scaffolding.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FOXGLOVE: Understanding Goal-Oriented and Anchored Writing Feedback from Experts and LLMs on Argumentative Essays
FOXGLOVE dataset of 2340 comments shows LLMs and instructors align on feedback goals and positions but diverge on sentence selection, with LLMs using more complex language and fewer questions and higher quality ratings driven by comment length.