'A bit of chaos and madness': The AI Assessment Scale and the work of assessment reform
Pith reviewed 2026-06-29 05:21 UTC · model grok-4.3
The pith
The AI Assessment Scale gives staff a shared language for GenAI use but risks becoming compliance without ties to learning outcomes and context.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The AIAS supplies a common vocabulary that legitimises certain GenAI practices and prompts reflection on assessment design, yet its movement from policy to classroom practice depends on governance structures, tool access, staff confidence, workload, integrity concerns, disciplinary context, and explicit alignment with learning outcomes; when those conditions are absent the scale tends to function as a compliance mechanism rather than a catalyst for authentic assessment.
What carries the argument
The Artificial Intelligence Assessment Scale (AIAS), a structured framework for categorising levels of AI use in student work, which staff treat as a legitimising device and reflection prompt.
If this is right
- The scale supplies staff with a shared language that legitimises GenAI use and clarifies boundaries.
- Implementation success depends on governance, tool access, staff confidence, and workload.
- When aligned with learning outcomes the scale can encourage authentic assessment design and student engagement.
- Without that alignment and without disciplinary fit the scale tends to operate as a compliance layer.
Where Pith is reading between the lines
- Similar scales may require embedded professional development to prevent adoption from remaining superficial.
- Student perspectives on the same scale would likely reveal additional barriers or enablers not captured in staff-only data.
- Comparative trials across more varied institutional types could test whether the reported conditions are general or context-specific.
Load-bearing premise
The hybrid thematic analysis of five focus groups with thirty staff, guided by Critical AI Literacy, sufficiently identifies the institutional conditions that decide whether the AIAS produces pedagogical change or mere compliance.
What would settle it
A follow-up study that measures changes in assessment authenticity and student engagement after AIAS adoption while holding constant the degree of alignment with learning outcomes and disciplinary context.
read the original abstract
Generative artificial intelligence (GenAI) has intensified pressure on universities to redesign assessment while maintaining integrity, equity, and validity. Structured frameworks such as the Artificial Intelligence Assessment Scale (AIAS) offer one response, but evidence of how staff experience their implementation remains limited. This qualitative study examines AIAS implementation at a private international university in Vietnam and a public university in the United Kingdom. Data from five focus groups with 30 academic staff were analysed using hybrid thematic analysis, with Critical AI Literacy used as a sensitising concept. Six themes were developed: recognising and integrating AI, facilitating conditions, building capacity, pathways to adoption, ethics in practice, and reframing pedagogy. Staff valued the AIAS as a shared language for legitimising GenAI use, clarifying boundaries, and prompting reflection on assessment design. However, implementation was shaped by governance, tool access, staff confidence, workload, integrity concerns, disciplinary context, and alignment with learning outcomes. The findings show that the AIAS could prompt authentic assessment design and student engagement, but may become a compliance layer when disconnected from learning outcomes, disciplinary context, and staff capacity. This study contributes empirical evidence on the institutional conditions through which GenAI assessment frameworks move from policy adoption to pedagogical enactment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a qualitative study of staff experiences implementing the Artificial Intelligence Assessment Scale (AIAS) at a private university in Vietnam and a public university in the UK. Five focus groups with 30 academic staff were analysed via hybrid thematic analysis, using Critical AI Literacy as a sensitising concept. Six themes emerged (recognising and integrating AI, facilitating conditions, building capacity, pathways to adoption, ethics in practice, reframing pedagogy). The central claim is that AIAS can serve as a shared language that legitimises GenAI use and prompts authentic assessment design and student engagement, but risks becoming a compliance layer when disconnected from learning outcomes, disciplinary context, and staff capacity. The study positions itself as providing empirical evidence on the institutional conditions that determine whether such frameworks move from policy to effective pedagogical practice.
Significance. If the findings hold after methodological strengthening, the paper supplies useful empirical data on real-world enactment of GenAI assessment frameworks. The themes on governance, workload, disciplinary variation, and alignment with outcomes offer concrete levers that institutions could use when designing implementation support. This is a timely contribution to the assessment-reform literature in higher education.
major comments (2)
- [Methods] Methods (hybrid thematic analysis description): The manuscript supplies no information on coding procedures, inter-coder agreement, member checking, or handling of researcher positionality. Because the six themes and the central distinction between 'prompting authentic assessment' and 'compliance layer' rest entirely on this analysis, the absence of these standard qualitative safeguards is load-bearing for the trustworthiness of the reported findings.
- [Findings / Discussion] Findings and Discussion: The study relies exclusively on focus-group self-reports. No direct observation of assessment artefacts, student work, or policy documents is described. Consequently, the claim that specific institutional conditions determine whether AIAS produces authentic design versus compliance remains an inference from perceptions rather than an evidenced mechanism linking reported conditions to observable practice.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below, indicating where revisions will be made to improve transparency and precision while preserving the study's qualitative focus on staff perceptions.
read point-by-point responses
-
Referee: [Methods] Methods (hybrid thematic analysis description): The manuscript supplies no information on coding procedures, inter-coder agreement, member checking, or handling of researcher positionality. Because the six themes and the central distinction between 'prompting authentic assessment' and 'compliance layer' rest entirely on this analysis, the absence of these standard qualitative safeguards is load-bearing for the trustworthiness of the reported findings.
Authors: We agree that the Methods section requires expanded detail on analytical procedures. The hybrid thematic analysis combined deductive elements from Critical AI Literacy with inductive development. In revision we will add a subsection specifying: independent coding of an initial transcript subset by two researchers, iterative codebook refinement through team discussion, percentage agreement calculated on a 20% sample of transcripts, rationale for not conducting formal member checking (logistical constraints with international sites), and researcher positionality (team expertise in education technology and AI ethics, with reflexive notes on interpretive influences). These additions will directly support the trustworthiness of the six themes. revision: yes
-
Referee: [Findings / Discussion] Findings and Discussion: The study relies exclusively on focus-group self-reports. No direct observation of assessment artefacts, student work, or policy documents is described. Consequently, the claim that specific institutional conditions determine whether AIAS produces authentic design versus compliance remains an inference from perceptions rather than an evidenced mechanism linking reported conditions to observable practice.
Authors: The study design prioritises focus-group data to capture staff-reported experiences of AIAS implementation, which is appropriate for surfacing the perceptual conditions that shape policy-to-practice translation. We do not claim direct observational mechanisms. In the revised Discussion we will explicitly frame all claims as grounded in self-reported perceptions, note the absence of artefact or policy-document analysis as a scope limitation, and clarify that the authentic-design versus compliance distinction reflects staff accounts rather than verified practice outcomes. This increases transparency without altering the empirical contribution. revision: partial
Circularity Check
No circularity: standard qualitative empirical study with no derivations or self-referential logic
full rationale
This paper reports a qualitative study based on five focus groups (n=30) analyzed via hybrid thematic analysis with Critical AI Literacy as a sensitising concept. Six themes are developed from staff perceptions regarding AIAS implementation. No equations, models, predictions, fitted parameters, or derivations appear anywhere in the text. No self-citations are invoked as load-bearing premises, and no claims reduce by construction to the study's own inputs. The findings are presented as direct outputs of the thematic analysis of participant data, making the work self-contained against external benchmarks with no circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
H., & James, S
An, Y., Yu, J. H., & James, S. (2025). Investigating the higher education institutions’ guidelines and policies regarding the use of generative AI in teaching, learning, research, and administration. International Journal of Educational Technology in Higher Education, 22(1),
2025
-
[2]
https://doi.org/10.1186/s41239-025-00507-3 Ardito, C. G. (2025). Generative AI detection in higher education assessments. New Directions for Teaching and Learning, 2025(182), 11–28. https://doi.org/10.1002/tl.20624 Ateşkan, A. (2026). Between delegation and responsibility: An exploratory case study of graduate educators' conceptualizations of AI -supporte...
-
[3]
https://doi.org/10.53761/q3azde36 Perkins, M., & Roe, J. (2025). The end of assessment as we know it: GenAI, inequality and the future of knowing. In UNESCO (Ed.), AI and the future of education: Disruptions, dilemmas and directions (pp. 76–80). UNESCO. https://doi.org/10.54675/KECK1261 Perkins, M., Roe, J., & Furze, L. (2025a). How (not) to use the AI As...
-
[4]
https://doi.org/10.5334/jime.961 Roe, J., Perkins, M., Bannister, P., Furze, L., & Wood, J. (2026). Dramaturgies of deception: AI humanizers and the performance of legitimacy in higher education assessment. https://doi.org/10.48550/arXiv.2605.02649 Roe, J., Perkins, M., & Giray, L. (2026). Assessment twins: An approach for strengthening assessment validit...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.5334/jime.961 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.