'A bit of chaos and madness': The AI Assessment Scale and the work of assessment reform

(2) Durham University; (3) University of Staffordshire; Craig Holdcroft (3) ((1) British University Vietnam; Darius Postma (1); Jasper Roe (2); Mike Perkins (1); Susan Sisay (3); United Kingdom; United Kingdom); Vietnam

arxiv: 2606.26729 · v2 · pith:E4GG3U7Cnew · submitted 2026-06-25 · 💻 cs.HC

'A bit of chaos and madness': The AI Assessment Scale and the work of assessment reform

Mike Perkins (1) , Darius Postma (1) , Jasper Roe (2) , Susan Sisay (3) , Craig Holdcroft (3) ((1) British University Vietnam , Vietnam , (2) Durham University , United Kingdom

show 2 more authors

(3) University of Staffordshire United Kingdom)

This is my paper

Pith reviewed 2026-06-29 05:21 UTC · model grok-4.3

classification 💻 cs.HC

keywords AI Assessment Scalegenerative AIassessment reformacademic staffqualitative studyCritical AI Literacyuniversity implementationfocus groups

0 comments

The pith

The AI Assessment Scale gives staff a shared language for GenAI use but risks becoming compliance without ties to learning outcomes and context.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies how academic staff at one private university in Vietnam and one public university in the UK experienced rolling out the Artificial Intelligence Assessment Scale under pressure from generative AI. Five focus groups with thirty staff yielded six themes on integration, capacity, ethics, and pedagogy. Staff saw the scale as useful for legitimising AI boundaries and prompting assessment redesign. Yet the same data show that governance, workload, disciplinary fit, and outcome alignment decide whether the scale drives real change or just adds a procedural layer.

Core claim

The AIAS supplies a common vocabulary that legitimises certain GenAI practices and prompts reflection on assessment design, yet its movement from policy to classroom practice depends on governance structures, tool access, staff confidence, workload, integrity concerns, disciplinary context, and explicit alignment with learning outcomes; when those conditions are absent the scale tends to function as a compliance mechanism rather than a catalyst for authentic assessment.

What carries the argument

The Artificial Intelligence Assessment Scale (AIAS), a structured framework for categorising levels of AI use in student work, which staff treat as a legitimising device and reflection prompt.

If this is right

The scale supplies staff with a shared language that legitimises GenAI use and clarifies boundaries.
Implementation success depends on governance, tool access, staff confidence, and workload.
When aligned with learning outcomes the scale can encourage authentic assessment design and student engagement.
Without that alignment and without disciplinary fit the scale tends to operate as a compliance layer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar scales may require embedded professional development to prevent adoption from remaining superficial.
Student perspectives on the same scale would likely reveal additional barriers or enablers not captured in staff-only data.
Comparative trials across more varied institutional types could test whether the reported conditions are general or context-specific.

Load-bearing premise

The hybrid thematic analysis of five focus groups with thirty staff, guided by Critical AI Literacy, sufficiently identifies the institutional conditions that decide whether the AIAS produces pedagogical change or mere compliance.

What would settle it

A follow-up study that measures changes in assessment authenticity and student engagement after AIAS adoption while holding constant the degree of alignment with learning outcomes and disciplinary context.

read the original abstract

Generative artificial intelligence (GenAI) has intensified pressure on universities to redesign assessment while maintaining integrity, equity, and validity. Structured frameworks such as the Artificial Intelligence Assessment Scale (AIAS) offer one response, but evidence of how staff experience their implementation remains limited. This qualitative study examines AIAS implementation at a private international university in Vietnam and a public university in the United Kingdom. Data from five focus groups with 30 academic staff were analysed using hybrid thematic analysis, with Critical AI Literacy used as a sensitising concept. Six themes were developed: recognising and integrating AI, facilitating conditions, building capacity, pathways to adoption, ethics in practice, and reframing pedagogy. Staff valued the AIAS as a shared language for legitimising GenAI use, clarifying boundaries, and prompting reflection on assessment design. However, implementation was shaped by governance, tool access, staff confidence, workload, integrity concerns, disciplinary context, and alignment with learning outcomes. The findings show that the AIAS could prompt authentic assessment design and student engagement, but may become a compliance layer when disconnected from learning outcomes, disciplinary context, and staff capacity. This study contributes empirical evidence on the institutional conditions through which GenAI assessment frameworks move from policy adoption to pedagogical enactment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper supplies original focus-group data from two universities on AIAS rollout but the analysis stays at the level of staff perceptions without linking them to observable practice changes.

read the letter

The punchline is that this paper brings some new focus-group evidence from two different universities on how the AI Assessment Scale lands with academic staff, but the evidence doesn't really pin down whether the scale drives better assessment or just adds another administrative hoop.

What stands out as new is the collection of data from a private international university in Vietnam and a public university in the UK. That's not something that's been done much for this particular framework. The paper does a reasonable job of pulling out six themes from the five focus groups with 30 staff – things like recognising AI, facilitating conditions, building capacity, adoption pathways, ethics, and reframing pedagogy. Staff apparently see value in having a shared language for AI use, and the authors tie this to Critical AI Literacy ideas.

Where it gets soft is in the methods. The description mentions hybrid thematic analysis but gives no details on coding, agreement between coders, or how researcher bias was handled. More importantly, everything rests on what staff said in groups. There's no look at actual assessment documents, student work, or changes in practice before and after. So when the paper says the scale could prompt authentic design but might turn into compliance when disconnected from outcomes and context, that's an inference from perceptions, not a tested link. The stress-test note captures this accurately.

For readers, this is probably most relevant to people in university teaching and learning centers or those writing local GenAI policies. It might give them some talking points on what staff worry about. Serious researchers in AI and education might skim it for the themes but won't find much that challenges existing literature.

I'd recommend sending it to peer review. The data is original enough that referees could help strengthen the methods reporting and see if the claims can be tightened.

Referee Report

2 major / 0 minor

Summary. The paper reports a qualitative study of staff experiences implementing the Artificial Intelligence Assessment Scale (AIAS) at a private university in Vietnam and a public university in the UK. Five focus groups with 30 academic staff were analysed via hybrid thematic analysis, using Critical AI Literacy as a sensitising concept. Six themes emerged (recognising and integrating AI, facilitating conditions, building capacity, pathways to adoption, ethics in practice, reframing pedagogy). The central claim is that AIAS can serve as a shared language that legitimises GenAI use and prompts authentic assessment design and student engagement, but risks becoming a compliance layer when disconnected from learning outcomes, disciplinary context, and staff capacity. The study positions itself as providing empirical evidence on the institutional conditions that determine whether such frameworks move from policy to effective pedagogical practice.

Significance. If the findings hold after methodological strengthening, the paper supplies useful empirical data on real-world enactment of GenAI assessment frameworks. The themes on governance, workload, disciplinary variation, and alignment with outcomes offer concrete levers that institutions could use when designing implementation support. This is a timely contribution to the assessment-reform literature in higher education.

major comments (2)

[Methods] Methods (hybrid thematic analysis description): The manuscript supplies no information on coding procedures, inter-coder agreement, member checking, or handling of researcher positionality. Because the six themes and the central distinction between 'prompting authentic assessment' and 'compliance layer' rest entirely on this analysis, the absence of these standard qualitative safeguards is load-bearing for the trustworthiness of the reported findings.
[Findings / Discussion] Findings and Discussion: The study relies exclusively on focus-group self-reports. No direct observation of assessment artefacts, student work, or policy documents is described. Consequently, the claim that specific institutional conditions determine whether AIAS produces authentic design versus compliance remains an inference from perceptions rather than an evidenced mechanism linking reported conditions to observable practice.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below, indicating where revisions will be made to improve transparency and precision while preserving the study's qualitative focus on staff perceptions.

read point-by-point responses

Referee: [Methods] Methods (hybrid thematic analysis description): The manuscript supplies no information on coding procedures, inter-coder agreement, member checking, or handling of researcher positionality. Because the six themes and the central distinction between 'prompting authentic assessment' and 'compliance layer' rest entirely on this analysis, the absence of these standard qualitative safeguards is load-bearing for the trustworthiness of the reported findings.

Authors: We agree that the Methods section requires expanded detail on analytical procedures. The hybrid thematic analysis combined deductive elements from Critical AI Literacy with inductive development. In revision we will add a subsection specifying: independent coding of an initial transcript subset by two researchers, iterative codebook refinement through team discussion, percentage agreement calculated on a 20% sample of transcripts, rationale for not conducting formal member checking (logistical constraints with international sites), and researcher positionality (team expertise in education technology and AI ethics, with reflexive notes on interpretive influences). These additions will directly support the trustworthiness of the six themes. revision: yes
Referee: [Findings / Discussion] Findings and Discussion: The study relies exclusively on focus-group self-reports. No direct observation of assessment artefacts, student work, or policy documents is described. Consequently, the claim that specific institutional conditions determine whether AIAS produces authentic design versus compliance remains an inference from perceptions rather than an evidenced mechanism linking reported conditions to observable practice.

Authors: The study design prioritises focus-group data to capture staff-reported experiences of AIAS implementation, which is appropriate for surfacing the perceptual conditions that shape policy-to-practice translation. We do not claim direct observational mechanisms. In the revised Discussion we will explicitly frame all claims as grounded in self-reported perceptions, note the absence of artefact or policy-document analysis as a scope limitation, and clarify that the authentic-design versus compliance distinction reflects staff accounts rather than verified practice outcomes. This increases transparency without altering the empirical contribution. revision: partial

Circularity Check

0 steps flagged

No circularity: standard qualitative empirical study with no derivations or self-referential logic

full rationale

This paper reports a qualitative study based on five focus groups (n=30) analyzed via hybrid thematic analysis with Critical AI Literacy as a sensitising concept. Six themes are developed from staff perceptions regarding AIAS implementation. No equations, models, predictions, fitted parameters, or derivations appear anywhere in the text. No self-citations are invoked as load-bearing premises, and no claims reduce by construction to the study's own inputs. The findings are presented as direct outputs of the thematic analysis of participant data, making the work self-contained against external benchmarks with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a qualitative empirical study in education research. It relies on standard social-science methods (focus groups, hybrid thematic analysis) and the established sensitising concept of Critical AI Literacy. No free parameters, mathematical axioms, or new invented entities are introduced or fitted.

pith-pipeline@v0.9.1-grok · 5799 in / 1185 out tokens · 35600 ms · 2026-06-29T05:21:00.849342+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 3 canonical work pages · 1 internal anchor

[1]

H., & James, S

An, Y., Yu, J. H., & James, S. (2025). Investigating the higher education institutions’ guidelines and policies regarding the use of generative AI in teaching, learning, research, and administration. International Journal of Educational Technology in Higher Education, 22(1),

2025
[2]

originality

https://doi.org/10.1186/s41239-025-00507-3 Ardito, C. G. (2025). Generative AI detection in higher education assessments. New Directions for Teaching and Learning, 2025(182), 11–28. https://doi.org/10.1002/tl.20624 Ateşkan, A. (2026). Between delegation and responsibility: An exploratory case study of graduate educators' conceptualizations of AI -supporte...

work page doi:10.1186/s41239-025-00507-3 2025
[3]

https://doi.org/10.53761/q3azde36 Perkins, M., & Roe, J. (2025). The end of assessment as we know it: GenAI, inequality and the future of knowing. In UNESCO (Ed.), AI and the future of education: Disruptions, dilemmas and directions (pp. 76–80). UNESCO. https://doi.org/10.54675/KECK1261 Perkins, M., Roe, J., & Furze, L. (2025a). How (not) to use the AI As...

work page doi:10.53761/q3azde36 2025
[4]

https://doi.org/10.5334/jime.961 Roe, J., Perkins, M., Bannister, P., Furze, L., & Wood, J. (2026). Dramaturgies of deception: AI humanizers and the performance of legitimacy in higher education assessment. https://doi.org/10.48550/arXiv.2605.02649 Roe, J., Perkins, M., & Giray, L. (2026). Assessment twins: An approach for strengthening assessment validit...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5334/jime.961 2026

[1] [1]

H., & James, S

An, Y., Yu, J. H., & James, S. (2025). Investigating the higher education institutions’ guidelines and policies regarding the use of generative AI in teaching, learning, research, and administration. International Journal of Educational Technology in Higher Education, 22(1),

2025

[2] [2]

originality

https://doi.org/10.1186/s41239-025-00507-3 Ardito, C. G. (2025). Generative AI detection in higher education assessments. New Directions for Teaching and Learning, 2025(182), 11–28. https://doi.org/10.1002/tl.20624 Ateşkan, A. (2026). Between delegation and responsibility: An exploratory case study of graduate educators' conceptualizations of AI -supporte...

work page doi:10.1186/s41239-025-00507-3 2025

[3] [3]

https://doi.org/10.53761/q3azde36 Perkins, M., & Roe, J. (2025). The end of assessment as we know it: GenAI, inequality and the future of knowing. In UNESCO (Ed.), AI and the future of education: Disruptions, dilemmas and directions (pp. 76–80). UNESCO. https://doi.org/10.54675/KECK1261 Perkins, M., Roe, J., & Furze, L. (2025a). How (not) to use the AI As...

work page doi:10.53761/q3azde36 2025

[4] [4]

https://doi.org/10.5334/jime.961 Roe, J., Perkins, M., Bannister, P., Furze, L., & Wood, J. (2026). Dramaturgies of deception: AI humanizers and the performance of legitimacy in higher education assessment. https://doi.org/10.48550/arXiv.2605.02649 Roe, J., Perkins, M., & Giray, L. (2026). Assessment twins: An approach for strengthening assessment validit...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.5334/jime.961 2026