AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations
Pith reviewed 2026-05-08 16:37 UTC · model grok-4.3
The pith
AISSA combines LLMs with dashboards to give students scalable rubric feedback on presentation slides.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a system combining large language model analysis of slide content and features with Learning Analytics dashboards can deliver reliable, rubric-aligned formative feedback at scale, as demonstrated by technical stability, low deployment cost, and positive student perceptions of usefulness for iterative improvement in a real classroom pilot.
What carries the argument
AISSA, the web-based platform that ingests slide decks, extracts slide-level and content features, prompts ChatGPT 5.2 to produce structured rubric scores and qualitative comments, and renders the output through interactive student and teacher dashboards.
If this is right
- Instructors in large classes can supply detailed pre-presentation feedback without a linear increase in their own review time.
- Students can iterate on slides multiple times using immediate, structured suggestions rather than waiting for a single teacher pass.
- Dashboards allow both students and teachers to monitor patterns in slide quality across submissions or over a course term.
- The same LLM-plus-dashboard pattern could extend to other rubric-driven student artifacts such as reports or posters.
Where Pith is reading between the lines
- If LLM outputs prove consistent with expert judgment over time, departments could reduce routine slide-review workload while preserving or improving feedback volume.
- Integration with video recordings of the actual presentations could create closed-loop analysis that links slide design directly to delivery outcomes.
- Similar architectures might support peer-review workflows in which student reviewers receive AI-augmented guidance on what to look for.
Load-bearing premise
The LLM generates accurate, unbiased, and rubric-aligned feedback that students can act on without further human validation or expert comparison.
What would settle it
A side-by-side study in which human experts review the same student slides and produce feedback that differs markedly from the LLM outputs in accuracy, actionability, or alignment with the rubric.
Figures
read the original abstract
Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA (AI-based Student Slides Analysis tool), a web-based system that combines large language models (LLMs) and Learning Analytics dashboards to support scalable, rubric-based feedback on presentation slides. AISSA allows students to upload their slide decks prior to an oral presentation and automatically receive quantitative scores and qualitative feedback based on teacher-defined evaluation rubrics. The system analyzes both slide-level features and slide content, generates structured feedback through an LLM (ChatGPT 5.2), and presents the results through interactive dashboards for students and teachers. We tested AISSA on a pilot deployment with 46 undergraduate students in a real academic setting. The results indicate that AISSA is technically reliable, economically feasible, and perceived by students as useful for iterative slide improvement. These findings suggest that combining LLM-based analysis with Learning Analytics dashboards is a promising approach for supporting formative feedback on presentation slides at scale.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AISSA, a web-based system that integrates large language models (specifically ChatGPT 5.2) with learning analytics dashboards to deliver rubric-based quantitative scores and qualitative feedback on student presentation slides. Students upload slide decks for automated analysis of both visual features and content; results are presented via interactive dashboards. A pilot deployment with 46 undergraduate students in a real academic setting is described, with the authors concluding that the system is technically reliable, economically feasible, and perceived as useful for iterative slide improvement.
Significance. If the central claims were supported by rigorous evidence, the work would represent a practical contribution to scalable formative assessment in HCI and educational technology, demonstrating how LLMs can be combined with dashboards for large-class presentation feedback. The pilot framing and focus on deployment costs and student perceptions are relevant to the field, but the absence of objective validation metrics substantially limits the current significance.
major comments (3)
- [Abstract / Pilot evaluation] Abstract and results description: the claim that 'AISSA is technically reliable' rests solely on system uptime and API cost figures from the n=46 pilot; no error rates, accuracy metrics, baseline comparisons (e.g., vs. human raters), or statistical tests are reported, leaving the reliability assertion unsupported.
- [Evaluation / Results] The weakest assumption—that the LLM produces rubric-aligned, accurate, and actionable feedback—is never tested. No section compares ChatGPT 5.2 outputs against expert human ratings on the same rubrics, measures inter-rater agreement (e.g., Cohen’s kappa), or tracks objective pre/post improvements in slide quality.
- [Pilot study results] Perceived usefulness is reported only via self-reported student perceptions through the dashboard; without controls for novelty effects or placebo, this cannot distinguish genuine feedback value from other factors, undermining the claim that the system supports 'iterative slide improvement'.
minor comments (2)
- [System description] The model version 'ChatGPT 5.2' is not a standard release; clarify the exact model identifier and any prompting or fine-tuning details used for rubric alignment.
- [Implementation] No details are given on how the teacher-defined rubrics are encoded for the LLM or how slide-level visual features are extracted; adding a short methods subsection would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below. We agree that several claims require clarification and qualification given the pilot nature of the study, and we will make revisions to the abstract, results, discussion, and conclusions to address these points while preserving the paper's focus on system deployment and feasibility.
read point-by-point responses
-
Referee: [Abstract / Pilot evaluation] Abstract and results description: the claim that 'AISSA is technically reliable' rests solely on system uptime and API cost figures from the n=46 pilot; no error rates, accuracy metrics, baseline comparisons (e.g., vs. human raters), or statistical tests are reported, leaving the reliability assertion unsupported.
Authors: We acknowledge that the term 'technically reliable' in the abstract and results was used to describe operational aspects of the deployed system, such as uptime during the pilot and manageable API costs, rather than the accuracy or validity of the LLM-generated feedback. We will revise the abstract, results description, and add an explicit definition in the methods to clarify this scope. We will also add a limitations subsection noting the absence of error rates, accuracy metrics, baseline comparisons to human raters, and statistical tests on output quality. revision: yes
-
Referee: [Evaluation / Results] The weakest assumption—that the LLM produces rubric-aligned, accurate, and actionable feedback—is never tested. No section compares ChatGPT 5.2 outputs against expert human ratings on the same rubrics, measures inter-rater agreement (e.g., Cohen’s kappa), or tracks objective pre/post improvements in slide quality.
Authors: We agree that the manuscript does not include objective validation of the LLM outputs, such as comparisons against expert human ratings, inter-rater agreement metrics like Cohen’s kappa, or objective pre/post measures of slide quality improvements. This study is positioned as a real-world pilot deployment focused on implementation, technical feasibility, and initial student perceptions rather than a controlled evaluation of feedback accuracy. We will revise the evaluation and discussion sections to explicitly state this scope, temper related claims, and add a limitations paragraph outlining plans for such validations (including human comparisons) in future work. revision: yes
-
Referee: [Pilot study results] Perceived usefulness is reported only via self-reported student perceptions through the dashboard; without controls for novelty effects or placebo, this cannot distinguish genuine feedback value from other factors, undermining the claim that the system supports 'iterative slide improvement'.
Authors: The usefulness findings are based solely on self-reported student perceptions collected through the dashboard. We recognize that the absence of controls for novelty effects or placebo effects limits our ability to attribute improvements specifically to the feedback quality. We will revise the pilot study results and conclusions to qualify the 'iterative slide improvement' claim as based on perceived usefulness, add this as an explicit limitation, and suggest that future controlled studies could better isolate the feedback's impact. revision: yes
Circularity Check
No circularity: empirical pilot reports rest on observed deployment data
full rationale
The paper describes a software implementation (AISSA) and reports outcomes from a real-world pilot with 46 students, including uptime, API costs, and self-reported perceptions via dashboards. No equations, fitted parameters, predictions, or derivation steps are present. Claims of technical reliability and usefulness are grounded directly in the pilot observations rather than any self-referential definitions, self-citations as load-bearing premises, or reductions of outputs to inputs by construction. The central argument is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can accurately and consistently evaluate slide content and design against teacher-defined rubrics
Reference graph
Works this paper leans on
- [1]
-
[2]
S. A. Schartel, Giving feedback–an integral part of education, Best practice & research Clinical anaesthesiology 26 (2012) 77–87
work page 2012
-
[3]
M. Henderson, T. Ryan, M. Phillips, The challenges of feedback in higher education, Assessment & Evaluation in Higher Education (2019)
work page 2019
-
[4]
S. van Ginkel, J. Gulikers, H. Biemans, M. Mulder, The impact of the feedback source on developing oral presentation competence, Studies in Higher Education 42 (2017) 1671–1685
work page 2017
-
[5]
J. C. Paiva, J. P. Leal, Á. Figueira, Automated assessment in computer science education: A state-of-the-art review, ACM Transactions on Computing Education (TOCE) 22 (2022) 1–40
work page 2022
-
[6]
Pieterse, Automated assessment of programming assignments., CSERC 13 (2013) 4–5
V. Pieterse, Automated assessment of programming assignments., CSERC 13 (2013) 4–5
work page 2013
-
[7]
R. Gao, H. E. Merzdorf, S. Anwar, M. C. Hipwell, A. R. Srinivasa, Automatic assessment of text- based responses in post-secondary education: A systematic review, Computers and Education: Artificial Intelligence 6 (2024) 100206
work page 2024
-
[8]
N. Askarbekuly, N. Aničić, Llm examiner: automating assessment in informal self-directed e-learning using chatgpt, Knowledge and Information Systems 66 (2024) 6133–6150
work page 2024
-
[9]
R. F. Mello, L. Anthony, J. Lobo, F. G. C. Ribeiro, C. Xavier, N. T. da Costa, D. Gasevic, L. Rodrigues, Empowering equitable learning with llms: enhancing writing skills in low-resource contexts, in: European Conference on Technology Enhanced Learning, Springer, 2025, pp. 183–197
work page 2025
-
[10]
N. Srivastava, S. Jain, C. Cohn, N. Mohammed, U. Timalsina, G. Biswas, Learnlens: An ai-enhanced dashboard to support teachers in open-ended classrooms, arXiv preprint arXiv:2509.10582 (2025)
-
[11]
A. P. Cavalcanti, A. Barbosa, R. Carvalho, F. Freitas, Y.-S. Tsai, D. Gašević, R. F. Mello, Auto- matic feedback in online learning environments: A systematic literature review, Computers and Education: Artificial Intelligence 2 (2021) 100027
work page 2021
- [12]
-
[13]
M. G. Hahn, S. M. B. Navarro, L. D. L. F. Valentín, D. Burgos, A systematic review of the effects of automatic scoring and automatic feedback in educational settings, Ieee Access 9 (2021) 108190– 108198
work page 2021
- [14]
-
[15]
J. Stamper, R. Xiao, X. Hou, Enhancing llm-based feedback: Insights from intelligent tutoring systems and the learning sciences, in: International Conference on Artificial Intelligence in Education, Springer, 2024, pp. 32–43
work page 2024
-
[16]
L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, et al., Judging llm-as-a-judge with mt-bench and chatbot arena, Advances in neural information processing systems 36 (2023) 46595–46623
work page 2023
- [17]
-
[18]
A. Becerra, R. Cobos, C. Lang, Enhancing online learning by integrating biosensors and multi- modal learning analytics for detecting and predicting student behaviour: a review, Behaviour & Information Technology (2025) 1–26
work page 2025
-
[19]
K. Verbert, X. Ochoa, R. De Croon, R. A. Dourado, T. De Laet, Learning analytics dashboards: The past, the present and the future, in: Proceedings of the tenth international conference on learning analytics & knowledge, 2020, pp. 35–40
work page 2020
- [20]
-
[21]
Z. Mohseni, I. Masiello, Co-designing, developing, and implementing multiple learning analytics dashboards for data-driven decision-making in education: a design-based research approach, Educational technology research and development (2025) 1–32
work page 2025
-
[22]
I. Possaghi, B. Vesin, F. Zhang, K. Sharma, C. Knudsen, H. Bjørkum, S. Papavlasopoulou, Integrating multi-modal learning analytics dashboard in k-12 education: insights for enhancing orchestration and teacher decision-making, Smart Learning Environments 12 (2025) 53
work page 2025
-
[23]
M. Navarro, A. Becerra, R. Daza, R. Cobos, A. Morales, J. Fierrez, Vaad: Visual attention analysis dashboard applied to e-learning, in: 2024 International Symposium on Computers in Education (SIIE), IEEE, 2024, pp. 1–6
work page 2024
-
[24]
A. Becerra, R. Cobos, Integrating eye-tracking and artificial intelligence for human-centered visual attention analytics in online learning, IE Comunicaciones: Revista Iberoamericana de Informática Educativa (2025) 21–32
work page 2025
- [25]
-
[26]
A. Becerra, R. Daza, R. Cobos, A. Morales, M. Cukurova, J. Fierrez, M2lads: A system for generating multimodal learning analytics dashboards, in: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), IEEE, 2023, pp. 1564–1569
work page 2023
-
[27]
A. Becerra, Z. Mohseni, J. Sanz, R. Cobos, A generative ai-based personalized guidance tool for enhancing the feedback to mooc learners, in: 2024 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2024, pp. 1–8
work page 2024
-
[28]
A. Becerra, R. Cobos, Enhancing the professional development of engineering students through an ai-based collaborative feedback system, in: 2025 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2025, pp. 1–9
work page 2025
-
[29]
A. Becerra, D. Andres, P. Villegas, R. Daza, R. Cobos, Mosaic-f: A framework for enhancing students’ oral presentation skills through personalized feedback, in: Proceedings of the Learning Analytics Summer Institute Spain 2025 (CEUR Workshop Proceedings, Vol. 4148), 2025, pp. 1–10
work page 2025
-
[30]
A. Becerra, R. Cobos, R. Daza, A multimodal dataset of student oral presentations with sensors and evaluation data, arXiv preprint arXiv:2601.07576 (2026)
-
[31]
R. A. Grier, A. Bangor, P. Kortum, S. C. Peres, The system usability scale: Beyond standard usability testing, in: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2013
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.