pith. sign in

arxiv: 2605.04729 · v1 · submitted 2026-05-06 · 💻 cs.HC · cs.AI· cs.SE

AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations

Pith reviewed 2026-05-08 16:37 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.SE
keywords AI feedbackpresentation slideslearning analyticslarge language modelsformative assessmenthigher educationstudent toolsrubric evaluation
0
0 comments X

The pith

AISSA combines LLMs with dashboards to give students scalable rubric feedback on presentation slides.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AISSA, a web-based tool that lets students upload slide decks before an oral presentation and receive automatic quantitative scores plus qualitative suggestions drawn from teacher-defined rubrics. The system examines both visual features and textual content, routes the analysis through an LLM, and surfaces results in interactive dashboards usable by students for revisions and by teachers for oversight. In a pilot with 46 undergraduates, the tool proved technically stable, low-cost to run, and viewed by students as helpful for refining slides iteratively. This setup targets the practical limit that instructors in large courses cannot review every deck in detail ahead of time.

Core claim

The central claim is that a system combining large language model analysis of slide content and features with Learning Analytics dashboards can deliver reliable, rubric-aligned formative feedback at scale, as demonstrated by technical stability, low deployment cost, and positive student perceptions of usefulness for iterative improvement in a real classroom pilot.

What carries the argument

AISSA, the web-based platform that ingests slide decks, extracts slide-level and content features, prompts ChatGPT 5.2 to produce structured rubric scores and qualitative comments, and renders the output through interactive student and teacher dashboards.

If this is right

  • Instructors in large classes can supply detailed pre-presentation feedback without a linear increase in their own review time.
  • Students can iterate on slides multiple times using immediate, structured suggestions rather than waiting for a single teacher pass.
  • Dashboards allow both students and teachers to monitor patterns in slide quality across submissions or over a course term.
  • The same LLM-plus-dashboard pattern could extend to other rubric-driven student artifacts such as reports or posters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If LLM outputs prove consistent with expert judgment over time, departments could reduce routine slide-review workload while preserving or improving feedback volume.
  • Integration with video recordings of the actual presentations could create closed-loop analysis that links slide design directly to delivery outcomes.
  • Similar architectures might support peer-review workflows in which student reviewers receive AI-augmented guidance on what to look for.

Load-bearing premise

The LLM generates accurate, unbiased, and rubric-aligned feedback that students can act on without further human validation or expert comparison.

What would settle it

A side-by-side study in which human experts review the same student slides and produce feedback that differs markedly from the LLM outputs in accuracy, actionability, or alignment with the rubric.

Figures

Figures reproduced from arXiv: 2605.04729 by Alvaro Becerra, Diego Gomez, Ruth Cobos.

Figure 1
Figure 1. Figure 1: Modular architecture of the AISSA tool. 3.1. Visualization Module The frontend of AISSA consists of multiple learning analytics dashboards that provide support for teachers, students, and administrators: • Student Dashboard: Allows students to upload PowerPoint presentations (.pptx) or review previous submissions ( view at source ↗
Figure 2
Figure 2. Figure 2: Student Dashboard interface view at source ↗
Figure 3
Figure 3. Figure 3: Feedback, Analysis and Slides Visualizations extracted from the Student Dashboard. • Teacher Dashboard: Serves as a centralized control interface where educators can configure dynamic 5-point Likert scale rubrics, manage student cohorts, and monitor engagement through activity logs that include indicators such as the number of logins, the number of slide decks uploaded, and the frequency and duration of st… view at source ↗
Figure 4
Figure 4. Figure 4: Teacher Dashboard interface. 3.2. Extraction Module The Extraction Module is responsible for extracting slide-level syntactic and visual features from uploaded .pptx presentations prior to downstream AI-based analysis. Using python-pptx library, the module traverses the underlying XML structure of each presentation to identify and extract features from each individual slide, including word counts, font siz… view at source ↗
Figure 5
Figure 5. Figure 5: Prompt structure and organization for the AI Module. 3.4. Processing and Analysis Module The Processing Module constitutes the operational backbone of AISSA, orchestrating the execution workflow that connects the visualization, extraction, artificial intelligence, and data persistence modules. After a presentation is submitted, this module receives the uploaded file, stores it temporarily, and coordinates … view at source ↗
read the original abstract

Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA (AI-based Student Slides Analysis tool), a web-based system that combines large language models (LLMs) and Learning Analytics dashboards to support scalable, rubric-based feedback on presentation slides. AISSA allows students to upload their slide decks prior to an oral presentation and automatically receive quantitative scores and qualitative feedback based on teacher-defined evaluation rubrics. The system analyzes both slide-level features and slide content, generates structured feedback through an LLM (ChatGPT 5.2), and presents the results through interactive dashboards for students and teachers. We tested AISSA on a pilot deployment with 46 undergraduate students in a real academic setting. The results indicate that AISSA is technically reliable, economically feasible, and perceived by students as useful for iterative slide improvement. These findings suggest that combining LLM-based analysis with Learning Analytics dashboards is a promising approach for supporting formative feedback on presentation slides at scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces AISSA, a web-based system that integrates large language models (specifically ChatGPT 5.2) with learning analytics dashboards to deliver rubric-based quantitative scores and qualitative feedback on student presentation slides. Students upload slide decks for automated analysis of both visual features and content; results are presented via interactive dashboards. A pilot deployment with 46 undergraduate students in a real academic setting is described, with the authors concluding that the system is technically reliable, economically feasible, and perceived as useful for iterative slide improvement.

Significance. If the central claims were supported by rigorous evidence, the work would represent a practical contribution to scalable formative assessment in HCI and educational technology, demonstrating how LLMs can be combined with dashboards for large-class presentation feedback. The pilot framing and focus on deployment costs and student perceptions are relevant to the field, but the absence of objective validation metrics substantially limits the current significance.

major comments (3)
  1. [Abstract / Pilot evaluation] Abstract and results description: the claim that 'AISSA is technically reliable' rests solely on system uptime and API cost figures from the n=46 pilot; no error rates, accuracy metrics, baseline comparisons (e.g., vs. human raters), or statistical tests are reported, leaving the reliability assertion unsupported.
  2. [Evaluation / Results] The weakest assumption—that the LLM produces rubric-aligned, accurate, and actionable feedback—is never tested. No section compares ChatGPT 5.2 outputs against expert human ratings on the same rubrics, measures inter-rater agreement (e.g., Cohen’s kappa), or tracks objective pre/post improvements in slide quality.
  3. [Pilot study results] Perceived usefulness is reported only via self-reported student perceptions through the dashboard; without controls for novelty effects or placebo, this cannot distinguish genuine feedback value from other factors, undermining the claim that the system supports 'iterative slide improvement'.
minor comments (2)
  1. [System description] The model version 'ChatGPT 5.2' is not a standard release; clarify the exact model identifier and any prompting or fine-tuning details used for rubric alignment.
  2. [Implementation] No details are given on how the teacher-defined rubrics are encoded for the LLM or how slide-level visual features are extracted; adding a short methods subsection would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below. We agree that several claims require clarification and qualification given the pilot nature of the study, and we will make revisions to the abstract, results, discussion, and conclusions to address these points while preserving the paper's focus on system deployment and feasibility.

read point-by-point responses
  1. Referee: [Abstract / Pilot evaluation] Abstract and results description: the claim that 'AISSA is technically reliable' rests solely on system uptime and API cost figures from the n=46 pilot; no error rates, accuracy metrics, baseline comparisons (e.g., vs. human raters), or statistical tests are reported, leaving the reliability assertion unsupported.

    Authors: We acknowledge that the term 'technically reliable' in the abstract and results was used to describe operational aspects of the deployed system, such as uptime during the pilot and manageable API costs, rather than the accuracy or validity of the LLM-generated feedback. We will revise the abstract, results description, and add an explicit definition in the methods to clarify this scope. We will also add a limitations subsection noting the absence of error rates, accuracy metrics, baseline comparisons to human raters, and statistical tests on output quality. revision: yes

  2. Referee: [Evaluation / Results] The weakest assumption—that the LLM produces rubric-aligned, accurate, and actionable feedback—is never tested. No section compares ChatGPT 5.2 outputs against expert human ratings on the same rubrics, measures inter-rater agreement (e.g., Cohen’s kappa), or tracks objective pre/post improvements in slide quality.

    Authors: We agree that the manuscript does not include objective validation of the LLM outputs, such as comparisons against expert human ratings, inter-rater agreement metrics like Cohen’s kappa, or objective pre/post measures of slide quality improvements. This study is positioned as a real-world pilot deployment focused on implementation, technical feasibility, and initial student perceptions rather than a controlled evaluation of feedback accuracy. We will revise the evaluation and discussion sections to explicitly state this scope, temper related claims, and add a limitations paragraph outlining plans for such validations (including human comparisons) in future work. revision: yes

  3. Referee: [Pilot study results] Perceived usefulness is reported only via self-reported student perceptions through the dashboard; without controls for novelty effects or placebo, this cannot distinguish genuine feedback value from other factors, undermining the claim that the system supports 'iterative slide improvement'.

    Authors: The usefulness findings are based solely on self-reported student perceptions collected through the dashboard. We recognize that the absence of controls for novelty effects or placebo effects limits our ability to attribute improvements specifically to the feedback quality. We will revise the pilot study results and conclusions to qualify the 'iterative slide improvement' claim as based on perceived usefulness, add this as an explicit limitation, and suggest that future controlled studies could better isolate the feedback's impact. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pilot reports rest on observed deployment data

full rationale

The paper describes a software implementation (AISSA) and reports outcomes from a real-world pilot with 46 students, including uptime, API costs, and self-reported perceptions via dashboards. No equations, fitted parameters, predictions, or derivation steps are present. Claims of technical reliability and usefulness are grounded directly in the pilot observations rather than any self-referential definitions, self-citations as load-bearing premises, or reductions of outputs to inputs by construction. The central argument is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim of technical reliability and usefulness depends on the unvalidated assumption that the chosen LLM can faithfully interpret and score slides according to arbitrary teacher rubrics, plus the assumption that a 46-student pilot generalizes.

axioms (1)
  • domain assumption Large language models can accurately and consistently evaluate slide content and design against teacher-defined rubrics
    The system architecture and feedback generation rest on this premise without reported human validation or inter-rater agreement checks.

pith-pipeline@v0.9.0 · 5489 in / 1400 out tokens · 63743 ms · 2026-05-08T16:37:15.631666+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Hattie, H

    J. Hattie, H. Timperley, The power of feedback, Review of Educational Research 77 (2007) 81–112

  2. [2]

    S. A. Schartel, Giving feedback–an integral part of education, Best practice & research Clinical anaesthesiology 26 (2012) 77–87

  3. [3]

    Henderson, T

    M. Henderson, T. Ryan, M. Phillips, The challenges of feedback in higher education, Assessment & Evaluation in Higher Education (2019)

  4. [4]

    van Ginkel, J

    S. van Ginkel, J. Gulikers, H. Biemans, M. Mulder, The impact of the feedback source on developing oral presentation competence, Studies in Higher Education 42 (2017) 1671–1685

  5. [5]

    J. C. Paiva, J. P. Leal, Á. Figueira, Automated assessment in computer science education: A state-of-the-art review, ACM Transactions on Computing Education (TOCE) 22 (2022) 1–40

  6. [6]

    Pieterse, Automated assessment of programming assignments., CSERC 13 (2013) 4–5

    V. Pieterse, Automated assessment of programming assignments., CSERC 13 (2013) 4–5

  7. [7]

    R. Gao, H. E. Merzdorf, S. Anwar, M. C. Hipwell, A. R. Srinivasa, Automatic assessment of text- based responses in post-secondary education: A systematic review, Computers and Education: Artificial Intelligence 6 (2024) 100206

  8. [8]

    Askarbekuly, N

    N. Askarbekuly, N. Aničić, Llm examiner: automating assessment in informal self-directed e-learning using chatgpt, Knowledge and Information Systems 66 (2024) 6133–6150

  9. [9]

    R. F. Mello, L. Anthony, J. Lobo, F. G. C. Ribeiro, C. Xavier, N. T. da Costa, D. Gasevic, L. Rodrigues, Empowering equitable learning with llms: enhancing writing skills in low-resource contexts, in: European Conference on Technology Enhanced Learning, Springer, 2025, pp. 183–197

  10. [10]

    Srivastava, S

    N. Srivastava, S. Jain, C. Cohn, N. Mohammed, U. Timalsina, G. Biswas, Learnlens: An ai-enhanced dashboard to support teachers in open-ended classrooms, arXiv preprint arXiv:2509.10582 (2025)

  11. [11]

    A. P. Cavalcanti, A. Barbosa, R. Carvalho, F. Freitas, Y.-S. Tsai, D. Gašević, R. F. Mello, Auto- matic feedback in online learning environments: A systematic literature review, Computers and Education: Artificial Intelligence 2 (2021) 100027

  12. [12]

    Deeva, D

    G. Deeva, D. Bogdanova, E. Serral, M. Snoeck, J. De Weerdt, A review of automated feedback systems for learners: Classification framework, challenges and opportunities, Computers & Education 162 (2021) 104094

  13. [13]

    M. G. Hahn, S. M. B. Navarro, L. D. L. F. Valentín, D. Burgos, A systematic review of the effects of automatic scoring and automatic feedback in educational settings, Ieee Access 9 (2021) 108190– 108198

  14. [14]

    Ochoa, H

    X. Ochoa, H. Zhao, Openopaf: An open-source multimodal system for automated feedback for oral presentations., Journal of Learning Analytics 11 (2024) 224–248

  15. [15]

    Stamper, R

    J. Stamper, R. Xiao, X. Hou, Enhancing llm-based feedback: Insights from intelligent tutoring systems and the learning sciences, in: International Conference on Artificial Intelligence in Education, Springer, 2024, pp. 32–43

  16. [16]

    Zheng, W.-L

    L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, et al., Judging llm-as-a-judge with mt-bench and chatbot arena, Advances in neural information processing systems 36 (2023) 46595–46623

  17. [17]

    Klerkx, K

    J. Klerkx, K. Verbert, E. Duval, Learning analytics dashboards, in: C. Lang, G. Siemens, A. F. Wise, D. Gašević (Eds.), The Handbook of Learning Analytics, Society for Learning Analytics Research (SoLAR), Beaumont, AB, Canada, 2017, pp. 143–158

  18. [18]

    Becerra, R

    A. Becerra, R. Cobos, C. Lang, Enhancing online learning by integrating biosensors and multi- modal learning analytics for detecting and predicting student behaviour: a review, Behaviour & Information Technology (2025) 1–26

  19. [19]

    Verbert, X

    K. Verbert, X. Ochoa, R. De Croon, R. A. Dourado, T. De Laet, Learning analytics dashboards: The past, the present and the future, in: Proceedings of the tenth international conference on learning analytics & knowledge, 2020, pp. 35–40

  20. [20]

    Topali, A

    P. Topali, A. Ortega-Arranz, M. J. Rodríguez-Triana, E. Er, M. Khalil, G. Akçapınar, Designing human-centered learning analytics and artificial intelligence in education solutions: a systematic literature review, Behaviour & Information Technology 44 (2025) 1071–1098

  21. [21]

    Mohseni, I

    Z. Mohseni, I. Masiello, Co-designing, developing, and implementing multiple learning analytics dashboards for data-driven decision-making in education: a design-based research approach, Educational technology research and development (2025) 1–32

  22. [22]

    Possaghi, B

    I. Possaghi, B. Vesin, F. Zhang, K. Sharma, C. Knudsen, H. Bjørkum, S. Papavlasopoulou, Integrating multi-modal learning analytics dashboard in k-12 education: insights for enhancing orchestration and teacher decision-making, Smart Learning Environments 12 (2025) 53

  23. [23]

    Navarro, A

    M. Navarro, A. Becerra, R. Daza, R. Cobos, A. Morales, J. Fierrez, Vaad: Visual attention analysis dashboard applied to e-learning, in: 2024 International Symposium on Computers in Education (SIIE), IEEE, 2024, pp. 1–6

  24. [24]

    Becerra, R

    A. Becerra, R. Cobos, Integrating eye-tracking and artificial intelligence for human-centered visual attention analytics in online learning, IE Comunicaciones: Revista Iberoamericana de Informática Educativa (2025) 21–32

  25. [25]

    Sharma, M

    K. Sharma, M. Giannakos, P. Dillenbourg, Eye-tracking and artificial intelligence to enhance motivation and learning, Smart Learning Environments 7 (2020) 13

  26. [26]

    Becerra, R

    A. Becerra, R. Daza, R. Cobos, A. Morales, M. Cukurova, J. Fierrez, M2lads: A system for generating multimodal learning analytics dashboards, in: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), IEEE, 2023, pp. 1564–1569

  27. [27]

    Becerra, Z

    A. Becerra, Z. Mohseni, J. Sanz, R. Cobos, A generative ai-based personalized guidance tool for enhancing the feedback to mooc learners, in: 2024 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2024, pp. 1–8

  28. [28]

    Becerra, R

    A. Becerra, R. Cobos, Enhancing the professional development of engineering students through an ai-based collaborative feedback system, in: 2025 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2025, pp. 1–9

  29. [29]

    Becerra, D

    A. Becerra, D. Andres, P. Villegas, R. Daza, R. Cobos, Mosaic-f: A framework for enhancing students’ oral presentation skills through personalized feedback, in: Proceedings of the Learning Analytics Summer Institute Spain 2025 (CEUR Workshop Proceedings, Vol. 4148), 2025, pp. 1–10

  30. [30]

    Becerra, R

    A. Becerra, R. Cobos, R. Daza, A multimodal dataset of student oral presentations with sensors and evaluation data, arXiv preprint arXiv:2601.07576 (2026)

  31. [31]

    R. A. Grier, A. Bangor, P. Kortum, S. C. Peres, The system usability scale: Beyond standard usability testing, in: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2013