pith. sign in

arxiv: 2605.04740 · v1 · submitted 2026-05-06 · 💻 cs.HC · cs.AI· cs.SE

AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education

Pith reviewed 2026-05-08 16:30 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.SE
keywords peer feedbackAI in educationhigher educationlarge language modelsteacher-in-the-looplearning analyticscollaborative feedbackeducational technology
0
0 comments X

The pith

AICoFe combines multiple AI models with teacher review to turn inconsistent student peer feedback into coherent, actionable comments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the implementation and deployment of AICoFe to address the inconsistent quality of peer feedback that limits its value for building critical reflection in higher education. It describes a modular architecture that runs a pipeline of three language models to combine rubric scores and student observations into draft feedback. Teachers then curate these drafts using dedicated analytics dashboards before the feedback reaches students. The design keeps educators in control while using AI to reduce variability. A hybrid database setup tracks all versions and data for traceability and later analysis.

Core claim

AICoFe employs a modular architecture that orchestrates a multi-LLM pipeline using GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1 to synthesize quantitative rubric data and qualitative observations into coherent, actionable feedback, supported by a teacher-in-the-loop mediation workflow where educators curate drafts through Learning Analytics dashboards and backed by a hybrid SQL and MongoDB infrastructure for version traceability.

What carries the argument

The multi-LLM pipeline that synthesizes rubric and observation data into feedback drafts, combined with the teacher-in-the-loop workflow that lets educators curate outputs via specialized dashboards.

If this is right

  • Educators can manage peer feedback for larger classes while preserving consistency and quality.
  • Students receive more uniform and useful comments that better support their ability to reflect on their work.
  • The traceable hybrid database enables systematic analysis of feedback patterns across courses and iterations.
  • The system provides a practical model for deploying AI assistance in education without removing teacher judgment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mediation pattern could apply to other collaborative student activities that require quality control, such as group project evaluations.
  • Adding direct links to individual student performance data might allow the pipeline to tailor drafts more precisely.
  • Wider adoption could reduce differences in feedback quality that currently depend on instructor workload or experience.

Load-bearing premise

That the multi-LLM synthesis will reliably generate coherent and actionable drafts that teachers can curate effectively enough to produce better student reflection and learning outcomes than traditional peer feedback.

What would settle it

A controlled comparison in which students receiving AICoFe-curated feedback show no greater gains in critical reflection measures or assignment performance than students receiving unassisted peer feedback.

Figures

Figures reproduced from arXiv: 2605.04740 by Alejandra Palma, Alvaro Becerra, Ruth Cobos.

Figure 1
Figure 1. Figure 1: Architecture of the AICoFe system, showcasing its modular design with four main modules. 3.1. Management Module The Management Module provides the data infrastructure underpinning the entire system. It integrates two complementary database technologies: a relational SQL database and a MongoDB document store. The SQL database manages structured academic data, including courses, student groups, rubric defini… view at source ↗
Figure 2
Figure 2. Figure 2: Screenshot of the AICoFe evaluator dashboard. 3.2.1. Evaluator Dashboard The Evaluator Dashboard ( view at source ↗
Figure 3
Figure 3. Figure 3: Screenshot of the AICoFe student dashboard. (a) (b) view at source ↗
Figure 4
Figure 4. Figure 4: Teacher Dashboard views: (a) granular feedback from the three LLMs with sentence-level selection, and (b) detailed evaluations by rubric item and evaluator. 3.2.4. Feedback History and LLM Contribution Tracking A key feature of the Teacher Dashboard is the Feedback History interface ( view at source ↗
Figure 5
Figure 5. Figure 5: Feedback History interface in the Teacher Dashboard, showing the LLM contribution legend. 3.2.5. Admin Dashboard System configuration and evaluation orchestration are managed through the Admin Dashboard, which allows administrators to create courses, define student groups, assign rubrics, and change the language of the system and the generated feedback. This dashboard is intended for system administrators … view at source ↗
read the original abstract

Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Collaborative Feedback), a system designed to bridge this gap through a human-centered AI approach. We describe a modular architecture that orchestrates a multi-LLM pipeline, utilizing GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1, to synthesize quantitative rubric data and qualitative observations into coherent, actionable feedback. Key to the system is a "teacher-in-the-loop" mediation workflow, where educators use specialized Learning Analytics dashboards to curate and refine AI-generated drafts before delivery. Furthermore, we detail the underlying data infrastructure, which employs a hybrid SQL and MongoDB strategy to ensure traceability and manage semi-structured feedback versions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents the implementation and deployment of AICoFe, an AI-based collaborative feedback system for higher education. It describes a modular architecture that orchestrates a multi-LLM pipeline (GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1) to synthesize quantitative rubric data and qualitative observations into coherent, actionable feedback. Central elements include a teacher-in-the-loop mediation workflow using specialized Learning Analytics dashboards for curation and refinement of AI drafts, along with a hybrid SQL/MongoDB data infrastructure to ensure traceability and manage semi-structured feedback versions.

Significance. If the described architecture and workflow perform as outlined, the work could provide a useful contribution to HCI and educational technology by demonstrating a practical human-centered integration of multiple LLMs with educator oversight. The detailed account of modular orchestration, dashboard mediation, and hybrid storage for versioned traceability offers reusable design patterns for systems aiming to improve peer feedback consistency in higher education settings.

major comments (1)
  1. [Abstract and full system description] Abstract and system description sections: The central claim that AICoFe 'bridges this gap' in peer feedback quality via the multi-LLM synthesis and teacher-in-the-loop curation is presented without any supporting evaluation data. The manuscript provides no pilot metrics, inter-rater reliability scores on feedback coherence, pre/post measures of student reflection, baseline comparisons, or teacher workload statistics, leaving the effectiveness assertion as an untested design hypothesis rather than a demonstrated result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review of our manuscript on the AICoFe system. We address the major comment below and will revise the paper accordingly to better scope its claims.

read point-by-point responses
  1. Referee: [Abstract and full system description] Abstract and system description sections: The central claim that AICoFe 'bridges this gap' in peer feedback quality via the multi-LLM synthesis and teacher-in-the-loop curation is presented without any supporting evaluation data. The manuscript provides no pilot metrics, inter-rater reliability scores on feedback coherence, pre/post measures of student reflection, baseline comparisons, or teacher workload statistics, leaving the effectiveness assertion as an untested design hypothesis rather than a demonstrated result.

    Authors: We agree that the manuscript contains no empirical evaluation data (e.g., pilot metrics, inter-rater reliability, pre/post reflection measures, or workload statistics) and therefore cannot demonstrate that the system bridges the gap in peer feedback quality. The paper's contribution is the detailed description of a modular multi-LLM architecture, teacher-in-the-loop curation workflow, and hybrid SQL/MongoDB storage for versioned traceability, offered as reusable design patterns for HCI and educational technology. We will revise the abstract and system description sections to frame AICoFe as an implemented system designed to address the identified challenges in peer feedback, rather than asserting demonstrated effectiveness. The revised text will explicitly note that empirical validation remains future work. This change aligns the claims with the manuscript's implementation-and-deployment focus. revision: yes

Circularity Check

0 steps flagged

No significant circularity; paper is purely descriptive of system design with no derivation chain.

full rationale

The manuscript is an implementation and deployment report that details a modular architecture, multi-LLM orchestration (GPT-4.1-mini, Gemini 2.5 Flash, Llama 3.1), teacher-in-the-loop curation via Learning Analytics dashboards, and hybrid SQL/MongoDB storage. No equations, fitted parameters, predictions, uniqueness theorems, or mathematical derivations are present. Claims about bridging gaps in peer feedback quality are presented as design motivations and architectural choices rather than results derived from prior steps or self-citations. The paper contains no load-bearing reductions where outputs equal inputs by construction, making it self-contained as a descriptive account.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering implementation paper with no mathematical derivations, empirical fitting, or theoretical claims. No free parameters, axioms, or invented scientific entities are present.

pith-pipeline@v0.9.0 · 5452 in / 1014 out tokens · 62502 ms · 2026-05-08T16:30:53.414037+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

  1. [1]

    N.-F. Liu, D. Carless, Peer feedback: the learning element of peer assessment, Teaching in Higher Education 11 (2006) 279–290

  2. [2]

    K. J. Topping, Peer assessment, Theory into Practice 48 (2009) 20–27

  3. [3]

    E. V. Popta, M. Kral, G. Camp, R. L. Martens, P. R.-J. Simons, Exploring the value of peer feedback in online learning for the provider, Educational Research Review 20 (2017) 24–34

  4. [4]

    Y. Wei, D. Liu, Incorporating peer feedback in academic writing: a systematic review of benefits and challenges, Frontiers in Psychology 15 (2024) 1506725

  5. [5]

    Viberg, M

    O. Viberg, M. Baars, R. F. Mello, N. Weerheim, D. Spikol, C. Bogdan, D. Gasevic, F. Paas, Exploring the nature of peer feedback: An epistemic network analysis approach, Journal of Computer Assisted Learning 40 (2024) 2809–2821

  6. [6]

    Bodily, T

    R. Bodily, T. K. Ikahihifo, B. Mackley, C. R. Graham, The design, development, and implementation of student-facing learning analytics dashboards, Journal of Computing in Higher Education 30 (2018) 572–598

  7. [7]

    A. P. Cavalcanti, A. Barbosa, R. Carvalho, F. Freitas, Y.-S. Tsai, D. Gašević, R. F. Mello, Auto- matic feedback in online learning environments: A systematic literature review, Computers and Education: Artificial Intelligence 2 (2021) 100027

  8. [8]

    Giannakos, R

    M. Giannakos, R. Azevedo, P. Brusilovsky, M. Cukurova, Y. Dimitriadis, D. Hernandez-Leo, S. Järvelä, M. Mavrikis, B. Rienties, The promise and challenges of generative AI in education, Behaviour & Information Technology (2024) 1–27

  9. [9]

    C. D. Kloos, C. Alario-Hoyos, I. Estévez-Ayres, P. Callejo-Pinardo, M. A. Hombrados-Herrera, P. J. Muñoz-Merino, P. M. Moreno-Marcos, M. Muñoz-Organero, M. B. Ibáñez, How can generative AI support education?, in: 2024 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2024, pp. 1–7

  10. [10]

    Zhang, Z

    Z. Zhang, Z. Dong, Y. Shi, T. Price, N. Matsuda, D. Xu, Students’ perceptions and preferences of generative artificial intelligence feedback for programming, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 2024, pp. 23250–23258

  11. [11]

    Becerra, R

    A. Becerra, R. Cobos, Enhancing the professional development of engineering students through an ai-based collaborative feedback system, in: 2025 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2025, pp. 1–9

  12. [12]

    Ochoa, F

    X. Ochoa, F. Dominguez, Controlled evaluation of a multimodal system to improve oral presentation skills in a real learning setting, British Journal of Educational Technology 51 (2020) 1615–1630

  13. [13]

    Ochoa, H

    X. Ochoa, H. Zhao, Openopaf: An open-source multimodal system for automated feedback for oral presentations., Journal of Learning Analytics 11 (2024) 224–248

  14. [14]

    I. U. Haq, M. Pifarré, E. Fraca, Novelty evaluation using sentence embedding models in open-ended cocreative problem-solving, International Journal of Artificial Intelligence in Education (2024) 1–28

  15. [15]

    Topali, R

    P. Topali, R. Cobos, U. Agirre-Uribarren, A. Martínez-Monés, S. Villagrá-Sobrino, ’instructor in action’: Co-design and evaluation of human-centred LA-informed feedback in MOOCs, Journal of Computer Assisted Learning 40 (2024) 3149–3166

  16. [16]

    Verbert, E

    K. Verbert, E. Duval, J. Klerkx, S. Govaerts, J. L. Santos, Learning analytics dashboard applications, American Behavioral Scientist 57 (2013) 1500–1509

  17. [17]

    Navarro, A

    M. Navarro, A. Becerra, R. Daza, R. Cobos, A. Morales, J. Fierrez, Vaad: Visual attention analysis dashboard applied to e-learning, in: 2024 International Symposium on Computers in Education (SIIE), IEEE, 2024, pp. 1–6

  18. [18]

    Becerra, R

    A. Becerra, R. Cobos, Integrating eye-tracking and artificial intelligence for human-centered visual attention analytics in online learning, IE Comunicaciones: Revista Iberoamericana de Informática Educativa (2025) 21–32

  19. [19]

    Becerra, R

    A. Becerra, R. Daza, R. Cobos, A. Morales, M. Cukurova, J. Fierrez, M2lads: A system for generating multimodal learning analytics dashboards, in: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), IEEE, 2023, pp. 1564–1569

  20. [20]

    Becerra, R

    A. Becerra, R. Cobos, C. Lang, Enhancing online learning by integrating biosensors and multi- modal learning analytics for detecting and predicting student behaviour: a review, Behaviour & Information Technology (2025) 1–26

  21. [21]

    Steiss, T

    J. Steiss, T. Tate, S. Graham, J. Cruz, M. Hebert, J. Wang, Y. Moon, W. Tseng, M. Warschauer, C. B. Olson, Comparing the quality of human and ChatGPT feedback of students’ writing, Learning and Instruction 91 (2024) 101894

  22. [22]

    T. Wan, Z. Chen, Exploring generative AI assisted feedback writing for students’ written responses to a physics conceptual question with prompt engineering and few-shot learning, Physical Review Physics Education Research 20 (2024) 010152

  23. [23]

    Magyar, S

    N. Magyar, S. R. Haley, Balancing learner experience and user experience in a peer feedback web application for MOOCs, in: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 2020, pp. 1–8

  24. [24]

    Sajadi, M

    S. Sajadi, M. Huerta, O. Ryan, K. Drinkwater, Harnessing generative AI to enhance feedback quality in peer evaluations within project-based learning contexts, in: International Journal of Engineering Education, 2024

  25. [25]

    C.-L. Yang, A. Uhde, N. Yamashita, H. Kuzuoka, Understanding and supporting peer review using AI-reframed positive summary, in: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 2025, pp. 1–16

  26. [26]

    Becerra, Z

    A. Becerra, Z. Mohseni, J. Sanz, R. Cobos, A generative ai-based personalized guidance tool for enhancing the feedback to mooc learners, in: 2024 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2024, pp. 1–8

  27. [27]

    Becerra, R

    A. Becerra, R. Cobos, R. Daza, A multimodal dataset of student oral presentations with sensors and evaluation data, arXiv preprint arXiv:2601.07576 (2026)

  28. [28]

    E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, LoRA: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685 (2022)

  29. [29]

    Becerra, D

    A. Becerra, D. Andres, P. Villegas, R. Daza, R. Cobos, Mosaic-f: A framework for enhancing students’ oral presentation skills through personalized feedback, in: Proceedings of the Learning Analytics Summer Institute Spain 2025 (CEUR Workshop Proceedings, Vol. 4148), 2025, pp. 1–10

  30. [30]

    Golrang, K

    A. Golrang, K. Sharma, Does feedback based on gaze and stress indicators help novice pro- grammers?, in: European Conference on Technology Enhanced Learning, Springer, 2025, pp. 198–213