Qayyem: A Real-time Platform for Scoring Proficiency of Arabic Essays

Fatima Brahamia; Hoor Elbahnasawi; Marwan Sayed; Sohaila Eltanbouly; Tamer Elsayed

arxiv: 2603.01009 · v2 · pith:4BEEWZ36new · submitted 2026-03-01 · 💻 cs.CL

Qayyem: A Real-time Platform for Scoring Proficiency of Arabic Essays

Hoor Elbahnasawi , Marwan Sayed , Sohaila Eltanbouly , Fatima Brahamia , Tamer Elsayed This is my paper

Pith reviewed 2026-05-21 12:12 UTC · model grok-4.3

classification 💻 cs.CL

keywords Automated Essay ScoringArabic AESWeb platformEssay evaluationArabic NLPEducational technologyReal-time scoring

0 comments

The pith

Qayyem provides a web platform that lets instructors score Arabic essays with advanced models through a simple interface without managing server APIs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Qayyem as a web-based platform to support automated essay scoring for Arabic by handling assignment creation, batch uploads, scoring setup, and trait-specific evaluation in one place. It removes the need for users to interact directly with scoring server APIs so instructors can reach state-of-the-art models via an ordinary browser interface. The work addresses the limited progress in Arabic AES caused by linguistic complexity and few large public datasets. Deploying several models that trade off effectiveness against efficiency aims to make consistent scoring of student writing more practical in real classrooms.

Core claim

Qayyem is a Web-based platform designed to support Arabic AES by providing an integrated workflow for assignment creation, batch essay upload, scoring configuration, and per-trait essay evaluation. It abstracts the technical complexity of interacting with scoring server APIs, allowing instructors to access advanced scoring services through a user-friendly interface while deploying a number of state-of-the-art Arabic essay scoring models with different effectiveness and efficiency figures.

What carries the argument

The Qayyem web platform that integrates the full essay-scoring workflow and hides direct calls to scoring server APIs to deliver access to multiple Arabic AES models.

If this is right

Instructors can set up assignments and upload batches of essays without separate technical configuration.
Scoring can be run in real time with results broken down by individual traits.
Users can choose among models that balance higher accuracy against faster processing.
The platform makes advanced Arabic AES services reachable for everyday classroom use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread classroom use could gradually create larger annotated Arabic datasets from scored essays.
The same abstraction approach might be applied to build similar platforms for other languages with limited AES resources.
Direct feedback from the trait scores could help students improve specific aspects of their Arabic writing sooner.

Load-bearing premise

The state-of-the-art Arabic essay scoring models deployed in the platform are accurate and suitable enough for real educational use despite the scarcity of large annotated datasets.

What would settle it

A test set of real Arabic student essays scored by the platform models showing low agreement with independent human raters would show the platform does not deliver usable proficiency scores.

read the original abstract

Over the past years, Automated Essay Scoring (AES) systems have gained increasing attention as scalable and consistent solutions for assessing the proficiency of student writing. Despite recent progress, support for Arabic AES remains limited due to linguistic complexity and scarcity of large publicly-available annotated datasets. In this work, we present Qayyem, a Web-based platform designed to support Arabic AES by providing an integrated workflow for assignment creation, batch essay upload, scoring configuration, and per-trait essay evaluation. Qayyem abstracts the technical complexity of interacting with scoring server APIs, allowing instructors to access advanced scoring services through a user-friendly interface. The platform deploys a number of state-of-the-art Arabic essay scoring models with different effectiveness and efficiency figures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Qayyem is a web platform that wraps existing Arabic AES models with batch upload and per-trait scoring, but the paper supplies no metrics or validation to show the models work well enough for real classroom decisions.

read the letter

The core point is that this paper describes a practical web interface for Arabic automated essay scoring rather than introducing new models or results. It integrates assignment setup, batch uploads, and trait-level feedback on top of several existing scorers, which fills a gap for instructors who do not want to handle APIs directly. That packaging is the main contribution here, and it is a reasonable engineering step given how little public Arabic AES data exists.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Qayyem, a web-based platform for automated essay scoring (AES) in Arabic. It describes an integrated workflow for assignment creation, batch essay upload, scoring configuration, and per-trait evaluation. The platform abstracts technical interactions with scoring server APIs to provide instructors a user-friendly interface to advanced Arabic AES services and deploys multiple state-of-the-art models noted for varying effectiveness and efficiency.

Significance. If the deployed models are reliable for educational decisions and the interface proves usable, Qayyem could help address the documented scarcity of Arabic AES tools by making advanced scoring accessible without requiring instructors to manage API complexities. The focus on practical deployment and real-time capability is a strength for applied computational linguistics work.

major comments (2)

[Abstract] Abstract: the claim that the platform deploys 'state-of-the-art Arabic essay scoring models with different effectiveness and efficiency figures' is load-bearing for the central assertion of providing advanced, usable scoring services, yet the manuscript supplies no quantitative metrics (e.g., QWK, Pearson correlation with human raters, latency figures), no error analysis, and no references to prior evaluations of those models.
[Platform Workflow and Model Deployment] Platform description sections: the integrated workflow is presented as enabling real educational use, but the manuscript contains no user studies, no platform-level usability data, and no comparison against existing Arabic AES interfaces or manual scoring baselines; without these, the abstraction benefit cannot be separated from the risk that the underlying scorers are not yet accurate enough for instructor decisions.

minor comments (2)

[Abstract] Clarify the specific traits used in 'per-trait essay evaluation' and whether they align with standard Arabic proficiency rubrics; an example or table would improve readability.
[Abstract] Add citations for the 'state-of-the-art' models referenced so readers can locate the original performance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated to strengthen the presentation of the platform and its claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the platform deploys 'state-of-the-art Arabic essay scoring models with different effectiveness and efficiency figures' is load-bearing for the central assertion of providing advanced, usable scoring services, yet the manuscript supplies no quantitative metrics (e.g., QWK, Pearson correlation with human raters, latency figures), no error analysis, and no references to prior evaluations of those models.

Authors: We agree that the abstract claim would benefit from explicit support. The models integrated in Qayyem are drawn from previously published Arabic AES research rather than newly developed here. In the revised manuscript we will add citations to the original papers reporting those models, include a summary table of their published QWK, Pearson correlation, and latency/efficiency figures, and briefly reference the error analyses from those works. This will substantiate the claim of varying effectiveness and efficiency while keeping the paper's focus on platform integration rather than model evaluation. revision: yes
Referee: [Platform Workflow and Model Deployment] Platform description sections: the integrated workflow is presented as enabling real educational use, but the manuscript contains no user studies, no platform-level usability data, and no comparison against existing Arabic AES interfaces or manual scoring baselines; without these, the abstraction benefit cannot be separated from the risk that the underlying scorers are not yet accurate enough for instructor decisions.

Authors: We acknowledge that formal user studies and direct comparisons would provide stronger evidence of real-world utility. As this manuscript primarily describes the system architecture, workflow, and API abstraction, we did not conduct new empirical evaluations. In revision we will add a subsection on UI/UX design choices intended to support instructor usability, explicitly note the lack of formal usability testing or baseline comparisons as a limitation, and expand the related-work section to discuss any existing Arabic AES tools or general platforms. We will also clarify that the platform's value lies in lowering the barrier to existing scorers rather than claiming new accuracy guarantees. Comprehensive user studies remain planned for future work. revision: partial

Circularity Check

0 steps flagged

No circularity; descriptive platform paper with no derivations or load-bearing self-citations

full rationale

The paper is a system description of the Qayyem web platform for Arabic AES, outlining workflows for assignment creation, essay upload, and model deployment. It contains no equations, predictions, fitted parameters, or derivation chains that could reduce to inputs by construction. Claims about abstracting API complexity and deploying SOTA models are presented as engineering contributions without mathematical steps or self-referential uniqueness theorems. The scarcity of Arabic datasets is acknowledged but not used in any circular predictive loop. This is a standard non-circular descriptive account of software.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The description rests on the domain assumption that existing Arabic AES models can be usefully deployed despite data scarcity, with no new free parameters, axioms, or invented entities introduced in the abstract.

axioms (1)

domain assumption Arabic linguistic complexity and scarcity of large annotated datasets limit current AES support
Explicitly stated in the abstract as the motivation for the platform.

pith-pipeline@v0.9.0 · 5667 in / 1186 out tokens · 36790 ms · 2026-05-21T12:12:18.850294+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Qayyem abstracts the technical complexity of interacting with scoring server APIs, allowing instructors to access advanced scoring services through a user-friendly interface while deploying a number of state-of-the-art Arabic essay scoring models.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TRATES achieves the best overall performance across all traits... QWK values

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.