Text-based approaches to item difficulty modeling in large-scale assessments: A systematic review.arXiv preprint arXiv:2509.23486,

Sydney Peters, Nan Zhang, Hong Jiao, Ming Li, Tianyi Zhou, Robert Lissitz · 2025 · arXiv 2509.23486

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

RoMathExam: A Longitudinal Dataset of Romanian Math Exams (1895-2025) with a Seven-Decade Core (1957-2025)

cs.CY · 2026-03-28 · unverdicted · novelty 7.0

RoMathExam supplies a century-long collection of Romanian math exams together with a new intrinsic complexity metric that correlates across frontier models at r > 0.72.

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

Epi2Diff extracts cognitive episode sequences from LRM reasoning traces and combines them with semantic features to predict human item difficulty, outperforming baselines on four educational datasets.

Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning

cs.CL · 2026-05-16 · conditional · novelty 5.0

Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.

citing papers explorer

Showing 3 of 3 citing papers after filters.

RoMathExam: A Longitudinal Dataset of Romanian Math Exams (1895-2025) with a Seven-Decade Core (1957-2025) cs.CY · 2026-03-28 · unverdicted · none · ref 23
RoMathExam supplies a century-long collection of Romanian math exams together with a new intrinsic complexity metric that correlates across frontier models at r > 0.72.
Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction cs.CL · 2026-06-26 · unverdicted · none · ref 21
Epi2Diff extracts cognitive episode sequences from LRM reasoning traces and combines them with semantic features to predict human item difficulty, outperforming baselines on four educational datasets.
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning cs.CL · 2026-05-16 · conditional · none · ref 197
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.

Text-based approaches to item difficulty modeling in large-scale assessments: A systematic review.arXiv preprint arXiv:2509.23486,

fields

years

verdicts

representative citing papers

citing papers explorer