Likely to stop? Predicting Stopout in Massive Open Online Courses

· 2014 · cs.CY · arXiv 1408.3382

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Understanding why students stopout will help in understanding how students learn in MOOCs. In this report, part of a 3 unit compendium, we describe how we build accurate predictive models of MOOC student stopout. We document a scalable, stopout prediction methodology, end to end, from raw source data to model analysis. We attempted to predict stopout for the Fall 2012 offering of 6.002x. This involved the meticulous and crowd-sourced engineering of over 25 predictive features extracted for thousands of students, the creation of temporal and non-temporal data representations for use in predictive modeling, the derivation of over 10 thousand models with a variety of state-of-the-art machine learning techniques and the analysis of feature importance by examining over 70000 models. We found that stop out prediction is a tractable problem. Our models achieved an AUC (receiver operating characteristic area-under-the-curve) as high as 0.95 (and generally 0.88) when predicting one week in advance. Even with more difficult prediction problems, such as predicting stop out at the end of the course with only one weeks' data, the models attained AUCs of 0.7.

representative citing papers

When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction

cs.LG · 2026-06-02 · conditional · novelty 7.0

A three-stage diagnostic on edX data shows offline selectors (BC, DQN, CQL) fail to reach oracle performance due to local representational ambiguity rather than learner mismatch or label shift.

citing papers explorer

Showing 1 of 1 citing paper.

When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction cs.LG · 2026-06-02 · conditional · none · ref 7 · internal anchor
A three-stage diagnostic on edX data shows offline selectors (BC, DQN, CQL) fail to reach oracle performance due to local representational ambiguity rather than learner mismatch or label shift.

Likely to stop? Predicting Stopout in Massive Open Online Courses

fields

years

verdicts

representative citing papers

citing papers explorer