Cross-Platform Domain Adaptation for Multi-Modal MOOC Learner Satisfaction Prediction

Jakub Kowalski; Magdalena Piotrowska

arxiv: 2604.13247 · v1 · submitted 2026-04-14 · 💻 cs.CE

Cross-Platform Domain Adaptation for Multi-Modal MOOC Learner Satisfaction Prediction

Jakub Kowalski , Magdalena Piotrowska This is my paper

Pith reviewed 2026-05-10 13:30 UTC · model grok-4.3

classification 💻 cs.CE

keywords cross-platform domain adaptationMOOC learner satisfactionmulti-modal predictiondomain-adversarial trainingrating calibrationfew-shot adaptationunsupervised transfer

0 comments

The pith

Domain-adversarial alignment and latent calibration let MOOC satisfaction predictors transfer across platforms with 0.66 RMSE using zero target labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Models that predict learner satisfaction from MOOC reviews and behavior logs lose accuracy when moved to a different platform because of shifts in text style, user groups, data recording methods, and rating scales. The work proposes a framework that encodes reviews with a fixed language model and logs with a basic network, then aligns the features across platforms through adversarial training while using a calibration layer to adjust for rating differences. This produces usable predictions even when the target platform supplies no labeled examples at all. The result matters for operators who cannot afford to label new data on every site and want to reuse models built on larger platforms.

Core claim

The paper establishes that a framework called ADAPT-MS transfers multi-modal satisfaction prediction models across three major MOOC platforms by freezing an LLM encoder for review text, processing behavioral traces with an MLP, aligning representations through domain-adversarial training with gradient reversal, correcting platform rating biases via a latent-variable calibration layer, and using gated fusion to handle missing modalities. On the collected multi-platform dataset this yields target-platform RMSE of 0.66 with no labeled target samples and 0.60 with 1000 labeled samples, surpassing naive pooling, uncalibrated adversarial baselines, and full fine-tuning. Ablation experiments show a

What carries the argument

ADAPT-MS, a framework that aligns multi-modal representations through domain-adversarial training with gradient reversal and corrects platform-specific rating norms with a latent-variable calibration layer.

Load-bearing premise

The assumption that domain-adversarial training with gradient reversal plus a latent-variable calibration layer can sufficiently align representations and correct for differences in review style, learner population, behavioral logging schemas, and platform-specific rating norms even when target-platform labels are absent or scarce.

What would settle it

An experiment on the same three platforms in which removing the latent-variable calibration layer leaves unsupervised target RMSE unchanged or lower than the full model would show that calibration is not required to correct platform biases.

read the original abstract

Learner satisfaction prediction from MOOC reviews and behavioral logs is valuable for course quality improvement and platform operations. In practice, models trained on one platform degrade significantly when deployed on another due to domain shift in review style, learner population, behavioral logging schemas, and platform-specific rating norms. We study \textbf{cross-platform domain adaptation} for multi-modal MOOC satisfaction prediction under limited or absent target-platform labels. We propose \textbf{ADAPT-MS}, a platform-adaptive framework that (i) encodes review text with a frozen LLM encoder and behavioral traces with a canonical-vocabulary MLP, (ii) aligns cross-platform representations via domain-adversarial training with gradient reversal, (iii) corrects platform-specific rating bias through a latent-variable calibration layer, and (iv) handles missing behavioral modalities via gated fusion with modality dropout. Experiments on a multi-platform MOOC dataset spanning three major platforms demonstrate that ADAPT-MS achieves target-platform RMSE of 0.66 in the unsupervised setting (zero labeled target samples) and 0.60 with 1000 labeled target samples, outperforming strong baselines including naive pooling, domain-adversarial alignment without calibration, and full fine-tuning. Ablation studies confirm the independent contribution of each component, and few-shot adaptation curves demonstrate stable improvement even with as few as 50 labeled target samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ADAPT-MS gives a usable pipeline for cross-platform MOOC satisfaction prediction with reported gains, but the unsupervised calibration step is under-specified and the results lack basic verification details.

read the letter

The paper shows a concrete way to adapt multi-modal satisfaction models across MOOC platforms when target labels are scarce or absent. It freezes an LLM for review text, uses an MLP on behavioral logs, adds domain-adversarial alignment with gradient reversal, inserts a latent calibration layer for rating bias, and applies gated fusion for missing modalities. The headline numbers are 0.66 RMSE unsupervised and 0.60 with 1000 target labels on a three-platform dataset, beating the listed baselines plus ablations and few-shot curves that look stable down to 50 labels.

Referee Report

2 major / 1 minor

Summary. The paper introduces ADAPT-MS, a framework for cross-platform domain adaptation in multi-modal MOOC learner satisfaction prediction. It encodes review text via a frozen LLM and behavioral logs via an MLP, aligns representations with domain-adversarial training and gradient reversal, applies a latent-variable calibration layer to correct platform-specific rating bias, and uses gated fusion with modality dropout for missing data. On a three-platform MOOC dataset, it reports target-platform RMSE of 0.66 in the fully unsupervised setting and 0.60 with 1000 labeled target samples, outperforming baselines such as naive pooling, domain-adversarial alignment without calibration, and full fine-tuning; ablations and few-shot curves are also presented.

Significance. If the central empirical claims hold after clarification, the work would be significant for practical MOOC operations, where models must transfer across platforms despite shifts in review style, learner demographics, logging schemas, and rating norms. The empirical framework with explicit ablations and few-shot adaptation curves provides concrete evidence of component contributions and is a strength.

major comments (2)

[Abstract] Abstract and method description: the latent-variable calibration layer is presented as correcting platform-specific rating bias in the unsupervised regime, yet no equations, prior, training objective, or identifiability argument are supplied for how the layer recovers the required shift from source labels and adversarially aligned features alone. This is load-bearing for the headline unsupervised RMSE of 0.66, as an orthogonal label-space bias would leave the layer without signal.
[Experiments] Experiments section: the reported RMSE values, outperformance margins, and ablation results are given without dataset sizes, statistical significance tests, error bars, or implementation details (e.g., hyper-parameters, exact baseline configurations). These omissions prevent verification of the central performance claims and undermine reproducibility.

minor comments (1)

[Abstract] The abstract could more explicitly state the total number of platforms, courses, and reviews in the multi-platform dataset to contextualize the scale of the evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving technical clarity and reproducibility, and we address each point below with specific plans for revision.

read point-by-point responses

Referee: [Abstract] Abstract and method description: the latent-variable calibration layer is presented as correcting platform-specific rating bias in the unsupervised regime, yet no equations, prior, training objective, or identifiability argument are supplied for how the layer recovers the required shift from source labels and adversarially aligned features alone. This is load-bearing for the headline unsupervised RMSE of 0.66, as an orthogonal label-space bias would leave the layer without signal.

Authors: We agree that the abstract and method description lack the necessary mathematical detail on the latent-variable calibration layer. In the revised manuscript we will expand the method section to include the explicit equations for the calibration layer (modeling platform bias as a latent variable with a Gaussian prior conditioned on aligned features), the full training objective combining the bias-correction term with the domain-adversarial loss, and a discussion of identifiability. The argument rests on the domain-adversarial alignment producing comparable feature distributions, allowing source labels to supervise the shared mapping while the latent variable absorbs platform-specific shifts; we will also add an explicit note on the assumption that rating bias is not fully orthogonal to the aligned representations and report supporting ablation evidence. revision: yes
Referee: [Experiments] Experiments section: the reported RMSE values, outperformance margins, and ablation results are given without dataset sizes, statistical significance tests, error bars, or implementation details (e.g., hyper-parameters, exact baseline configurations). These omissions prevent verification of the central performance claims and undermine reproducibility.

Authors: We acknowledge that the experiments section is missing several elements required for full verification and reproducibility. In the revision we will add the exact dataset sizes per platform, results of statistical significance tests (paired t-tests with p-values), error bars from five independent runs with different seeds, and complete hyper-parameter tables together with precise baseline implementations. These details will be integrated into the main experiments section, with any lengthy tables moved to the appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework evaluated on external data

full rationale

The paper describes ADAPT-MS as an empirical pipeline combining frozen LLM text encoding, MLP behavioral encoding, standard domain-adversarial training with gradient reversal, a latent calibration layer, and gated fusion. All reported results (RMSE 0.66 unsupervised, 0.60 few-shot) are obtained by training and evaluating on a held-out multi-platform MOOC dataset against external baselines and ablations. No equations, self-definitions, or self-citations reduce the performance metrics to quantities defined by the model's own fitted parameters. The calibration layer is presented as a trainable component whose effectiveness is measured experimentally rather than assumed by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard domain-adaptation assumptions and modality-handling heuristics rather than new free parameters or invented entities.

axioms (2)

domain assumption Domain shift exists across platforms in review style, learner population, behavioral logging schemas, and rating norms.
Invoked in the opening paragraph to motivate the need for adaptation.
domain assumption A frozen LLM encoder and canonical-vocabulary MLP produce transferable features when aligned adversarially.
Assumed in the encoding and alignment steps of the proposed method.

pith-pipeline@v0.9.0 · 5534 in / 1353 out tokens · 51806 ms · 2026-05-10T13:30:16.499774+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

The mooc pivot,

J. Reich and J. Ruiperez-Valiente, “The mooc pivot,”Science, vol. 363, no. 6423, pp. 130–131, 2019

work page 2019
[2]

By the numbers: Moocs in 2020,

D. Shah, “By the numbers: Moocs in 2020,”Class Central Report, 2020

work page 2020
[3]

Active learning for graphs with noisy structures,

H. Chi, C. Qi, S. Wang, and Y . Ma, “Active learning for graphs with noisy structures,” inProceedings of the 2024 SIAM International Conference on Data Mining (SDM). SIAM, 2024, pp. 262–270

work page 2024
[4]

Evaluating on-line courses via reviews mining,

C. Qi and S. Liu, “Evaluating on-line courses via reviews mining,”IEEE Access, vol. 9, pp. 35 439–35 451, 2021

work page 2021
[5]

What predicts student satisfaction with moocs: A gradient boosting trees supervised machine learning and sentiment analysis approach,

K. F. Hew, X. Hu, C. Qiao, and Y . Tang, “What predicts student satisfaction with moocs: A gradient boosting trees supervised machine learning and sentiment analysis approach,”Computers & Education, vol. 145, p. 103724, 2020

work page 2020
[6]

Systematic review of discussion forums in massive open online courses (moocs),

O. Almatrafi and A. Johri, “Systematic review of discussion forums in massive open online courses (moocs),”IEEE Transactions on Learning Technologies, vol. 12, no. 3, pp. 413–428, 2019

work page 2019
[7]

Bert: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inNAACL, 2019

work page 2019
[8]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Y . Liu, M. Ott, N. Goyalet al., “Roberta: A robustly optimized bert pretraining approach,”arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[9]

Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach,

A. Onan, “Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach,”Computer Applications in Engineering Education, vol. 29, no. 3, pp. 572–589, 2020

work page 2020
[10]

A survey on transfer learning,

S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Transac- tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– 1359, 2010

work page 2010
[11]

Domain adaptation via transfer component analysis,

S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,”IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011

work page 2011
[12]

Ganin, E

Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavio- lette, M. Marchand, and V . Lempitsky,Domain-Adversarial Training of Neural Networks. Springer, 2017, pp. 189–209

work page 2017
[13]

Aspect-based opinion mining of students’ reviews,

Z. Kastrati, “Aspect-based opinion mining of students’ reviews,”Inter- national Conference on Computing, 2020

work page 2020
[14]

Promoting engagement in online courses: What strategies can we learn from three highly rated moocs,

K. F. Hew, “Promoting engagement in online courses: What strategies can we learn from three highly rated moocs,”British Journal of Educational Technology, vol. 47, no. 2, pp. 320–341, 2016

work page 2016
[15]

Investigating learners’ behaviors in moocs,

X. Penget al., “Investigating learners’ behaviors in moocs,”Computers & Education, 2020. 6

work page 2020
[16]

Deep visual domain adaptation: A survey,

M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, 2018

work page 2018
[17]

Detecting and correcting for label shift with black box predictors,

Z. C. Lipton, Y .-X. Wang, and A. Smola, “Detecting and correcting for label shift with black box predictors,” inICML, 2018, pp. 3122–3130

work page 2018
[18]

Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,

J. C. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” inAdvances in Large Margin Classifiers, 1999, pp. 61–74

work page 1999
[19]

Multimodal machine learning: A survey and taxonomy,

T. Baltru ˇsaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423–443, 2019

work page 2019
[20]

SMIL: Multimodal learning with severely missing modality,

M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “SMIL: Multimodal learning with severely missing modality,” inAAAI, 2021, pp. 2302–2310

work page 2021
[21]

A theory of learning from different domains,

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” vol. 79, no. 1–2, 2010, pp. 151–175

work page 2010
[22]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inICLR, 2015

work page 2015

[1] [1]

The mooc pivot,

J. Reich and J. Ruiperez-Valiente, “The mooc pivot,”Science, vol. 363, no. 6423, pp. 130–131, 2019

work page 2019

[2] [2]

By the numbers: Moocs in 2020,

D. Shah, “By the numbers: Moocs in 2020,”Class Central Report, 2020

work page 2020

[3] [3]

Active learning for graphs with noisy structures,

H. Chi, C. Qi, S. Wang, and Y . Ma, “Active learning for graphs with noisy structures,” inProceedings of the 2024 SIAM International Conference on Data Mining (SDM). SIAM, 2024, pp. 262–270

work page 2024

[4] [4]

Evaluating on-line courses via reviews mining,

C. Qi and S. Liu, “Evaluating on-line courses via reviews mining,”IEEE Access, vol. 9, pp. 35 439–35 451, 2021

work page 2021

[5] [5]

What predicts student satisfaction with moocs: A gradient boosting trees supervised machine learning and sentiment analysis approach,

K. F. Hew, X. Hu, C. Qiao, and Y . Tang, “What predicts student satisfaction with moocs: A gradient boosting trees supervised machine learning and sentiment analysis approach,”Computers & Education, vol. 145, p. 103724, 2020

work page 2020

[6] [6]

Systematic review of discussion forums in massive open online courses (moocs),

O. Almatrafi and A. Johri, “Systematic review of discussion forums in massive open online courses (moocs),”IEEE Transactions on Learning Technologies, vol. 12, no. 3, pp. 413–428, 2019

work page 2019

[7] [7]

Bert: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inNAACL, 2019

work page 2019

[8] [8]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Y . Liu, M. Ott, N. Goyalet al., “Roberta: A robustly optimized bert pretraining approach,”arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[9] [9]

Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach,

A. Onan, “Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach,”Computer Applications in Engineering Education, vol. 29, no. 3, pp. 572–589, 2020

work page 2020

[10] [10]

A survey on transfer learning,

S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Transac- tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– 1359, 2010

work page 2010

[11] [11]

Domain adaptation via transfer component analysis,

S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,”IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011

work page 2011

[12] [12]

Ganin, E

Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavio- lette, M. Marchand, and V . Lempitsky,Domain-Adversarial Training of Neural Networks. Springer, 2017, pp. 189–209

work page 2017

[13] [13]

Aspect-based opinion mining of students’ reviews,

Z. Kastrati, “Aspect-based opinion mining of students’ reviews,”Inter- national Conference on Computing, 2020

work page 2020

[14] [14]

Promoting engagement in online courses: What strategies can we learn from three highly rated moocs,

K. F. Hew, “Promoting engagement in online courses: What strategies can we learn from three highly rated moocs,”British Journal of Educational Technology, vol. 47, no. 2, pp. 320–341, 2016

work page 2016

[15] [15]

Investigating learners’ behaviors in moocs,

X. Penget al., “Investigating learners’ behaviors in moocs,”Computers & Education, 2020. 6

work page 2020

[16] [16]

Deep visual domain adaptation: A survey,

M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, 2018

work page 2018

[17] [17]

Detecting and correcting for label shift with black box predictors,

Z. C. Lipton, Y .-X. Wang, and A. Smola, “Detecting and correcting for label shift with black box predictors,” inICML, 2018, pp. 3122–3130

work page 2018

[18] [18]

Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,

J. C. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” inAdvances in Large Margin Classifiers, 1999, pp. 61–74

work page 1999

[19] [19]

Multimodal machine learning: A survey and taxonomy,

T. Baltru ˇsaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423–443, 2019

work page 2019

[20] [20]

SMIL: Multimodal learning with severely missing modality,

M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “SMIL: Multimodal learning with severely missing modality,” inAAAI, 2021, pp. 2302–2310

work page 2021

[21] [21]

A theory of learning from different domains,

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” vol. 79, no. 1–2, 2010, pp. 151–175

work page 2010

[22] [22]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inICLR, 2015

work page 2015