Personalized Student Stress Prediction with Deep Multitask Network

Abhinav Shaw; Iman Deznaby; Madalina Fiterau; Natcha Simsiri; Tauhidur Rahaman

arxiv: 1906.11356 · v1 · pith:HAXQZS5Nnew · submitted 2019-06-26 · 💻 cs.LG · cs.CY· stat.ML

Personalized Student Stress Prediction with Deep Multitask Network

Abhinav Shaw , Natcha Simsiri , Iman Deznaby , Madalina Fiterau , Tauhidur Rahaman This is my paper

Pith reviewed 2026-05-25 15:28 UTC · model grok-4.3

classification 💻 cs.LG cs.CYstat.ML

keywords stress predictionmultitask learningautoencodersmobile sensorsStudentLife datasetpersonalized modelingwearable devicesdeep learning

0 comments

The pith

A deep multitask network with autoencoders predicts student stress from mobile sensor data and covariates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a platform for personalized prediction of students' stress levels using physiological data from wearables and mobile sensors. It combines auto-encoders to process sensor sequences with multitask learning that also incorporates high-level covariates. The resulting model is evaluated on the StudentLife dataset and reported to improve F1 score by 45.6 percent over prior state-of-the-art methods. This setup aims to support clinical uses for monitoring mental states such as mood and stress. A sympathetic reader would see value in moving from raw passive data to actionable behavioral state forecasts without requiring active user input.

Core claim

The authors present a deep multitask network that uses auto-encoders to handle sequences of passive sensor data together with high-level covariates, enabling personalized prediction of stress levels; on the StudentLife dataset this yields a 45.6 percent improvement in F1 score relative to previous methods.

What carries the argument

Deep multitask network that integrates auto-encoders for sensor sequences and shared learning across tasks including stress prediction and covariate modeling.

If this is right

Stress level prediction can be performed from passive mobile sensor streams without requiring explicit user reports.
The same architecture can be applied to other behavioral states such as mood.
Personalized models become feasible by combining sequence data with subject-specific covariates.
Clinical monitoring applications gain a pathway from wearable data to mental-state estimates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the performance gain holds under matched experimental conditions, similar multitask auto-encoder designs could be tested on other longitudinal sensor datasets for health outcomes.
Deployment on consumer wearables would require checking whether the model maintains accuracy when sensor streams are shorter or noisier than those in the StudentLife collection.
The approach implicitly treats stress as a latent state recoverable from both low-level signals and high-level context; this framing could be examined against purely unsupervised representations of the same data.

Load-bearing premise

The StudentLife dataset supplies reliable ground-truth stress labels and the reported performance gain is attributable to the proposed architecture rather than unstated differences in preprocessing, hyperparameter search, or baseline re-implementation.

What would settle it

Reproduce the exact preprocessing pipeline, hyperparameter search, and baseline implementations on the StudentLife dataset and check whether the 45.6 percent F1 improvement still appears.

Figures

Figures reproduced from arXiv: 1906.11356 by Abhinav Shaw, Iman Deznaby, Madalina Fiterau, Natcha Simsiri, Tauhidur Rahaman.

**Figure 1.** Figure 1: Cross-personal Activity LSTM Multitask Auto-encoder Network (CALM-Net). 3.2.2. LSTM The state-of-the-art model which utilizes featured engineered aggregates doesn’t model the time-series. This leads to an inability to use the information in granular passive sensing data which is ubiquitous in these kinds of datasets. To model the temporal patterns of features like Activity, Audio and Conversation we put t… view at source ↗

read the original abstract

With the growing popularity of wearable devices, the ability to utilize physiological data collected from these devices to predict the wearer's mental state such as mood and stress suggests great clinical applications, yet such a task is extremely challenging. In this paper, we present a general platform for personalized predictive modeling of behavioural states like students' level of stress. Through the use of Auto-encoders and Multitask learning we extend the prediction of stress to both sequences of passive sensor data and high-level covariates. Our model outperforms the state-of-the-art in the prediction of stress level from mobile sensor data, obtaining a 45.6 % improvement in F1 score on the StudentLife dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 45.6% F1 gain on StudentLife is the headline but looks difficult to credit to the multitask autoencoder without matching preprocessing and baseline details.

read the letter

The paper's main contribution is a multitask network that uses autoencoders on passive sensor sequences plus high-level covariates to predict student stress levels. It reports a 45.6% F1 improvement over prior work on the StudentLife dataset and frames the setup as a general platform for personalized behavioral modeling from wearables. That combination is a straightforward extension of existing autoencoder and multitask ideas rather than a new theoretical framework, but it is a reasonable practical step for this sensor task. The authors correctly note the clinical angle for mental-state monitoring and show how the model handles both sequential data and covariates in one architecture. Credit is due for targeting a real dataset and for trying to move beyond single-task prediction. The soft spot is the evaluation. StudentLife stress labels come from sparse EMA surveys, and decisions on binarization thresholds, missing-value handling, temporal alignment, and sensor feature extraction are not standardized across papers. If the baselines were not re-implemented inside the exact same pipeline and cross-validation scheme, the reported delta cannot be cleanly attributed to the proposed network. The abstract supplies no architecture diagram, training hyperparameters, ablation results, or statistical tests, so the central claim stays hard to assess even after the full text. Minor issues include the usual lack of external validation on a second dataset. This work is aimed at the mobile-health and affective-computing crowd who already use StudentLife or similar sensor streams. A reader already running multitask models on time-series health data could pick up the specific weighting and autoencoder choices. It is coherent enough on its own terms to deserve a serious referee who can examine the full experimental harness and decide whether the gain survives re-implementation. I would send it out for review rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a deep multitask network that combines autoencoders with multitask learning to predict students' stress levels from sequences of passive mobile sensor data together with high-level covariates. It evaluates the approach on the StudentLife dataset and claims a 45.6% improvement in F1 score over prior state-of-the-art methods.

Significance. If the reported F1 gain is shown to be caused by the proposed architecture rather than differences in preprocessing or baseline implementation, the work would provide a concrete advance in personalized behavioral-state modeling from wearable sensor streams and could support downstream clinical applications.

major comments (2)

[§4] §4 (Experiments): the 45.6% F1 improvement is presented without any description of the exact preprocessing pipeline applied to accelerometer/GPS features, the binarization threshold or missing-value policy used for EMA stress labels, the train/test splits, or whether the cited baselines were re-run inside the same harness; without these controls the performance delta cannot be attributed to the multitask autoencoder.
[§3] §3 (Proposed Method): the multitask objective and autoencoder architecture are described at a high level only; network depth/width, task-weighting coefficients, and the precise loss formulation are left unspecified even though they are listed among the free parameters, preventing reproduction or ablation of the central modeling claim.

minor comments (2)

[Figure 2] Figure 2 and Table 1 would benefit from explicit axis labels and a caption that states the exact evaluation metric and number of folds used.
[Abstract] The abstract and introduction use the phrase 'state-of-the-art' without citing the specific prior works being compared; a numbered reference list entry should be added.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments correctly identify gaps in experimental controls and architectural specification that limit reproducibility. We will revise the manuscript to address both points fully.

read point-by-point responses

Referee: [§4] §4 (Experiments): the 45.6% F1 improvement is presented without any description of the exact preprocessing pipeline applied to accelerometer/GPS features, the binarization threshold or missing-value policy used for EMA stress labels, the train/test splits, or whether the cited baselines were re-run inside the same harness; without these controls the performance delta cannot be attributed to the multitask autoencoder.

Authors: We agree that these controls are required to attribute the reported gain to the proposed model. In the revised version we will add a dedicated subsection in §4 that fully specifies: (i) the accelerometer and GPS feature extraction and normalization steps, (ii) the exact EMA binarization threshold and missing-value handling, (iii) the precise train/test split protocol (including any subject-wise or temporal partitioning), and (iv) confirmation that all baselines were re-implemented inside the identical preprocessing and evaluation harness. These additions will make the performance comparison unambiguous. revision: yes
Referee: [§3] §3 (Proposed Method): the multitask objective and autoencoder architecture are described at a high level only; network depth/width, task-weighting coefficients, and the precise loss formulation are left unspecified even though they are listed among the free parameters, preventing reproduction or ablation of the central modeling claim.

Authors: We accept that the current description in §3 is insufficient for reproduction. The revised manuscript will expand §3 with the missing implementation details: exact layer counts and widths for the autoencoder and task-specific heads, the numerical values of the task-weighting coefficients, and the complete mathematical formulation of the joint loss (including any regularization terms). We will also add a short hyper-parameter table so that the architecture can be reproduced exactly and ablated. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML model with no derivation chain or self-referential predictions

full rationale

The paper presents a deep multitask autoencoder architecture for stress prediction on the StudentLife dataset and reports an empirical F1 improvement. No equations, first-principles derivations, uniqueness theorems, or fitted parameters renamed as predictions appear in the provided text. The central claim is a performance comparison, not a mathematical reduction that collapses to its own inputs by construction. Self-citations, if present, are not load-bearing for any derivation. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. Deep networks of this type typically contain dozens of free hyperparameters whose values are chosen or tuned on the target data.

free parameters (2)

network depth and width
Number of layers and hidden units in the multitask autoencoder are chosen to fit the data.
task weighting coefficients
Relative importance of the stress prediction task versus auxiliary tasks is set during training.

axioms (1)

domain assumption StudentLife stress labels constitute valid ground truth
The paper treats self-reported or survey-based stress scores as reliable targets for supervised learning.

pith-pipeline@v0.9.0 · 5657 in / 1283 out tokens · 41548 ms · 2026-05-25T15:28:23.814693+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

A global measure of perceived stress

Cohen, S., Kamarck, T., and Mermelstein, R. A global measure of perceived stress. Journal of Health and Social Behavior, 24: 0 386--396, 1983

work page 1983
[2]

Dickerson, S. S. and Kemenyr, M. E. Acute stressors and cortisol responses: a theoretical integration and synthesis of laboratory research. Psychological bulletin, 130: 0 355–91, 2004

work page 2004
[3]

and Schmidhuber, J

Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation, 9 0 (8): 0 1735--1780, 1997

work page 1997
[4]

O., Taylor, S., Sano, A., and Picard, R

Jaques, N., Rudovic, O. O., Taylor, S., Sano, A., and Picard, R. W. Predicting tomorrow's mood, health, and stress level using personalized multitask learning and domain adaptation. In Proceedings of the 1st IJCAI Workshop on Artificial Intelligence in Affective Computing (AffComp 2017), Melbourne, Australia, August 20, 2017. , pp.\ 17--33, 2017. URL http...

work page 2017
[5]

Multi-task and multi-view learning of user state

Kandemir, M., Vetek, A., Gönen, M., Klami, A., and Kaski, S. Multi-task and multi-view learning of user state. Neurocomputing, 139: 0 97–106, 09 2014. doi:10.1016/j.neucom.2014.02.057

work page doi:10.1016/j.neucom.2014.02.057 2014
[6]

Disasters and the heart: a review of the effects of earthquake-induced stress on cardiovascular disease

Kario, K., McEwen, B., and Pickering, T. Disasters and the heart: a review of the effects of earthquake-induced stress on cardiovascular disease. Hypertension Res, 26: 0 355–367, 2003

work page 2003
[7]

Effects of stress on the immune system

Khansari, D., Murgo, A., and Faith, R. Effects of stress on the immune system. Immunol Today, 11: 0 170–175, 1990

work page 1990
[8]

A review of unsupervised feature learning and deep learning for time-series modeling

Längkvist, M., Karlsson, L., and Loutfi, A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters, 42: 0 11 -- 24, 2014. ISSN 0167-8655. doi:https://doi.org/10.1016/j.patrec.2014.01.008. URL http://www.sciencedirect.com/science/article/pii/S0167865514000221

work page doi:10.1016/j.patrec.2014.01.008 2014
[9]

Towards deep learning models for psychological state prediction using smartphone data: Challenges and opportunities

Mikelsons, G., Smith, M., Mehrotra, A., and Musolesi, M. Towards deep learning models for psychological state prediction using smartphone data: Challenges and opportunities. 31st Conference on Neural Information Processing Systems (NIPS) 2017, 2, 2018

work page 2017
[10]

Towards accurate non-intrusive recollection of stress levels using mobile sensing and contextual recall

Rahman, T., Zhang, M., Voida, S., and Choudhury, T. Towards accurate non-intrusive recollection of stress levels using mobile sensing and contextual recall. In Proceedings of the 8th International Conference on Pervasive Computing Technologies for Healthcare, pp.\ 166--169. ICST (Institute for Computer Sciences, Social-Informatics and …, 2014

work page 2014
[11]

Impact of psychological factors on the pathogenesis of cardiovascular disease and implications for therapy

Rozanski, A., Blumenthal, J., and Kaplan, J. Impact of psychological factors on the pathogenesis of cardiovascular disease and implications for therapy. Immunol Today, 99: 0 2192–2217, 1999

work page 1999
[12]

and Picard, R

Sano, A. and Picard, R. W. Stress recognition using wearable sensors and mobile phones. Humaine Association Conference on Affective Computing and Intelligent Interaction, 24: 0 386--396, 2013

work page 2013
[13]

J., Yu, A

Sano, A., Phillips, A. J., Yu, A. Y., and here full, F. Recognizing academic performance, sleep quality, stress level and mental health using personality traits, wearable sensors and mobile phones. Draft for Body Sensor Networks 2015, 24: 0 386--396, 2015

work page 2015
[14]

L., G.Tröste, and Ehler, U

Setz, C., Arnrich, B., Schumm, J., Marca, R. L., G.Tröste, and Ehler, U. Discriminating stress from cognitive load using a wearable eda device. A publication of the IEEE Engineering in Medicine and Biology Society, 14: 0 410–7, 2010

work page 2010
[15]

Effects of stress throughout the lifespan on the brain, behaviour and cognition

SJ, L., BS, M., MR, G., and C, H. Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nat Rev Neurosci, 10: 0 434–445, 2009

work page 2009
[16]

Stults-Kolehmainen, M. A. and Sinha, R. The effects of stress on physical activity and exercise. Sports Med., 44: 0 81–121, 2014

work page 2014
[17]

T., Barnes, M

Trokel, M. T., Barnes, M. D., and Egget, D. L. Health-related variables and academic performance among first-year college students: Implications for sleep and other behaviours. Journal of American College health, 49: 0 125--131, 2000

work page 2000
[18]

G., van Doornen, L

Vrijkotte, T. G., van Doornen, L. J., and de Geus, E. J. Effects of work stress on ambulatory blood pressure, heart rate and heart rate variability. Hypertension, 35: 0 880–886, 2000

work page 2000
[19]

Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., Zhou, X., Ben-Zeev, D., , and Campbell, A. T. Studentlife: Assessing mental health, academic performance and behavioral trends of college students using smartphones. UbiComp, 2014

work page 2014
[20]

F., Kelley, W

Wang, R., Wang, W., Dasilva, A., Huckins, J. F., Kelley, W. M., Heatherton, T. F., and Chambell, A. T. Tracking depression dynamics in college students using mobile phone and wearable sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies,, 2, 2018

work page 2018
[21]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[1] [1]

A global measure of perceived stress

Cohen, S., Kamarck, T., and Mermelstein, R. A global measure of perceived stress. Journal of Health and Social Behavior, 24: 0 386--396, 1983

work page 1983

[2] [2]

Dickerson, S. S. and Kemenyr, M. E. Acute stressors and cortisol responses: a theoretical integration and synthesis of laboratory research. Psychological bulletin, 130: 0 355–91, 2004

work page 2004

[3] [3]

and Schmidhuber, J

Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation, 9 0 (8): 0 1735--1780, 1997

work page 1997

[4] [4]

O., Taylor, S., Sano, A., and Picard, R

Jaques, N., Rudovic, O. O., Taylor, S., Sano, A., and Picard, R. W. Predicting tomorrow's mood, health, and stress level using personalized multitask learning and domain adaptation. In Proceedings of the 1st IJCAI Workshop on Artificial Intelligence in Affective Computing (AffComp 2017), Melbourne, Australia, August 20, 2017. , pp.\ 17--33, 2017. URL http...

work page 2017

[5] [5]

Multi-task and multi-view learning of user state

Kandemir, M., Vetek, A., Gönen, M., Klami, A., and Kaski, S. Multi-task and multi-view learning of user state. Neurocomputing, 139: 0 97–106, 09 2014. doi:10.1016/j.neucom.2014.02.057

work page doi:10.1016/j.neucom.2014.02.057 2014

[6] [6]

Disasters and the heart: a review of the effects of earthquake-induced stress on cardiovascular disease

Kario, K., McEwen, B., and Pickering, T. Disasters and the heart: a review of the effects of earthquake-induced stress on cardiovascular disease. Hypertension Res, 26: 0 355–367, 2003

work page 2003

[7] [7]

Effects of stress on the immune system

Khansari, D., Murgo, A., and Faith, R. Effects of stress on the immune system. Immunol Today, 11: 0 170–175, 1990

work page 1990

[8] [8]

A review of unsupervised feature learning and deep learning for time-series modeling

Längkvist, M., Karlsson, L., and Loutfi, A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters, 42: 0 11 -- 24, 2014. ISSN 0167-8655. doi:https://doi.org/10.1016/j.patrec.2014.01.008. URL http://www.sciencedirect.com/science/article/pii/S0167865514000221

work page doi:10.1016/j.patrec.2014.01.008 2014

[9] [9]

Towards deep learning models for psychological state prediction using smartphone data: Challenges and opportunities

Mikelsons, G., Smith, M., Mehrotra, A., and Musolesi, M. Towards deep learning models for psychological state prediction using smartphone data: Challenges and opportunities. 31st Conference on Neural Information Processing Systems (NIPS) 2017, 2, 2018

work page 2017

[10] [10]

Towards accurate non-intrusive recollection of stress levels using mobile sensing and contextual recall

Rahman, T., Zhang, M., Voida, S., and Choudhury, T. Towards accurate non-intrusive recollection of stress levels using mobile sensing and contextual recall. In Proceedings of the 8th International Conference on Pervasive Computing Technologies for Healthcare, pp.\ 166--169. ICST (Institute for Computer Sciences, Social-Informatics and …, 2014

work page 2014

[11] [11]

Impact of psychological factors on the pathogenesis of cardiovascular disease and implications for therapy

Rozanski, A., Blumenthal, J., and Kaplan, J. Impact of psychological factors on the pathogenesis of cardiovascular disease and implications for therapy. Immunol Today, 99: 0 2192–2217, 1999

work page 1999

[12] [12]

and Picard, R

Sano, A. and Picard, R. W. Stress recognition using wearable sensors and mobile phones. Humaine Association Conference on Affective Computing and Intelligent Interaction, 24: 0 386--396, 2013

work page 2013

[13] [13]

J., Yu, A

Sano, A., Phillips, A. J., Yu, A. Y., and here full, F. Recognizing academic performance, sleep quality, stress level and mental health using personality traits, wearable sensors and mobile phones. Draft for Body Sensor Networks 2015, 24: 0 386--396, 2015

work page 2015

[14] [14]

L., G.Tröste, and Ehler, U

Setz, C., Arnrich, B., Schumm, J., Marca, R. L., G.Tröste, and Ehler, U. Discriminating stress from cognitive load using a wearable eda device. A publication of the IEEE Engineering in Medicine and Biology Society, 14: 0 410–7, 2010

work page 2010

[15] [15]

Effects of stress throughout the lifespan on the brain, behaviour and cognition

SJ, L., BS, M., MR, G., and C, H. Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nat Rev Neurosci, 10: 0 434–445, 2009

work page 2009

[16] [16]

Stults-Kolehmainen, M. A. and Sinha, R. The effects of stress on physical activity and exercise. Sports Med., 44: 0 81–121, 2014

work page 2014

[17] [17]

T., Barnes, M

Trokel, M. T., Barnes, M. D., and Egget, D. L. Health-related variables and academic performance among first-year college students: Implications for sleep and other behaviours. Journal of American College health, 49: 0 125--131, 2000

work page 2000

[18] [18]

G., van Doornen, L

Vrijkotte, T. G., van Doornen, L. J., and de Geus, E. J. Effects of work stress on ambulatory blood pressure, heart rate and heart rate variability. Hypertension, 35: 0 880–886, 2000

work page 2000

[19] [19]

Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., Zhou, X., Ben-Zeev, D., , and Campbell, A. T. Studentlife: Assessing mental health, academic performance and behavioral trends of college students using smartphones. UbiComp, 2014

work page 2014

[20] [20]

F., Kelley, W

Wang, R., Wang, W., Dasilva, A., Huckins, J. F., Kelley, W. M., Heatherton, T. F., and Chambell, A. T. Tracking depression dynamics in college students using mobile phone and wearable sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies,, 2, 2018

work page 2018

[21] [21]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page