Inter and Intra Document Attention for Depression Risk Assessment

Diego Maupom\'e; Marc Queudot; Marie-Jean Meurs

arxiv: 1907.00462 · v1 · pith:DD5KYOTTnew · submitted 2019-06-30 · 💻 cs.CL · cs.LG

Inter and Intra Document Attention for Depression Risk Assessment

Diego Maupom\'e , Marc Queudot , Marie-Jean Meurs This is my paper

Pith reviewed 2026-05-25 12:26 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords depression risk assessmentsocial media analysisattention mechanismRNN classifierseRisk datasetearly detectionnatural language processinguser sequence classification

0 comments

The pith

An attention mechanism processes all user posts in parallel to prioritize important writings for depression risk classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper implements four RNN-based systems to classify social media users for depression risk using the eRisk 2018 dataset of sequential user writings. It tests multiple ways to aggregate predictions from individual posts and identifies the best performance from a model that reads all writings simultaneously while applying attention to emphasize the most relevant ones at each step. This setup differs from strict sequential processing by allowing flexible focus across a user's contributions. A sympathetic reader would care because it points to a concrete way of handling variable post sequences for early risk flagging.

Core claim

The best model reads through all writings of a user in parallel but uses an attention mechanism to prioritize the most important ones at each timestep, outperforming other RNN variants with different aggregation methods on the eRisk 2018 dataset.

What carries the argument

Inter and intra document attention mechanism that weighs importance of posts within and across a user's sequence.

If this is right

Parallel attention across posts yields higher classification accuracy than sequential RNN processing or simple aggregation.
The model can handle arbitrary numbers of user contributions by dynamically prioritizing key writings.
Attention-based aggregation improves upon methods that treat all posts equally or process them in fixed order.
The approach demonstrates that focusing on salient posts at each timestep enhances risk assessment performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same parallel attention structure could be tested on other social media signals such as anxiety indicators if labeled data becomes available.
Deployment would need safeguards for user privacy since the input is public but personal writing history.
The method might extend to real-time monitoring by updating attention weights as new posts arrive.
Similar inter-intra attention could apply to other sequence classification tasks like identifying misinformation spreaders.

Load-bearing premise

The eRisk 2018 dataset labels provide a valid and generalizable signal for depression risk that attention can reliably extract from post sequences.

What would settle it

Re-running the attention model on a fresh collection of social media users whose depression status was verified independently of the eRisk labels shows no accuracy gain over non-attention RNN baselines.

read the original abstract

We take interest in the early assessment of risk for depression in social media users. We focus on the eRisk 2018 dataset, which represents users as a sequence of their written online contributions. We implement four RNN-based systems to classify the users. We explore several aggregations methods to combine predictions on individual posts. Our best model reads through all writings of a user in parallel but uses an attention mechanism to prioritize the most important ones at each timestep.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies standard attention RNNs to eRisk 2018 but the abstract supplies zero results, baselines or validation, so the work is impossible to evaluate.

read the letter

This paper applies RNNs with attention to the eRisk 2018 task of classifying social media users for depression risk. The authors test four RNN variants and several ways to aggregate post-level predictions, with the strongest version reading all posts in parallel and using attention to weight them at each step. That aggregation focus is a practical detail for these kinds of user timelines. Beyond that, nothing new appears in the abstract: the techniques are established, and the dataset is the standard shared-task benchmark. The paper does not claim a new framework or derive anything from first principles. The real problem is the complete absence of numbers. No accuracies, no baselines, no error bars, no comparison to prior eRisk entries. Without those, there is no way to tell whether the attention step improves anything or simply fits label noise. The stress-test point about the eRisk labels is fair; the abstract gives no analysis of whether attention weights align with known markers or whether the labels themselves are reliable enough for the claimed mechanism to be tested. This is narrow work aimed at people already running eRisk submissions. A broader reader gets little value. I would not bring it to a reading group or cite it. It does not deserve peer review until the full version shows actual results that can be checked against existing numbers.

Referee Report

2 major / 0 minor

Summary. The manuscript implements four RNN-based classifiers for early depression risk assessment on the eRisk 2018 dataset, where users are represented as sequences of social media posts. It examines multiple aggregation methods for combining per-post predictions and identifies a best model that processes all writings in parallel while applying an attention mechanism to prioritize the most important posts at each timestep.

Significance. If the empirical results hold, the work would contribute to NLP applications in mental health by showing how attention can manage long, variable-length user post sequences for risk classification, potentially informing scalable early-detection systems on social media platforms.

major comments (2)

[Abstract] Abstract: the description of the best model asserts superiority via attention but supplies no performance numbers, baselines, error bars, or validation details, preventing any determination of whether the data supports the effectiveness claim.
[Abstract] Abstract: the central claim that attention reliably prioritizes writings containing a true depression risk signal rests on the unexamined assumption that eRisk 2018 labels constitute a valid, low-noise proxy; no analysis of label quality, attention-weight correlation with lexical markers, or external clinical validation is described.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the description of the best model asserts superiority via attention but supplies no performance numbers, baselines, error bars, or validation details, preventing any determination of whether the data supports the effectiveness claim.

Authors: We agree that the abstract, due to its brevity, does not include quantitative results or validation details. The full manuscript reports performance metrics, baseline comparisons, and the experimental protocol (including how the eRisk 2018 data was split) in the experiments section. We will revise the abstract to include the key performance figures for the best model and a brief note on the evaluation setup. revision: yes
Referee: [Abstract] Abstract: the central claim that attention reliably prioritizes writings containing a true depression risk signal rests on the unexamined assumption that eRisk 2018 labels constitute a valid, low-noise proxy; no analysis of label quality, attention-weight correlation with lexical markers, or external clinical validation is described.

Authors: The abstract states that the model 'uses an attention mechanism to prioritize the most important ones,' without claiming clinical reliability or direct correlation to an underlying 'true' signal. The work treats the eRisk 2018 labels as the task definition, consistent with prior literature on this benchmark. No label-quality or external-validation analysis is present because it lies outside the paper's scope of evaluating attention-based aggregation methods. We will revise the abstract wording to ensure the claim is limited to improved classification performance on the given labels. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ML application with no derivations or self-referential reductions

full rationale

The paper describes four RNN-based classifiers applied to the eRisk 2018 dataset, including an attention-augmented model that processes user writings in parallel. No equations, parameter-fitting derivations, or uniqueness theorems are presented. Claims rest on standard attention mechanisms and aggregation methods evaluated empirically on the given labels; these are externally falsifiable via held-out performance rather than reducing to inputs by construction. No self-citations or ansatzes are invoked as load-bearing steps. This is a typical non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms beyond standard NLP assumptions, or invented entities are described.

axioms (1)

domain assumption Recurrent neural networks can effectively model sequences of text posts for classification.
Standard assumption in sequence modeling for NLP tasks.

pith-pipeline@v0.9.0 · 5595 in / 1089 out tokens · 42087 ms · 2026-05-25T12:26:11.155498+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 6 internal anchors

[1]

http://early.irlab.org/, Accessed July 6, 2018

CLEF eRisk pilot task. http://early.irlab.org/, Accessed July 6, 2018

work page 2018
[2]

http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018

CLPsych Shared Task. http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018

work page 2017
[3]

N., Ford, D.E.: Seasonality in seeking mental health information on Google

Ayers, J.W., Althouse, B.M., Allem, J.P., Rosenquist, J. N., Ford, D.E.: Seasonality in seeking mental health information on Google. American Jo urnal of Preventive Medicine (AJPM) 44(5), 520–525 (2013)

work page 2013
[4]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine transla tion by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (20 14)

work page internal anchor Pith review Pith/arXiv arXiv
[5]

IEEE transactions on neural ne tworks 5(2), 157–166 (1994)

Bengio, Y., Simard, P., Frasconi, P., et al.: Learning lon g-term dependencies with gradient descent is diﬃcult. IEEE transactions on neural ne tworks 5(2), 157–166 (1994)

work page 1994
[6]

CamhOSDUHS (201 6)

Boak, A., Hamilton, H.A., Adlaf, E.M., Henderson, J.L., M ann, R.E.: The mental health and well-being of Ontario students, 1991-2017: Deta iled Findings from the Ontario Student Drug Use and Health Survey. CamhOSDUHS (201 6)

work page 1991
[7]

Canada, S.: Accessing Mental Health Care in Canada (2017) , https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017019-eng.htm

work page 2017
[8]

In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing

Cheng, J., Dong, L., Lapata, M.: Long short-term memory-n etworks for machine reading. In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing. pp. 551–561 (2016)

work page 2016
[9]

World Psychiatry 13(2), 153–160 (2014)

Chesney, E., Goodwin, G.M., Fazel, S.: Risks of all-cause and suicide mortality in mental disorders: a meta-review. World Psychiatry 13(2), 153–160 (2014)

work page 2014
[10]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical E valuation of Gated Re- current Neural Networks on Sequence Modeling. arXiv prepri nt arXiv:1412.3555 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[11]

In: Seventh international AAAI conference on weblogs and social media (2013)

De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Pre dicting depression via social media. In: Seventh international AAAI conference on weblogs and social media (2013)

work page 2013
[12]

Neural Turing Machines

Graves, A., Wayne, G., Danihelka, I.: Neural Turing Mach ines. arXiv preprint arXiv:1410.5401 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

Neural computation 9(8), 1735–1780 (1997)

Hochreiter, S., Schmidhuber, J.: Long Short-term Memor y. Neural computation 9(8), 1735–1780 (1997)

work page 1997
[14]

In: Proceedings of the Fifth Workshop on Comp utational Linguistics and Clinical Psychology: From Keyboard to Clinic

Ive, J., Gkotsis, G., Dutta, R., Stewart, R., Velupillai , S.: Hierarchical neural model with attention mechanisms for the classiﬁcation of social m edia text related to mental health. In: Proceedings of the Fifth Workshop on Comp utational Linguistics and Clinical Psychology: From Keyboard to Clinic. pp. 69–77 (2018)

work page 2018
[15]

JAMA 289(23), 3095–3105 (2003)

Kessler, R., Berglund, P., Demler, O., et al: The epidemi ology of major depressive disorder: Results from the national comorbidity survey rep lication (ncs-r). JAMA 289(23), 3095–3105 (2003)

work page 2003
[16]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optim ization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[17]

Multiplicative LSTM for sequence modelling

Krause, B., Lu, L., Murray, I., Renals, S.: Multiplicati ve lstm for sequence mod- elling. arXiv preprint arXiv:1609.07959 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction

Losada, D.E., Crestani, F., Parapar, J.: Overview of eRi sk – Early Risk Predic- tion on the Internet. In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction. Proceedings of the Ninth International Co nference of the CLEF Association (CLEF 2018). Avignon, France (2018)

work page 2018
[19]

arXiv preprint arXiv:1508.04 025 (2015)

Luong, M.T., Pham, H., Manning, C.D.: Eﬀective approach es to attention-based neural machine translation. arXiv preprint arXiv:1508.04 025 (2015)

work page 2015
[20]

In: Advances in neural information processing systems

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dea n, J.: Distributed repre- sentations of words and phrases and their compositionality . In: Advances in neural information processing systems. pp. 3111–3119 (2013)

work page 2013
[21]

Learning to Generate Reviews and Discovering Sentiment

Radford, A., Jozefowicz, R., Sutskever, I.: Learning to generate reviews and dis- covering sentiment. arXiv preprint arXiv:1704.01444 (201 7)

work page internal anchor Pith review Pith/arXiv arXiv
[22]

https://www.reddit.com/, Accessed July 6, 2018

Reddit: Reddit. https://www.reddit.com/, Accessed July 6, 2018

work page 2018
[23]

American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)

Rodrigues, S., Bokhour, B., Mueller, N., Dell, N., Osei- Bonsu, P.E., Zhao, S., Glick- man, M., Eisen, S.V., Elwy, A.R.: Impact of stigma on veteran treatment seek- ing for depression. American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)

work page 2014
[24]

In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11)

Sutskever, I., Martens, J., Hinton, G.E.: Generating te xt with recurrent neural net- works. In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11). pp. 1017–1024 (2011)

work page 2011
[25]

The primary care companion to CNS disorders 13(2) (2011)

Vermani, M., Marcus, M., Katzman, M.A.: Rates of detecti on of mood and anxiety disorders in primary care: a descriptive, cross-sectional study. The primary care companion to CNS disorders 13(2) (2011)

work page 2011
[26]

In: Bach, F., Blei, D

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhu dinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption g eneration with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd In- ternational Conference on Machine Learning. Proceedings o f Machine Learn- ing Research, vol. 37, pp. 2048–2057. PMLR, Lille, F...

work page 2048
[27]

In: Proceedings of th e 2016 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: H ierarchical Attention Networks for Document Classiﬁcation. In: Proceedings of th e 2016 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies. pp. 1480–1489 (2016)

work page 2016

[1] [1]

http://early.irlab.org/, Accessed July 6, 2018

CLEF eRisk pilot task. http://early.irlab.org/, Accessed July 6, 2018

work page 2018

[2] [2]

http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018

CLPsych Shared Task. http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018

work page 2017

[3] [3]

N., Ford, D.E.: Seasonality in seeking mental health information on Google

Ayers, J.W., Althouse, B.M., Allem, J.P., Rosenquist, J. N., Ford, D.E.: Seasonality in seeking mental health information on Google. American Jo urnal of Preventive Medicine (AJPM) 44(5), 520–525 (2013)

work page 2013

[4] [4]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine transla tion by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (20 14)

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

IEEE transactions on neural ne tworks 5(2), 157–166 (1994)

Bengio, Y., Simard, P., Frasconi, P., et al.: Learning lon g-term dependencies with gradient descent is diﬃcult. IEEE transactions on neural ne tworks 5(2), 157–166 (1994)

work page 1994

[6] [6]

CamhOSDUHS (201 6)

Boak, A., Hamilton, H.A., Adlaf, E.M., Henderson, J.L., M ann, R.E.: The mental health and well-being of Ontario students, 1991-2017: Deta iled Findings from the Ontario Student Drug Use and Health Survey. CamhOSDUHS (201 6)

work page 1991

[7] [7]

Canada, S.: Accessing Mental Health Care in Canada (2017) , https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017019-eng.htm

work page 2017

[8] [8]

In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing

Cheng, J., Dong, L., Lapata, M.: Long short-term memory-n etworks for machine reading. In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing. pp. 551–561 (2016)

work page 2016

[9] [9]

World Psychiatry 13(2), 153–160 (2014)

Chesney, E., Goodwin, G.M., Fazel, S.: Risks of all-cause and suicide mortality in mental disorders: a meta-review. World Psychiatry 13(2), 153–160 (2014)

work page 2014

[10] [10]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical E valuation of Gated Re- current Neural Networks on Sequence Modeling. arXiv prepri nt arXiv:1412.3555 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[11] [11]

In: Seventh international AAAI conference on weblogs and social media (2013)

De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Pre dicting depression via social media. In: Seventh international AAAI conference on weblogs and social media (2013)

work page 2013

[12] [12]

Neural Turing Machines

Graves, A., Wayne, G., Danihelka, I.: Neural Turing Mach ines. arXiv preprint arXiv:1410.5401 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

Neural computation 9(8), 1735–1780 (1997)

Hochreiter, S., Schmidhuber, J.: Long Short-term Memor y. Neural computation 9(8), 1735–1780 (1997)

work page 1997

[14] [14]

In: Proceedings of the Fifth Workshop on Comp utational Linguistics and Clinical Psychology: From Keyboard to Clinic

Ive, J., Gkotsis, G., Dutta, R., Stewart, R., Velupillai , S.: Hierarchical neural model with attention mechanisms for the classiﬁcation of social m edia text related to mental health. In: Proceedings of the Fifth Workshop on Comp utational Linguistics and Clinical Psychology: From Keyboard to Clinic. pp. 69–77 (2018)

work page 2018

[15] [15]

JAMA 289(23), 3095–3105 (2003)

Kessler, R., Berglund, P., Demler, O., et al: The epidemi ology of major depressive disorder: Results from the national comorbidity survey rep lication (ncs-r). JAMA 289(23), 3095–3105 (2003)

work page 2003

[16] [16]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optim ization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[17] [17]

Multiplicative LSTM for sequence modelling

Krause, B., Lu, L., Murray, I., Renals, S.: Multiplicati ve lstm for sequence mod- elling. arXiv preprint arXiv:1609.07959 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction

Losada, D.E., Crestani, F., Parapar, J.: Overview of eRi sk – Early Risk Predic- tion on the Internet. In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction. Proceedings of the Ninth International Co nference of the CLEF Association (CLEF 2018). Avignon, France (2018)

work page 2018

[19] [19]

arXiv preprint arXiv:1508.04 025 (2015)

Luong, M.T., Pham, H., Manning, C.D.: Eﬀective approach es to attention-based neural machine translation. arXiv preprint arXiv:1508.04 025 (2015)

work page 2015

[20] [20]

In: Advances in neural information processing systems

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dea n, J.: Distributed repre- sentations of words and phrases and their compositionality . In: Advances in neural information processing systems. pp. 3111–3119 (2013)

work page 2013

[21] [21]

Learning to Generate Reviews and Discovering Sentiment

Radford, A., Jozefowicz, R., Sutskever, I.: Learning to generate reviews and dis- covering sentiment. arXiv preprint arXiv:1704.01444 (201 7)

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

https://www.reddit.com/, Accessed July 6, 2018

Reddit: Reddit. https://www.reddit.com/, Accessed July 6, 2018

work page 2018

[23] [23]

American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)

Rodrigues, S., Bokhour, B., Mueller, N., Dell, N., Osei- Bonsu, P.E., Zhao, S., Glick- man, M., Eisen, S.V., Elwy, A.R.: Impact of stigma on veteran treatment seek- ing for depression. American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)

work page 2014

[24] [24]

In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11)

Sutskever, I., Martens, J., Hinton, G.E.: Generating te xt with recurrent neural net- works. In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11). pp. 1017–1024 (2011)

work page 2011

[25] [25]

The primary care companion to CNS disorders 13(2) (2011)

Vermani, M., Marcus, M., Katzman, M.A.: Rates of detecti on of mood and anxiety disorders in primary care: a descriptive, cross-sectional study. The primary care companion to CNS disorders 13(2) (2011)

work page 2011

[26] [26]

In: Bach, F., Blei, D

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhu dinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption g eneration with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd In- ternational Conference on Machine Learning. Proceedings o f Machine Learn- ing Research, vol. 37, pp. 2048–2057. PMLR, Lille, F...

work page 2048

[27] [27]

In: Proceedings of th e 2016 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: H ierarchical Attention Networks for Document Classiﬁcation. In: Proceedings of th e 2016 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies. pp. 1480–1489 (2016)

work page 2016