Inter and Intra Document Attention for Depression Risk Assessment
Pith reviewed 2026-05-25 12:26 UTC · model grok-4.3
The pith
An attention mechanism processes all user posts in parallel to prioritize important writings for depression risk classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The best model reads through all writings of a user in parallel but uses an attention mechanism to prioritize the most important ones at each timestep, outperforming other RNN variants with different aggregation methods on the eRisk 2018 dataset.
What carries the argument
Inter and intra document attention mechanism that weighs importance of posts within and across a user's sequence.
If this is right
- Parallel attention across posts yields higher classification accuracy than sequential RNN processing or simple aggregation.
- The model can handle arbitrary numbers of user contributions by dynamically prioritizing key writings.
- Attention-based aggregation improves upon methods that treat all posts equally or process them in fixed order.
- The approach demonstrates that focusing on salient posts at each timestep enhances risk assessment performance.
Where Pith is reading between the lines
- The same parallel attention structure could be tested on other social media signals such as anxiety indicators if labeled data becomes available.
- Deployment would need safeguards for user privacy since the input is public but personal writing history.
- The method might extend to real-time monitoring by updating attention weights as new posts arrive.
- Similar inter-intra attention could apply to other sequence classification tasks like identifying misinformation spreaders.
Load-bearing premise
The eRisk 2018 dataset labels provide a valid and generalizable signal for depression risk that attention can reliably extract from post sequences.
What would settle it
Re-running the attention model on a fresh collection of social media users whose depression status was verified independently of the eRisk labels shows no accuracy gain over non-attention RNN baselines.
read the original abstract
We take interest in the early assessment of risk for depression in social media users. We focus on the eRisk 2018 dataset, which represents users as a sequence of their written online contributions. We implement four RNN-based systems to classify the users. We explore several aggregations methods to combine predictions on individual posts. Our best model reads through all writings of a user in parallel but uses an attention mechanism to prioritize the most important ones at each timestep.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript implements four RNN-based classifiers for early depression risk assessment on the eRisk 2018 dataset, where users are represented as sequences of social media posts. It examines multiple aggregation methods for combining per-post predictions and identifies a best model that processes all writings in parallel while applying an attention mechanism to prioritize the most important posts at each timestep.
Significance. If the empirical results hold, the work would contribute to NLP applications in mental health by showing how attention can manage long, variable-length user post sequences for risk classification, potentially informing scalable early-detection systems on social media platforms.
major comments (2)
- [Abstract] Abstract: the description of the best model asserts superiority via attention but supplies no performance numbers, baselines, error bars, or validation details, preventing any determination of whether the data supports the effectiveness claim.
- [Abstract] Abstract: the central claim that attention reliably prioritizes writings containing a true depression risk signal rests on the unexamined assumption that eRisk 2018 labels constitute a valid, low-noise proxy; no analysis of label quality, attention-weight correlation with lexical markers, or external clinical validation is described.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the description of the best model asserts superiority via attention but supplies no performance numbers, baselines, error bars, or validation details, preventing any determination of whether the data supports the effectiveness claim.
Authors: We agree that the abstract, due to its brevity, does not include quantitative results or validation details. The full manuscript reports performance metrics, baseline comparisons, and the experimental protocol (including how the eRisk 2018 data was split) in the experiments section. We will revise the abstract to include the key performance figures for the best model and a brief note on the evaluation setup. revision: yes
-
Referee: [Abstract] Abstract: the central claim that attention reliably prioritizes writings containing a true depression risk signal rests on the unexamined assumption that eRisk 2018 labels constitute a valid, low-noise proxy; no analysis of label quality, attention-weight correlation with lexical markers, or external clinical validation is described.
Authors: The abstract states that the model 'uses an attention mechanism to prioritize the most important ones,' without claiming clinical reliability or direct correlation to an underlying 'true' signal. The work treats the eRisk 2018 labels as the task definition, consistent with prior literature on this benchmark. No label-quality or external-validation analysis is present because it lies outside the paper's scope of evaluating attention-based aggregation methods. We will revise the abstract wording to ensure the claim is limited to improved classification performance on the given labels. revision: partial
Circularity Check
No circularity: empirical ML application with no derivations or self-referential reductions
full rationale
The paper describes four RNN-based classifiers applied to the eRisk 2018 dataset, including an attention-augmented model that processes user writings in parallel. No equations, parameter-fitting derivations, or uniqueness theorems are presented. Claims rest on standard attention mechanisms and aggregation methods evaluated empirically on the given labels; these are externally falsifiable via held-out performance rather than reducing to inputs by construction. No self-citations or ansatzes are invoked as load-bearing steps. This is a typical non-circular empirical paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Recurrent neural networks can effectively model sequences of text posts for classification.
Reference graph
Works this paper leans on
-
[1]
http://early.irlab.org/, Accessed July 6, 2018
CLEF eRisk pilot task. http://early.irlab.org/, Accessed July 6, 2018
work page 2018
-
[2]
http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018
CLPsych Shared Task. http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018
work page 2017
-
[3]
N., Ford, D.E.: Seasonality in seeking mental health information on Google
Ayers, J.W., Althouse, B.M., Allem, J.P., Rosenquist, J. N., Ford, D.E.: Seasonality in seeking mental health information on Google. American Jo urnal of Preventive Medicine (AJPM) 44(5), 520–525 (2013)
work page 2013
-
[4]
Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine transla tion by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (20 14)
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
IEEE transactions on neural ne tworks 5(2), 157–166 (1994)
Bengio, Y., Simard, P., Frasconi, P., et al.: Learning lon g-term dependencies with gradient descent is difficult. IEEE transactions on neural ne tworks 5(2), 157–166 (1994)
work page 1994
-
[6]
Boak, A., Hamilton, H.A., Adlaf, E.M., Henderson, J.L., M ann, R.E.: The mental health and well-being of Ontario students, 1991-2017: Deta iled Findings from the Ontario Student Drug Use and Health Survey. CamhOSDUHS (201 6)
work page 1991
-
[7]
Canada, S.: Accessing Mental Health Care in Canada (2017) , https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017019-eng.htm
work page 2017
-
[8]
In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-n etworks for machine reading. In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing. pp. 551–561 (2016)
work page 2016
-
[9]
World Psychiatry 13(2), 153–160 (2014)
Chesney, E., Goodwin, G.M., Fazel, S.: Risks of all-cause and suicide mortality in mental disorders: a meta-review. World Psychiatry 13(2), 153–160 (2014)
work page 2014
-
[10]
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical E valuation of Gated Re- current Neural Networks on Sequence Modeling. arXiv prepri nt arXiv:1412.3555 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[11]
In: Seventh international AAAI conference on weblogs and social media (2013)
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Pre dicting depression via social media. In: Seventh international AAAI conference on weblogs and social media (2013)
work page 2013
-
[12]
Graves, A., Wayne, G., Danihelka, I.: Neural Turing Mach ines. arXiv preprint arXiv:1410.5401 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[13]
Neural computation 9(8), 1735–1780 (1997)
Hochreiter, S., Schmidhuber, J.: Long Short-term Memor y. Neural computation 9(8), 1735–1780 (1997)
work page 1997
-
[14]
Ive, J., Gkotsis, G., Dutta, R., Stewart, R., Velupillai , S.: Hierarchical neural model with attention mechanisms for the classification of social m edia text related to mental health. In: Proceedings of the Fifth Workshop on Comp utational Linguistics and Clinical Psychology: From Keyboard to Clinic. pp. 69–77 (2018)
work page 2018
-
[15]
JAMA 289(23), 3095–3105 (2003)
Kessler, R., Berglund, P., Demler, O., et al: The epidemi ology of major depressive disorder: Results from the national comorbidity survey rep lication (ncs-r). JAMA 289(23), 3095–3105 (2003)
work page 2003
-
[16]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A method for stochastic optim ization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[17]
Multiplicative LSTM for sequence modelling
Krause, B., Lu, L., Murray, I., Renals, S.: Multiplicati ve lstm for sequence mod- elling. arXiv preprint arXiv:1609.07959 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[18]
In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction
Losada, D.E., Crestani, F., Parapar, J.: Overview of eRi sk – Early Risk Predic- tion on the Internet. In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction. Proceedings of the Ninth International Co nference of the CLEF Association (CLEF 2018). Avignon, France (2018)
work page 2018
-
[19]
arXiv preprint arXiv:1508.04 025 (2015)
Luong, M.T., Pham, H., Manning, C.D.: Effective approach es to attention-based neural machine translation. arXiv preprint arXiv:1508.04 025 (2015)
work page 2015
-
[20]
In: Advances in neural information processing systems
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dea n, J.: Distributed repre- sentations of words and phrases and their compositionality . In: Advances in neural information processing systems. pp. 3111–3119 (2013)
work page 2013
-
[21]
Learning to Generate Reviews and Discovering Sentiment
Radford, A., Jozefowicz, R., Sutskever, I.: Learning to generate reviews and dis- covering sentiment. arXiv preprint arXiv:1704.01444 (201 7)
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
https://www.reddit.com/, Accessed July 6, 2018
Reddit: Reddit. https://www.reddit.com/, Accessed July 6, 2018
work page 2018
-
[23]
American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)
Rodrigues, S., Bokhour, B., Mueller, N., Dell, N., Osei- Bonsu, P.E., Zhao, S., Glick- man, M., Eisen, S.V., Elwy, A.R.: Impact of stigma on veteran treatment seek- ing for depression. American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)
work page 2014
-
[24]
In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11)
Sutskever, I., Martens, J., Hinton, G.E.: Generating te xt with recurrent neural net- works. In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11). pp. 1017–1024 (2011)
work page 2011
-
[25]
The primary care companion to CNS disorders 13(2) (2011)
Vermani, M., Marcus, M., Katzman, M.A.: Rates of detecti on of mood and anxiety disorders in primary care: a descriptive, cross-sectional study. The primary care companion to CNS disorders 13(2) (2011)
work page 2011
-
[26]
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhu dinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption g eneration with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd In- ternational Conference on Machine Learning. Proceedings o f Machine Learn- ing Research, vol. 37, pp. 2048–2057. PMLR, Lille, F...
work page 2048
-
[27]
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: H ierarchical Attention Networks for Document Classification. In: Proceedings of th e 2016 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies. pp. 1480–1489 (2016)
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.