pith. sign in

arxiv: 1907.00462 · v1 · pith:DD5KYOTTnew · submitted 2019-06-30 · 💻 cs.CL · cs.LG

Inter and Intra Document Attention for Depression Risk Assessment

Pith reviewed 2026-05-25 12:26 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords depression risk assessmentsocial media analysisattention mechanismRNN classifierseRisk datasetearly detectionnatural language processinguser sequence classification
0
0 comments X

The pith

An attention mechanism processes all user posts in parallel to prioritize important writings for depression risk classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper implements four RNN-based systems to classify social media users for depression risk using the eRisk 2018 dataset of sequential user writings. It tests multiple ways to aggregate predictions from individual posts and identifies the best performance from a model that reads all writings simultaneously while applying attention to emphasize the most relevant ones at each step. This setup differs from strict sequential processing by allowing flexible focus across a user's contributions. A sympathetic reader would care because it points to a concrete way of handling variable post sequences for early risk flagging.

Core claim

The best model reads through all writings of a user in parallel but uses an attention mechanism to prioritize the most important ones at each timestep, outperforming other RNN variants with different aggregation methods on the eRisk 2018 dataset.

What carries the argument

Inter and intra document attention mechanism that weighs importance of posts within and across a user's sequence.

If this is right

  • Parallel attention across posts yields higher classification accuracy than sequential RNN processing or simple aggregation.
  • The model can handle arbitrary numbers of user contributions by dynamically prioritizing key writings.
  • Attention-based aggregation improves upon methods that treat all posts equally or process them in fixed order.
  • The approach demonstrates that focusing on salient posts at each timestep enhances risk assessment performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same parallel attention structure could be tested on other social media signals such as anxiety indicators if labeled data becomes available.
  • Deployment would need safeguards for user privacy since the input is public but personal writing history.
  • The method might extend to real-time monitoring by updating attention weights as new posts arrive.
  • Similar inter-intra attention could apply to other sequence classification tasks like identifying misinformation spreaders.

Load-bearing premise

The eRisk 2018 dataset labels provide a valid and generalizable signal for depression risk that attention can reliably extract from post sequences.

What would settle it

Re-running the attention model on a fresh collection of social media users whose depression status was verified independently of the eRisk labels shows no accuracy gain over non-attention RNN baselines.

read the original abstract

We take interest in the early assessment of risk for depression in social media users. We focus on the eRisk 2018 dataset, which represents users as a sequence of their written online contributions. We implement four RNN-based systems to classify the users. We explore several aggregations methods to combine predictions on individual posts. Our best model reads through all writings of a user in parallel but uses an attention mechanism to prioritize the most important ones at each timestep.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript implements four RNN-based classifiers for early depression risk assessment on the eRisk 2018 dataset, where users are represented as sequences of social media posts. It examines multiple aggregation methods for combining per-post predictions and identifies a best model that processes all writings in parallel while applying an attention mechanism to prioritize the most important posts at each timestep.

Significance. If the empirical results hold, the work would contribute to NLP applications in mental health by showing how attention can manage long, variable-length user post sequences for risk classification, potentially informing scalable early-detection systems on social media platforms.

major comments (2)
  1. [Abstract] Abstract: the description of the best model asserts superiority via attention but supplies no performance numbers, baselines, error bars, or validation details, preventing any determination of whether the data supports the effectiveness claim.
  2. [Abstract] Abstract: the central claim that attention reliably prioritizes writings containing a true depression risk signal rests on the unexamined assumption that eRisk 2018 labels constitute a valid, low-noise proxy; no analysis of label quality, attention-weight correlation with lexical markers, or external clinical validation is described.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the description of the best model asserts superiority via attention but supplies no performance numbers, baselines, error bars, or validation details, preventing any determination of whether the data supports the effectiveness claim.

    Authors: We agree that the abstract, due to its brevity, does not include quantitative results or validation details. The full manuscript reports performance metrics, baseline comparisons, and the experimental protocol (including how the eRisk 2018 data was split) in the experiments section. We will revise the abstract to include the key performance figures for the best model and a brief note on the evaluation setup. revision: yes

  2. Referee: [Abstract] Abstract: the central claim that attention reliably prioritizes writings containing a true depression risk signal rests on the unexamined assumption that eRisk 2018 labels constitute a valid, low-noise proxy; no analysis of label quality, attention-weight correlation with lexical markers, or external clinical validation is described.

    Authors: The abstract states that the model 'uses an attention mechanism to prioritize the most important ones,' without claiming clinical reliability or direct correlation to an underlying 'true' signal. The work treats the eRisk 2018 labels as the task definition, consistent with prior literature on this benchmark. No label-quality or external-validation analysis is present because it lies outside the paper's scope of evaluating attention-based aggregation methods. We will revise the abstract wording to ensure the claim is limited to improved classification performance on the given labels. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ML application with no derivations or self-referential reductions

full rationale

The paper describes four RNN-based classifiers applied to the eRisk 2018 dataset, including an attention-augmented model that processes user writings in parallel. No equations, parameter-fitting derivations, or uniqueness theorems are presented. Claims rest on standard attention mechanisms and aggregation methods evaluated empirically on the given labels; these are externally falsifiable via held-out performance rather than reducing to inputs by construction. No self-citations or ansatzes are invoked as load-bearing steps. This is a typical non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms beyond standard NLP assumptions, or invented entities are described.

axioms (1)
  • domain assumption Recurrent neural networks can effectively model sequences of text posts for classification.
    Standard assumption in sequence modeling for NLP tasks.

pith-pipeline@v0.9.0 · 5595 in / 1089 out tokens · 42087 ms · 2026-05-25T12:26:11.155498+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 6 internal anchors

  1. [1]

    http://early.irlab.org/, Accessed July 6, 2018

    CLEF eRisk pilot task. http://early.irlab.org/, Accessed July 6, 2018

  2. [2]

    http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018

    CLPsych Shared Task. http://clpsych.org/shared-task-2017/ , Accessed July 6, 2018

  3. [3]

    N., Ford, D.E.: Seasonality in seeking mental health information on Google

    Ayers, J.W., Althouse, B.M., Allem, J.P., Rosenquist, J. N., Ford, D.E.: Seasonality in seeking mental health information on Google. American Jo urnal of Preventive Medicine (AJPM) 44(5), 520–525 (2013)

  4. [4]

    Neural Machine Translation by Jointly Learning to Align and Translate

    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine transla tion by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (20 14)

  5. [5]

    IEEE transactions on neural ne tworks 5(2), 157–166 (1994)

    Bengio, Y., Simard, P., Frasconi, P., et al.: Learning lon g-term dependencies with gradient descent is difficult. IEEE transactions on neural ne tworks 5(2), 157–166 (1994)

  6. [6]

    CamhOSDUHS (201 6)

    Boak, A., Hamilton, H.A., Adlaf, E.M., Henderson, J.L., M ann, R.E.: The mental health and well-being of Ontario students, 1991-2017: Deta iled Findings from the Ontario Student Drug Use and Health Survey. CamhOSDUHS (201 6)

  7. [7]

    Canada, S.: Accessing Mental Health Care in Canada (2017) , https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017019-eng.htm

  8. [8]

    In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing

    Cheng, J., Dong, L., Lapata, M.: Long short-term memory-n etworks for machine reading. In: Proceedings of the 2016 Conference on Empirica l Methods in Natural Language Processing. pp. 551–561 (2016)

  9. [9]

    World Psychiatry 13(2), 153–160 (2014)

    Chesney, E., Goodwin, G.M., Fazel, S.: Risks of all-cause and suicide mortality in mental disorders: a meta-review. World Psychiatry 13(2), 153–160 (2014)

  10. [10]

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical E valuation of Gated Re- current Neural Networks on Sequence Modeling. arXiv prepri nt arXiv:1412.3555 (2014)

  11. [11]

    In: Seventh international AAAI conference on weblogs and social media (2013)

    De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Pre dicting depression via social media. In: Seventh international AAAI conference on weblogs and social media (2013)

  12. [12]

    Neural Turing Machines

    Graves, A., Wayne, G., Danihelka, I.: Neural Turing Mach ines. arXiv preprint arXiv:1410.5401 (2014)

  13. [13]

    Neural computation 9(8), 1735–1780 (1997)

    Hochreiter, S., Schmidhuber, J.: Long Short-term Memor y. Neural computation 9(8), 1735–1780 (1997)

  14. [14]

    In: Proceedings of the Fifth Workshop on Comp utational Linguistics and Clinical Psychology: From Keyboard to Clinic

    Ive, J., Gkotsis, G., Dutta, R., Stewart, R., Velupillai , S.: Hierarchical neural model with attention mechanisms for the classification of social m edia text related to mental health. In: Proceedings of the Fifth Workshop on Comp utational Linguistics and Clinical Psychology: From Keyboard to Clinic. pp. 69–77 (2018)

  15. [15]

    JAMA 289(23), 3095–3105 (2003)

    Kessler, R., Berglund, P., Demler, O., et al: The epidemi ology of major depressive disorder: Results from the national comorbidity survey rep lication (ncs-r). JAMA 289(23), 3095–3105 (2003)

  16. [16]

    Adam: A Method for Stochastic Optimization

    Kingma, D.P., Ba, J.: Adam: A method for stochastic optim ization. arXiv preprint arXiv:1412.6980 (2014)

  17. [17]

    Multiplicative LSTM for sequence modelling

    Krause, B., Lu, L., Murray, I., Renals, S.: Multiplicati ve lstm for sequence mod- elling. arXiv preprint arXiv:1609.07959 (2016)

  18. [18]

    In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction

    Losada, D.E., Crestani, F., Parapar, J.: Overview of eRi sk – Early Risk Predic- tion on the Internet. In: Experimental IR Meets Multilingua lity, Multimodality, and Interaction. Proceedings of the Ninth International Co nference of the CLEF Association (CLEF 2018). Avignon, France (2018)

  19. [19]

    arXiv preprint arXiv:1508.04 025 (2015)

    Luong, M.T., Pham, H., Manning, C.D.: Effective approach es to attention-based neural machine translation. arXiv preprint arXiv:1508.04 025 (2015)

  20. [20]

    In: Advances in neural information processing systems

    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dea n, J.: Distributed repre- sentations of words and phrases and their compositionality . In: Advances in neural information processing systems. pp. 3111–3119 (2013)

  21. [21]

    Learning to Generate Reviews and Discovering Sentiment

    Radford, A., Jozefowicz, R., Sutskever, I.: Learning to generate reviews and dis- covering sentiment. arXiv preprint arXiv:1704.01444 (201 7)

  22. [22]

    https://www.reddit.com/, Accessed July 6, 2018

    Reddit: Reddit. https://www.reddit.com/, Accessed July 6, 2018

  23. [23]

    American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)

    Rodrigues, S., Bokhour, B., Mueller, N., Dell, N., Osei- Bonsu, P.E., Zhao, S., Glick- man, M., Eisen, S.V., Elwy, A.R.: Impact of stigma on veteran treatment seek- ing for depression. American Journal of Psychiatric Rehabi litation 17(2), 128–146 (2014)

  24. [24]

    In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11)

    Sutskever, I., Martens, J., Hinton, G.E.: Generating te xt with recurrent neural net- works. In: Proceedings of the 28th International Conferenc e on Machine Learning (ICML-11). pp. 1017–1024 (2011)

  25. [25]

    The primary care companion to CNS disorders 13(2) (2011)

    Vermani, M., Marcus, M., Katzman, M.A.: Rates of detecti on of mood and anxiety disorders in primary care: a descriptive, cross-sectional study. The primary care companion to CNS disorders 13(2) (2011)

  26. [26]

    In: Bach, F., Blei, D

    Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhu dinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption g eneration with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd In- ternational Conference on Machine Learning. Proceedings o f Machine Learn- ing Research, vol. 37, pp. 2048–2057. PMLR, Lille, F...

  27. [27]

    In: Proceedings of th e 2016 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies

    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: H ierarchical Attention Networks for Document Classification. In: Proceedings of th e 2016 Conference of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies. pp. 1480–1489 (2016)