pith. sign in

arxiv: 1806.05258 · v2 · pith:Q236KRIVnew · submitted 2018-06-13 · 💻 cs.CL

SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions

classification 💻 cs.CL
keywords mentalhealthconditionslanguageuserssmhddatasetdiagnoses
0
0 comments X
read the original abstract

Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study

    cs.CL 2019-07 unverdicted novelty 4.0

    Finetuning GPT-1 on 150000 unlabeled Reachout.com posts then feeding the features into AutoML yields a new state-of-the-art macro F1 of 0.572 for triaging risk in 1588 labeled CLPsych 2017 posts without metadata or history.