DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods
Pith reviewed 2026-05-25 05:26 UTC · model grok-4.3
The pith
RAG-based hybrid modeling ranks first for detecting mental health improvement in timelines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The RAG-based method captures recurrent psychological change patterns across timelines, achieving first place for improvement detection and third place for deterioration detection in the sequence-level task.
What carries the argument
The retrieval-augmented generation pipeline that retrieves timeline segments to produce sequence-level summaries of mental health changes.
Load-bearing premise
Social media timelines contain sufficient and reliable signals of psychological state changes that can be captured by the described hybrid pipeline without substantial domain-specific validation or clinical ground truth.
What would settle it
A direct comparison of the system's predicted changes against independent clinical assessments of the same individuals' mental health over matching time periods.
Figures
read the original abstract
We present DreamerNLplus, a hybrid framework for modeling mental health dynamics from social media timelines in the CLPsych 2026 shared task. Our system addresses three tasks: psychological state modeling, temporal change detection, and sequence-level summarization. For Task 1, we combine LLM-based data augmentation, DeBERTa classification, and Random Forest regression for structured state prediction. For Task 2, we use few-shot prompting with a locally deployed Llama 3.1 model to detect Switch and Escalation events using short-term temporal context. For Task 3.1, we explore both a deterministic rule-based summarization pipeline and a few-shot LLM-based approach, ranking \textbf{2nd} officially. Our RAG-based method achieves strong performance in Task 3.2, ranking \textbf{1st} for Improvement and \textbf{3rd} for Deterioration, demonstrating its ability to capture recurrent psychological change patterns across timelines. Our analysis reveals key challenges, including the mismatch between classification and regression performance, the difficulty of modeling temporal transitions, and the disagreement between semantic and similarity-based evaluation metrics. These findings highlight the complexity of modeling mental health dynamics and motivate future work on unified evaluation frameworks. We share our code and prompts at https://github.com/4dpicture/CLPsych2026
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents DreamerNLplus, a hybrid framework combining LLM-based data augmentation, DeBERTa classification, Random Forest regression, few-shot prompting with Llama 3.1, deterministic rule-based summarization, and RAG methods for three tasks in the CLPsych 2026 shared task on modeling mental health dynamics from social media timelines. It reports official rankings of 2nd on Task 3.1 and 1st for Improvement / 3rd for Deterioration on Task 3.2, analyzes challenges such as classification-regression mismatch and temporal transition modeling, and releases code and prompts.
Significance. If the reported shared-task rankings hold, the work demonstrates the viability of hybrid rule-based and RAG pipelines for capturing recurrent psychological change patterns in social media data. The explicit release of code and prompts is a clear strength that supports reproducibility and extension by other researchers.
major comments (1)
- [Abstract] Abstract: The central performance claims rest on shared-task rankings (1st/3rd on Task 3.2) but the manuscript provides no error bars, baseline comparisons, or detailed ablation results for the RAG component versus the rule-based pipeline; this makes it difficult to isolate the contribution of the hybrid design to the reported rankings.
minor comments (1)
- The analysis section notes disagreement between semantic and similarity-based metrics and the difficulty of modeling temporal transitions, but does not provide concrete examples or quantitative breakdowns from the submitted runs to illustrate these challenges.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the constructive feedback on the abstract. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claims rest on shared-task rankings (1st/3rd on Task 3.2) but the manuscript provides no error bars, baseline comparisons, or detailed ablation results for the RAG component versus the rule-based pipeline; this makes it difficult to isolate the contribution of the hybrid design to the reported rankings.
Authors: We acknowledge that the abstract, constrained by length, emphasizes the official shared-task rankings without error bars or component-wise ablations. The full manuscript describes the hybrid pipeline and notes challenges such as classification-regression mismatch, but does not include the requested detailed comparisons. In the revised version we will add (i) baseline comparisons against the individual rule-based and RAG pipelines on Task 3.2 and (ii) ablation results that isolate the contribution of each component. For error bars, the shared-task protocol uses a single fixed test set; we will therefore report standard deviation across our internal development-set cross-validation runs to quantify stability. revision: yes
Circularity Check
No significant circularity; empirical shared-task results externally anchored
full rationale
The paper reports system performance on CLPsych 2026 shared tasks using hybrid LLM/RAG/rule-based pipelines. All central claims (rankings of 1st/3rd on Task 3.2, 2nd on Task 3.1) are defined and scored by external task organizers rather than by any internal derivation, fitted parameter, or self-citation chain. No equations, predictions derived from fitted inputs, or load-bearing self-citations appear. The evaluation is externally falsifiable via the shared-task leaderboard and released code/prompts.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
CL4Health WS at LREC2026, Palma, Spain , year=
Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop , author=. CL4Health WS at LREC2026, Palma, Spain , year=
-
[2]
Proceedings of the 11th Workshop on Computational Linguistics and Clinical Psychology , month=
Overview of the CLPsych 2026 Shared Task: Capturing and Characterizing Mental Health Changes through Social Media Timeline Dynamics , author=. Proceedings of the 11th Workshop on Computational Linguistics and Clinical Psychology , month=
work page 2026
-
[3]
Tseriotou, Talia and Chim, Jenny and Klein, Ayal and Shamir, Aya and Dvir, Guy and Ali, Iqra and Kennedy, Cian and Singh Kohli, Guneet and Hills, Anthony and Zirikly, Ayah and Atzil-Slonim, Dana and Liakata, Maria. Overview of the CLP sych 2025 Shared Task: Capturing Mental Health Dynamics from Social Media Timelines. Proceedings of the 10th Workshop on C...
-
[4]
2025 , month = oct, publisher =
Atzil-Slonim, Dana , title =. 2025 , month = oct, publisher =. doi:10.17605/OSF.IO/SJE8C , url =
-
[5]
Overview of the CLP sych 2022 Shared Task: Capturing Moments of Change in Longitudinal User Posts
Tsakalidis, Adam and Chim, Jenny and Bilal, Iman Munire and Zirikly, Ayah and Atzil-Slonim, Dana and Nanni, Federico and Resnik, Philip and Gaur, Manas and Roy, Kaushik and Inkster, Becky and Leintz, Jeff and Liakata, Maria. Overview of the CLP sych 2022 Shared Task: Capturing Moments of Change in Longitudinal User Posts. Proceedings of the Eighth Worksho...
-
[6]
Atzil-Slonim, Dana , title =. Practice-Based Evidence in the Psychological Therapies: Toward Policy Implications for Research, Training, and Clinical Guidelines , editor =. 2026 , url =
work page 2026
-
[7]
Prompt Engineering for Capturing Dynamic Mental Health Self States from Social Media Posts
Chan, Callum and Khunkhun, Sunveer and Inkpen, Diana and Lossio-Ventura, Juan Antonio. Prompt Engineering for Capturing Dynamic Mental Health Self States from Social Media Posts. Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025). 2025. doi:10.18653/v1/2025.clpsych-1.22
-
[8]
A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs
Kermani, Arshia and Perez-Rosas, Veronica and Metsis, Vangelis. A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG. Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025). 2025. doi:10.18653/v1/2025.clpsych-1.14
-
[9]
Translational Psychiatry , volume=
Natural language processing for mental health interventions: a systematic review and research framework , author=. Translational Psychiatry , volume=. 2023 , publisher=
work page 2023
-
[10]
Journal of medical Internet research , volume=
Machine learning and natural language processing in mental health: systematic review , author=. Journal of medical Internet research , volume=. 2021 , publisher=
work page 2021
-
[11]
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health) , pages=
LT3: Generating medication prescriptions with conditional transformer , author=. Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health) , pages=
-
[12]
MaLei at MultiClinSUM: Summarisation of Clinical Documents using Perspective-Aware Iterative Self-Prompting with LLMs , author=
-
[13]
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health) , pages=
The manchester bees at peranssumm 2025: Iterative self-prompting with claude and o1 for perspective-aware healthcare answer summarisation , author=. Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health) , pages=
work page 2025
-
[14]
arXiv preprint arXiv:2603.01910 , year=
FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures , author=. arXiv preprint arXiv:2603.01910 , year=
-
[15]
The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal Study
Lindevelt, David and Verberne, Suzan and Broekens, Joost. The Correlation Between Emotion in Text and Speech Segments is Limited: A Cross-Modal Study. Findings of the A ssociation for C omputational L inguistics: EACL 2026. 2026. doi:10.18653/v1/2026.findings-eacl.136
-
[16]
An enhanced aspect-based sentiment analysis model based on roberta for text sentiment analysis , author=. Informatica , volume=
-
[17]
Comparative analysis of deep natural networks and large language models for aspect-based sentiment analysis , author=. Ieee Access , volume=. 2024 , publisher=
work page 2024
-
[18]
Applied Intelligence , volume=
An improved aspect-category sentiment analysis model for text sentiment analysis based on RoBERTa , author=. Applied Intelligence , volume=. 2021 , publisher=
work page 2021
-
[19]
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 , pages=
Aspect-based sentiment analysis of scientific reviews , author=. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 , pages=
work page 2020
-
[20]
Knowledge-based systems , volume=
Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks , author=. Knowledge-based systems , volume=. 2022 , publisher=
work page 2022
-
[21]
IEEE Transactions on Knowledge and Data Engineering , volume=
A survey on aspect-based sentiment analysis: Tasks, methods, and challenges , author=. IEEE Transactions on Knowledge and Data Engineering , volume=. 2022 , publisher=
work page 2022
-
[22]
Triple dimensional psychology knowledge encouraging graph attention networks to exploit aspect-based sentiment analysis , author=. Scientific Reports , volume=. 2025 , publisher=
work page 2025
-
[23]
Overview of the SIGHAN 2024 shared task for C hinese dimensional aspect-based sentiment analysis
Lee, Lung-Hao and Yu, Liang-Chih and Wang, Suge and Liao, Jian. Overview of the SIGHAN 2024 shared task for C hinese dimensional aspect-based sentiment analysis. Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10). 2024
work page 2024
-
[24]
DimStance: Multilingual Datasets for Dimensional Stance Analysis , author=. 2026 , eprint=
work page 2026
-
[25]
DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis , author=. 2026 , eprint=
work page 2026
-
[26]
A hybrid approach to dimensional aspect-based sentiment analysis using Bert and large language models , author=. Electronics , volume=. 2024 , publisher=
work page 2024
-
[27]
Optuna: A Next-generation Hyperparameter Optimization Framework , author=. 2019 , eprint=
work page 2019
-
[28]
Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM) , year =
VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text , author =. Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM) , year =
-
[29]
Medication Extraction and Entity Linking using Stacked and Voted Ensembles on LLM s
Romero, Pablo and Han, Lifeng and Nenadic, Goran. Medication Extraction and Entity Linking using Stacked and Voted Ensembles on LLM s. Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health). 2025. doi:10.18653/v1/2025.cl4health-1.26
-
[30]
Aspect -- Sentiment Quad Prediction with Distilled Large Language Models
Ventirozos, Filippos Karolos and Appleby, Peter and Shardlow, Matthew. Aspect -- Sentiment Quad Prediction with Distilled Large Language Models. Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era. 2025
work page 2025
-
[31]
Ventirozos, Filippos and Appleby, Peter A. and Shardlow, Matthew. Are You Sure You ' re Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis. Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025). 2025. doi:10.18653/v1/2025.realm-1.22
-
[32]
arXiv preprint arXiv:2508.13953 , year=
ReviewGraph: A Knowledge Graph Embedding Based Framework for Review Rating Prediction with Sentiment Features , author=. arXiv preprint arXiv:2508.13953 , year=
-
[33]
IEEE Transactions on Affective Computing , volume=
Issues and challenges of aspect-based sentiment analysis: A comprehensive survey , author=. IEEE Transactions on Affective Computing , volume=. 2020 , publisher=
work page 2020
- [34]
-
[35]
Publications Manual , year = "1983", publisher =
work page 1983
-
[36]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
- [37]
-
[38]
Dan Gusfield , title =. 1997
work page 1997
-
[39]
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
work page 2015
-
[40]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.