TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health
Pith reviewed 2026-05-21 06:06 UTC · model grok-4.3
The pith
TimeSRL routes time-series signals through language abstractions and RL tuning to generalize mental health predictions across datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TimeSRL is a two-stage LLM framework that abstracts raw signals into high-level natural language then predicts behavioral outcomes from these abstractions alone, optimized end-to-end using Group Relative Policy Optimization with Reinforcement Learning from Verifiable Rewards, achieving state-of-the-art performance on cross-cohort generalization benchmarks for mental health prediction.
What carries the argument
The semantic bottleneck that converts raw time-series into natural language abstractions before prediction, aligned end-to-end via RLVR to produce outcome-relevant descriptions.
If this is right
- The same abstractions support accurate prediction on unseen sensing pipelines without any target-domain fine-tuning.
- Cross-benchmark transfer performance approaches the level of within-domain training for both anxiety and depression tasks.
- Mean absolute error drops 3.1 to 10.1 percent versus strong non-LLM baselines and up to 57.6 percent versus prior LLM baselines under rigorous LOSO evaluation.
- Outcome-aligned abstractions learned via RLVR eliminate the need for gold-standard intermediate annotations during training.
Where Pith is reading between the lines
- The same semantic routing could apply to other longitudinal sensing tasks such as activity recognition or sleep staging where cohort shifts are common.
- If language abstractions prove reusable, new deployments might require far less labeled target data than current numeric models.
- The approach suggests a broader pattern: insert an explicit language layer between sensor streams and downstream models to improve robustness to distribution shift.
Load-bearing premise
High-level natural language abstractions of raw signals generalize better across datasets and sensing pipelines than models that operate directly on the numeric time series.
What would settle it
A new leave-one-dataset-out test where a direct numeric time-series model matches or beats TimeSRL on mean absolute error for anxiety or depression would show the semantic route does not deliver the claimed generalization gain.
Figures
read the original abstract
Longitudinal passive sensing enables continuous health prediction, yet models often fail under cross-dataset distribution shifts. Traditional ML overfits cohort-specific artifacts, while Large Language Models (LLMs) struggle to reason reliably over long, heterogeneous time-series. We introduce TimeSRL, a two-stage LLM framework that routes predictions through an explicit semantic bottleneck. The model first abstracts raw signals into high-level natural language, then predicts behavioral outcomes from these abstractions alone. This forces the model to reason over semantic concepts that we argue generalize better than raw numbers. We optimize this process end-to-end using Group Relative Policy Optimization (GRPO) with Reinforcement Learning from Verifiable Rewards (RLVR), learning outcome-aligned abstractions without gold intermediate annotations. Instantiated on mental-health prediction, TimeSRL achieves state-of-the-art performance on a benchmark designed to stress-test cross-cohort generalization under a rigorous leave-one-dataset-out (LOSO) protocol, reducing mean absolute error (MAE) over strong non-LLM ML and LLM baselines by 3.1--10.1% and 9.5--44.1% for anxiety, and 3.2--9.6% and 27.4--57.6% for depression (all $p$s<0.05). TimeSRL significantly outperforms prior methods in cross-benchmark transfer across different sensing pipelines, rivaling its own within-domain performance without target-domain fine-tuning. These results demonstrate that semantic abstractions are reusable and point to a new direction for generalizable behavior modeling via RL-tuned LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TimeSRL, a two-stage LLM framework for generalizable time-series behavioral modeling. Raw passive-sensing signals are first abstracted into high-level natural-language descriptions; predictions of behavioral outcomes (anxiety and depression scores) are then made exclusively from these abstractions. The abstraction and prediction stages are optimized end-to-end with Group Relative Policy Optimization (GRPO) under Reinforcement Learning from Verifiable Rewards (RLVR) that use only outcome-level supervision. The method is evaluated on a leave-one-dataset-out (LOSO) benchmark spanning multiple cohorts and sensing pipelines, claiming statistically significant MAE reductions of 3.1–10.1 % versus strong non-LLM ML baselines and 9.5–44.1 % versus prior LLM baselines for anxiety (analogous figures for depression), together with strong cross-benchmark transfer without target-domain fine-tuning.
Significance. If the central claim holds, the work demonstrates that explicit semantic natural-language bottlenecks can yield reusable abstractions that survive cross-cohort and cross-pipeline shifts better than direct numeric modeling, offering a concrete path for LLM-based longitudinal health prediction. The use of verifiable outcome rewards rather than fitted intermediate targets supplies external grounding, and the rigorous LOSO protocol is a methodological strength that directly addresses distribution-shift concerns common in passive-sensing studies.
major comments (2)
- [Experiments (LOSO results and ablations)] The central claim is that routing predictions through an explicit semantic natural-language abstraction produces reusable concepts that drive the reported LOSO gains. No ablation is presented that keeps the base LLM and the GRPO/RLVR procedure fixed while removing the semantic bottleneck (i.e., feeding raw numeric series directly to the prediction stage). Without this isolation, the observed 3.1–10.1 % and 9.5–44.1 % MAE reductions cannot be attributed specifically to the semantic abstraction rather than to LLM capacity or RL tuning effects alone.
- [Methods and Experimental Setup] Full details of baseline implementations, exact data-exclusion criteria, and the computation of error bars and p-values under the LOSO protocol are not provided. This prevents independent verification that the claimed improvements are free of post-hoc choices or implementation artifacts.
minor comments (2)
- [Abstract] The abstract reports improvement ranges (e.g., 3.1--10.1 %) without mapping each endpoint to a specific baseline; a table or explicit listing would improve clarity.
- [Notation and Methods] Notation for the semantic abstraction function and the precise reward formulation in RLVR should be defined once and used consistently across sections.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and for recognizing the potential significance of semantic bottlenecks in generalizable time-series modeling. We address each major comment below and commit to revisions that strengthen the manuscript.
read point-by-point responses
-
Referee: [Experiments (LOSO results and ablations)] The central claim is that routing predictions through an explicit semantic natural-language abstraction produces reusable concepts that drive the reported LOSO gains. No ablation is presented that keeps the base LLM and the GRPO/RLVR procedure fixed while removing the semantic bottleneck (i.e., feeding raw numeric series directly to the prediction stage). Without this isolation, the observed 3.1–10.1 % and 9.5–44.1 % MAE reductions cannot be attributed specifically to the semantic abstraction rather than to LLM capacity or RL tuning effects alone.
Authors: We agree that this specific ablation is necessary to isolate the contribution of the semantic natural-language bottleneck from LLM capacity and RL tuning effects. While the manuscript includes comparisons to non-LLM ML baselines (which operate directly on raw numeric features) and prior LLM baselines, it does not hold the base LLM and GRPO/RLVR procedure fixed while bypassing the abstraction stage. We will add this control experiment in the revision: raw numeric time series will be provided directly to the prediction-stage LLM under identical GRPO/RLVR optimization, allowing direct attribution of gains to the semantic abstraction. revision: yes
-
Referee: [Methods and Experimental Setup] Full details of baseline implementations, exact data-exclusion criteria, and the computation of error bars and p-values under the LOSO protocol are not provided. This prevents independent verification that the claimed improvements are free of post-hoc choices or implementation artifacts.
Authors: We acknowledge that these implementation details are essential for reproducibility and independent verification. The revised manuscript will include an expanded Methods section and a dedicated appendix providing: (i) exact hyperparameter settings and code-level descriptions for all baselines, (ii) precise data-exclusion criteria applied per cohort and sensing pipeline, and (iii) full specification of how error bars and p-values were computed under the LOSO protocol, including the statistical tests and multiple-comparison corrections used. revision: yes
Circularity Check
No circularity: derivation relies on external RLVR rewards and LOSO evaluation
full rationale
The paper's central mechanism routes time-series through an explicit semantic abstraction step, then optimizes the full pipeline end-to-end via GRPO with RLVR. Rewards are defined from verifiable outcome labels (anxiety/depression scores) rather than from the same numeric targets used in final evaluation. The LOSO protocol further separates training and test distributions across datasets and sensing pipelines. No equation or step reduces the claimed generalization advantage to a fitted parameter or self-referential definition inside the paper; the performance deltas are presented as empirical outcomes of this externally grounded optimization. No load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Semantic concepts extracted from raw time-series signals generalize better than raw numeric features across cohorts and sensing pipelines.
Reference graph
Works this paper leans on
-
[1]
early to bed and early to rise
Saeed Abdullah, Mark Matthews, Elizabeth L. Murnane, Geri Gay, and Tanzeem Choudhury. 2014. Towards circadian computing: "early to bed and early to rise" makes some of us unhealthy and sleep deprived. InProceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’14). Association for Computing Machinery, New Y...
-
[2]
Adler, Dror Ben-Zeev, Vincent W.-S
Daniel A. Adler, Dror Ben-Zeev, Vincent W.-S. Tseng, John M. Kane, Rachel Brian, Andrew T. Campbell, Marta Hauser, Emily A. Scherer, and Tanzeem Choudhury. 2020. Predicting Early Warning Signs of Psychotic Relapse From Passive Sensing Data: An Approach Using Encoder-Decoder Neural Networks.JMIR mHealth and uHealth8, 8 (Aug. 2020), e19962. doi:10.2196/19962
-
[3]
Daniel A. Adler, Fei Wang, David C. Mohr, and Tanzeem Choudhury. 2022. Machine learning for passive mental health symptom prediction: Generalization across different longitudinal mobile sensing studies.PLOS ONE17, 4 (April 2022), e0266516. doi:10.1371/journal.pone.0266516
-
[4]
Iftikhar Ahmed, Anushree Brahmacharimayum, Raja Hashim Ali, Talha Ali Khan, and Muhammad Ovais Ahmad. 2025. Explainable AI for Depression Detection and Severity Classification From Activity Data: Development and Evaluation Study of an Interpretable Framework.JMIR Mental Health 12, 1 (Sept. 2025), e72038. doi:10.2196/72038
-
[5]
Rebeka Amin, Simon Schreynemackers, Hannah Oppenheimer, Milica Petrovic, Ulrich Hegerl, and Hanna Reich. 2025. Use of Mobile Sensing Data for Longitudinal Monitoring and Prediction of Depression Severity: Systematic Review.Journal of Medical Internet Research27 (Aug. 2025), e57418. doi:10.2196/57418
-
[6]
Puyana, Ryan Kurtz, Tammy Chung, and Anind K
Sangwon Bae, Denzil Ferreira, Brian Suffoletto, Juan C. Puyana, Ryan Kurtz, Tammy Chung, and Anind K. Dey. 2017. Detecting Drinking Episodes in Young Adults Using Smartphone-based Sensors.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 2 (June 2017), 5:1–5:36. doi:10.1145/3090051 26 Fan et al
-
[7]
Andrey Bogomolov, Bruno Lepri, Michela Ferron, Fabio Pianesi, and Alex (Sandy) Pentland. 2014. Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits. InProceedings of the 22nd ACM international conference on Multimedia (MM ’14). Association for Computing Machinery, New York, NY, USA, 477–486. doi:10.1145/2647868.2654933
-
[8]
Borelli, Yuning Wang, Frances Haofei Li, Lyric N
Jessica L. Borelli, Yuning Wang, Frances Haofei Li, Lyric N. Russo, Marta Tironi, Ken Yamashita, Elayne Zhou, Jocelyn Lai, Brenda Nguyen, Iman Azimi, Christopher Marcotullio, Sina Labbaf, Salar Jafarlou, Nikil Dutt, and Amir Rahmani. 2025. Detection of Depressive Symptoms in College Students Using Multimodal Passive Sensing Data and Light Gradient Boostin...
-
[9]
Mehdi Boukhechba, Philip Chow, Karl Fua, Bethany A Teachman, and Laura E Barnes. 2018. Predicting Social Anxiety From Global Positioning System Traces of College Students: Feasibility Study.JMIR Mental Health5, 3 (July 2018), e10101. doi:10.2196/10101
-
[10]
Carrie J. Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 104:1–104:24. doi:10.1145/3359206
-
[11]
Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 1293–1304. doi:10.1145/2750...
-
[12]
Prerna Chikersal, Afsaneh Doryab, Michael Tumminia, Daniella K. Villalba, Janine M. Dutcher, Xinwen Liu, Sheldon Cohen, Kasey G. Creswell, Jennifer Mankoff, J. David Creswell, Mayank Goel, and Anind K. Dey. 2021. Detecting Depression and Predicting its Onset Using Longitudinal Symptoms Captured by Passive Sensing: A Machine Learning Approach With Robust F...
-
[13]
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
-
[14]
Afsaneh Doryab, Daniella K Villalba, Prerna Chikersal, Janine M Dutcher, Michael Tumminia, Xinwen Liu, Sheldon Cohen, Kasey Creswell, Jennifer Mankoff, John D Creswell, and Anind K Dey. 2019. Identifying Behavioral Phenotypes of Loneliness and Social Isolation with Passive Sensing: Statistical Analysis, Data Mining and Machine Learning of Smartphone and F...
-
[15]
Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, and Vikram Iyer. 2024. From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models.Proceedings of the ACM on Interactive, Mobile, Weara...
-
[16]
Yuang Fan, Jingping Nie, Xinghua Sun, and Xiaofan Jiang. 2024. Exploring foundation models in detecting concerning daily functioning in psychotherapeutic context based on images from smart home devices. In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys). IEEE, 44–49
work page 2024
-
[17]
Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, and Xin Liu. 2025. RADAR: Benchmarking Language Models on Imperfect Tabular Data. doi:...
-
[18]
Gabriella M. Harari, Nicholas D. Lane, Rui Wang, Benjamin S. Crosier, Andrew T. Campbell, and Samuel D. Gosling. 2016. Using Smartphones to Collect Behavioral Data in Psychological Science: Opportunities, Practical Considerations, and Challenges.Perspectives on Psychological Science: A Journal of the Association for Psychological Science11, 6 (Nov. 2016),...
-
[19]
A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A. Metwally, Brent Winslow, Yubin Kim, Kumar Ayush, Yuzhe Yang, Girish Narayanswamy, Maxwell A. Xu, Jake Garrison, Amy Armento Lee, Jenny Vafeiadou, Ben Graef, Isaac R. Galatzer-Levy, Erik Schenck, Andrew Barakat, J...
-
[20]
Karen Hovsepian, Mustafa al’Absi, Emre Ertin, Thomas Kamarck, Motohiro Nakajima, and Santosh Kumar. 2015. cStress: towards a gold standard for continuous stress assessment in the mobile environment. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New Yor...
-
[21]
Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, and Bashima Islam. 2025. LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors. doi:10.48550/arXiv.2406.14498 arXiv:2406.14498 [cs]
-
[22]
Natasha Jaques, Sara Taylor, Asaph Azaria, Asma Ghandeharioun, Akane Sano, and Rosalind Picard. 2015. Predicting students’ happiness from physiology, phone, mobility, and behavioral data.International Conference on Affective Computing and Intelligent Interaction and workshops : [proceedings]. ACII (Conference)2015 (Sept. 2015), 222–228. doi:10.1109/ACII.2...
-
[23]
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. 2024. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. doi:10.48550/arXiv.2310.01728 arXiv:2310.01728 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.01728 2024
-
[24]
James M. Joyce. 2011. Kullback-Leibler Divergence. InInternational Encyclopedia of Statistical Science, Miodrag Lovric (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 720–722. doi:10.1007/978-3-642-04898-2_327
-
[25]
Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, and Hae Won Park. 2024. Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data. doi:10.48550/arXiv.2401.06866 arXiv:2401.06866 [cs]
-
[26]
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. 2020. Concept Bottleneck Models. doi:10.48550/arXiv.2007.04612 arXiv:2007.04612 [cs]
-
[27]
K. Kroenke, R. L. Spitzer, and J. B. Williams. 2001. The PHQ-9: validity of a brief depression severity measure.Journal of General Internal Medicine 16, 9 (Sept. 2001), 606–613. doi:10.1046/j.1525-1497.2001.016009606.x
-
[28]
Kurt Kroenke, Robert L. Spitzer, Janet B. W. Williams, and Bernd Löwe. 2009. An ultra-brief screening scale for anxiety and depression: the PHQ-4. Psychosomatics50, 6 (2009), 613–621. doi:10.1176/appi.psy.50.6.613
-
[29]
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirzi...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2411.15124 2025
- [30]
-
[31]
Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, and Flora D. Salim. 2025. SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity Recognition. doi:10.48550/arXiv.2410.10624 arXiv:2410.10624 [cs]
-
[32]
Robert LiKamWa, Yunxin Liu, Nicholas D. Lane, and Lin Zhong. 2013. MoodScope: building a mood sensor from smartphone usage patterns. InProceeding of the 11th annual international conference on Mobile systems, applications, and services (MobiSys ’13). Association for Computing Machinery, New York, NY, USA, 389–402. doi:10.1145/2462456.2464449
-
[33]
Dante L. Mack, Alex W. DaSilva, Courtney Rogers, Elin Hedlund, Eilis I. Murphy, Vlado Vojdanovski, Jane Plomp, Weichen Wang, Subigya K. Nepal, Paul E. Holtzheimer, Dylan D. Wagner, Nicholas C. Jacobson, Meghan L. Meyer, Andrew T. Campbell, and Jeremy F. Huckins. 2021. Mental Health and Behavior of College Students During the COVID-19 Pandemic: Longitudina...
-
[34]
Lakmal Meegahapola, William Droz, Peter Kun, Amalia de Götzen, Chaitanya Nutakki, Shyam Diwakar, Salvador Ruiz Correa, Donglei Song, Hao Xu, Miriam Bidoglia, George Gaskell, Altangerel Chagnaa, Amarsanaa Ganbold, Tsolmon Zundui, Carlo Caprini, Daniele Miorandi, Alethia Hume, Jose Luis Zarza, Luca Cernuzzi, Ivano Bison, Marcelo Rodas Britez, Matteo Busso, ...
-
[35]
Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, and Xin Liu
-
[36]
Transforming wearable data into personal health insights using large language model agents.Nature Communications17, 1 (Jan. 2026), 1143. doi:10.1038/s41467-025-67922-y
-
[37]
Jun-Ki Min, Afsaneh Doryab, Jason Wiese, Shahriyar Amini, John Zimmerman, and Jason I. Hong. 2014. Toss ’n’ turn: smartphone as sleep and sleep quality detector. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). Association for Computing Machinery, New York, NY, USA, 477–486. doi:10.1145/2556288.2557220
-
[38]
Varun Mishra, Gunnar Pope, Sarah Lord, Stephanie Lewia, Byron Lowens, Kelly Caine, Sougata Sen, Ryan Halter, and David Kotz. 2020. Continuous Detection of Physiological Stress with Commodity Hardware.ACM Trans. Comput. Healthcare1, 2 (April 2020), 8:1–8:30. doi:10.1145/3361562
-
[39]
David C. Mohr, Mi Zhang, and Stephen M. Schueller. 2017. Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning.Annual Review of Clinical Psychology13 (May 2017), 23–47. doi:10.1146/annurev-clinpsy-032816-044949 28 Fan et al
-
[40]
Mohr, and Laura Pulkki- Råback
Isaac Moshe, Yannik Terhorst, Kennedy Opoku Asare, Lasse Bosse Sander, Denzil Ferreira, Harald Baumeister, David C. Mohr, and Laura Pulkki- Råback. 2021. Predicting Symptoms of Depression and Anxiety Using Smartphone and Wearable Data.Frontiers in Psychiatry12 (Jan. 2021). doi:10.3389/fpsyt.2021.625247 Publisher: Frontiers
-
[41]
Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. 2024. Scaling Wearable Foundation Models. doi:10.48550/arXiv.2410.13638 arXiv:2410.13638 [cs]
-
[42]
HUCKINS, COURTNEY ROGERS, MEGHAN L
SUBIGYA NEPAL, WENJUN LIU, ARVIND PILLAI, WEICHEN WANG, VLADO VOJDANOVSKI, JEREMY F. HUCKINS, COURTNEY ROGERS, MEGHAN L. MEYER, and ANDREW T. CAMPBELL. 2024. Capturing the College Experience: A Four-Year Mobile Sensing Study of Mental Health, Resilience and Behavior of College Students during the Pandemic.Proceedings of the ACM on interactive, mobile, wea...
-
[43]
HEINZ, ASHMITA KUNWAR, EUNSOL SOUL CHOI, XUHAI XU, JOANNA KUC, JEREMY F
SUBIGYA NEPAL, ARVIND PILLAI, WILLIAM CAMPBELL, TALIE MASSACHI, MICHAEL V. HEINZ, ASHMITA KUNWAR, EUNSOL SOUL CHOI, XUHAI XU, JOANNA KUC, JEREMY F. HUCKINS, JASON HOLDEN, SARAH M. PREUM, COLIN DEPP, NICHOLAS JACOBSON, MARY P. CZERWINSKI, ERIC GRANHOLM, and ANDREW T. CAMPBELL. 2024. MindScape Study: Integrating LLM and Behavioral Sensing for Personalized A...
-
[44]
Subigya Nepal, Arvind Pillai, Weichen Wang, Tess Griffin, Amanda C Collins, Michael Heinz, Damien Lekkas, Shayan Mirjafari, Matthew Nemesure, George Price, Nicholas Jacobson, and Andrew Campbell. 2024. MoodCapture: Depression Detection using In-the-Wild Smartphone Images. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI...
-
[45]
Jingping Nie, Yanchen Liu, Yigong Hu, Yuanyuting Wang, Stephen Xia, Matthias Preindl, and Xiaofan Jiang. 2021. SPIDERS+: A light-weight, wireless, and low-cost glasses-based wearable platform for emotion sensing and bio-signal acquisition.Pervasive and Mobile Computing75 (2021), 101424
work page 2021
-
[46]
Jingping Nie, Hanya (Vera) Shao, Yuang Fan, Qijia Shao, Haoxuan You, Matthias Preindl, and Xiaofan Jiang. 2025. LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices.ACM Trans. Comput. Healthcare(Jan. 2025). doi:10.1145/3712299 Just Accepted
-
[47]
Jingping Nie, Minghui Zhao, Stephen Xia, Xinghua Sun, Hanya Shao, Yuang Fan, Matthias Preindl, and Xiaofan Jiang. 2022. Ai therapist for daily functioning assessment and intervention using smart home devices. InProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems. 764–765
work page 2022
- [48]
-
[49]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human fee...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155 2022
-
[50]
Collins, Tess Griffin, Benjamin Buck, Sarah Masud Preum, Trevor Cohen, Nicholas C
Arvind Pillai, Subigya Kumar Nepal, Weichen Wang, Matthew Nemesure, Michael Heinz, George Price, Damien Lekkas, Amanda C. Collins, Tess Griffin, Benjamin Buck, Sarah Masud Preum, Trevor Cohen, Nicholas C. Jacobson, Dror Ben-Zeev, and Andrew Campbell. 2024. Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones.Proc....
-
[51]
Mashfiqui Rabbi, Min Hane Aung, Mi Zhang, and Tanzeem Choudhury. 2015. MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 707–718. d...
-
[52]
Yuri Rykov, Thuan-Quoc Thach, Iva Bojic, George Christopoulos, and Josip Car. 2021. Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling.JMIR mHealth and uHealth9, 10 (Oct. 2021), e24872. doi:10.2196/24872
-
[53]
Sohrab Saeb, Mi Zhang, Christopher J. Karr, Stephen M. Schueller, Marya E. Corden, Konrad P. Kording, and David C. Mohr. 2015. Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study.Journal of Medical Internet Research17, 7 (July 2015), e4273. doi:10.2196/jmir.4273
-
[54]
Akane Sano and Rosalind W. Picard. 2013. Stress Recognition Using Wearable Sensors and Mobile Phones. InProceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII ’13). IEEE Computer Society, USA, 671–676. doi:10.1109/ACII.2013.117
-
[55]
McHill, Andrew Jk Phillips, Laura K
Akane Sano, Sara Taylor, Andrew W. McHill, Andrew Jk Phillips, Laura K. Barger, Elizabeth Klerman, and Rosalind Picard. 2018. Identifying Objective Physiological Markers and Modifiable Behaviors for Self-Reported Stress and Mental Health Status Using Wearable Sensors and Mobile Phones: Observational Study.Journal of Medical Internet Research20, 6 (June 20...
-
[56]
Rachuri, Cecilia Mascolo, Peter J
Sandra Servia-Rodríguez, Kiran K. Rachuri, Cecilia Mascolo, Peter J. Rentfrow, Neal Lathia, and Gillian M. Sandstrom. 2017. Mobile Sensing at the Service of Mental Well-being: a Large-scale Longitudinal Study. InProceedings of the 26th International Conference on World Wide Web (WWW ’17). International World Wide Web Conferences Steering Committee, Republ...
-
[57]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. doi:10.48550/arXiv.2402.03300 arXiv:2402.03300 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
- [58]
-
[59]
Spitzer, Kurt Kroenke, Janet B
Robert L. Spitzer, Kurt Kroenke, Janet B. W. Williams, and Bernd Löwe. 2006. A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine166, 10 (May 2006), 1092–1097. doi:10.1001/archinte.166.10.1092
-
[60]
Shaoxiong Sun, Amos A. Folarin, Yuezhou Zhang, Nicholas Cummins, Rafael Garcia-Dias, Callum Stewart, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Petroula Laiou, Heet Sankesara, Faith Matcham, Daniel Leightley, Katie M. White, Carolin Oetzmann, Alina Ivan, Femke Lamers, Sara Siddi, Sara Simblett, Raluca Nica, Aki Rintala, David C. Mohr, Inez Myin-Ge...
-
[61]
Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Ye Tian, Xiaoyuan Ren, Zihao Wang, Onat Gungor, Xiaofan Yu, and Tajana Rosing. 2025. DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs. doi:10.48550/arXiv.2507.13737 arXiv:2507.13737 [cs] version: 1
-
[63]
Tseng, Akane Sano, Dror Ben-Zeev, Rachel Brian, Andrew T
Vincent W.-S. Tseng, Akane Sano, Dror Ben-Zeev, Rachel Brian, Andrew T. Campbell, Marta Hauser, John M. Kane, Emily A. Scherer, Rui Wang, Weichen Wang, Hongyi Wen, and Tanzeem Choudhury. 2020. Using behavioral rhythms and multi-task learning to predict fine-grained symptoms of schizophrenia.Scientific Reports10, 1 (Sept. 2020), 15100. doi:10.1038/s41598-0...
-
[64]
Rui Wang, Min S. H. Aung, Saeed Abdullah, Rachel Brian, Andrew T. Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Michael Merrill, Emily A. Scherer, Vincent W. S. Tseng, and Dror Ben-Zeev. 2016. CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia. InProceedings of the 2016 ACM International Joint Co...
-
[65]
Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. 2014. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. InProceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computin...
-
[66]
Epstein, An Ping, James Fogarty, and Sean A
Rui Wang, Gabriella Harari, Peilin Hao, Xia Zhou, and Andrew T. Campbell. 2015. SmartGPA: how smartphones can assess and predict academic performance of college students. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 295–306. doi:10....
-
[67]
Rui Wang, Weichen Wang, Min S. H. Aung, Dror Ben-Zeev, Rachel Brian, Andrew T. Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Emily A. Scherer, and Megan Walsh. 2017. Predicting Symptom Trajectories of Schizophrenia using Mobile Sensing.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 3 (Sept. 2017), 110:1–110:24. doi:10.1145/3130976
-
[68]
Rui Wang, Weichen Wang, Alex daSilva, Jeremy F. Huckins, William M. Kelley, Todd F. Heatherton, and Andrew T. Campbell. 2018. Tracking Depression Dynamics in College Students Using Mobile Phone and Wearable Sensing.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.2, 1 (March 2018), 43:1–43:26. doi:10.1145/3191775
-
[69]
Xumeng Wen, Zihan Liu, Shun Zheng, Shengyu Ye, Zhirong Wu, Yang Wang, Zhijian Xu, Xiao Liang, Junjie Li, Ziming Miao, Jiang Bian, and Mao Yang. 2025. Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs. doi:10.48550/arXiv.2506.14245 arXiv:2506.14245 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.14245 2025
-
[70]
Wuyue Xia, Hanya Shao, Ningxin Kong, Yuang Fan, and Jingping Nie. 2025. The Convergence of Mental Health and AI: A Cross-Disciplinary Survey of Ubiquitous Sensing, LLMs, and Clinical Alignment. doi:10.36227/techrxiv.176521329.92810310/v1
-
[71]
Xuhai Xu, Prerna Chikersal, Afsaneh Doryab, Daniella K. Villalba, Janine M. Dutcher, Michael J. Tumminia, Tim Althoff, Sheldon Cohen, Kasey G. Creswell, J. David Creswell, Jennifer Mankoff, and Anind K. Dey. 2019. Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College Students.Proc. ACM Interact. Mob. Wearabl...
-
[72]
Xuhai Xu, Prerna Chikersal, Janine M. Dutcher, Yasaman S. Sefidgar, Woosuk Seo, Michael J. Tumminia, Daniella K. Villalba, Sheldon Cohen, Kasey G. Creswell, J. David Creswell, Afsaneh Doryab, Paula S. Nurius, Eve Riskin, Anind K. Dey, and Jennifer Mankoff. 2021. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depress...
-
[73]
Xuhai Xu, Xin Liu, Han Zhang, Weichen Wang, Subigya Nepal, Yasaman Sefidgar, Woosuk Seo, Kevin S. Kuehn, Jeremy F. Huckins, Margaret E. Morris, Paula S. Nurius, Eve A. Riskin, Shwetak Patel, Tim Althoff, Andrew Campbell, Anind K. Dey, and Jennifer Mankoff. 2023. GLOBEM: Cross-Dataset Generalization of Longitudinal Human Behavior Modeling.Proc. ACM Interac...
-
[74]
Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K. Dey, and Dakuo Wang. 2024. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.8, 1 (March 2024), 31:1–31:32. doi:10.1145/3643540
-
[75]
Morris, Eve Riskin, Jennifer Mankoff, and Anind K
Xuhai Xu, Han Zhang, Yasaman Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, Jennifer Brown, Kevin Kuehn, Mike Merrill, Paula Nurius, Shwetak Patel, Tim Althoff, Margaret E. Morris, Eve Riskin, Jennifer Mankoff, and Anind K. Dey. 2023. GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization. arXiv:2211.02733 [cs.LG] https://ar...
- [76]
-
[77]
Yuzhe Yang, Yuan Yuan, Guo Zhang, Hao Wang, Ying-Cong Chen, Yingcheng Liu, Christopher G Tarolli, Daniel Crepeau, Jan Bukartyk, Mithri R Junna, et al. 2022. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals.Nature Medicine 28, 10 (2022), 2207–2215
work page 2022
-
[78]
Tianyi Zhang, Miu Kojima, and Simon D’Alfonso. 2024. AWARE Narrator and the Utilization of Large Language Models to Extract Behavioral Insights from Smartphone Sensing Data. doi:10.48550/arXiv.2411.04691 arXiv:2411.04691 [cs]
-
[79]
Ali Heydari, Girish Narayanswamy, Maxwell A
Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, and Yuzhe Yang. 2025. SensorLM: Learning the Language of Wearable Sensors. doi:10.48550/a...
-
[80]
Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Hong Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wen- meng Zhou, and Yingda Chen. 2025. SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning. doi:10.48550/arXiv.2408.05517 arXiv:2408.05517 [cs] version: 4
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.