TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health

Jingping Nie; Lilin Xu; Millie Wu; Qingyu Chen; Subigya Nepal; Xiaofan Jiang; Xin Liu; Xuhai "Orson" Xu; Yuang Fan; Yuzhe Yang

arxiv: 2605.21295 · v1 · pith:OCQWWXCUnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.HC

TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health

Yuang Fan , Lilin Xu , Millie Wu , Jingping Nie , Qingyu Chen , Yuzhe Yang , Zhuo Zhang , Xin Liu

show 3 more authors

Subigya Nepal Xiaofan Jiang Xuhai "Orson" Xu

This is my paper

Pith reviewed 2026-05-21 06:06 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.HC

keywords time-series modelinglarge language modelsreinforcement learningmental health predictionsemantic abstractionsgeneralizationpassive sensingcross-dataset transfer

0 comments

The pith

TimeSRL routes time-series signals through language abstractions and RL tuning to generalize mental health predictions across datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TimeSRL as a two-stage process where an LLM first turns raw passive sensing data into high-level natural language descriptions of behavior. A second stage then predicts anxiety or depression scores from those descriptions alone. Training uses Group Relative Policy Optimization with reinforcement learning from verifiable rewards to shape the abstractions without needing labeled intermediate steps. On benchmarks that hold out entire datasets for testing, this yields lower mean absolute errors than both traditional machine learning and other LLM baselines. The results show the abstractions transfer to new sensing setups without additional fine-tuning on the target data.

Core claim

TimeSRL is a two-stage LLM framework that abstracts raw signals into high-level natural language then predicts behavioral outcomes from these abstractions alone, optimized end-to-end using Group Relative Policy Optimization with Reinforcement Learning from Verifiable Rewards, achieving state-of-the-art performance on cross-cohort generalization benchmarks for mental health prediction.

What carries the argument

The semantic bottleneck that converts raw time-series into natural language abstractions before prediction, aligned end-to-end via RLVR to produce outcome-relevant descriptions.

If this is right

The same abstractions support accurate prediction on unseen sensing pipelines without any target-domain fine-tuning.
Cross-benchmark transfer performance approaches the level of within-domain training for both anxiety and depression tasks.
Mean absolute error drops 3.1 to 10.1 percent versus strong non-LLM baselines and up to 57.6 percent versus prior LLM baselines under rigorous LOSO evaluation.
Outcome-aligned abstractions learned via RLVR eliminate the need for gold-standard intermediate annotations during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same semantic routing could apply to other longitudinal sensing tasks such as activity recognition or sleep staging where cohort shifts are common.
If language abstractions prove reusable, new deployments might require far less labeled target data than current numeric models.
The approach suggests a broader pattern: insert an explicit language layer between sensor streams and downstream models to improve robustness to distribution shift.

Load-bearing premise

High-level natural language abstractions of raw signals generalize better across datasets and sensing pipelines than models that operate directly on the numeric time series.

What would settle it

A new leave-one-dataset-out test where a direct numeric time-series model matches or beats TimeSRL on mean absolute error for anxiety or depression would show the semantic route does not deliver the claimed generalization gain.

Figures

Figures reproduced from arXiv: 2605.21295 by Jingping Nie, Lilin Xu, Millie Wu, Qingyu Chen, Subigya Nepal, Xiaofan Jiang, Xin Liu, Xuhai "Orson" Xu, Yuang Fan, Yuzhe Yang, Zhuo Zhang.

**Figure 1.** Figure 1: Overview of TimeSRL, a two-stage LLM framework for robust longitudinal behavioral time-series modeling, instantiated on behavioral health prediction. While traditional ML models overfit numerical regularities and direct-prediction LLMs struggle with long numeric trajectories, TimeSRL addresses these distribution shift challenges by routing inference through an explicit semantic bottleneck. In Stage 1, it a… view at source ↗

**Figure 2.** Figure 2: The Two-Stage GRPO Tuning Pipeline for TimeSRL. The proposed architecture uses the same model for both stages. In Stage 1, the TimeSRL-LLM is given a prompt with behavioral data to examine the numerical data and summarize the findings. The model leverages an explicit reasoning process to generate #K semantic abstracted summaries. Next, only the generated summaries are extracted, passing through the semanti… view at source ↗

**Figure 3.** Figure 3: Example of the Two Stage Prompting used in the task of mental-health prediction. Starting from 14 days of tabular multi-variate time-series behavioral data, TimeSRL first constructs a semantic abstraction prompt in Stage 1 by organizing the data into a structured template, translating system feature names into descriptive labels, and converting raw sensor units into interpretable formats. This prompt guide… view at source ↗

**Figure 4.** Figure 4: LOSO MAE on GLOBEM and College Experience anxiety and depression prediction. Bars denote mean MAE and error bars denote 95% percentile bootstrap SE. Star annotations report paired bootstrap significance tests versus TimeSRL, indicating significantly higher MAE than TimeSRL, and significance levels are marked as ∗𝑝 < 0.05, ∗∗𝑝 < 0.01, and ∗∗∗𝑝 < 0.001. Across both datasets and tasks, TimeSRL maintains top-t… view at source ↗

**Figure 5.** Figure 5: MAE reduction from TimeSRL tuning across four LLM backbones on GLOBEM (LOSO). Bars compare direct prompting against the TimeSRL-tuned variant for anxiety and depression; error bars denote standard error. TimeSRL consistently improves every backbone, with relative MAE reductions of 38.4–61.6% across all backbone–task combinations. tested models, the tuned version consistently outperforms direct prompting on… view at source ↗

**Figure 6.** Figure 6: Cross-benchmark (Cross-BM) transfer results on GLOBEM and College Experience. Each panel evaluates transfer in one target benchmark split after training on the other benchmark; the rightmost bar shows the within-benchmark (In-BM) TimeSRL reference for the same target study. Stars denote statistical significance vs. the Cross-BM reference (paired bootstrap; 𝑝 < 0.05, 𝑝 < 0.01, 𝑝 < 0.001; n.s. = not signific… view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of intermediate summaries on a 14-day window. Model-generated summaries and predictions are presented alongside highlighted framing sentences (colored text) and annotated compression/preservation patterns (shaded boxes). Both untuned two-stage baselines (GPT-5.0, Qwen3-4B) compress the trajectory into a predominantly concern-heavy narrative, emphasizing salient irregularities while u… view at source ↗

**Figure 8.** Figure 8: Qualitative comparison on a 14-day CollegeExperience DS3 sample (gold anxiety score = 3). Model-generated summaries and predictions are presented alongside highlighted framing sentences (colored text) and annotated compression/preservation patterns (shaded boxes). Untuned two-stage baselines misconstrue localized irregularities (e.g., the D5/D6 sleep swing and D14 crash) as a pervasive anxiety pattern, re… view at source ↗

**Figure 9.** Figure 9: Qualitative comparison on a 14-day CollegeExperience DS2 depression sample (gold depression score = 2). Modelgenerated summaries and predictions are presented alongside highlighted framing sentences (colored text) and annotated compression/preservation patterns (shaded boxes). The untuned two-stage baselines compress the trajectory into a global decline-andwithdrawal narrative — GPT-5.0 hedges on confou… view at source ↗

read the original abstract

Longitudinal passive sensing enables continuous health prediction, yet models often fail under cross-dataset distribution shifts. Traditional ML overfits cohort-specific artifacts, while Large Language Models (LLMs) struggle to reason reliably over long, heterogeneous time-series. We introduce TimeSRL, a two-stage LLM framework that routes predictions through an explicit semantic bottleneck. The model first abstracts raw signals into high-level natural language, then predicts behavioral outcomes from these abstractions alone. This forces the model to reason over semantic concepts that we argue generalize better than raw numbers. We optimize this process end-to-end using Group Relative Policy Optimization (GRPO) with Reinforcement Learning from Verifiable Rewards (RLVR), learning outcome-aligned abstractions without gold intermediate annotations. Instantiated on mental-health prediction, TimeSRL achieves state-of-the-art performance on a benchmark designed to stress-test cross-cohort generalization under a rigorous leave-one-dataset-out (LOSO) protocol, reducing mean absolute error (MAE) over strong non-LLM ML and LLM baselines by 3.1--10.1% and 9.5--44.1% for anxiety, and 3.2--9.6% and 27.4--57.6% for depression (all $p$s<0.05). TimeSRL significantly outperforms prior methods in cross-benchmark transfer across different sensing pipelines, rivaling its own within-domain performance without target-domain fine-tuning. These results demonstrate that semantic abstractions are reusable and point to a new direction for generalizable behavior modeling via RL-tuned LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TimeSRL gets measurable LOSO gains on mental health time-series by routing through language abstractions and tuning with GRPO, but the semantic mechanism still needs direct isolation.

read the letter

The main takeaway is that this paper shows a two-stage LLM setup can improve cross-cohort performance on anxiety and depression prediction from passive sensing data. It abstracts raw signals into natural language first, then predicts from those abstractions alone, and tunes the whole thing end-to-end with Group Relative Policy Optimization using verifiable outcome rewards. The LOSO results and the cross-pipeline transfer without target fine-tuning are the concrete claims worth noting.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TimeSRL, a two-stage LLM framework for generalizable time-series behavioral modeling. Raw passive-sensing signals are first abstracted into high-level natural-language descriptions; predictions of behavioral outcomes (anxiety and depression scores) are then made exclusively from these abstractions. The abstraction and prediction stages are optimized end-to-end with Group Relative Policy Optimization (GRPO) under Reinforcement Learning from Verifiable Rewards (RLVR) that use only outcome-level supervision. The method is evaluated on a leave-one-dataset-out (LOSO) benchmark spanning multiple cohorts and sensing pipelines, claiming statistically significant MAE reductions of 3.1–10.1 % versus strong non-LLM ML baselines and 9.5–44.1 % versus prior LLM baselines for anxiety (analogous figures for depression), together with strong cross-benchmark transfer without target-domain fine-tuning.

Significance. If the central claim holds, the work demonstrates that explicit semantic natural-language bottlenecks can yield reusable abstractions that survive cross-cohort and cross-pipeline shifts better than direct numeric modeling, offering a concrete path for LLM-based longitudinal health prediction. The use of verifiable outcome rewards rather than fitted intermediate targets supplies external grounding, and the rigorous LOSO protocol is a methodological strength that directly addresses distribution-shift concerns common in passive-sensing studies.

major comments (2)

[Experiments (LOSO results and ablations)] The central claim is that routing predictions through an explicit semantic natural-language abstraction produces reusable concepts that drive the reported LOSO gains. No ablation is presented that keeps the base LLM and the GRPO/RLVR procedure fixed while removing the semantic bottleneck (i.e., feeding raw numeric series directly to the prediction stage). Without this isolation, the observed 3.1–10.1 % and 9.5–44.1 % MAE reductions cannot be attributed specifically to the semantic abstraction rather than to LLM capacity or RL tuning effects alone.
[Methods and Experimental Setup] Full details of baseline implementations, exact data-exclusion criteria, and the computation of error bars and p-values under the LOSO protocol are not provided. This prevents independent verification that the claimed improvements are free of post-hoc choices or implementation artifacts.

minor comments (2)

[Abstract] The abstract reports improvement ranges (e.g., 3.1--10.1 %) without mapping each endpoint to a specific baseline; a table or explicit listing would improve clarity.
[Notation and Methods] Notation for the semantic abstraction function and the precise reward formulation in RLVR should be defined once and used consistently across sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the potential significance of semantic bottlenecks in generalizable time-series modeling. We address each major comment below and commit to revisions that strengthen the manuscript.

read point-by-point responses

Referee: [Experiments (LOSO results and ablations)] The central claim is that routing predictions through an explicit semantic natural-language abstraction produces reusable concepts that drive the reported LOSO gains. No ablation is presented that keeps the base LLM and the GRPO/RLVR procedure fixed while removing the semantic bottleneck (i.e., feeding raw numeric series directly to the prediction stage). Without this isolation, the observed 3.1–10.1 % and 9.5–44.1 % MAE reductions cannot be attributed specifically to the semantic abstraction rather than to LLM capacity or RL tuning effects alone.

Authors: We agree that this specific ablation is necessary to isolate the contribution of the semantic natural-language bottleneck from LLM capacity and RL tuning effects. While the manuscript includes comparisons to non-LLM ML baselines (which operate directly on raw numeric features) and prior LLM baselines, it does not hold the base LLM and GRPO/RLVR procedure fixed while bypassing the abstraction stage. We will add this control experiment in the revision: raw numeric time series will be provided directly to the prediction-stage LLM under identical GRPO/RLVR optimization, allowing direct attribution of gains to the semantic abstraction. revision: yes
Referee: [Methods and Experimental Setup] Full details of baseline implementations, exact data-exclusion criteria, and the computation of error bars and p-values under the LOSO protocol are not provided. This prevents independent verification that the claimed improvements are free of post-hoc choices or implementation artifacts.

Authors: We acknowledge that these implementation details are essential for reproducibility and independent verification. The revised manuscript will include an expanded Methods section and a dedicated appendix providing: (i) exact hyperparameter settings and code-level descriptions for all baselines, (ii) precise data-exclusion criteria applied per cohort and sensing pipeline, and (iii) full specification of how error bars and p-values were computed under the LOSO protocol, including the statistical tests and multiple-comparison corrections used. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external RLVR rewards and LOSO evaluation

full rationale

The paper's central mechanism routes time-series through an explicit semantic abstraction step, then optimizes the full pipeline end-to-end via GRPO with RLVR. Rewards are defined from verifiable outcome labels (anxiety/depression scores) rather than from the same numeric targets used in final evaluation. The LOSO protocol further separates training and test distributions across datasets and sensing pipelines. No equation or step reduces the claimed generalization advantage to a fitted parameter or self-referential definition inside the paper; the performance deltas are presented as empirical outcomes of this externally grounded optimization. No load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested premise that semantic language abstractions are inherently more reusable across distributions than raw numeric features; this is stated as an argument rather than derived from prior evidence.

axioms (1)

domain assumption Semantic concepts extracted from raw time-series signals generalize better than raw numeric features across cohorts and sensing pipelines.
Explicitly argued in the abstract as the reason the two-stage design should outperform direct numeric models.

pith-pipeline@v0.9.0 · 5863 in / 1368 out tokens · 55332 ms · 2026-05-21T06:06:45.550200+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 7 internal anchors

[1]

early to bed and early to rise

Saeed Abdullah, Mark Matthews, Elizabeth L. Murnane, Geri Gay, and Tanzeem Choudhury. 2014. Towards circadian computing: "early to bed and early to rise" makes some of us unhealthy and sleep deprived. InProceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’14). Association for Computing Machinery, New Y...

work page doi:10.1145/2632048.2632100 2014
[2]

Adler, Dror Ben-Zeev, Vincent W.-S

Daniel A. Adler, Dror Ben-Zeev, Vincent W.-S. Tseng, John M. Kane, Rachel Brian, Andrew T. Campbell, Marta Hauser, Emily A. Scherer, and Tanzeem Choudhury. 2020. Predicting Early Warning Signs of Psychotic Relapse From Passive Sensing Data: An Approach Using Encoder-Decoder Neural Networks.JMIR mHealth and uHealth8, 8 (Aug. 2020), e19962. doi:10.2196/19962

work page doi:10.2196/19962 2020
[3]

Adler, Fei Wang, David C

Daniel A. Adler, Fei Wang, David C. Mohr, and Tanzeem Choudhury. 2022. Machine learning for passive mental health symptom prediction: Generalization across different longitudinal mobile sensing studies.PLOS ONE17, 4 (April 2022), e0266516. doi:10.1371/journal.pone.0266516

work page doi:10.1371/journal.pone.0266516 2022
[4]

Iftikhar Ahmed, Anushree Brahmacharimayum, Raja Hashim Ali, Talha Ali Khan, and Muhammad Ovais Ahmad. 2025. Explainable AI for Depression Detection and Severity Classification From Activity Data: Development and Evaluation Study of an Interpretable Framework.JMIR Mental Health 12, 1 (Sept. 2025), e72038. doi:10.2196/72038

work page doi:10.2196/72038 2025
[5]

Rebeka Amin, Simon Schreynemackers, Hannah Oppenheimer, Milica Petrovic, Ulrich Hegerl, and Hanna Reich. 2025. Use of Mobile Sensing Data for Longitudinal Monitoring and Prediction of Depression Severity: Systematic Review.Journal of Medical Internet Research27 (Aug. 2025), e57418. doi:10.2196/57418

work page doi:10.2196/57418 2025
[6]

Puyana, Ryan Kurtz, Tammy Chung, and Anind K

Sangwon Bae, Denzil Ferreira, Brian Suffoletto, Juan C. Puyana, Ryan Kurtz, Tammy Chung, and Anind K. Dey. 2017. Detecting Drinking Episodes in Young Adults Using Smartphone-based Sensors.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 2 (June 2017), 5:1–5:36. doi:10.1145/3090051 26 Fan et al

work page doi:10.1145/3090051 2017
[7]

Andrey Bogomolov, Bruno Lepri, Michela Ferron, Fabio Pianesi, and Alex (Sandy) Pentland. 2014. Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits. InProceedings of the 22nd ACM international conference on Multimedia (MM ’14). Association for Computing Machinery, New York, NY, USA, 477–486. doi:10.1145/2647868.2654933

work page doi:10.1145/2647868.2654933 2014
[8]

Borelli, Yuning Wang, Frances Haofei Li, Lyric N

Jessica L. Borelli, Yuning Wang, Frances Haofei Li, Lyric N. Russo, Marta Tironi, Ken Yamashita, Elayne Zhou, Jocelyn Lai, Brenda Nguyen, Iman Azimi, Christopher Marcotullio, Sina Labbaf, Salar Jafarlou, Nikil Dutt, and Amir Rahmani. 2025. Detection of Depressive Symptoms in College Students Using Multimodal Passive Sensing Data and Light Gradient Boostin...

work page doi:10.2196/67964 2025
[9]

Mehdi Boukhechba, Philip Chow, Karl Fua, Bethany A Teachman, and Laura E Barnes. 2018. Predicting Social Anxiety From Global Positioning System Traces of College Students: Feasibility Study.JMIR Mental Health5, 3 (July 2018), e10101. doi:10.2196/10101

work page doi:10.2196/10101 2018
[10]

Hello AI

Carrie J. Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 104:1–104:24. doi:10.1145/3359206

work page doi:10.1145/3359206 2019
[11]

Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 1293–1304. doi:10.1145/2750...

work page doi:10.1145/2750858.2805845 2015
[12]

Villalba, Janine M

Prerna Chikersal, Afsaneh Doryab, Michael Tumminia, Daniella K. Villalba, Janine M. Dutcher, Xinwen Liu, Sheldon Cohen, Kasey G. Creswell, Jennifer Mankoff, J. David Creswell, Mayank Goel, and Anind K. Dey. 2021. Detecting Depression and Predicting its Onset Using Longitudinal Symptoms Captured by Passive Sensing: A Machine Learning Approach With Robust F...

work page doi:10.1145/3422821 2021
[13]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[14]

Afsaneh Doryab, Daniella K Villalba, Prerna Chikersal, Janine M Dutcher, Michael Tumminia, Xinwen Liu, Sheldon Cohen, Kasey Creswell, Jennifer Mankoff, John D Creswell, and Anind K Dey. 2019. Identifying Behavioral Phenotypes of Loneliness and Social Isolation with Passive Sensing: Statistical Analysis, Data Mining and Machine Learning of Smartphone and F...

work page doi:10.2196/13209 2019
[15]

Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, and Vikram Iyer

Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, and Vikram Iyer. 2024. From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models.Proceedings of the ACM on Interactive, Mobile, Weara...

work page doi:10.1145/3659604 2024
[16]

Yuang Fan, Jingping Nie, Xinghua Sun, and Xiaofan Jiang. 2024. Exploring foundation models in detecting concerning daily functioning in psychotherapeutic context based on images from smart home devices. In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys). IEEE, 44–49

work page 2024
[17]

Ali Heydari, Maxwell A

Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, and Xin Liu. 2025. RADAR: Benchmarking Language Models on Imperfect Tabular Data. doi:...

work page doi:10.48550/arxiv.2506.08249 2025
[18]

Harari, Nicholas D

Gabriella M. Harari, Nicholas D. Lane, Rui Wang, Benjamin S. Crosier, Andrew T. Campbell, and Samuel D. Gosling. 2016. Using Smartphones to Collect Behavioral Data in Psychological Science: Opportunities, Practical Considerations, and Challenges.Perspectives on Psychological Science: A Journal of the Association for Psychological Science11, 6 (Nov. 2016),...

work page doi:10.1177/1745691616650285 2016
[19]

Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A

A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A. Metwally, Brent Winslow, Yubin Kim, Kumar Ayush, Yuzhe Yang, Girish Narayanswamy, Maxwell A. Xu, Jake Garrison, Amy Armento Lee, Jenny Vafeiadou, Ben Graef, Isaac R. Galatzer-Levy, Erik Schenck, Andrew Barakat, J...

work page doi:10.48550/arxiv.2508.20148 2025
[20]

Karen Hovsepian, Mustafa al’Absi, Emre Ertin, Thomas Kamarck, Motohiro Nakajima, and Santosh Kumar. 2015. cStress: towards a gold standard for continuous stress assessment in the mobile environment. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New Yor...

work page doi:10.1145/2750858.2807526 2015
[21]

Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, and Bashima Islam. 2025. LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors. doi:10.48550/arXiv.2406.14498 arXiv:2406.14498 [cs]

work page doi:10.48550/arxiv.2406.14498 2025
[22]

Natasha Jaques, Sara Taylor, Asaph Azaria, Asma Ghandeharioun, Akane Sano, and Rosalind Picard. 2015. Predicting students’ happiness from physiology, phone, mobility, and behavioral data.International Conference on Affective Computing and Intelligent Interaction and workshops : [proceedings]. ACII (Conference)2015 (Sept. 2015), 222–228. doi:10.1109/ACII.2...

work page doi:10.1109/acii.2015.7344575 2015
[23]

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. 2024. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. doi:10.48550/arXiv.2310.01728 arXiv:2310.01728 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.01728 2024
[24]

James M. Joyce. 2011. Kullback-Leibler Divergence. InInternational Encyclopedia of Statistical Science, Miodrag Lovric (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 720–722. doi:10.1007/978-3-642-04898-2_327

work page doi:10.1007/978-3-642-04898-2_327 2011
[25]

Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, and Hae Won Park. 2024. Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data. doi:10.48550/arXiv.2401.06866 arXiv:2401.06866 [cs]

work page doi:10.48550/arxiv.2401.06866 2024
[26]

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. 2020. Concept Bottleneck Models. doi:10.48550/arXiv.2007.04612 arXiv:2007.04612 [cs]

work page doi:10.48550/arxiv.2007.04612 2020
[27]

Kroenke, R

K. Kroenke, R. L. Spitzer, and J. B. Williams. 2001. The PHQ-9: validity of a brief depression severity measure.Journal of General Internal Medicine 16, 9 (Sept. 2001), 606–613. doi:10.1046/j.1525-1497.2001.016009606.x

work page doi:10.1046/j.1525-1497.2001.016009606.x 2001
[28]

Spitzer, Janet B

Kurt Kroenke, Robert L. Spitzer, Janet B. W. Williams, and Bernd Löwe. 2009. An ultra-brief screening scale for anxiety and depression: the PHQ-4. Psychosomatics50, 6 (2009), 613–621. doi:10.1176/appi.psy.50.6.613

work page doi:10.1176/appi.psy.50.6.613 2009
[29]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirzi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2411.15124 2025
[30]

Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, and Yuzhe Yang. 2026. HEARTS: Benchmarking LLM Reasoning on Health Time Series.arXiv preprint arXiv:2603.06638(2026)

work page arXiv 2026
[31]

Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, and Flora D. Salim. 2025. SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity Recognition. doi:10.48550/arXiv.2410.10624 arXiv:2410.10624 [cs]

work page doi:10.48550/arxiv.2410.10624 2025
[32]

Lane, and Lin Zhong

Robert LiKamWa, Yunxin Liu, Nicholas D. Lane, and Lin Zhong. 2013. MoodScope: building a mood sensor from smartphone usage patterns. InProceeding of the 11th annual international conference on Mobile systems, applications, and services (MobiSys ’13). Association for Computing Machinery, New York, NY, USA, 389–402. doi:10.1145/2462456.2464449

work page doi:10.1145/2462456.2464449 2013
[33]

Mack, Alex W

Dante L. Mack, Alex W. DaSilva, Courtney Rogers, Elin Hedlund, Eilis I. Murphy, Vlado Vojdanovski, Jane Plomp, Weichen Wang, Subigya K. Nepal, Paul E. Holtzheimer, Dylan D. Wagner, Nicholas C. Jacobson, Meghan L. Meyer, Andrew T. Campbell, and Jeremy F. Huckins. 2021. Mental Health and Behavior of College Students During the COVID-19 Pandemic: Longitudina...

work page doi:10.2196/28892 2021
[34]

Lakmal Meegahapola, William Droz, Peter Kun, Amalia de Götzen, Chaitanya Nutakki, Shyam Diwakar, Salvador Ruiz Correa, Donglei Song, Hao Xu, Miriam Bidoglia, George Gaskell, Altangerel Chagnaa, Amarsanaa Ganbold, Tsolmon Zundui, Carlo Caprini, Daniele Miorandi, Alethia Hume, Jose Luis Zarza, Luca Cernuzzi, Ivano Bison, Marcelo Rodas Britez, Matteo Busso, ...

work page doi:10.1145/3569483 2023
[35]

Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y

Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, and Xin Liu

work page
[36]

2026), 1143

Transforming wearable data into personal health insights using large language model agents.Nature Communications17, 1 (Jan. 2026), 1143. doi:10.1038/s41467-025-67922-y

work page doi:10.1038/s41467-025-67922-y 2026
[37]

Jun-Ki Min, Afsaneh Doryab, Jason Wiese, Shahriyar Amini, John Zimmerman, and Jason I. Hong. 2014. Toss ’n’ turn: smartphone as sleep and sleep quality detector. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). Association for Computing Machinery, New York, NY, USA, 477–486. doi:10.1145/2556288.2557220

work page doi:10.1145/2556288.2557220 2014
[38]

Varun Mishra, Gunnar Pope, Sarah Lord, Stephanie Lewia, Byron Lowens, Kelly Caine, Sougata Sen, Ryan Halter, and David Kotz. 2020. Continuous Detection of Physiological Stress with Commodity Hardware.ACM Trans. Comput. Healthcare1, 2 (April 2020), 8:1–8:30. doi:10.1145/3361562

work page doi:10.1145/3361562 2020
[39]

Mohr, Mi Zhang, and Stephen M

David C. Mohr, Mi Zhang, and Stephen M. Schueller. 2017. Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning.Annual Review of Clinical Psychology13 (May 2017), 23–47. doi:10.1146/annurev-clinpsy-032816-044949 28 Fan et al

work page doi:10.1146/annurev-clinpsy-032816-044949 2017
[40]

Mohr, and Laura Pulkki- Råback

Isaac Moshe, Yannik Terhorst, Kennedy Opoku Asare, Lasse Bosse Sander, Denzil Ferreira, Harald Baumeister, David C. Mohr, and Laura Pulkki- Råback. 2021. Predicting Symptoms of Depression and Anxiety Using Smartphone and Wearable Data.Frontiers in Psychiatry12 (Jan. 2021). doi:10.3389/fpsyt.2021.625247 Publisher: Frontiers

work page doi:10.3389/fpsyt.2021.625247 2021
[41]

Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. 2024. Scaling Wearable Foundation Models. doi:10.48550/arXiv.2410.13638 arXiv:2410.13638 [cs]

work page doi:10.48550/arxiv.2410.13638 2024
[42]

HUCKINS, COURTNEY ROGERS, MEGHAN L

SUBIGYA NEPAL, WENJUN LIU, ARVIND PILLAI, WEICHEN WANG, VLADO VOJDANOVSKI, JEREMY F. HUCKINS, COURTNEY ROGERS, MEGHAN L. MEYER, and ANDREW T. CAMPBELL. 2024. Capturing the College Experience: A Four-Year Mobile Sensing Study of Mental Health, Resilience and Behavior of College Students during the Pandemic.Proceedings of the ACM on interactive, mobile, wea...

work page doi:10.1145/3643501 2024
[43]

HEINZ, ASHMITA KUNWAR, EUNSOL SOUL CHOI, XUHAI XU, JOANNA KUC, JEREMY F

SUBIGYA NEPAL, ARVIND PILLAI, WILLIAM CAMPBELL, TALIE MASSACHI, MICHAEL V. HEINZ, ASHMITA KUNWAR, EUNSOL SOUL CHOI, XUHAI XU, JOANNA KUC, JEREMY F. HUCKINS, JASON HOLDEN, SARAH M. PREUM, COLIN DEPP, NICHOLAS JACOBSON, MARY P. CZERWINSKI, ERIC GRANHOLM, and ANDREW T. CAMPBELL. 2024. MindScape Study: Integrating LLM and Behavioral Sensing for Personalized A...

work page doi:10.1145/3699761 2024
[44]

Subigya Nepal, Arvind Pillai, Weichen Wang, Tess Griffin, Amanda C Collins, Michael Heinz, Damien Lekkas, Shayan Mirjafari, Matthew Nemesure, George Price, Nicholas Jacobson, and Andrew Campbell. 2024. MoodCapture: Depression Detection using In-the-Wild Smartphone Images. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI...

work page doi:10.1145/3613904.3642680 2024
[45]

Jingping Nie, Yanchen Liu, Yigong Hu, Yuanyuting Wang, Stephen Xia, Matthias Preindl, and Xiaofan Jiang. 2021. SPIDERS+: A light-weight, wireless, and low-cost glasses-based wearable platform for emotion sensing and bio-signal acquisition.Pervasive and Mobile Computing75 (2021), 101424

work page 2021
[46]

Jingping Nie, Hanya (Vera) Shao, Yuang Fan, Qijia Shao, Haoxuan You, Matthias Preindl, and Xiaofan Jiang. 2025. LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices.ACM Trans. Comput. Healthcare(Jan. 2025). doi:10.1145/3712299 Just Accepted

work page doi:10.1145/3712299 2025
[47]

Jingping Nie, Minghui Zhao, Stephen Xia, Xinghua Sun, Hanya Shao, Yuang Fan, Matthias Preindl, and Xiaofan Jiang. 2022. Ai therapist for daily functioning assessment and intervention using smart home devices. InProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems. 764–765

work page 2022
[48]

2026.GPT-5

OpenAI. 2026.GPT-5. https://openai.com Accessed via ChatGPT interface

work page 2026
[49]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human fee...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155 2022
[50]

Collins, Tess Griffin, Benjamin Buck, Sarah Masud Preum, Trevor Cohen, Nicholas C

Arvind Pillai, Subigya Kumar Nepal, Weichen Wang, Matthew Nemesure, Michael Heinz, George Price, Damien Lekkas, Amanda C. Collins, Tess Griffin, Benjamin Buck, Sarah Masud Preum, Trevor Cohen, Nicholas C. Jacobson, Dror Ben-Zeev, and Andrew Campbell. 2024. Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones.Proc....

work page doi:10.1145/3631452 2024
[51]

Mashfiqui Rabbi, Min Hane Aung, Mi Zhang, and Tanzeem Choudhury. 2015. MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 707–718. d...

work page doi:10.1145/2750858.2805840 2015
[52]

Yuri Rykov, Thuan-Quoc Thach, Iva Bojic, George Christopoulos, and Josip Car. 2021. Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling.JMIR mHealth and uHealth9, 10 (Oct. 2021), e24872. doi:10.2196/24872

work page doi:10.2196/24872 2021
[53]

Karr, Stephen M

Sohrab Saeb, Mi Zhang, Christopher J. Karr, Stephen M. Schueller, Marya E. Corden, Konrad P. Kording, and David C. Mohr. 2015. Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study.Journal of Medical Internet Research17, 7 (July 2015), e4273. doi:10.2196/jmir.4273

work page doi:10.2196/jmir.4273 2015
[54]

Akane Sano and Rosalind W. Picard. 2013. Stress Recognition Using Wearable Sensors and Mobile Phones. InProceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII ’13). IEEE Computer Society, USA, 671–676. doi:10.1109/ACII.2013.117

work page doi:10.1109/acii.2013.117 2013
[55]

McHill, Andrew Jk Phillips, Laura K

Akane Sano, Sara Taylor, Andrew W. McHill, Andrew Jk Phillips, Laura K. Barger, Elizabeth Klerman, and Rosalind Picard. 2018. Identifying Objective Physiological Markers and Modifiable Behaviors for Self-Reported Stress and Mental Health Status Using Wearable Sensors and Mobile Phones: Observational Study.Journal of Medical Internet Research20, 6 (June 20...

work page doi:10.2196/jmir.9410 2018
[56]

Rachuri, Cecilia Mascolo, Peter J

Sandra Servia-Rodríguez, Kiran K. Rachuri, Cecilia Mascolo, Peter J. Rentfrow, Neal Lathia, and Gillian M. Sandstrom. 2017. Mobile Sensing at the Service of Mental Well-being: a Large-scale Longitudinal Study. InProceedings of the 26th International Conference on World Wide Web (WWW ’17). International World Wide Web Conferences Steering Committee, Republ...

work page doi:10.1145/3038912.3052618 2017
[57]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. doi:10.48550/arXiv.2402.03300 arXiv:2402.03300 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
[58]

Zitao Shuai, Zongzhe Xu, David Yang, Wei Wang, and Yuzhe Yang. 2026. OSF: On Pre-training and Scaling of Sleep Foundation Models.arXiv preprint arXiv:2603.00190(2026). TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs 29

work page arXiv 2026
[59]

Spitzer, Kurt Kroenke, Janet B

Robert L. Spitzer, Kurt Kroenke, Janet B. W. Williams, and Bernd Löwe. 2006. A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine166, 10 (May 2006), 1092–1097. doi:10.1001/archinte.166.10.1092

work page doi:10.1001/archinte.166.10.1092 2006
[60]

Shaoxiong Sun, Amos A. Folarin, Yuezhou Zhang, Nicholas Cummins, Rafael Garcia-Dias, Callum Stewart, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Petroula Laiou, Heet Sankesara, Faith Matcham, Daniel Leightley, Katie M. White, Carolin Oetzmann, Alina Ivan, Femke Lamers, Sara Siddi, Sara Simblett, Raluca Nica, Aki Rintala, David C. Mohr, Inez Myin-Ge...

work page doi:10.2196/45233 2023
[61]

Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[62]

Ye Tian, Xiaoyuan Ren, Zihao Wang, Onat Gungor, Xiaofan Yu, and Tajana Rosing. 2025. DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs. doi:10.48550/arXiv.2507.13737 arXiv:2507.13737 [cs] version: 1

work page doi:10.48550/arxiv.2507.13737 2025
[63]

Tseng, Akane Sano, Dror Ben-Zeev, Rachel Brian, Andrew T

Vincent W.-S. Tseng, Akane Sano, Dror Ben-Zeev, Rachel Brian, Andrew T. Campbell, Marta Hauser, John M. Kane, Emily A. Scherer, Rui Wang, Weichen Wang, Hongyi Wen, and Tanzeem Choudhury. 2020. Using behavioral rhythms and multi-task learning to predict fine-grained symptoms of schizophrenia.Scientific Reports10, 1 (Sept. 2020), 15100. doi:10.1038/s41598-0...

work page doi:10.1038/s41598-020-71689-1 2020
[64]

Rui Wang, Min S. H. Aung, Saeed Abdullah, Rachel Brian, Andrew T. Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Michael Merrill, Emily A. Scherer, Vincent W. S. Tseng, and Dror Ben-Zeev. 2016. CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia. InProceedings of the 2016 ACM International Joint Co...

work page doi:10.1145/2971648.2971740 2016
[65]

Campbell

Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. 2014. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. InProceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computin...

work page doi:10.1145/2632048.2632054 2014
[66]

Epstein, An Ping, James Fogarty, and Sean A

Rui Wang, Gabriella Harari, Peilin Hao, Xia Zhou, and Andrew T. Campbell. 2015. SmartGPA: how smartphones can assess and predict academic performance of college students. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 295–306. doi:10....

work page doi:10.1145/2750858.2804251 2015
[67]

Rui Wang, Weichen Wang, Min S. H. Aung, Dror Ben-Zeev, Rachel Brian, Andrew T. Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Emily A. Scherer, and Megan Walsh. 2017. Predicting Symptom Trajectories of Schizophrenia using Mobile Sensing.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 3 (Sept. 2017), 110:1–110:24. doi:10.1145/3130976

work page doi:10.1145/3130976 2017
[68]

Huckins, William M

Rui Wang, Weichen Wang, Alex daSilva, Jeremy F. Huckins, William M. Kelley, Todd F. Heatherton, and Andrew T. Campbell. 2018. Tracking Depression Dynamics in College Students Using Mobile Phone and Wearable Sensing.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.2, 1 (March 2018), 43:1–43:26. doi:10.1145/3191775

work page doi:10.1145/3191775 2018
[69]

Xumeng Wen, Zihan Liu, Shun Zheng, Shengyu Ye, Zhirong Wu, Yang Wang, Zhijian Xu, Xiao Liang, Junjie Li, Ziming Miao, Jiang Bian, and Mao Yang. 2025. Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs. doi:10.48550/arXiv.2506.14245 arXiv:2506.14245 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.14245 2025
[70]

Wuyue Xia, Hanya Shao, Ningxin Kong, Yuang Fan, and Jingping Nie. 2025. The Convergence of Mental Health and AI: A Cross-Disciplinary Survey of Ubiquitous Sensing, LLMs, and Clinical Alignment. doi:10.36227/techrxiv.176521329.92810310/v1

work page doi:10.36227/techrxiv.176521329.92810310/v1 2025
[71]

Villalba, Janine M

Xuhai Xu, Prerna Chikersal, Afsaneh Doryab, Daniella K. Villalba, Janine M. Dutcher, Michael J. Tumminia, Tim Althoff, Sheldon Cohen, Kasey G. Creswell, J. David Creswell, Jennifer Mankoff, and Anind K. Dey. 2019. Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College Students.Proc. ACM Interact. Mob. Wearabl...

work page doi:10.1145/3351274 2019
[72]

Dutcher, Yasaman S

Xuhai Xu, Prerna Chikersal, Janine M. Dutcher, Yasaman S. Sefidgar, Woosuk Seo, Michael J. Tumminia, Daniella K. Villalba, Sheldon Cohen, Kasey G. Creswell, J. David Creswell, Afsaneh Doryab, Paula S. Nurius, Eve Riskin, Anind K. Dey, and Jennifer Mankoff. 2021. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depress...

work page doi:10.1145/3448107 2021
[73]

Kuehn, Jeremy F

Xuhai Xu, Xin Liu, Han Zhang, Weichen Wang, Subigya Nepal, Yasaman Sefidgar, Woosuk Seo, Kevin S. Kuehn, Jeremy F. Huckins, Margaret E. Morris, Paula S. Nurius, Eve A. Riskin, Shwetak Patel, Tim Althoff, Andrew Campbell, Anind K. Dey, and Jennifer Mankoff. 2023. GLOBEM: Cross-Dataset Generalization of Longitudinal Human Behavior Modeling.Proc. ACM Interac...

work page doi:10.1145/3569485 2023
[74]

Dey, and Dakuo Wang

Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K. Dey, and Dakuo Wang. 2024. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.8, 1 (March 2024), 31:1–31:32. doi:10.1145/3643540

work page doi:10.1145/3643540 2024
[75]

Morris, Eve Riskin, Jennifer Mankoff, and Anind K

Xuhai Xu, Han Zhang, Yasaman Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, Jennifer Brown, Kevin Kuehn, Mike Merrill, Paula Nurius, Shwetak Patel, Tim Althoff, Margaret E. Morris, Eve Riskin, Jennifer Mankoff, and Anind K. Dey. 2023. GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization. arXiv:2211.02733 [cs.LG] https://ar...

work page arXiv 2023
[76]

Zongzhe Xu, Zitao Shuai, Eideen Mozaffari, Ravi S Aysola, Rajesh Kumar, and Yuzhe Yang. 2026. SleepLM: Natural-Language Intelligence for Human Sleep.arXiv preprint arXiv:2602.23605(2026). 30 Fan et al

work page arXiv 2026
[77]

Yuzhe Yang, Yuan Yuan, Guo Zhang, Hao Wang, Ying-Cong Chen, Yingcheng Liu, Christopher G Tarolli, Daniel Crepeau, Jan Bukartyk, Mithri R Junna, et al. 2022. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals.Nature Medicine 28, 10 (2022), 2207–2215

work page 2022
[78]

Tianyi Zhang, Miu Kojima, and Simon D’Alfonso. 2024. AWARE Narrator and the Utilization of Large Language Models to Extract Behavioral Insights from Smartphone Sensing Data. doi:10.48550/arXiv.2411.04691 arXiv:2411.04691 [cs]

work page doi:10.48550/arxiv.2411.04691 2024
[79]

Ali Heydari, Girish Narayanswamy, Maxwell A

Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, and Yuzhe Yang. 2025. SensorLM: Learning the Language of Wearable Sensors. doi:10.48550/a...

work page doi:10.48550/arxiv.2506.09108 2025
[80]

Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Hong Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wen- meng Zhou, and Yingda Chen. 2025. SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning. doi:10.48550/arXiv.2408.05517 arXiv:2408.05517 [cs] version: 4

work page doi:10.48550/arxiv.2408.05517 2025

Showing first 80 references.

[1] [1]

early to bed and early to rise

Saeed Abdullah, Mark Matthews, Elizabeth L. Murnane, Geri Gay, and Tanzeem Choudhury. 2014. Towards circadian computing: "early to bed and early to rise" makes some of us unhealthy and sleep deprived. InProceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’14). Association for Computing Machinery, New Y...

work page doi:10.1145/2632048.2632100 2014

[2] [2]

Adler, Dror Ben-Zeev, Vincent W.-S

Daniel A. Adler, Dror Ben-Zeev, Vincent W.-S. Tseng, John M. Kane, Rachel Brian, Andrew T. Campbell, Marta Hauser, Emily A. Scherer, and Tanzeem Choudhury. 2020. Predicting Early Warning Signs of Psychotic Relapse From Passive Sensing Data: An Approach Using Encoder-Decoder Neural Networks.JMIR mHealth and uHealth8, 8 (Aug. 2020), e19962. doi:10.2196/19962

work page doi:10.2196/19962 2020

[3] [3]

Adler, Fei Wang, David C

Daniel A. Adler, Fei Wang, David C. Mohr, and Tanzeem Choudhury. 2022. Machine learning for passive mental health symptom prediction: Generalization across different longitudinal mobile sensing studies.PLOS ONE17, 4 (April 2022), e0266516. doi:10.1371/journal.pone.0266516

work page doi:10.1371/journal.pone.0266516 2022

[4] [4]

Iftikhar Ahmed, Anushree Brahmacharimayum, Raja Hashim Ali, Talha Ali Khan, and Muhammad Ovais Ahmad. 2025. Explainable AI for Depression Detection and Severity Classification From Activity Data: Development and Evaluation Study of an Interpretable Framework.JMIR Mental Health 12, 1 (Sept. 2025), e72038. doi:10.2196/72038

work page doi:10.2196/72038 2025

[5] [5]

Rebeka Amin, Simon Schreynemackers, Hannah Oppenheimer, Milica Petrovic, Ulrich Hegerl, and Hanna Reich. 2025. Use of Mobile Sensing Data for Longitudinal Monitoring and Prediction of Depression Severity: Systematic Review.Journal of Medical Internet Research27 (Aug. 2025), e57418. doi:10.2196/57418

work page doi:10.2196/57418 2025

[6] [6]

Puyana, Ryan Kurtz, Tammy Chung, and Anind K

Sangwon Bae, Denzil Ferreira, Brian Suffoletto, Juan C. Puyana, Ryan Kurtz, Tammy Chung, and Anind K. Dey. 2017. Detecting Drinking Episodes in Young Adults Using Smartphone-based Sensors.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 2 (June 2017), 5:1–5:36. doi:10.1145/3090051 26 Fan et al

work page doi:10.1145/3090051 2017

[7] [7]

Andrey Bogomolov, Bruno Lepri, Michela Ferron, Fabio Pianesi, and Alex (Sandy) Pentland. 2014. Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits. InProceedings of the 22nd ACM international conference on Multimedia (MM ’14). Association for Computing Machinery, New York, NY, USA, 477–486. doi:10.1145/2647868.2654933

work page doi:10.1145/2647868.2654933 2014

[8] [8]

Borelli, Yuning Wang, Frances Haofei Li, Lyric N

Jessica L. Borelli, Yuning Wang, Frances Haofei Li, Lyric N. Russo, Marta Tironi, Ken Yamashita, Elayne Zhou, Jocelyn Lai, Brenda Nguyen, Iman Azimi, Christopher Marcotullio, Sina Labbaf, Salar Jafarlou, Nikil Dutt, and Amir Rahmani. 2025. Detection of Depressive Symptoms in College Students Using Multimodal Passive Sensing Data and Light Gradient Boostin...

work page doi:10.2196/67964 2025

[9] [9]

Mehdi Boukhechba, Philip Chow, Karl Fua, Bethany A Teachman, and Laura E Barnes. 2018. Predicting Social Anxiety From Global Positioning System Traces of College Students: Feasibility Study.JMIR Mental Health5, 3 (July 2018), e10101. doi:10.2196/10101

work page doi:10.2196/10101 2018

[10] [10]

Hello AI

Carrie J. Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 104:1–104:24. doi:10.1145/3359206

work page doi:10.1145/3359206 2019

[11] [11]

Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 1293–1304. doi:10.1145/2750...

work page doi:10.1145/2750858.2805845 2015

[12] [12]

Villalba, Janine M

Prerna Chikersal, Afsaneh Doryab, Michael Tumminia, Daniella K. Villalba, Janine M. Dutcher, Xinwen Liu, Sheldon Cohen, Kasey G. Creswell, Jennifer Mankoff, J. David Creswell, Mayank Goel, and Anind K. Dey. 2021. Detecting Depression and Predicting its Onset Using Longitudinal Symptoms Captured by Passive Sensing: A Machine Learning Approach With Robust F...

work page doi:10.1145/3422821 2021

[13] [13]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025

[14] [14]

Afsaneh Doryab, Daniella K Villalba, Prerna Chikersal, Janine M Dutcher, Michael Tumminia, Xinwen Liu, Sheldon Cohen, Kasey Creswell, Jennifer Mankoff, John D Creswell, and Anind K Dey. 2019. Identifying Behavioral Phenotypes of Loneliness and Social Isolation with Passive Sensing: Statistical Analysis, Data Mining and Machine Learning of Smartphone and F...

work page doi:10.2196/13209 2019

[15] [15]

Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, and Vikram Iyer

Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, and Vikram Iyer. 2024. From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models.Proceedings of the ACM on Interactive, Mobile, Weara...

work page doi:10.1145/3659604 2024

[16] [16]

Yuang Fan, Jingping Nie, Xinghua Sun, and Xiaofan Jiang. 2024. Exploring foundation models in detecting concerning daily functioning in psychotherapeutic context based on images from smart home devices. In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys). IEEE, 44–49

work page 2024

[17] [17]

Ali Heydari, Maxwell A

Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, and Xin Liu. 2025. RADAR: Benchmarking Language Models on Imperfect Tabular Data. doi:...

work page doi:10.48550/arxiv.2506.08249 2025

[18] [18]

Harari, Nicholas D

Gabriella M. Harari, Nicholas D. Lane, Rui Wang, Benjamin S. Crosier, Andrew T. Campbell, and Samuel D. Gosling. 2016. Using Smartphones to Collect Behavioral Data in Psychological Science: Opportunities, Practical Considerations, and Challenges.Perspectives on Psychological Science: A Journal of the Association for Psychological Science11, 6 (Nov. 2016),...

work page doi:10.1177/1745691616650285 2016

[19] [19]

Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A

A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A. Metwally, Brent Winslow, Yubin Kim, Kumar Ayush, Yuzhe Yang, Girish Narayanswamy, Maxwell A. Xu, Jake Garrison, Amy Armento Lee, Jenny Vafeiadou, Ben Graef, Isaac R. Galatzer-Levy, Erik Schenck, Andrew Barakat, J...

work page doi:10.48550/arxiv.2508.20148 2025

[20] [20]

Karen Hovsepian, Mustafa al’Absi, Emre Ertin, Thomas Kamarck, Motohiro Nakajima, and Santosh Kumar. 2015. cStress: towards a gold standard for continuous stress assessment in the mobile environment. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New Yor...

work page doi:10.1145/2750858.2807526 2015

[21] [21]

Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, and Bashima Islam. 2025. LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors. doi:10.48550/arXiv.2406.14498 arXiv:2406.14498 [cs]

work page doi:10.48550/arxiv.2406.14498 2025

[22] [22]

Natasha Jaques, Sara Taylor, Asaph Azaria, Asma Ghandeharioun, Akane Sano, and Rosalind Picard. 2015. Predicting students’ happiness from physiology, phone, mobility, and behavioral data.International Conference on Affective Computing and Intelligent Interaction and workshops : [proceedings]. ACII (Conference)2015 (Sept. 2015), 222–228. doi:10.1109/ACII.2...

work page doi:10.1109/acii.2015.7344575 2015

[23] [23]

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. 2024. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. doi:10.48550/arXiv.2310.01728 arXiv:2310.01728 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.01728 2024

[24] [24]

James M. Joyce. 2011. Kullback-Leibler Divergence. InInternational Encyclopedia of Statistical Science, Miodrag Lovric (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 720–722. doi:10.1007/978-3-642-04898-2_327

work page doi:10.1007/978-3-642-04898-2_327 2011

[25] [25]

Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, and Hae Won Park. 2024. Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data. doi:10.48550/arXiv.2401.06866 arXiv:2401.06866 [cs]

work page doi:10.48550/arxiv.2401.06866 2024

[26] [26]

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. 2020. Concept Bottleneck Models. doi:10.48550/arXiv.2007.04612 arXiv:2007.04612 [cs]

work page doi:10.48550/arxiv.2007.04612 2020

[27] [27]

Kroenke, R

K. Kroenke, R. L. Spitzer, and J. B. Williams. 2001. The PHQ-9: validity of a brief depression severity measure.Journal of General Internal Medicine 16, 9 (Sept. 2001), 606–613. doi:10.1046/j.1525-1497.2001.016009606.x

work page doi:10.1046/j.1525-1497.2001.016009606.x 2001

[28] [28]

Spitzer, Janet B

Kurt Kroenke, Robert L. Spitzer, Janet B. W. Williams, and Bernd Löwe. 2009. An ultra-brief screening scale for anxiety and depression: the PHQ-4. Psychosomatics50, 6 (2009), 613–621. doi:10.1176/appi.psy.50.6.613

work page doi:10.1176/appi.psy.50.6.613 2009

[29] [29]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, and Hannaneh Hajishirzi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2411.15124 2025

[30] [30]

Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, and Yuzhe Yang. 2026. HEARTS: Benchmarking LLM Reasoning on Health Time Series.arXiv preprint arXiv:2603.06638(2026)

work page arXiv 2026

[31] [31]

Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, and Flora D. Salim. 2025. SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity Recognition. doi:10.48550/arXiv.2410.10624 arXiv:2410.10624 [cs]

work page doi:10.48550/arxiv.2410.10624 2025

[32] [32]

Lane, and Lin Zhong

Robert LiKamWa, Yunxin Liu, Nicholas D. Lane, and Lin Zhong. 2013. MoodScope: building a mood sensor from smartphone usage patterns. InProceeding of the 11th annual international conference on Mobile systems, applications, and services (MobiSys ’13). Association for Computing Machinery, New York, NY, USA, 389–402. doi:10.1145/2462456.2464449

work page doi:10.1145/2462456.2464449 2013

[33] [33]

Mack, Alex W

Dante L. Mack, Alex W. DaSilva, Courtney Rogers, Elin Hedlund, Eilis I. Murphy, Vlado Vojdanovski, Jane Plomp, Weichen Wang, Subigya K. Nepal, Paul E. Holtzheimer, Dylan D. Wagner, Nicholas C. Jacobson, Meghan L. Meyer, Andrew T. Campbell, and Jeremy F. Huckins. 2021. Mental Health and Behavior of College Students During the COVID-19 Pandemic: Longitudina...

work page doi:10.2196/28892 2021

[34] [34]

Lakmal Meegahapola, William Droz, Peter Kun, Amalia de Götzen, Chaitanya Nutakki, Shyam Diwakar, Salvador Ruiz Correa, Donglei Song, Hao Xu, Miriam Bidoglia, George Gaskell, Altangerel Chagnaa, Amarsanaa Ganbold, Tsolmon Zundui, Carlo Caprini, Daniele Miorandi, Alethia Hume, Jose Luis Zarza, Luca Cernuzzi, Ivano Bison, Marcelo Rodas Britez, Matteo Busso, ...

work page doi:10.1145/3569483 2023

[35] [35]

Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y

Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, and Xin Liu

work page

[36] [36]

2026), 1143

Transforming wearable data into personal health insights using large language model agents.Nature Communications17, 1 (Jan. 2026), 1143. doi:10.1038/s41467-025-67922-y

work page doi:10.1038/s41467-025-67922-y 2026

[37] [37]

Jun-Ki Min, Afsaneh Doryab, Jason Wiese, Shahriyar Amini, John Zimmerman, and Jason I. Hong. 2014. Toss ’n’ turn: smartphone as sleep and sleep quality detector. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). Association for Computing Machinery, New York, NY, USA, 477–486. doi:10.1145/2556288.2557220

work page doi:10.1145/2556288.2557220 2014

[38] [38]

Varun Mishra, Gunnar Pope, Sarah Lord, Stephanie Lewia, Byron Lowens, Kelly Caine, Sougata Sen, Ryan Halter, and David Kotz. 2020. Continuous Detection of Physiological Stress with Commodity Hardware.ACM Trans. Comput. Healthcare1, 2 (April 2020), 8:1–8:30. doi:10.1145/3361562

work page doi:10.1145/3361562 2020

[39] [39]

Mohr, Mi Zhang, and Stephen M

David C. Mohr, Mi Zhang, and Stephen M. Schueller. 2017. Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning.Annual Review of Clinical Psychology13 (May 2017), 23–47. doi:10.1146/annurev-clinpsy-032816-044949 28 Fan et al

work page doi:10.1146/annurev-clinpsy-032816-044949 2017

[40] [40]

Mohr, and Laura Pulkki- Råback

Isaac Moshe, Yannik Terhorst, Kennedy Opoku Asare, Lasse Bosse Sander, Denzil Ferreira, Harald Baumeister, David C. Mohr, and Laura Pulkki- Råback. 2021. Predicting Symptoms of Depression and Anxiety Using Smartphone and Wearable Data.Frontiers in Psychiatry12 (Jan. 2021). doi:10.3389/fpsyt.2021.625247 Publisher: Frontiers

work page doi:10.3389/fpsyt.2021.625247 2021

[41] [41]

Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. 2024. Scaling Wearable Foundation Models. doi:10.48550/arXiv.2410.13638 arXiv:2410.13638 [cs]

work page doi:10.48550/arxiv.2410.13638 2024

[42] [42]

HUCKINS, COURTNEY ROGERS, MEGHAN L

SUBIGYA NEPAL, WENJUN LIU, ARVIND PILLAI, WEICHEN WANG, VLADO VOJDANOVSKI, JEREMY F. HUCKINS, COURTNEY ROGERS, MEGHAN L. MEYER, and ANDREW T. CAMPBELL. 2024. Capturing the College Experience: A Four-Year Mobile Sensing Study of Mental Health, Resilience and Behavior of College Students during the Pandemic.Proceedings of the ACM on interactive, mobile, wea...

work page doi:10.1145/3643501 2024

[43] [43]

HEINZ, ASHMITA KUNWAR, EUNSOL SOUL CHOI, XUHAI XU, JOANNA KUC, JEREMY F

SUBIGYA NEPAL, ARVIND PILLAI, WILLIAM CAMPBELL, TALIE MASSACHI, MICHAEL V. HEINZ, ASHMITA KUNWAR, EUNSOL SOUL CHOI, XUHAI XU, JOANNA KUC, JEREMY F. HUCKINS, JASON HOLDEN, SARAH M. PREUM, COLIN DEPP, NICHOLAS JACOBSON, MARY P. CZERWINSKI, ERIC GRANHOLM, and ANDREW T. CAMPBELL. 2024. MindScape Study: Integrating LLM and Behavioral Sensing for Personalized A...

work page doi:10.1145/3699761 2024

[44] [44]

Subigya Nepal, Arvind Pillai, Weichen Wang, Tess Griffin, Amanda C Collins, Michael Heinz, Damien Lekkas, Shayan Mirjafari, Matthew Nemesure, George Price, Nicholas Jacobson, and Andrew Campbell. 2024. MoodCapture: Depression Detection using In-the-Wild Smartphone Images. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI...

work page doi:10.1145/3613904.3642680 2024

[45] [45]

Jingping Nie, Yanchen Liu, Yigong Hu, Yuanyuting Wang, Stephen Xia, Matthias Preindl, and Xiaofan Jiang. 2021. SPIDERS+: A light-weight, wireless, and low-cost glasses-based wearable platform for emotion sensing and bio-signal acquisition.Pervasive and Mobile Computing75 (2021), 101424

work page 2021

[46] [46]

Jingping Nie, Hanya (Vera) Shao, Yuang Fan, Qijia Shao, Haoxuan You, Matthias Preindl, and Xiaofan Jiang. 2025. LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices.ACM Trans. Comput. Healthcare(Jan. 2025). doi:10.1145/3712299 Just Accepted

work page doi:10.1145/3712299 2025

[47] [47]

Jingping Nie, Minghui Zhao, Stephen Xia, Xinghua Sun, Hanya Shao, Yuang Fan, Matthias Preindl, and Xiaofan Jiang. 2022. Ai therapist for daily functioning assessment and intervention using smart home devices. InProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems. 764–765

work page 2022

[48] [48]

2026.GPT-5

OpenAI. 2026.GPT-5. https://openai.com Accessed via ChatGPT interface

work page 2026

[49] [49]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human fee...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155 2022

[50] [50]

Collins, Tess Griffin, Benjamin Buck, Sarah Masud Preum, Trevor Cohen, Nicholas C

Arvind Pillai, Subigya Kumar Nepal, Weichen Wang, Matthew Nemesure, Michael Heinz, George Price, Damien Lekkas, Amanda C. Collins, Tess Griffin, Benjamin Buck, Sarah Masud Preum, Trevor Cohen, Nicholas C. Jacobson, Dror Ben-Zeev, and Andrew Campbell. 2024. Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones.Proc....

work page doi:10.1145/3631452 2024

[51] [51]

Mashfiqui Rabbi, Min Hane Aung, Mi Zhang, and Tanzeem Choudhury. 2015. MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 707–718. d...

work page doi:10.1145/2750858.2805840 2015

[52] [52]

Yuri Rykov, Thuan-Quoc Thach, Iva Bojic, George Christopoulos, and Josip Car. 2021. Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling.JMIR mHealth and uHealth9, 10 (Oct. 2021), e24872. doi:10.2196/24872

work page doi:10.2196/24872 2021

[53] [53]

Karr, Stephen M

Sohrab Saeb, Mi Zhang, Christopher J. Karr, Stephen M. Schueller, Marya E. Corden, Konrad P. Kording, and David C. Mohr. 2015. Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study.Journal of Medical Internet Research17, 7 (July 2015), e4273. doi:10.2196/jmir.4273

work page doi:10.2196/jmir.4273 2015

[54] [54]

Akane Sano and Rosalind W. Picard. 2013. Stress Recognition Using Wearable Sensors and Mobile Phones. InProceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII ’13). IEEE Computer Society, USA, 671–676. doi:10.1109/ACII.2013.117

work page doi:10.1109/acii.2013.117 2013

[55] [55]

McHill, Andrew Jk Phillips, Laura K

Akane Sano, Sara Taylor, Andrew W. McHill, Andrew Jk Phillips, Laura K. Barger, Elizabeth Klerman, and Rosalind Picard. 2018. Identifying Objective Physiological Markers and Modifiable Behaviors for Self-Reported Stress and Mental Health Status Using Wearable Sensors and Mobile Phones: Observational Study.Journal of Medical Internet Research20, 6 (June 20...

work page doi:10.2196/jmir.9410 2018

[56] [56]

Rachuri, Cecilia Mascolo, Peter J

Sandra Servia-Rodríguez, Kiran K. Rachuri, Cecilia Mascolo, Peter J. Rentfrow, Neal Lathia, and Gillian M. Sandstrom. 2017. Mobile Sensing at the Service of Mental Well-being: a Large-scale Longitudinal Study. InProceedings of the 26th International Conference on World Wide Web (WWW ’17). International World Wide Web Conferences Steering Committee, Republ...

work page doi:10.1145/3038912.3052618 2017

[57] [57]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. doi:10.48550/arXiv.2402.03300 arXiv:2402.03300 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024

[58] [58]

Zitao Shuai, Zongzhe Xu, David Yang, Wei Wang, and Yuzhe Yang. 2026. OSF: On Pre-training and Scaling of Sleep Foundation Models.arXiv preprint arXiv:2603.00190(2026). TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs 29

work page arXiv 2026

[59] [59]

Spitzer, Kurt Kroenke, Janet B

Robert L. Spitzer, Kurt Kroenke, Janet B. W. Williams, and Bernd Löwe. 2006. A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine166, 10 (May 2006), 1092–1097. doi:10.1001/archinte.166.10.1092

work page doi:10.1001/archinte.166.10.1092 2006

[60] [60]

Shaoxiong Sun, Amos A. Folarin, Yuezhou Zhang, Nicholas Cummins, Rafael Garcia-Dias, Callum Stewart, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Petroula Laiou, Heet Sankesara, Faith Matcham, Daniel Leightley, Katie M. White, Carolin Oetzmann, Alina Ivan, Femke Lamers, Sara Siddi, Sara Simblett, Raluca Nica, Aki Rintala, David C. Mohr, Inez Myin-Ge...

work page doi:10.2196/45233 2023

[61] [61]

Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025

[62] [62]

Ye Tian, Xiaoyuan Ren, Zihao Wang, Onat Gungor, Xiaofan Yu, and Tajana Rosing. 2025. DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs. doi:10.48550/arXiv.2507.13737 arXiv:2507.13737 [cs] version: 1

work page doi:10.48550/arxiv.2507.13737 2025

[63] [63]

Tseng, Akane Sano, Dror Ben-Zeev, Rachel Brian, Andrew T

Vincent W.-S. Tseng, Akane Sano, Dror Ben-Zeev, Rachel Brian, Andrew T. Campbell, Marta Hauser, John M. Kane, Emily A. Scherer, Rui Wang, Weichen Wang, Hongyi Wen, and Tanzeem Choudhury. 2020. Using behavioral rhythms and multi-task learning to predict fine-grained symptoms of schizophrenia.Scientific Reports10, 1 (Sept. 2020), 15100. doi:10.1038/s41598-0...

work page doi:10.1038/s41598-020-71689-1 2020

[64] [64]

Rui Wang, Min S. H. Aung, Saeed Abdullah, Rachel Brian, Andrew T. Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Michael Merrill, Emily A. Scherer, Vincent W. S. Tseng, and Dror Ben-Zeev. 2016. CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia. InProceedings of the 2016 ACM International Joint Co...

work page doi:10.1145/2971648.2971740 2016

[65] [65]

Campbell

Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. 2014. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. InProceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computin...

work page doi:10.1145/2632048.2632054 2014

[66] [66]

Epstein, An Ping, James Fogarty, and Sean A

Rui Wang, Gabriella Harari, Peilin Hao, Xia Zhou, and Andrew T. Campbell. 2015. SmartGPA: how smartphones can assess and predict academic performance of college students. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 295–306. doi:10....

work page doi:10.1145/2750858.2804251 2015

[67] [67]

Rui Wang, Weichen Wang, Min S. H. Aung, Dror Ben-Zeev, Rachel Brian, Andrew T. Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Emily A. Scherer, and Megan Walsh. 2017. Predicting Symptom Trajectories of Schizophrenia using Mobile Sensing.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.1, 3 (Sept. 2017), 110:1–110:24. doi:10.1145/3130976

work page doi:10.1145/3130976 2017

[68] [68]

Huckins, William M

Rui Wang, Weichen Wang, Alex daSilva, Jeremy F. Huckins, William M. Kelley, Todd F. Heatherton, and Andrew T. Campbell. 2018. Tracking Depression Dynamics in College Students Using Mobile Phone and Wearable Sensing.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.2, 1 (March 2018), 43:1–43:26. doi:10.1145/3191775

work page doi:10.1145/3191775 2018

[69] [69]

Xumeng Wen, Zihan Liu, Shun Zheng, Shengyu Ye, Zhirong Wu, Yang Wang, Zhijian Xu, Xiao Liang, Junjie Li, Ziming Miao, Jiang Bian, and Mao Yang. 2025. Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs. doi:10.48550/arXiv.2506.14245 arXiv:2506.14245 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.14245 2025

[70] [70]

Wuyue Xia, Hanya Shao, Ningxin Kong, Yuang Fan, and Jingping Nie. 2025. The Convergence of Mental Health and AI: A Cross-Disciplinary Survey of Ubiquitous Sensing, LLMs, and Clinical Alignment. doi:10.36227/techrxiv.176521329.92810310/v1

work page doi:10.36227/techrxiv.176521329.92810310/v1 2025

[71] [71]

Villalba, Janine M

Xuhai Xu, Prerna Chikersal, Afsaneh Doryab, Daniella K. Villalba, Janine M. Dutcher, Michael J. Tumminia, Tim Althoff, Sheldon Cohen, Kasey G. Creswell, J. David Creswell, Jennifer Mankoff, and Anind K. Dey. 2019. Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College Students.Proc. ACM Interact. Mob. Wearabl...

work page doi:10.1145/3351274 2019

[72] [72]

Dutcher, Yasaman S

Xuhai Xu, Prerna Chikersal, Janine M. Dutcher, Yasaman S. Sefidgar, Woosuk Seo, Michael J. Tumminia, Daniella K. Villalba, Sheldon Cohen, Kasey G. Creswell, J. David Creswell, Afsaneh Doryab, Paula S. Nurius, Eve Riskin, Anind K. Dey, and Jennifer Mankoff. 2021. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depress...

work page doi:10.1145/3448107 2021

[73] [73]

Kuehn, Jeremy F

Xuhai Xu, Xin Liu, Han Zhang, Weichen Wang, Subigya Nepal, Yasaman Sefidgar, Woosuk Seo, Kevin S. Kuehn, Jeremy F. Huckins, Margaret E. Morris, Paula S. Nurius, Eve A. Riskin, Shwetak Patel, Tim Althoff, Andrew Campbell, Anind K. Dey, and Jennifer Mankoff. 2023. GLOBEM: Cross-Dataset Generalization of Longitudinal Human Behavior Modeling.Proc. ACM Interac...

work page doi:10.1145/3569485 2023

[74] [74]

Dey, and Dakuo Wang

Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K. Dey, and Dakuo Wang. 2024. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.8, 1 (March 2024), 31:1–31:32. doi:10.1145/3643540

work page doi:10.1145/3643540 2024

[75] [75]

Morris, Eve Riskin, Jennifer Mankoff, and Anind K

Xuhai Xu, Han Zhang, Yasaman Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, Jennifer Brown, Kevin Kuehn, Mike Merrill, Paula Nurius, Shwetak Patel, Tim Althoff, Margaret E. Morris, Eve Riskin, Jennifer Mankoff, and Anind K. Dey. 2023. GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization. arXiv:2211.02733 [cs.LG] https://ar...

work page arXiv 2023

[76] [76]

Zongzhe Xu, Zitao Shuai, Eideen Mozaffari, Ravi S Aysola, Rajesh Kumar, and Yuzhe Yang. 2026. SleepLM: Natural-Language Intelligence for Human Sleep.arXiv preprint arXiv:2602.23605(2026). 30 Fan et al

work page arXiv 2026

[77] [77]

Yuzhe Yang, Yuan Yuan, Guo Zhang, Hao Wang, Ying-Cong Chen, Yingcheng Liu, Christopher G Tarolli, Daniel Crepeau, Jan Bukartyk, Mithri R Junna, et al. 2022. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals.Nature Medicine 28, 10 (2022), 2207–2215

work page 2022

[78] [78]

Tianyi Zhang, Miu Kojima, and Simon D’Alfonso. 2024. AWARE Narrator and the Utilization of Large Language Models to Extract Behavioral Insights from Smartphone Sensing Data. doi:10.48550/arXiv.2411.04691 arXiv:2411.04691 [cs]

work page doi:10.48550/arxiv.2411.04691 2024

[79] [79]

Ali Heydari, Girish Narayanswamy, Maxwell A

Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, and Yuzhe Yang. 2025. SensorLM: Learning the Language of Wearable Sensors. doi:10.48550/a...

work page doi:10.48550/arxiv.2506.09108 2025

[80] [80]

Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Hong Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wen- meng Zhou, and Yingda Chen. 2025. SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning. doi:10.48550/arXiv.2408.05517 arXiv:2408.05517 [cs] version: 4

work page doi:10.48550/arxiv.2408.05517 2025