pith. sign in

arxiv: 2605.17679 · v1 · pith:YOAO5UFJnew · submitted 2026-05-17 · 💻 cs.HC · cs.AI

PULSE: Agentic Investigation with Passive Sensing for Proactive Intervention in Cancer Survivorship

Pith reviewed 2026-05-19 22:00 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords passive sensingagentic AIcancer survivorshipmental healthjust-in-time interventionLLM agentssmartphone sensingemotion regulation
0
0 comments X

The pith

Agentic LLM investigation of passive smartphone sensing data raises prediction accuracy for when cancer survivors need mental health support.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that fixed feature pipelines limit passive sensing's usefulness for detecting distress in cancer survivors because they cannot adaptively query data or compare against personal baselines. It proposes shifting to LLM agents that autonomously select modalities, time windows, and calibration steps using eight purpose-built tools, mirroring clinical hypothesis testing. This matters because survivors often need help precisely when they skip diary entries, creating a gap traditional methods cannot close. The 2x2 evaluation on 50 participants shows agentic reasoning as the main driver, reaching 0.743 balanced accuracy for emotion regulation desire with combined data and 0.713 for intervention availability from sensing alone.

Core claim

PULSE replaces static feature extraction with agentic sensing investigation in which LLM agents equipped with eight purpose-built tools decide which data streams to examine, how far back to look, and how to calibrate current behavior against personalized baselines plus retrieval-augmented population comparisons, yielding balanced accuracy of 0.743 for emotion regulation desire when diary data are available and 0.713 for intervention availability from passive sensing data only.

What carries the argument

LLM agents with eight purpose-built tools that autonomously query smartphone sensing streams, compare behavior to personalized baselines, and perform retrieval-augmented population-level calibration.

If this is right

  • Agentic reasoning outperforms structured reasoning architectures across both sensing-only and multimodal conditions.
  • Passive sensing alone can support prediction of intervention availability at 0.713 balanced accuracy.
  • Just-in-time mental health support becomes more feasible because agents can act on continuous data without waiting for diary entries.
  • The accuracy ceiling reported in prior sensing work is at least partly due to interpretation methods rather than data limitations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agentic querying approach could be tested on passive data streams for other fluctuating conditions such as chronic pain or anxiety disorders.
  • Longer-term studies could measure whether the higher prediction accuracy translates into fewer distress episodes when interventions are triggered automatically.
  • Combining the agents with privacy-preserving on-device execution would address deployment barriers not examined in the current lab evaluation.

Load-bearing premise

The LLM agents will generate reliable, unbiased inferences directly from raw sensing streams without hallucinating patterns or systematically misreading behavioral signals.

What would settle it

A deployment study in which agent predictions are compared in real time against subsequent clinical assessments or self-reported intervention uptake, checking whether accuracy falls below 0.65 or produces more false positives than a fixed pipeline baseline.

Figures

Figures reproduced from arXiv: 2605.17679 by Ariful Islam, Indrajeet Ghosh, Katharine E. Daniel, Laura E. Barnes, Philip Chow, Subigya Nepal, Xinyu Chen, Zhiyuan Wang.

Figure 1
Figure 1. Figure 1: Pulse addresses ambiguity in brief diary entries by integrating passive behavioral sensing with agentic investigation. A diary entry alone admits multiple interpretations (left); always-on smartphone sensors reveal behavioral context (center); the Pulse agent autonomously queries sensor data through purpose-built tools to disambiguate affect and predict intervention need (right). Cancer survivors face elev… view at source ↗
Figure 2
Figure 2. Figure 2: System architecture of Pulse. Passive smartphone sensing streams and optional diary text (Sense) feed an LLM agent that conducts a multi-turn ReAct investigation using eight domain-specific MCP tools and cross-user retrieval-augmented generation (Think). The agent produces predicted affect, desire, and availability along with reasoning and confidence scores (Inference). The real outcome—receptivity defined… view at source ↗
Figure 3
Figure 3. Figure 3: Actual Auto-Multi investigation trace from the evaluation (Participant 089). The agent forms a grief hypothesis from [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Agent behavior analysis. (a) Tool usage and confidence over 70 entries per user: tools decrease while confidence [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
read the original abstract

Cancer survivors face elevated rates of depression, anxiety, and general emotional distress, yet the precise moments they most need support are often the moments when self-report is sparse, a phenomenon we term the diary paradox. Passive smartphone sensing offers a continuous, unobtrusive alternative, but prior sensing-based affect prediction has been limited by an accuracy ceiling, suggesting a bottleneck not only in available data, but in how behavioral signals are interpreted. We present PULSE, a system that shifts from fixed feature pipelines to agentic sensing investigation: LLM agents equipped with eight purpose-built tools autonomously query smartphone sensing data, compare current behavior against personalized baselines, and calibrate inferences through retrieval-augmented population-level comparisons. Rather than receiving pre-formatted feature summaries, agents decide which modalities to inspect, how far back to look, and how deeply to investigate, mirroring hypothesis-driven clinical reasoning. We evaluate PULSE through a 2*2 factorial design crossing reasoning architecture (structured vs. agentic) with data modality (sensing-only vs. with diary) on 50 cancer survivors from a longitudinal study of cancer survivors. Agentic reasoning is the primary driver of performance: agentic multimodal agent achieves balanced accuracy of 0.743 for emotion regulation desire with diary and sensing data, while agentic agents predict intervention availability at 0.713 with passive sensing data only. These results suggest that agentic investigation may be a cornerstone for unlocking the clinical value of passive sensing, advancing the feasibility of proactive just-in-time mental health support.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces PULSE, a system that uses LLM agents equipped with eight purpose-built tools to autonomously investigate passive smartphone sensing data for predicting emotion regulation desire and intervention availability among cancer survivors. It contrasts this agentic approach against structured baselines in a 2x2 factorial design (reasoning architecture × data modality) with 50 participants from a longitudinal study, reporting balanced accuracies of 0.743 (agentic multimodal) and 0.713 (agentic sensing-only) and attributing gains primarily to agentic reasoning over fixed feature pipelines to address the diary paradox and enable proactive just-in-time support.

Significance. If the central performance claims hold after verification, the work has moderate significance for human-computer interaction and health sensing by showing how dynamic, hypothesis-driven agentic investigation can potentially exceed the accuracy ceiling of traditional passive sensing pipelines. The factorial design isolating reasoning architecture is a positive element. The results, if robust, could support feasibility of proactive mental health interventions, but the significance depends on confirming that gains arise from improved signal interpretation rather than unexamined LLM behaviors.

major comments (3)
  1. [Abstract/Evaluation] Abstract and Evaluation: The reported balanced accuracies (0.743 multimodal agentic; 0.713 sensing-only agentic) attribute performance primarily to agentic reasoning, yet no baseline accuracies for the structured condition, error bars, cross-validation procedure, or statistical tests are provided. This information is required to verify that the gains are load-bearing evidence for the agentic advantage rather than artifacts of the experimental setup.
  2. [Methods/Evaluation] Methods/Evaluation: The manuscript provides no error analysis of tool calls, qualitative review of agent reasoning traces against raw sensor values, or checks for hallucinated baselines/population comparisons. Because the central claim requires that the eight purpose-built tools enable reliable, unbiased inferences from sensing streams, the absence of such validation leaves open whether reported gains reflect improved behavioral interpretation or LLM priors.
  3. [Results] Results: The 2×2 design claims agentic reasoning as the primary driver, but without details on how post-hoc decisions were handled or how the structured baseline was implemented, it is not possible to rule out that performance differences arise from differences in prompt structure or data formatting rather than the agentic investigation mechanism itself.
minor comments (2)
  1. [Introduction] The term 'diary paradox' is introduced without a formal definition or citation to related concepts in experience sampling literature; a brief operationalization would improve clarity.
  2. [Methods] Notation for the eight tools and their inputs/outputs could be presented more systematically (e.g., in a table) to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us strengthen the presentation of our results and evaluation methodology. We have made revisions to address the concerns regarding missing details in the abstract, methods, and results sections. Our responses to each major comment are provided below.

read point-by-point responses
  1. Referee: [Abstract/Evaluation] Abstract and Evaluation: The reported balanced accuracies (0.743 multimodal agentic; 0.713 sensing-only agentic) attribute performance primarily to agentic reasoning, yet no baseline accuracies for the structured condition, error bars, cross-validation procedure, or statistical tests are provided. This information is required to verify that the gains are load-bearing evidence for the agentic advantage rather than artifacts of the experimental setup.

    Authors: We agree that the full results of the 2×2 factorial design should be reported to substantiate the claim that agentic reasoning is the primary driver. In the revised manuscript, we have expanded the abstract and added a new table in the Results section that reports balanced accuracies for all four conditions (structured sensing-only, structured with diary, agentic sensing-only, and agentic with diary). We have also included error bars representing standard error across participants, detailed the cross-validation procedure (leave-one-participant-out to account for the longitudinal nature), and added statistical tests (paired t-tests on per-participant accuracies) showing significant differences favoring the agentic conditions. These additions confirm the robustness of our findings. revision: yes

  2. Referee: [Methods/Evaluation] Methods/Evaluation: The manuscript provides no error analysis of tool calls, qualitative review of agent reasoning traces against raw sensor values, or checks for hallucinated baselines/population comparisons. Because the central claim requires that the eight purpose-built tools enable reliable, unbiased inferences from sensing streams, the absence of such validation leaves open whether reported gains reflect improved behavioral interpretation or LLM priors.

    Authors: We acknowledge the importance of validating the reliability of the agentic tool use. In the revised version, we have added a dedicated 'Error Analysis and Validation' subsection in the Methods and Results. This includes quantitative metrics on tool call accuracy (e.g., 92% of time-window queries matched expected ranges), a qualitative analysis of 15 sampled reasoning traces with direct comparison to raw sensor logs and diary entries, and explicit checks confirming that population baselines were computed from the 50-participant cohort without external data leakage. These analyses indicate that the performance improvements arise from dynamic, context-aware investigation rather than LLM priors or hallucinations. revision: yes

  3. Referee: [Results] Results: The 2×2 design claims agentic reasoning as the primary driver, but without details on how post-hoc decisions were handled or how the structured baseline was implemented, it is not possible to rule out that performance differences arise from differences in prompt structure or data formatting rather than the agentic investigation mechanism itself.

    Authors: To clarify the implementation details and rule out confounds from prompt structure, we have revised the Methods section to provide a detailed description of the structured baseline: it uses a fixed prompt template that receives pre-computed feature summaries (mean, variance, etc.) for each modality without allowing dynamic querying. For the agentic condition, the eight tools enable on-demand data retrieval and hypothesis testing. We also describe how post-hoc decisions (e.g., when to stop investigation) were handled via a fixed iteration limit and confidence threshold in both conditions to ensure fair comparison. These details are now explicitly stated to demonstrate that differences are due to the agentic mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation with independent experimental results

full rationale

The paper reports results from a 2x2 factorial experiment comparing structured vs. agentic reasoning on sensing-only vs. multimodal data from 50 cancer survivors. Balanced accuracies (0.743 multimodal, 0.713 sensing-only) are direct empirical measurements, not quantities derived from equations or parameters fitted within the paper itself. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain; the central claim rests on observed performance differences rather than construction from inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the assumption that the eight purpose-built tools and LLM reasoning produce clinically meaningful inferences from sensing data; no explicit free parameters or invented physical entities are described in the abstract, but the agent architecture itself constitutes an ad-hoc modeling choice.

axioms (1)
  • domain assumption LLM agents can autonomously select relevant sensing modalities and time windows to form accurate inferences about emotional states
    Invoked in the description of how agents decide which modalities to inspect and how deeply to investigate.
invented entities (1)
  • Eight purpose-built tools for agentic sensing investigation no independent evidence
    purpose: Enable LLM agents to query smartphone sensing data, compare against baselines, and perform retrieval-augmented comparisons
    Introduced as the core mechanism allowing agents to move beyond fixed feature pipelines; no independent evidence provided in abstract.

pith-pipeline@v0.9.0 · 5830 in / 1406 out tokens · 36016 ms · 2026-05-19T22:00:06.873918+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 6 internal anchors

  1. [1]

    Bewick, and Mowafa Househ

    Alaa Abd-Alrazaq, Mohannad Alajlani, Nashva Ali, Kerstin Denecke, Bridgette M. Bewick, and Mowafa Househ. 2023. Systematic Review and Meta-Analysis of AI-Based Conversational Agents for Promoting Mental Health and Well-Being.npj Digital Medicine6, 1 (2023), 236. doi:10.1038/s41746-023-00979-5

  2. [2]

    Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, and Lars Juhl Jensen

    Daniel A. Adler, Fei Wang, David C. Mohr, and Tanzeem Choudhury. 2022. Machine Learning for Passive Mental Health Symptom Prediction: Generalization Across Different Longitudinal Mobile Sensing Studies.PLOS ONE17, 4 (2022), e0266516. doi:10.1371/journal. pone.0266516

  3. [3]

    Adler, Yuewen Yang, Thalia Viranda, Xuhai Xu, David C

    Daniel A. Adler, Yuewen Yang, Thalia Viranda, Xuhai Xu, David C. Mohr, Anna R. Van Meter, Julia C. Tartaglia, Nicholas C. Jacobson, Fei Wang, Deborah Estrin, and Tanzeem Choudhury. 2024. Beyond Detection: Towards Actionable Sensing Research in Clinical Mental Healthcare.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, ...

  4. [4]

    Lameck Mbangula Amugongo et al. 2025. Retrieval-Augmented Generation for Large Language Models in Healthcare: A Systematic Review.PLOS Digital Health(2025). doi:10.1371/journal.pdig.0000877

  5. [5]

    Tuo An et al. 2025. IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models.Patterns6, 1 (2025), 101131

  6. [6]

    Andersen, Christina Lacchetti, Kimlin Ashing, Jonathan S

    Barbara L. Andersen, Christina Lacchetti, Kimlin Ashing, Jonathan S. Berek, Barbara S. Berman, Samantha Bolte, Don S. Dizon, Barbara Given, Larissa Nekhlyudov, William Pirl, Julia H. Rowland, and Charles L. Shapiro. 2023. Management of Anxiety and Depression in Adult Survivors of Cancer: ASCO Guideline Update.Journal of Clinical Oncology41, 18 (2023), 342...

  7. [7]

    Anthropic. 2024. Introducing the Model Context Protocol.Anthropic Blog(2024)

  8. [8]

    Brian, Geneva Jonathan, Lisa Roncone, Irene Darmawan, Kelly Aschbrenner, Junia Lopez, Joseph A

    Dror Ben-Zeev, Rachel M. Brian, Geneva Jonathan, Lisa Roncone, Irene Darmawan, Kelly Aschbrenner, Junia Lopez, Joseph A. Hartwick, and Colin Depp. 2018. Mobile Health (mHealth) Versus Clinic-Based Group Intervention for People With Serious Mental Illness: A Randomized Controlled Trial.Psychiatric Services69, 9 (2018), 978–985. doi:10.1176/appi.ps.201800063

  9. [9]

    Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language Models are Few-Shot Learners. InAdvances in Neural Information Processing Systems (NeurIPS 2020), Vol. 33. 1877–1901

  10. [10]

    Luca Canzian and Mirco Musolesi. 2015. Trajectories of Depression: Unobtrusive Monitoring of Depressive States by means of Smartphone Mobility Traces Analysis. InProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 1293–1304

  11. [11]

    Prerna Chikersal, Afsaneh Doryab, Michael Tumminia, Daniella K Villalba, Janine M Dutcher, Xinwen Liu, Sheldon Cohen, Kasey G Creswell, Jennifer Mankoff, J David Creswell, et al. 2021. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing: a machine learning approach with robust feature selection.ACM Transac...

  12. [12]

    Adrien Choi, Aysel Ooi, and Danielle Lottridge. 2024. Digital Phenotyping for Stress, Anxiety, and Mild Depression: Systematic Literature Review.JMIR mHealth and uHealth12 (2024), e40689. doi:10.2196/40689

  13. [13]

    Akshat Choube, Ha Le, Jiachen Li, Kaixin Ji, Vedant Das Swain, and Varun Mishra. 2025. GLOSS: Group of LLMs for Open-Ended Sensemaking of Passive Sensing Data for Health and Wellbeing.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies9, 3, Article 76 (2025). doi:10.1145/3749474

  14. [14]

    Andrea Coravos, Sean Khozin, and Kenneth D. Mandl. 2019. Developing and Adopting Safe and Effective Digital Biomarkers to Improve Patient Outcomes.npj Digital Medicine2, 1 (2019), 14. doi:10.1038/s41746-019-0090-4

  15. [15]

    Justin Cosentino, Ankur Soni, Enzo Capobianco, Daniel McDuff, Anup Majmundar, et al. 2025. A Personal Health Large Language Model.Nature Medicine(2025). doi:10.1038/s41591-024-03408-6

  16. [16]

    Daniel, James W

    Katharine E. Daniel, James W. Kinchen, Angela Chang, Patrick H. Finan, and Philip I. Chow. 2025. Identifying Time-Variant Predictors of Interest in Completing Brief Digital Mental Health Interventions Among Adult Survivors of Cancer: Ecological Momentary Assessment Study.JMIR mHealth and uHealth13 (2025), e69244. doi:10.2196/69244

  17. [17]

    de Jong et al

    Laura M. de Jong et al. 2025. Effectiveness of Just-in-Time Adaptive Interventions for Improving Mental Health and Psychological Well-Being: A Systematic Review and Meta-Analysis.Internet Interventions(2025). doi:10.1016/j.invent.2025.100814 23 studies, 2563 individuals; small between-group effect g=0.15

  18. [18]

    Kaidong Feng, Zhu Sun, Roy Ka-Wei Lee, Xun Jiang, Yin-Leng Theng, and Yi Ding. 2026. A Comparative Study of Traditional Machine Learning, Deep Learning, and Large Language Models for Mental Health Forecasting using Smartphone Sensing Data.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies(2026). https://arxiv.org/abs/2601....

  19. [19]

    Denzil Ferreira, Vassilis Kostakos, and Anind K. Dey. 2015. AWARE: Mobile Context Instrumentation Framework.Frontiers in ICT2 (2015), 6. doi:10.3389/fict.2015.00006

  20. [20]

    Simon B Goldberg, Daniel M Bolt, and Richard J Davidson. 2021. Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses.Journal of Medical Internet Research23, 6 (2021), e26749. doi:10.2196/26749 0:28•Zhiyuan Wang, Ariful Islam, Indrajeet Ghosh, Xinyu Chen, Katharine E. Daniel, Subigya Nepal, Phil...

  21. [21]

    Goldstein, J

    Samantha P. Goldstein, J. Graham Thomas, Gary D. Foster, Gabrielle M. Turner-McGrievy, Rena R. Wing, and Dale S. Bond. 2023. How Notifications Affect Engagement With a Behavior Change App: Results From a Micro-Randomized Trial.JMIR mHealth and uHealth11 (2023), e38342. doi:10.2196/38342

  22. [22]

    James J. Gross. 2015. Emotion Regulation: Current Status and Future Prospects.Psychological Inquiry26, 1 (2015), 1–26. doi:10.1080/ 1047840X.2014.940781

  23. [23]

    Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Chen, and Fei Huang. 2024. Understanding the Planning of LLM Agents: A Survey.arXiv preprint arXiv:2402.02716(2024). https://arxiv.org/abs/2402.02716

  24. [24]

    Huckins, Alex W

    Jeremy F. Huckins, Alex W. daSilva, Weichen Wang, Elin Hedlund, Courtney Rogers, Subigya K. Nepal, Jialing Wu, Mikio Obuchi, Eilis I. Murphy, Meghan L. Meyer, Dylan D. Wagner, Paul E. Holtzheimer, and Andrew T. Campbell. 2020. Mental Health and Behavior of College Students During the Early Phases of the COVID-19 Pandemic: Longitudinal Smartphone and Ecolo...

  25. [25]

    Jacobson, Hilary Weingarden, and Sabine Wilhelm

    Nicholas C. Jacobson, Hilary Weingarden, and Sabine Wilhelm. 2019. Digital Biomarkers of Mood Disorders and Symptom Change.npj Digital Medicine2, 1 (2019), 3. doi:10.1038/s41746-019-0078-0

  26. [26]

    Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, and Hae Won Park. 2024. Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data. InProceedings of Machine Learning Research (CHIL), Vol. 248. 522–539

  27. [27]

    Predrag Klasnja, Eric B Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A Murphy. 2015. Mi- crorandomized Trials: An Experimental Design for Developing Just-in-Time Adaptive Interventions.Health Psychology34, S (2015), 1220–1228

  28. [28]

    Seewald, Andy Lee, Kelly Hall, Bryan Luers, Eric B

    Predrag Klasnja, Shawna Smith, Nicholas J. Seewald, Andy Lee, Kelly Hall, Bryan Luers, Eric B. Hekler, and Susan A. Murphy. 2019. Efficacy of Contextually Tailored Suggestions for Physical Activity: A Micro-randomized Optimization Trial of HeartSteps.Annals of Behavioral Medicine53, 6 (2019), 573–582. doi:10.1093/abm/kay067

  29. [29]

    Florian Kunzler, Varun Mishra, Jan-Niklas Kramer, David Kotz, Elgar Fleisch, and Tobias Kowatsch. 2019. Exploring the State-of- Receptivity for mHealth Interventions.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies3, 4 (2019), 1–27

  30. [30]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems33 (2020), 9459–9474

  31. [31]

    Jiachen Li, Akshat Choube, et al. 2025. Vital Insight: Assisting Experts’ Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-in-the-Loop LLM.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies(2025). doi:10.1145/3749508

  32. [32]

    Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, and Flora D Salim. 2025. SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025). https://aclanthology.org/2025.emnlp-main.19

  33. [33]

    Yusheng Liao, Shuyang Jiang, Yanfeng Wang, and Yu Wang. 2025. ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL). https://arxiv.org/abs/2410.17657

  34. [34]

    Dey, and Daniel Gatica-Perez

    Lakmal Meegahapola, William Droz, Peter Kun, Numa de Lara, Damien Erber, Florian Flöck, Anna Giménez-Laudo, Kristóf Reyero, Duc Phan, Donghwi Shin, Anind K. Dey, and Daniel Gatica-Perez. 2023. Generalization and Personalization of Mobile Sensing-Based Mood Inference Models: An Analysis of College Students in Eight Countries.Proceedings of the ACM on Inter...

  35. [35]

    Mendes, Ivo R

    João Pedro S. Mendes, Ivo R. Moura, Peter Van de Ven, Davi Viana, Luciano R. Coutinho, and Carla Flora. 2023. Digital Phenotyping for Monitoring Mental Disorders: Systematic Review.Journal of Medical Internet Research25 (2023), e46778. doi:10.2196/46778

  36. [36]

    Mike A Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Melissa Perez, Md Mobashir Hasan Shandhi, et al. 2026. Transforming Wearable Data into Health Insights using Large Language Model Agents.Nature Communications17 (2026), 1265. doi:10.1038/s41467- 025-67922-y PHIA system

  37. [37]

    Munib Mesinovic, Peter Watkinson, and Tingting Zhu. 2025. Explainability in the Age of Large Language Models for Healthcare. Communications Engineering(2025). doi:10.1038/s44172-025-00453-y

  38. [38]

    Varun Mishra, Florian Kunzler, Jan-Niklas Kramer, Elgar Fleisch, Tobias Kowatsch, and David Kotz. 2021. Detecting Receptivity for mHealth Interventions in the Natural Environment. InProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 5. 1–24

  39. [39]

    Mitchell, Melissa Chan, Hina Bhatti, Marie Halton, Luigi Grassi, Christoffer Johansen, and Nicholas Meader

    Alex J. Mitchell, Melissa Chan, Hina Bhatti, Marie Halton, Luigi Grassi, Christoffer Johansen, and Nicholas Meader. 2011. Prevalence of Depression, Anxiety, and Adjustment Disorder in Oncological, Haematological, and Palliative-Care Settings: A Meta-Analysis of 94 Interview-Based Studies.The Lancet Oncology12, 2 (2011), 160–174. doi:10.1016/S1470-2045(11)70002-X

  40. [40]

    Mohr, Kathryn Noth Tomasino, Emily G

    David C. Mohr, Kathryn Noth Tomasino, Emily G. Lattie, Hannah L. Palac, Mary J. Kwasny, Kenneth Weingardt, Chris J. Karr, Susan M. Kaiser, Rebecca C. Rossom, Leland R. Bardsley, Lauren Caccamo, Colleen Stiles-Shields, and Stephen M. Schueller. 2017. IntelliCare: An Eclectic, Skills-Based App Suite for the Treatment of Depression and Anxiety.Journal of Med...

  41. [41]

    Aja Louise Murray, Ruth Brown, Xinxin Zhu, Lydia Gabriela Speyer, Yi Yang, Zhouni Xiao, Denis Ribeaud, and Manuel Eisner. 2023. Prompt-level predictors of compliance in an ecological momentary assessment study of young adults’ mental health.Journal of Affective Disorders322 (2023), 125–131. doi:10.1016/j.jad.2022.11.014

  42. [42]

    Inbal Nahum-Shani, Shawna N Smith, Bonnie J Spring, Linda M Collins, Katie Witkiewitz, Ambuj Tewari, and Susan A Murphy. 2018. Just-in-Time Adaptive Interventions (JITAI) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support.Annals of Behavioral Medicine52, 6 (2018), 446–462

  43. [43]

    Smith, Ambuj Tewari, Katie Witkiewitz, Linda M

    Inbal Nahum-Shani, Shawna N. Smith, Ambuj Tewari, Katie Witkiewitz, Linda M. Collins, Bonnie J. Spring, and Susan A. Murphy. 2014. Just-in-Time Adaptive Interventions (JITAIs): An Organizing Framework for Ongoing Health Behavior Support. Technical Report 14-126. The Methodology Center, Penn State University

  44. [44]

    Georgiou

    Shrikanth Narayanan and Panayiotis G. Georgiou. 2013. Behavioral Signal Processing: Deriving Human Behavioral Informatics from Speech and Language.Proc. IEEE101, 5 (2013), 1203–1233. doi:10.1109/JPROC.2012.2236291

  45. [45]

    HUCKINS, COURTNEY ROGERS, MEGHAN L

    Subigya Nepal, Wenjun Liu, Arvind Pillai, Weichen Wang, Vlado Vojdanovski, Jeremy F. Huckins, Courtney Rogers, Meghan L. Meyer, and Andrew T. Campbell. 2024. Capturing the College Experience: A Four-Year Mobile Sensing Study of Mental Health, Resilience and Behavior of College Students during the Pandemic.Proceedings of the ACM on Interactive, Mobile, Wea...

  46. [46]

    HEINZ, ASHMITA KUNWAR, EUNSOL SOUL CHOI, XUHAI XU, JOANNA KUC, JEREMY F

    Subigya Nepal, Arvind Pillai, William Campbell, Marvin Heinz, Talie Griffin, Daniel Mackin, Amanda Collins, Nicholas C. Jacobson, and Andrew T. Campbell. 2024. MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 4 (20...

  47. [47]

    Subigya Nepal, Weichen Wang, Vlado Vojdanovski, Jeremy F Huckins, Alex Dasilva, Meghan Meyer, and Andrew Campbell. 2022. COVID student study: A year in the life of college students during the COVID-19 pandemic through the lens of mobile phone sensing. InProceedings of the 2022 CHI conference on human factors in computing systems. 1–19

  48. [48]

    Jukka-Pekka Onnela and Scott L. Rauch. 2016. Harnessing Smartphone-Based Digital Phenotyping to Enhance Behavioral and Mental Health.Neuropsychopharmacology41, 7 (2016), 1691–1696. doi:10.1038/npp.2016.7

  49. [49]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training Language Models to Follow Instructions with Human Feedback. InAdvances in Neural Information Processing Systems (NeurIPS 2022), Vol. 35. 27730–27744

  50. [50]

    Sarthak Pati, Sourav Kumar, Amokh Varma, Brandon Edwards, Charles Lu, Liangqiong Qu, Justin J Wang, Anantharaman Lakshmi- narayanan, Shih-Han Wang, Micah J Sheller, Ken Chang, Praveer Singh, Daniel L Rubin, Jayashree Kalpathy-Cramer, and Spyridon Bakas. 2024. Privacy Preservation for Federated Learning in Health Care.Patterns5, 7 (2024), 100974. doi:10.10...

  51. [51]

    Gorilla: Large Language Model Connected with Massive APIs

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2024. Gorilla: Large Language Model Connected with Massive APIs. InAdvances in Neural Information Processing Systems (NeurIPS 2024), Vol. 37. https://arxiv.org/abs/2305.15334

  52. [52]

    Heinz, Tess Z

    Arvind Pillai, Subigya Nepal, William Campbell, Michael V. Heinz, Tess Z. Griffin, Nicholas C. Jacobson, and Andrew T. Campbell

  53. [53]

    In Proceedings of the Conference on Health, Inference, and Learning (CHIL)

    Beyond Prompting: Time2Lang—Bridging Time-Series Foundation Models and Large Language Models for Health Sensing. In Proceedings of the Conference on Health, Inference, and Learning (CHIL). PMLR. https://arxiv.org/abs/2502.07608

  54. [54]

    Emilio Qiu et al . 2024. Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges.Sensors24, 15 (2024), 5045. doi:10.3390/s24155045

  55. [55]

    Guanqiao Qu et al. 2024. Mobile Edge Intelligence for Large Language Models: A Contemporary Survey.arXiv preprint arXiv:2407.18921 (2024). https://arxiv.org/abs/2407.18921

  56. [56]

    Sohrab Saeb, Mi Zhang, Christopher J Karr, Stephen M Schueller, Marya E Corden, Konrad P Kording, and David C Mohr. 2015. Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study.Journal of Medical Internet Research17, 7 (2015), e175

  57. [57]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS 2023). https://arxiv.org/abs/2302.04761 Oral presentation

  58. [58]

    Saul Shiffman, Arthur A Stone, and Michael R Hufford. 2008. Ecological momentary assessment.Annual Review of Clinical Psychology4, 1 (2008), 1–32

  59. [59]

    Singhal, S

    Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al . 2023. Large Language Models Encode Clinical Knowledge.Nature620, 7972 (2023), 172–180. doi:10.1038/s41586-023-06291-2

  60. [60]

    Robinson

    Bonnie Spring, Tammy Stump, Frank Penedo, Angela Fidler Pfammatter, and June K. Robinson. 2019. Toward a Health-Promoting System for Cancer Survivors: Patient and Provider Multiple Behavior Change.Health Psychology38, 9 (2019), 840–850. doi:10.1037/hea0000760

  61. [61]

    John Torous, Jukka-Pekka Onnela, and Matcheri Keshavan. 2017. New Dimensions and New Tools to Realize the Potential of RDoC: Digital Phenotyping via Smartphones and Connected Devices.Translational Psychiatry7, 3 (2017), e1075. doi:10.1038/tp.2017.25 0:30•Zhiyuan Wang, Ariful Islam, Indrajeet Ghosh, Xinyu Chen, Katharine E. Daniel, Subigya Nepal, Philip Ch...

  62. [62]

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models.arXiv preprint arXiv:2302.13971(2023). https://arxiv.org/abs/2302.13971

  63. [63]

    Tao Tu, Anil Palepu, Mike Schaekermann, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, et al. 2024. Towards Conversational Diagnostic AI.arXiv preprint arXiv:2401.05654(2024). AMIE system

  64. [64]

    The Canadian Journal of Psychiatry64(7), 456–464 (2019).https: //doi.org/10.1177/0706743719828977

    Aditya Nrusimha Vaidyam, Hannah Wisniewski, John David Halamka, Matcheri S. Kashavan, and John Blake Torous. 2019. Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape.The Canadian Journal of Psychiatry64, 7 (2019), 456–464. doi:10.1177/0706743719828977

  65. [65]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems (NeurIPS 2017), Vol. 30. 5998–6008

  66. [66]

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. InAdvances in Neural Information Processing Systems (NeurIPS 2023). https://arxiv.org/abs/2305.16291

  67. [67]

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. 2024. A survey on large language model based autonomous agents.Frontiers of Computer Science18, 6 (2024), 186345

  68. [68]

    Rui Wang, Min S. H. Aung, Saeed Abdullah, Rachel Brian, Andrew T. Campbell, Tanzeem Choudhury, Marta Hauser, John Kane, Michael Merrill, Emily A. Scherer, Vincent W. S. Tseng, and Dror Ben-Zeev. 2016. CrossCheck: Toward Passive Sensing and Detection of Mental Health Changes in People with Schizophrenia. InProceedings of the 2016 ACM International Joint Co...

  69. [69]

    Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T Campbell

  70. [70]

    In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing

    StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing. 3–14

  71. [71]

    Rui Wang, Weichen Wang, Alex DaSilva, Jeremy F Huckins, William M Kelley, Todd F Heatherton, and Andrew T Campbell. 2018. Tracking depression dynamics in college students using mobile phone and wearable sensing.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies2, 1 (2018), 1–26

  72. [72]

    Xin Wang, Ting Dang, Vassilis Kostakos, and Hong Jia. 2024. Efficient and Personalized Mobile Health Event Prediction via Small Language Models. InProceedings of the 30th Annual International Conference on Mobile Computing and Networking (MobiCom ’24). ACM, 2353–2358. doi:10.1145/3636534.3698123

  73. [73]

    Barnes, and Philip Chow

    Zhiyuan Wang, Katharine E Daniel, Laura E. Barnes, and Philip Chow. 2026. Inferring Affect and Intervention Opportunities for Cancer Survivors from Digital Diaries with Context-Aware LLMs. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems(Barcelona, Spain)(CHI ’26). Association for Computing Machinery, New York, NY, USA, Artic...

  74. [74]

    David Watson, Lee Anna Clark, and Auke Tellegen. 1988. Development and validation of brief measures of positive and negative affect: The PANAS scales.Journal of Personality and Social Psychology54, 6 (1988), 1063–1070

  75. [75]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain- of-Thought Prompting Elicits Reasoning in Large Language Models.Advances in Neural Information Processing Systems35 (2022), 24824–24837

  76. [76]

    Huatao Xu, Liying Han, Qirui Yang, Mo Li, and Mani Srivastava. 2024. Penetrative AI: Making LLMs Comprehend the Physical World. InFindings of the Association for Computational Linguistics: ACL 2024. 7324–7341

  77. [77]

    Wenxuan Xu, Arvind Pillai, Subigya Nepal, Amanda C Collins, Daniel M Mackin, Michael V Heinz, Tess Z Griffin, Nicholas C Jacobson, and Andrew Campbell. 2025. LENS: LLM-Enabled Narrative Synthesis for Mental Health by Aligning Multimodal Sensing with Language Models.arXiv preprint arXiv:2512.23025(2025)

  78. [78]

    Kuehn, Jeremy F

    Xuhai Xu, Xin Liu, Han Zhang, Weichen Wang, Subigya Nepal, Yasaman Sefidgar, Woosuk Seo, Kevin S. Kuehn, Jeremy F. Huckins, Margaret E. Morris, Paula S. Nurius, Eve A. Riskin, Shwetak Patel, Tim Althoff, Andrew Campbell, Anind K. Dey, and Jennifer Mankoff

  79. [79]

    Kuehn, Jeremy F

    GLOBEM: Cross-Dataset Generalization of Longitudinal Human Behavior Modeling.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies6, 4, Article 190 (2022), 34 pages. doi:10.1145/3569485 Distinguished Paper Award, UbiComp 2023

  80. [80]

    Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K Dey, and Dakuo Wang

Showing first 80 references.