Human-computer interactions predict mental health
Pith reviewed 2026-05-17 05:04 UTC · model grok-4.3
The pith
Everyday cursor and touchscreen interactions encode mental health states that machine learning can extract accurately.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MAILA is a machine-learning framework trained on 18,200 cursor and touchscreen recordings paired with 1.3 million mental-health self-reports from 9,500 participants. It tracks dynamic mental states along 13 clinically relevant dimensions, resolves circadian fluctuations and experimental manipulations of arousal and valence, achieves near-ceiling accuracy at the group level, captures information only partially reflected in verbal self-report, and improves the ability of large language models to infer user mental health.
What carries the argument
MAILA, the MAchine-learning framework for Inferring Latent mental states from digital Activity, which extracts psychological signatures from patterns in cursor movements and touchscreen interactions.
If this is right
- Mental health can be assessed continuously and scalably through normal device use.
- Daily fluctuations in mental states become trackable without additional user effort.
- Large language models gain better performance when inferring mental health from user context.
- Digital phenotyping gains human-computer interaction as a new, untapped data modality.
Where Pith is reading between the lines
- Apps could integrate similar tracking to provide real-time mental health insights during routine use.
- The method may reduce reliance on self-reports alone in both research and clinical settings.
- Generalization tests across devices, cultures, and clinical populations would clarify real-world limits.
Load-bearing premise
Self-reported mental health labels serve as accurate ground truth, and the interaction data contain no major unmeasured confounds from device type, task demands, demographics, or reporting biases.
What would settle it
An experiment in which participants undergo controlled manipulations of arousal or valence and the model fails to detect corresponding changes from their cursor or touchscreen data alone.
Figures
read the original abstract
Scalable assessments of mental illness remain a critical roadblock toward accessible and equitable care. Here, we show that everyday human-computer interactions encode mental health with biomarker accuracy. We introduce MAILA, a MAchine-learning framework for Inferring Latent mental states from digital Activity. We trained MAILA on 18,200 cursor and touchscreen recordings labeled with 1.3 million mental-health self-reports collected from 9,500 participants. MAILA tracks dynamic mental states along 13 clinically relevant dimensions, resolves circadian fluctuations and experimental manipulations of arousal and valence, achieves near-ceiling accuracy at the group level, captures information that is only partially reflected in verbal self-report, and improves the ability of large language models to infer user mental health. By extracting signatures of psychological function that have so far remained untapped, MAILA establishes human-computer interactions as a new modality for scalable digital phenotyping and a foundation for context-aware artificial intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that everyday human-computer interactions encode mental health with biomarker accuracy. It introduces MAILA, a machine-learning framework trained on 18,200 cursor and touchscreen recordings labeled with 1.3 million mental-health self-reports from 9,500 participants. MAILA is said to track dynamic mental states along 13 clinically relevant dimensions, resolve circadian fluctuations and experimental manipulations of arousal and valence, achieve near-ceiling accuracy at the group level, capture information only partially reflected in verbal self-report, and improve large language models' ability to infer user mental health.
Significance. If the central claims hold after addressing validation concerns, this work could significantly advance scalable digital phenotyping by establishing human-computer interactions as a new, passive modality for mental health assessment. The large-scale dataset and multi-dimensional approach represent a strength, potentially leading to more accessible care and context-aware AI systems.
major comments (3)
- [Abstract] Abstract: The assertion that MAILA 'captures information that is only partially reflected in verbal self-report' is load-bearing for the novelty claim but is not supported by any described comparison to independent non-self-report criteria such as clinical interview scores or physiological markers.
- [Methods] Methods: The manuscript provides insufficient detail on validation splits, error bars, cross-validation strategy, and explicit controls for confounds including device type, task demands, demographics, and reporting biases, which are critical to evaluate whether the near-ceiling group-level accuracy reflects latent mental states rather than spurious correlations with self-report style.
- [Results] Results: The claim that MAILA resolves experimental manipulations of arousal and valence does not include reporting of how the large observational dataset controls for the same confounds, leaving the group-level accuracy vulnerable to alternative explanations.
minor comments (2)
- [Abstract] The acronym expansion for MAILA could be formatted more clearly on first use to improve immediate readability.
- Tables or figures presenting the 13 dimensions would benefit from explicit labeling of each dimension and associated performance metrics for clarity.
Simulated Author's Rebuttal
We are grateful to the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that MAILA 'captures information that is only partially reflected in verbal self-report' is load-bearing for the novelty claim but is not supported by any described comparison to independent non-self-report criteria such as clinical interview scores or physiological markers.
Authors: We thank the referee for this important observation. The statement in the abstract is based on supplementary analyses in the manuscript demonstrating that MAILA-derived predictions from human-computer interaction data account for unique variance in mental health self-reports not explained by other self-report measures alone. However, we acknowledge that this does not constitute validation against independent criteria such as clinical interviews or physiological markers, which are not available in our dataset. In the revised version, we will qualify this claim in the abstract and add a dedicated limitations section discussing the reliance on self-report labels and the need for future validation with clinical data. This revision will be incorporated. revision: yes
-
Referee: [Methods] Methods: The manuscript provides insufficient detail on validation splits, error bars, cross-validation strategy, and explicit controls for confounds including device type, task demands, demographics, and reporting biases, which are critical to evaluate whether the near-ceiling group-level accuracy reflects latent mental states rather than spurious correlations with self-report style.
Authors: We apologize for any lack of clarity in the methods description. The original manuscript includes participant-wise cross-validation to prevent data leakage, with error bars representing standard errors across cross-validation folds. To address the referee's concern, we will expand the Methods section with additional details on the validation strategy, including the exact split ratios and how confounds were controlled. Specifically, we will report subgroup analyses by device type and demographics, and include statistical controls for task demands and potential reporting biases where applicable. These enhancements will improve the transparency and allow readers to better assess the robustness of our findings. revision: yes
-
Referee: [Results] Results: The claim that MAILA resolves experimental manipulations of arousal and valence does not include reporting of how the large observational dataset controls for the same confounds, leaving the group-level accuracy vulnerable to alternative explanations.
Authors: Thank you for pointing this out. The experimental manipulations of arousal and valence were conducted in a controlled experimental arm of the study, separate from the large observational dataset, and we report the results after accounting for time-of-day and other basic variables. For the observational data, we have applied controls for circadian effects and basic demographics. We agree that more explicit documentation is warranted. In the revision, we will add a paragraph in the Results section detailing the confound control procedures applied to the observational dataset and include any relevant sensitivity analyses. This will help rule out alternative explanations and strengthen the interpretation. revision: yes
Circularity Check
No circularity: empirical supervised learning on held-out self-report labels
full rationale
The paper trains MAILA on cursor/touchscreen features to predict held-out self-reported mental health labels collected from participants. This is standard supervised machine learning with performance measured on data splits independent of the training set; the reported accuracy is not equivalent to the input labels by construction, nor does any derivation step reduce to a self-definition, fitted parameter renamed as prediction, or self-citation chain. Claims about capturing information partially independent of verbal self-report and resolving experimental manipulations are presented as empirical outcomes rather than definitional. The derivation chain is therefore self-contained against the external benchmark of the collected labels and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MAILA uses unsupervised representation learning to encode each participant’s cursor or touchscreen activity as a distribution over stereotyped movement patterns... LSTM autoencoder... K-means clusters... support vector regression
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We trained MAILA on 18,200 cursor and touchscreen recordings labeled with 1.3 million mental-health self-reports
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
GBD 2019 Mental Disorders Collaborators. Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990-2019: A systematic analysis for the Global Burden of Disease Study 2019 . The Lancet. Psychiatry 9, 137–150 (2022)
work page 2019
- [2]
-
[3]
Ghio, L. et al. Duration of untreated illness and outcomes in unipolar depression: A sys- tematic review and meta-analysis . Journal of Affective Disorders 152-154, 45–51 (2014)
work page 2014
-
[4]
Pablo, G. S. de et al. What is the duration of untreated psychosis worldwide? – A meta- analysis of pooled mean and median time and regional trends and other correlates across 369 studies . Psychological Medicine 54, 652–662 (2024)
work page 2024
-
[5]
Kraus, C. et al. Prognosis and improved outcomes in major depression: A review . Transla- tional Psychiatry 9, 127 (2019)
work page 2019
-
[6]
Preece, D. A. et al. Alexithymia profiles and depression, anxiety, and stress . Journal of Affective Disorders 357, 116–125 (2024)
work page 2024
-
[7]
Clement, S. et al. What is the impact of mental health-related stigma on help-seeking? A systematic review of quantitative and qualitative studies . Psychological Medicine 45, 11–27 (2015)
work page 2015
-
[8]
Miteva, D. et al. Impact of language proficiency on mental health service use, treatment and outcomes: "Lost in Translation" . Comprehensive Psychiatry 114, 152299 (2022)
work page 2022
-
[9]
Keynejad, R. C. et al. WHO Mental Health Gap Action Programme (mhGAP) Intervention Guide: A systematic review of evidence from low and middle-income countries . Evidence Based Mental Health 21, (2018)
work page 2018
-
[10]
Binz, M. et al. A foundation model to predict and capture human cognition . Nature 644, 1002–1009 (2025)
work page 2025
-
[11]
Dohnány, S. et al. Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness. (2025) doi: 10.48550/arXiv.2507.19218
-
[12]
Galatzer-Levy, I. R. et al. Generative Psychometrics—An Emerging Frontier in Mental Health Measurement. JAMA Psychiatry (2025) doi: 10.1001/jamapsychiatry.2025.3258
-
[13]
Lewis, C. M. et al. Polygenic risk scores: From research tools to clinical instruments . Genome Medicine 12, 44 (2020)
work page 2020
-
[14]
Murray, G. K. et al. Could Polygenic Risk Scores Be Useful in Psychiatry?: A Review . JAMA Psychiatry 78, 210–219 (2021)
work page 2021
-
[15]
Sanchez-Roige, S. et al. Emerging phenotyping strategies will advance our understanding of psychiatric genetics . Nature neuroscience 23, 475–480 (2020)
work page 2020
-
[16]
Kambeitz, J. et al. Detecting Neuroimaging Biomarkers for Depression: A Meta-analysis of Multivariate Pattern Recognition Studies . Biological Psychiatry 82, 330–338 (2017). 74
work page 2017
-
[17]
Marek, S. et al. Reproducible brain-wide association studies require thousands of individuals . Nature 2022 603:7902 603, 654–660 (2022)
work page 2022
-
[18]
Abd-Alrazaq, A. et al. Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression . npj Digital Medicine 6, 84 (2023)
work page 2023
-
[19]
Liu, J. J. et al. Digital phenotyping from wearables using AI characterizes psychiatric disorders and identifies genetic associations . Cell 188, 515–529.e15 (2025)
work page 2025
-
[20]
Xie, E. et al. JETS: A Self-Supervised Joint Embedding Time Series Foundation Model for Behavioral Data in Healthcare . in (2025)
work page 2025
-
[21]
Estimating the reproducibility of psychological science
Open Science Collaboration. Estimating the reproducibility of psychological science . Science 349, aac4716 (2015)
work page 2015
-
[22]
Eichstaedt, J. C. et al. Facebook language predicts depression in medical records . Proceed- ings of the National Academy of Sciences 115, 11203–11208 (2018)
work page 2018
-
[23]
Kelley, S. W. et al. Using language in social media posts to study the network dynamics of depression longitudinally . Nature Communications 13, 870 (2022)
work page 2022
-
[24]
Mirea, D.-M. et al. Cognitive modeling of real-world behavior for understanding mental health. Trends in Cognitive Sciences (2025) doi: 10.1016/j.tics.2025.07.009
-
[25]
Freeman, J. B. Doing Psychological Science by Hand . Current Directions in Psychological Science 27, 315–323 (2018)
work page 2018
-
[26]
Jain, S. H. et al. The digital phenotype . Nature Biotechnology 33, 462–463 (2015)
work page 2015
-
[27]
Insel, T. R. Digital Phenotyping: Technology for a New Science of Behavior . JAMA 318, 1215–1216 (2017)
work page 2017
-
[28]
Wainberg, M. L. et al. Challenges and Opportunities in Global Mental Health: A Research- to-Practice Perspective . Current Psychiatry Reports 19, 28 (2017)
work page 2017
-
[29]
Barrett, P. M. et al. Digitising the mind . The Lancet 389, 1877 (2017)
work page 2017
-
[30]
Topol, E. J. High-performance medicine: The convergence of human and artificial intelli- gence. Nature Medicine 25, 44–56 (2019)
work page 2019
-
[31]
Opportunities and challenges in the collection and analysis of digital pheno- typing data
Onnela, J.-P. Opportunities and challenges in the collection and analysis of digital pheno- typing data . Neuropsychopharmacology 46, 45–54 (2021)
work page 2021
-
[32]
Hauser, T. U. et al. The promise of a model-based psychiatry: Building computational models of mental ill health . The Lancet Digital Health 4, e816–e828 (2022)
work page 2022
-
[33]
Koutsouleris, N. et al. From promise to practice: Towards the realisation of AI-informed mental health care . The Lancet Digital Health 4, e829–e840 (2022)
work page 2022
-
[34]
Galatzer-Levy, I. R. et al. Machine Learning and the Digital Measurement of Psychological Health. Annual Review of Clinical Psychology 19, 133–154 (2023)
work page 2023
-
[35]
Picard, R. W. Affective computing / Rosalind W. Picard. (MIT Press, 1997)
work page 1997
-
[36]
Darwin, C. et al. The Expression of the Emotions in Man and Animals, Definitive Edition . (Oxford University Press, 1998). 75
work page 1998
-
[37]
Emotional and Conversational Nonverbal Signals
Ekman, P. Emotional and Conversational Nonverbal Signals. in Language, Knowledge, and Representation (eds. Larrazabal, J. M. et al.) 39–50 (Springer Netherlands, 2004)
work page 2004
-
[38]
Wolpert, D. M. et al. A unifying computational framework for motor control and social interaction. Philosophical Transactions of the Royal Society B: Biological Sciences 358, 593–602 (2003)
work page 2003
-
[39]
Shadmehr, R. et al. Error correction, sensory prediction, and adaptation in motor control . Annual Review of Neuroscience 33, 89–108 (2010)
work page 2010
-
[40]
Schoemann, M. et al. Using mouse cursor tracking to investigate online cognition: Preserving methodological ingenuity while moving toward reproducible science . Psychonomic Bulletin & Review 28, 766–787 (2021)
work page 2021
-
[41]
Freihaut, P. et al. Tracking stress via the computer mouse? Promises and challenges of a potential behavioral stress marker . Behavior Research Methods 53, 2281–2301 (2021)
work page 2021
-
[42]
De Angel, V. et al. Digital health tools for the passive monitoring of depression: A systematic review of methods . npj Digital Medicine 5, 3 (2022)
work page 2022
-
[43]
Insel, T. et al. Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders . The American Journal of Psychiatry 167, 748–751 (2010)
work page 2010
-
[44]
Kotov, R. et al. A paradigm shift in psychiatric classification: The Hierarchical Taxonomy Of Psychopathology (HiTOP) . World Psychiatry 17, 24–25 (2018)
work page 2018
-
[45]
Kılıç, A. A. et al. Bogazici mouse dynamics dataset . Data in Brief 36, 107094 (2021)
work page 2021
-
[46]
Westerhof, G. J. et al. Mental Illness and Mental Health: The Two Continua Model Across the Lifespan . Journal of Adult Development 17, 110–119 (2010)
work page 2010
-
[47]
Saragosa-Harris, N. M. et al. Real-World Exploration Increases Across Adolescence and Relates to Affect, Risk Taking, and Social Connectivity . Psychological Science 33, 1664– 1679 (2022)
work page 2022
-
[48]
Schurr, R. et al. Dynamic computational phenotyping of human cognition . Nature Human Behaviour 8, 917–931 (2024)
work page 2024
-
[49]
So, S. H. et al. Jumping to conclusions data-gathering bias in psychosis and other psychiatric disorders — Two meta-analyses of comparisons between patients and healthy individuals . Clinical Psychology Review 46, 151–167 (2016)
work page 2016
-
[50]
Gillan, C. M. et al. Smartphones and the Neuroscience of Mental Health . Annual Review of Neuroscience 44, 129–151 (2021)
work page 2021
-
[51]
Kuppens, P. et al. Emotional inertia and psychological maladjustment. Psychological science 21, 984–991 (2010)
work page 2010
-
[52]
Caspi, A. et al. All for One and One for All: Mental Disorders in One Dimension . American Journal of Psychiatry 175, 831–844 (2018)
work page 2018
-
[53]
Golder, S. A. et al. Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures . Science 333, 1878–1881 (2011). 76
work page 2011
-
[54]
Perris, F. et al. Duration of Untreated Illness in Patients with Obsessive-Compulsive Disor- der and Its Impact on Long-Term Outcome: A Systematic Review . Journal of Personalized Medicine 13, 1453 (2023)
work page 2023
-
[55]
Obermeyer, Z. et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019)
work page 2019
-
[56]
Keyes, K. M. et al. UK Biobank, big data, and the consequences of non-representativeness . Lancet (London, England) 393, 1297 (2019)
work page 2019
-
[57]
Karvelis, P. et al. Individual differences in computational psychiatry: A review of current challenges. Neuroscience & Biobehavioral Reviews 148, 105137 (2023)
work page 2023
-
[58]
Arena, A. F. et al. Mental health and unemployment: A systematic review and meta- analysis of interventions to improve depression and anxiety outcomes . Journal of Affective Disorders 335, 450–472 (2023)
work page 2023
-
[59]
Charles, S. T. et al. Social and Emotional Aging . Annual Review of Psychology 61, 383–409 (2010)
work page 2010
-
[60]
Why is depression more common among women than among men? The Lancet Psychiatry 4, 146–158 (2017)
Kuehner, C. Why is depression more common among women than among men? The Lancet Psychiatry 4, 146–158 (2017)
work page 2017
-
[61]
Trepka, E. et al. Entropy-based metrics for predicting choice behavior based on local re- sponse to reward . Nature Communications 12, 6567 (2021)
work page 2021
-
[62]
Bennett, D. et al. The Two Cultures of Computational Psychiatry . JAMA Psychiatry 76, 563–564 (2019)
work page 2019
-
[63]
Torous, J. et al. The growing field of digital psychiatry: Current evidence and the future of apps, social media, chatbots, and virtual reality . World psychiatry: official journal of the World Psychiatric Association (WPA) 20, 318–335 (2021)
work page 2021
-
[64]
LeCun, Y. et al. Deep learning . Nature 521, 436–444 (2015)
work page 2015
-
[65]
Lebowitz, M. S. et al. Testing positive for a genetic predisposition to depression magnifies retrospective memory for depressive symptoms . Journal of Consulting and Clinical Psychol- ogy 85, 1052–1063 (2017)
work page 2017
-
[66]
Lekadir, K. et al. FUTURE-AI: International consensus guideline for trustworthy and de- ployable artificial intelligence in healthcare. (2025) doi: 10.1136/bmj-2024-081554
-
[67]
Derogatis, L. R. et al. The Brief Symptom Inventory: An introductory report . Psychological Medicine 13, 595–605 (1983)
work page 1983
-
[68]
Keyes, C. L. M. et al. Evaluation of the mental health continuum-short form (MHC-SF) in setswana-speaking South Africans . Clinical Psychology & Psychotherapy 15, 181–192 (2008). 77
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.