A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

Bin Liu; Chuanbo Hu; Lynn K. Paul; Minglei Yin; Shuo Wang; Wenqi Li; Xin Li

arxiv: 2605.22993 · v1 · pith:Z5MOCWPHnew · submitted 2026-05-21 · 💻 cs.CL · cs.AI

A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism

Chuanbo Hu , Minglei Yin , Bin Liu , Wenqi Li , Lynn K. Paul , Shuo Wang , Xin Li This is my paper

Pith reviewed 2026-05-25 05:41 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords autismsocial language disorderdialogue frameworkmulti-agent systemADOS-2trait assessmentproactive questioningLLM

0 comments

The pith

A proactive multi-agent system called TPA raises SLD trait coverage to 82.1 percent by having an AI doctor track unobserved traits and pick targeted questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Characteristic language traits in autism often remain hidden in ordinary talk and only surface under particular questioning conditions. The paper tests whether an LLM-based doctor agent can improve yield by first identifying which traits have not yet appeared and then selecting a matching clinical strategy before asking the next question. A second agent simulates patient responses drawn from real ADOS-2 sessions, allowing repeated trials without live participants. Across 484 episodes the method reaches 82.1 percent trait coverage and an area-under-curve-per-turn score of 0.628, exceeding both automated baselines and replays of actual clinician dialogues. The gain in per-turn efficiency suggests that deliberate strategy choice can make automated screening more practical.

Core claim

TPA lets the doctor agent reason explicitly over remaining unobserved SLD traits, select a clinically grounded questioning strategy, and generate the next utterance, producing 82.1 percent trait coverage and an AUCC of 0.628 on 484 episodes from 35 patients—16.6 points above the 65.5 percent coverage obtained from automated replays of real clinician dialogues and 0.170 above their AUCC of 0.458.

What carries the argument

The Think-Plan-Ask loop: the doctor agent first enumerates unobserved traits, then chooses a strategy from a clinically defined set, then produces the question.

If this is right

Substantially higher diagnostic information per conversation turn than either scripted or replayed clinical dialogues.
Reproducible, repeatable evaluation of dialogue policies without requiring live patient participation.
Outperformance on every primary metric against six competitive planning baselines.
Direct applicability to the language-assessment portion of ADOS-2 Module 4.
Demonstration that proactive strategy selection improves automated SLD trait assessment efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same patient-simulation approach could test dialogue policies for other disorders whose signs appear only under narrow conversational conditions.
Combining TPA outputs with human clinician oversight might further raise coverage while preserving safety.
If the efficiency gain holds, fewer total turns would be needed to reach a given diagnostic threshold, lowering assessment cost.
The framework could be adapted to train human clinicians by showing which strategies surface which traits most reliably.

Load-bearing premise

The patient agent, built from real ADOS-2 transcripts, generates replies that closely match how actual patients would respond to the new questions chosen by TPA.

What would settle it

Administer the exact question sequences generated by TPA to real patients and compare the resulting SLD trait detection rate against the 82.1 percent obtained in simulation.

read the original abstract

Characteristic linguistic behaviors associated with Social Language Disorder (SLD) in autism spectrum disorder, including echoic repetition, pronoun displacement, and stereotyped media quoting, are largely absent from spontaneous conversation and only emerge under specific conversational conditions. In structured clinical assessments, this latency means that questioning strategy selection is a critical yet underappreciated determinant of how much diagnostic information a conversation yields. Whether large language models (LLMs) can be guided to proactively select questioning strategies that systematically surface these latent traits remains largely unexplored. Here we present TPA (Think, Plan, Ask), a proactive multi-agent dialogue framework applied to the language assessment component of the Autism Diagnostic Observation Schedule Module 4 (ADOS-2), in which a doctor agent explicitly reasons about which traits remain unobserved before selecting a clinically grounded strategy and generating a targeted question. A patient agent grounded in real ADOS-2 clinical data enables reproducible evaluation without real patient participation, validated across three independent experiments confirming adequate fidelity to real patient language. Evaluated on 484 episodes from 35 patients, TPA outperforms six competitive dialogue planning baselines across all primary metrics, achieving 82.1% SLD trait coverage, 16.6% higher than automated replay of real clinical dialogues conducted by trained clinicians (65.5%), with substantially greater per-turn diagnostic efficiency (AUCC: 0.628 vs. 0.458, absolute gain +0.170). These results demonstrate that proactive questioning strategy selection substantially improves the efficiency of automated SLD trait assessment, with direct implications for scalable AI-assisted clinical screening.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TPA applies multi-agent LLMs to pick targeted questions for surfacing latent autism language traits, with gains that rest on how faithfully the patient simulator handles those novel questions.

read the letter

The paper's main move is a doctor agent that reasons about which SLD traits are still missing, picks a clinically motivated strategy, and asks a focused question. That proactive loop is a straightforward but new application of multi-agent setups to the ADOS-2 language module, where trait latency makes strategy choice important. The patient agent is built from real clinical data and checked in three experiments for basic fidelity, which lets them run 484 episodes without live patients. The headline numbers—82.1% trait coverage against 65.5% for clinician dialogue replays and a 0.17 AUCC lift—look like a concrete efficiency win if the simulation is reliable. Credit for staying grounded in actual ADOS-2 material rather than synthetic patients from scratch. The central risk is exactly the one flagged in the stress test. The validation experiments are described only at a high level, so it is not clear whether they tested the agent's responses to the out-of-distribution questions TPA actually generates. If the simulator smooths over the conditional appearance of echoic repetition or pronoun issues under those questions, the measured advantage over the replay baseline will not carry over to real conversations. Baseline implementations and any statistical tests are also left unspecified in the abstract, which makes it hard to judge how much of the gap is real versus setup-dependent. No circularity in the evaluation itself, since comparisons use external replays. This is for groups working on AI-assisted clinical screening or multi-agent dialogue in constrained domains. A reader who needs a reproducible testbed for questioning strategies would find usable pieces here. The underlying idea is coherent and the motivation tracks clinical practice, so the work deserves referee time even though the simulation details will need tightening.

Referee Report

2 major / 2 minor

Summary. The paper presents TPA (Think, Plan, Ask), a proactive multi-agent dialogue framework for assessing Social Language Disorder (SLD) traits (echoic repetition, pronoun displacement, stereotyped quoting) in the language component of ADOS-2. A doctor agent explicitly reasons over unobserved traits before selecting a clinically grounded strategy and generating a targeted question; a patient agent is constructed from real ADOS-2 clinical data. On 484 episodes from 35 patients, TPA is reported to reach 82.1% SLD trait coverage (16.6 points above automated replay of real clinician dialogues at 65.5%) and AUCC 0.628 (vs. 0.458), outperforming six dialogue-planning baselines. The central claim is that explicit proactive strategy selection materially improves diagnostic efficiency in automated SLD assessment.

Significance. If the patient-agent fidelity holds under out-of-distribution proactive questions, the result would supply concrete evidence that LLM-based agents can improve the yield of structured clinical dialogues without real-patient participation. The reproducible evaluation setup (three validation experiments on a grounded simulator) is a methodological strength that could support follow-on work in scalable screening. The magnitude of the reported gains (+0.170 AUCC, +16.6% coverage) would be clinically relevant if the simulation preserves the conditional sparsity of latent traits.

major comments (2)

[Abstract / Evaluation] Abstract and evaluation description: the headline metrics (82.1% coverage, AUCC 0.628) are obtained exclusively by running TPA and baselines against the same patient agent. The paper states the agent is 'grounded in real ADOS-2 clinical data' and 'validated across three independent experiments confirming adequate fidelity,' yet supplies no quantitative comparison of trait-latency distributions (e.g., conditional probability of echoic repetition or pronoun displacement) under TPA-generated questions versus the real-clinician replay distribution. Because the measured gains are defined relative to this simulator, the absence of such evidence makes it impossible to determine whether the 16.6-point improvement reflects clinical reality or simulation artifact.
[Evaluation] Evaluation section: the abstract asserts that TPA 'outperforms six competitive dialogue planning baselines across all primary metrics' and reports specific numbers, but provides no description of baseline implementations, hyper-parameter choices, statistical tests, confidence intervals, or patient exclusion criteria. Without these details the numerical claims cannot be assessed for robustness or reproducibility.

minor comments (2)

[Abstract] The acronym AUCC is used without expansion on first appearance; clarify whether it denotes area under a cumulative-coverage curve or another quantity.
[Methods] The patient-agent construction is described only at high level; a short paragraph or table summarizing the three fidelity experiments (e.g., metrics, sample sizes) would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments on our manuscript. We address each of the major comments below and indicate where revisions will be made to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and evaluation description: the headline metrics (82.1% coverage, AUCC 0.628) are obtained exclusively by running TPA and baselines against the same patient agent. The paper states the agent is 'grounded in real ADOS-2 clinical data' and 'validated across three independent experiments confirming adequate fidelity,' yet supplies no quantitative comparison of trait-latency distributions (e.g., conditional probability of echoic repetition or pronoun displacement) under TPA-generated questions versus the real-clinician replay distribution. Because the measured gains are defined relative to this simulator, the absence of such evidence makes it impossible to determine whether the 16.6-point improvement reflects clinical reality or simulation artifact.

Authors: We agree that providing explicit quantitative comparisons of trait-latency distributions under different questioning strategies would strengthen the validation of the patient agent. While the three independent experiments confirm overall fidelity to real patient language patterns, we will add in the revised manuscript specific analyses comparing conditional probabilities of SLD traits (echoic repetition, pronoun displacement, stereotyped quoting) when the patient agent is queried with TPA-generated questions versus the real clinician dialogue replays. This will help demonstrate that the patient responses remain consistent with clinical data even under proactive questioning. revision: yes
Referee: [Evaluation] Evaluation section: the abstract asserts that TPA 'outperforms six competitive dialogue planning baselines across all primary metrics' and reports specific numbers, but provides no description of baseline implementations, hyper-parameter choices, statistical tests, confidence intervals, or patient exclusion criteria. Without these details the numerical claims cannot be assessed for robustness or reproducibility.

Authors: We acknowledge the need for greater detail in the evaluation section to support reproducibility. In the revised manuscript, we will expand the Evaluation section to include: (1) full descriptions and implementations of all six baseline methods, (2) hyperparameter choices and tuning procedures, (3) statistical tests used (e.g., paired t-tests or Wilcoxon tests) with p-values and confidence intervals for the reported metrics, and (4) explicit patient exclusion criteria and dataset split details. These additions will allow readers to fully assess the robustness of our claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports empirical performance of TPA versus external baselines on a simulated patient agent whose fidelity is asserted via separate validation experiments against real ADOS-2 data. No equations, parameter fits, self-definitional loops, or load-bearing self-citations are present that would reduce the reported metrics (82.1% coverage, AUCC gains) to the inputs by construction. The evaluation chain remains externally benchmarked rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the fidelity of the simulated patient agent to real clinical language data; this is asserted but not independently evidenced beyond the abstract's mention of three validation experiments.

axioms (1)

domain assumption The patient agent grounded in real ADOS-2 clinical data has adequate fidelity to real patient language
Stated directly in the abstract as the basis for reproducible evaluation without real patients.

invented entities (1)

TPA (Think, Plan, Ask) multi-agent framework no independent evidence
purpose: Proactive selection of questioning strategies to surface latent SLD traits
New framework introduced by the authors; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5828 in / 1323 out tokens · 32096 ms · 2026-05-25T05:41:13.116379+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

[1]

MMWR Surveillance Summaries72(2), 1–14 (2023) https://doi.org/10

Maenner, M.J., Warren, Z., Williams, A.R.,et al.: Prevalence and characteristics of autism spectrum disorder among children aged 8 years — Autism and developmental disabilities monitoring network, 11 sites, United States, 2020. MMWR Surveillance Summaries72(2), 1–14 (2023) https://doi.org/10. 15585/mmwr.ss7202a1

work page 2020
[2]

Cognoa Waitlist Crisis Report (2023)

Cognoa: The State of Pediatric Autism Diagnosis in the U.S.: Gridlocks, Inequities and Missed Opportunities Persist. Cognoa Waitlist Crisis Report (2023). https://cognoa.com/waitlist-crisis-report/

work page 2023
[3]

Pediatric Medicine4, 7993081 (2021) https: //doi.org/10.21037/pm-20-106

Doherty, M., Foley, K.-J., Mckee, A., Sherwood, M., Pellicano, E.: Tackling healthcare access barriers for individuals with autism from diagnosis to adulthood. Pediatric Medicine4, 7993081 (2021) https: //doi.org/10.21037/pm-20-106

work page doi:10.21037/pm-20-106 2021
[4]

Autism 27(4), 935–948 (2023) https://doi.org/10.1177/13623613231159153

Guthrie, W., Wetherby, A.M., Woods, J., Schatschneider, C., Holland, R.D., Morgan, L., Lord, C.E.: The earlier the better: An RCT of treatment timing effects for toddlers on the autism spectrum. Autism 27(4), 935–948 (2023) https://doi.org/10.1177/13623613231159153

work page doi:10.1177/13623613231159153 2023
[5]

Journal of Clinical Medicine11(17), 5100 (2022) https://doi.org/10.3390/jcm11175100

Daniolou, S., Pandis, N., Znoj, H.: The efficacy of early interventions for children with autism spectrum disorders: A systematic review and meta-analysis. Journal of Clinical Medicine11(17), 5100 (2022) https://doi.org/10.3390/jcm11175100

work page doi:10.3390/jcm11175100 2022
[6]

Journal of Pediatrics 260, 113514 (2023) https://doi.org/10.1016/j.jpeds.2023.113514

Chen, Y.-H., Drye, M., Chen, Q., Fecher, M., Liu, G., Guthrie, W.: Delay from screening to diagnosis in autism spectrum disorder: Results from a large national health research network. Journal of Pediatrics 260, 113514 (2023) https://doi.org/10.1016/j.jpeds.2023.113514

work page doi:10.1016/j.jpeds.2023.113514 2023
[7]

Western Psychological Services (2012)

Lord, C., Rutter, M., DiLavore, P.C., Risi, S., Gotham, K., Bishop, S.L.: Autism diagnostic observation schedule, second edition (ADOS-2). Western Psychological Services (2012)

work page 2012
[8]

Ruan, M., et al.: Video-based contrastive learning on decision trees: from action recognition to autism diagnosis, 289–300 (2023) https://doi.org/10.1145/3587819.3590963

work page doi:10.1145/3587819.3590963 2023
[9]

Paul, L., Wang, S., Li, X.: Can micro- expressions be used as a biomarker for autism spectrum disorder? Frontiers in Neuroinformatics18, 1435091 (2024)

Ruan, M., Zhang, N., Yu, X., Li, W., Hu, C., Webster, P.J., K. Paul, L., Wang, S., Li, X.: Can micro- expressions be used as a biomarker for autism spectrum disorder? Frontiers in Neuroinformatics18, 1435091 (2024)

work page 2024
[10]

IEEE Transactions on Affective Computing14(2), 1110–1124 (2022)

Zhang, N., Ruan, M., Wang, S., Paul, L., Li, X.: Discriminative few shot learning of facial dynamics in interview videos for autism trait classification. IEEE Transactions on Affective Computing14(2), 1110–1124 (2022)

work page 2022
[11]

arXiv preprint arXiv:2409.00664 (2024)

Yu, X., Ruan, M., Hu, C., Li, W., Paul, L.K., Li, X., Wang, S.: Video-based analysis reveals atypical social gaze in people with autism spectrum disorder. arXiv preprint arXiv:2409.00664 (2024)

work page arXiv 2024
[12]

Frontiers in Neuroinformatics19, 1647194 (2025)

Hu, C., Thrasher, J., Li, W., Ruan, M., Yu, X., Paul, L.K., Wang, S., Li, X.: Speech pattern disorders in verbally fluent individuals with autism spectrum disorder: a machine learning analysis. Frontiers in Neuroinformatics19, 1647194 (2025)

work page 2025
[13]

Autism Research10(3), 384–407 (2017)

Fusaroli, R., Lambrechts, A., Bang, D., Bowler, D.M., Gaigg, S.B.: Is voice a marker for autism spectrum disorder? a systematic review and meta-analysis. Autism Research10(3), 384–407 (2017)

work page 2017
[14]

Scientific reports11(1), 10968 (2021) 19

Salem, A.C., MacFarlane, H., Adams, J.R., Lawley, G.O., Dolata, J.K., Bedrick, S., Fombonne, E.: Evaluating atypical language in autism using automated language measures. Scientific reports11(1), 10968 (2021) 19

work page 2021
[15]

Autism Research15(7), 1288–1300 (2022)

MacFarlane, H., Salem, A.C., Chen, L., Asgari, M., Fombonne, E.: Combining voice and language features improves automated autism detection. Autism Research15(7), 1288–1300 (2022)

work page 2022
[16]

PLoS One15(3), 0229985 (2020)

Chojnicka, I., Wawer, A.: Social language in autism spectrum disorder: A computational analysis of sentiment and linguistic abstraction. PLoS One15(3), 0229985 (2020)

work page 2020
[17]

Handbook of autism and pervasive developmental disorders1, 335–364 (2005)

Tager-Flusberg, H., Paul, R., Lord, C.: Language and communication in autism. Handbook of autism and pervasive developmental disorders1, 335–364 (2005)

work page 2005
[18]

Journal of autism and developmental disorders21(2), 109–130 (1991)

Volden, J., Lord, C.: Neologisms and idiosyncratic language in autistic speakers. Journal of autism and developmental disorders21(2), 109–130 (1991)

work page 1991
[19]

Autism & developmental language impairments7, 23969415221105472 (2022)

Luyster, R.J., Zane, E., Wisman Weil, L.: Conventions for unconventional language: Revisiting a framework for spoken language features in autism. Autism & developmental language impairments7, 23969415221105472 (2022)

work page 2022
[20]

NPJ Digital Medicine8(1), 763 (2025)

Hu, C., Li, W., Ruan, M., Yu, X., Deshpande, S., Paul, L.K., Wang, S., Li, X.: Exploiting large language models for diagnosing autism associated language disorders and identifying distinct features. NPJ Digital Medicine8(1), 763 (2025)

work page 2025
[21]

Cell188(8), 2235–2248 (2025) https: //doi.org/10.1016/j.cell.2025.02.025

Stanley, J., Rabot, E., Reddy, S., Belilovsky, E., Mottron, L., Bzdok, D.: Large language models deconstruct the clinical intuition behind diagnosing autism. Cell188(8), 2235–2248 (2025) https: //doi.org/10.1016/j.cell.2025.02.025

work page doi:10.1016/j.cell.2025.02.025 2025
[22]

Nature642(8067), 442–450 (2025)

Tu, T., Schaekermann, M., Palepu, A., Saab, K., Freyberg, J., Tanno, R., Wang, A., Li, B., Amin, M., Cheng, Y.,et al.: Towards conversational diagnostic artificial intelligence. Nature642(8067), 442–450 (2025)

work page 2025
[23]

In: Proceedings of COLING (2025)

Fan, Z.,et al.: Ai hospital: Building a comprehensive medical multi-agent system. In: Proceedings of COLING (2025)

work page 2025
[24]

arXiv preprint arXiv:2405.08851 (2024)

Schmidgall, S., et al.: Agentclinic: A multimodal clinical diagnostic benchmark. arXiv preprint arXiv:2405.08851 (2024)

work page arXiv 2024
[25]

In: International Conference on Medical Image Computing and Computer- Assisted Intervention, pp

Almansoori, M., Kumar, K., Cholakkal, H.: Medagentsim: Self-evolving multi-agent simulations for realistic clinical interactions. In: International Conference on Medical Image Computing and Computer- Assisted Intervention, pp. 362–372 (2025). Springer

work page 2025
[26]

In: Proceedings of EMNLP (2024)

Wang, Z.,et al.: Patient-ψ: Representative patient simulation with age-specific cognitive models. In: Proceedings of EMNLP (2024)

work page 2024
[27]

arXiv preprint arXiv:2501.04567 (2025)

Kim, J., et al.: Psyche: A multi-faceted psychiatric assessment benchmark for llm agents. arXiv preprint arXiv:2501.04567 (2025)

work page arXiv 2025
[28]

In: Proceedings of EMNLP (2023)

Deng, Y.,et al.: Mind the gap: Dialogue planning under uncertainty. In: Proceedings of EMNLP (2023)

work page 2023
[29]

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

Choudhury, D., Williamson, S., Goli´ nski, A., Miao, N., Smith, F.B., Kirchhof, M., Zhang, Y., Rainforth, T.: Bed-llm: Intelligent information gathering with llms and bayesian experimental design. arXiv preprint arXiv:2508.21184 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

In: Proceedings of ACL (2024)

Liu, H.,et al.: DPDP: Dynamic programming for dialogue policy optimization. In: Proceedings of ACL (2024)

work page 2024
[31]

In: Advances in Neural Information Processing Systems (2024)

Ye, Z.,et al.: Uncertainty of thoughts: Uncertainty-aware planning enhances information seeking in large language models. In: Advances in Neural Information Processing Systems (2024)

work page 2024
[32]

In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp

Mo, S., Xin, M.: Tree of uncertain thoughts reasoning for large language models. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12742–12746 (2024). IEEE

work page 2024
[33]

Communications Medicine (2025) https://doi.org/10.1038/s43856-025-01283-x 20

Yu, H., Fan, L., Li, S., Zhou, J., Ma, Z., Tejedor-Grado, A.,et al.: Simulated patient systems pow- ered by large language model-based AI agents offer potential for transforming medical education. Communications Medicine (2025) https://doi.org/10.1038/s43856-025-01283-x 20

work page doi:10.1038/s43856-025-01283-x 2025
[34]

In: First Conference on Language Modeling (2024)

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J.,et al.: Autogen: Enabling next-gen llm applications via multi-agent conversations. In: First Conference on Language Modeling (2024)

work page 2024
[35]

Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Pro- ceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019) 21

work page 2019

[1] [1]

MMWR Surveillance Summaries72(2), 1–14 (2023) https://doi.org/10

Maenner, M.J., Warren, Z., Williams, A.R.,et al.: Prevalence and characteristics of autism spectrum disorder among children aged 8 years — Autism and developmental disabilities monitoring network, 11 sites, United States, 2020. MMWR Surveillance Summaries72(2), 1–14 (2023) https://doi.org/10. 15585/mmwr.ss7202a1

work page 2020

[2] [2]

Cognoa Waitlist Crisis Report (2023)

Cognoa: The State of Pediatric Autism Diagnosis in the U.S.: Gridlocks, Inequities and Missed Opportunities Persist. Cognoa Waitlist Crisis Report (2023). https://cognoa.com/waitlist-crisis-report/

work page 2023

[3] [3]

Pediatric Medicine4, 7993081 (2021) https: //doi.org/10.21037/pm-20-106

Doherty, M., Foley, K.-J., Mckee, A., Sherwood, M., Pellicano, E.: Tackling healthcare access barriers for individuals with autism from diagnosis to adulthood. Pediatric Medicine4, 7993081 (2021) https: //doi.org/10.21037/pm-20-106

work page doi:10.21037/pm-20-106 2021

[4] [4]

Autism 27(4), 935–948 (2023) https://doi.org/10.1177/13623613231159153

Guthrie, W., Wetherby, A.M., Woods, J., Schatschneider, C., Holland, R.D., Morgan, L., Lord, C.E.: The earlier the better: An RCT of treatment timing effects for toddlers on the autism spectrum. Autism 27(4), 935–948 (2023) https://doi.org/10.1177/13623613231159153

work page doi:10.1177/13623613231159153 2023

[5] [5]

Journal of Clinical Medicine11(17), 5100 (2022) https://doi.org/10.3390/jcm11175100

Daniolou, S., Pandis, N., Znoj, H.: The efficacy of early interventions for children with autism spectrum disorders: A systematic review and meta-analysis. Journal of Clinical Medicine11(17), 5100 (2022) https://doi.org/10.3390/jcm11175100

work page doi:10.3390/jcm11175100 2022

[6] [6]

Journal of Pediatrics 260, 113514 (2023) https://doi.org/10.1016/j.jpeds.2023.113514

Chen, Y.-H., Drye, M., Chen, Q., Fecher, M., Liu, G., Guthrie, W.: Delay from screening to diagnosis in autism spectrum disorder: Results from a large national health research network. Journal of Pediatrics 260, 113514 (2023) https://doi.org/10.1016/j.jpeds.2023.113514

work page doi:10.1016/j.jpeds.2023.113514 2023

[7] [7]

Western Psychological Services (2012)

Lord, C., Rutter, M., DiLavore, P.C., Risi, S., Gotham, K., Bishop, S.L.: Autism diagnostic observation schedule, second edition (ADOS-2). Western Psychological Services (2012)

work page 2012

[8] [8]

Ruan, M., et al.: Video-based contrastive learning on decision trees: from action recognition to autism diagnosis, 289–300 (2023) https://doi.org/10.1145/3587819.3590963

work page doi:10.1145/3587819.3590963 2023

[9] [9]

Paul, L., Wang, S., Li, X.: Can micro- expressions be used as a biomarker for autism spectrum disorder? Frontiers in Neuroinformatics18, 1435091 (2024)

Ruan, M., Zhang, N., Yu, X., Li, W., Hu, C., Webster, P.J., K. Paul, L., Wang, S., Li, X.: Can micro- expressions be used as a biomarker for autism spectrum disorder? Frontiers in Neuroinformatics18, 1435091 (2024)

work page 2024

[10] [10]

IEEE Transactions on Affective Computing14(2), 1110–1124 (2022)

Zhang, N., Ruan, M., Wang, S., Paul, L., Li, X.: Discriminative few shot learning of facial dynamics in interview videos for autism trait classification. IEEE Transactions on Affective Computing14(2), 1110–1124 (2022)

work page 2022

[11] [11]

arXiv preprint arXiv:2409.00664 (2024)

Yu, X., Ruan, M., Hu, C., Li, W., Paul, L.K., Li, X., Wang, S.: Video-based analysis reveals atypical social gaze in people with autism spectrum disorder. arXiv preprint arXiv:2409.00664 (2024)

work page arXiv 2024

[12] [12]

Frontiers in Neuroinformatics19, 1647194 (2025)

Hu, C., Thrasher, J., Li, W., Ruan, M., Yu, X., Paul, L.K., Wang, S., Li, X.: Speech pattern disorders in verbally fluent individuals with autism spectrum disorder: a machine learning analysis. Frontiers in Neuroinformatics19, 1647194 (2025)

work page 2025

[13] [13]

Autism Research10(3), 384–407 (2017)

Fusaroli, R., Lambrechts, A., Bang, D., Bowler, D.M., Gaigg, S.B.: Is voice a marker for autism spectrum disorder? a systematic review and meta-analysis. Autism Research10(3), 384–407 (2017)

work page 2017

[14] [14]

Scientific reports11(1), 10968 (2021) 19

Salem, A.C., MacFarlane, H., Adams, J.R., Lawley, G.O., Dolata, J.K., Bedrick, S., Fombonne, E.: Evaluating atypical language in autism using automated language measures. Scientific reports11(1), 10968 (2021) 19

work page 2021

[15] [15]

Autism Research15(7), 1288–1300 (2022)

MacFarlane, H., Salem, A.C., Chen, L., Asgari, M., Fombonne, E.: Combining voice and language features improves automated autism detection. Autism Research15(7), 1288–1300 (2022)

work page 2022

[16] [16]

PLoS One15(3), 0229985 (2020)

Chojnicka, I., Wawer, A.: Social language in autism spectrum disorder: A computational analysis of sentiment and linguistic abstraction. PLoS One15(3), 0229985 (2020)

work page 2020

[17] [17]

Handbook of autism and pervasive developmental disorders1, 335–364 (2005)

Tager-Flusberg, H., Paul, R., Lord, C.: Language and communication in autism. Handbook of autism and pervasive developmental disorders1, 335–364 (2005)

work page 2005

[18] [18]

Journal of autism and developmental disorders21(2), 109–130 (1991)

Volden, J., Lord, C.: Neologisms and idiosyncratic language in autistic speakers. Journal of autism and developmental disorders21(2), 109–130 (1991)

work page 1991

[19] [19]

Autism & developmental language impairments7, 23969415221105472 (2022)

Luyster, R.J., Zane, E., Wisman Weil, L.: Conventions for unconventional language: Revisiting a framework for spoken language features in autism. Autism & developmental language impairments7, 23969415221105472 (2022)

work page 2022

[20] [20]

NPJ Digital Medicine8(1), 763 (2025)

Hu, C., Li, W., Ruan, M., Yu, X., Deshpande, S., Paul, L.K., Wang, S., Li, X.: Exploiting large language models for diagnosing autism associated language disorders and identifying distinct features. NPJ Digital Medicine8(1), 763 (2025)

work page 2025

[21] [21]

Cell188(8), 2235–2248 (2025) https: //doi.org/10.1016/j.cell.2025.02.025

Stanley, J., Rabot, E., Reddy, S., Belilovsky, E., Mottron, L., Bzdok, D.: Large language models deconstruct the clinical intuition behind diagnosing autism. Cell188(8), 2235–2248 (2025) https: //doi.org/10.1016/j.cell.2025.02.025

work page doi:10.1016/j.cell.2025.02.025 2025

[22] [22]

Nature642(8067), 442–450 (2025)

Tu, T., Schaekermann, M., Palepu, A., Saab, K., Freyberg, J., Tanno, R., Wang, A., Li, B., Amin, M., Cheng, Y.,et al.: Towards conversational diagnostic artificial intelligence. Nature642(8067), 442–450 (2025)

work page 2025

[23] [23]

In: Proceedings of COLING (2025)

Fan, Z.,et al.: Ai hospital: Building a comprehensive medical multi-agent system. In: Proceedings of COLING (2025)

work page 2025

[24] [24]

arXiv preprint arXiv:2405.08851 (2024)

Schmidgall, S., et al.: Agentclinic: A multimodal clinical diagnostic benchmark. arXiv preprint arXiv:2405.08851 (2024)

work page arXiv 2024

[25] [25]

In: International Conference on Medical Image Computing and Computer- Assisted Intervention, pp

Almansoori, M., Kumar, K., Cholakkal, H.: Medagentsim: Self-evolving multi-agent simulations for realistic clinical interactions. In: International Conference on Medical Image Computing and Computer- Assisted Intervention, pp. 362–372 (2025). Springer

work page 2025

[26] [26]

In: Proceedings of EMNLP (2024)

Wang, Z.,et al.: Patient-ψ: Representative patient simulation with age-specific cognitive models. In: Proceedings of EMNLP (2024)

work page 2024

[27] [27]

arXiv preprint arXiv:2501.04567 (2025)

Kim, J., et al.: Psyche: A multi-faceted psychiatric assessment benchmark for llm agents. arXiv preprint arXiv:2501.04567 (2025)

work page arXiv 2025

[28] [28]

In: Proceedings of EMNLP (2023)

Deng, Y.,et al.: Mind the gap: Dialogue planning under uncertainty. In: Proceedings of EMNLP (2023)

work page 2023

[29] [29]

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

Choudhury, D., Williamson, S., Goli´ nski, A., Miao, N., Smith, F.B., Kirchhof, M., Zhang, Y., Rainforth, T.: Bed-llm: Intelligent information gathering with llms and bayesian experimental design. arXiv preprint arXiv:2508.21184 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

In: Proceedings of ACL (2024)

Liu, H.,et al.: DPDP: Dynamic programming for dialogue policy optimization. In: Proceedings of ACL (2024)

work page 2024

[31] [31]

In: Advances in Neural Information Processing Systems (2024)

Ye, Z.,et al.: Uncertainty of thoughts: Uncertainty-aware planning enhances information seeking in large language models. In: Advances in Neural Information Processing Systems (2024)

work page 2024

[32] [32]

In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp

Mo, S., Xin, M.: Tree of uncertain thoughts reasoning for large language models. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12742–12746 (2024). IEEE

work page 2024

[33] [33]

Communications Medicine (2025) https://doi.org/10.1038/s43856-025-01283-x 20

Yu, H., Fan, L., Li, S., Zhou, J., Ma, Z., Tejedor-Grado, A.,et al.: Simulated patient systems pow- ered by large language model-based AI agents offer potential for transforming medical education. Communications Medicine (2025) https://doi.org/10.1038/s43856-025-01283-x 20

work page doi:10.1038/s43856-025-01283-x 2025

[34] [34]

In: First Conference on Language Modeling (2024)

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J.,et al.: Autogen: Enabling next-gen llm applications via multi-agent conversations. In: First Conference on Language Modeling (2024)

work page 2024

[35] [35]

Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Pro- ceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019) 21

work page 2019