pith. machine review for the scientific record. sign in

arxiv: 2605.02916 · v1 · submitted 2026-04-09 · 💻 cs.LG

Recognition: unknown

From Synthesis to Clinical Assistance: A Strategy-Aware Agent Framework for Autism Intervention based on Real Clinical Dataset

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:30 UTC · model grok-4.3

classification 💻 cs.LG
keywords autism spectrum disorderapplied behavior analysismulti-agent frameworkdialogue synthesissynthetic clinical datastrategy consistencysmall language modelsclinical decision support
0
0 comments X

The pith

ASDAgent uses an O-T-A-C reasoning loop and probabilistic child modeling to generate autism intervention dialogues that match human therapist strategies at nearly 80 percent consistency and improve small language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ASDAgent to solve data scarcity and strategic inconsistency in AI tools for autism intervention, where general LLMs often fail to follow Applied Behavior Analysis procedures despite fluent language. It splits the task into a DoctorAgent that makes every ABA step explicit through an Observe-Think-Act-Correct loop and a ChildAgent that adds realistic variability via probabilistic response modeling. Experiments show the resulting dialogues track human strategy distributions closely enough to serve as reliable training material. This matters because it turns limited clinical records into scalable datasets that can raise the performance of smaller, deployable models without requiring massive new human data collection.

Core claim

By equipping a DoctorAgent with an Observe-Think-Act-Correct reasoning loop to enforce explicit ABA execution and a ChildAgent with probabilistic behavior modeling to generate diverse, non-deterministic child responses, the ASDAgent framework produces intervention dialogues whose strategy distribution closely matches human therapists, measured by a KL divergence of 0.083 and nearly 80 percent strategic consistency with experts in real settings. The same synthetic data successfully transfers professional clinical knowledge into small language models, measurably raising their therapeutic capabilities.

What carries the argument

The dual-agent architecture that pairs the DoctorAgent's O-T-A-C reasoning loop for controllable ABA adherence with the ChildAgent's probabilistic behavior modeling to produce varied ASD responses.

If this is right

  • Synthetic dialogues generated by ASDAgent can be used to distill expert clinical knowledge into small language models, significantly enhancing their performance in autism intervention tasks.
  • The framework achieves nearly 80 percent strategic consistency with human experts, supporting its direct use for both large-scale data synthesis and real-time clinical decision support.
  • It overcomes data scarcity and homogeneity by producing controllable, high-fidelity ABA dialogues at scale.
  • Generated dialogues mirror human strategy distributions, enabling production of training sets that maintain clinical standards without further collection of sensitive patient records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-agent design could be adapted to create synthetic training data for other data-scarce behavioral or medical domains that require strict protocol adherence.
  • Embedding ASDAgent into therapy software might let therapists receive real-time strategy suggestions while sessions are underway.
  • Large synthetic datasets produced this way could serve as standardized benchmarks for evaluating new AI tools in pediatric behavioral health.
  • Deployment across multiple clinics would test whether the reported consistency levels hold when the underlying patient population differs from the original dataset.

Load-bearing premise

The O-T-A-C reasoning loop and probabilistic behavior modeling faithfully reproduce real clinical ABA procedures and the full range of non-deterministic ASD child responses without introducing systematic biases or artifacts not present in human data.

What would settle it

A blinded expert rating study in which practicing therapists evaluate ASDAgent-generated dialogues against matched real clinical transcripts for strategic fidelity and naturalness, or a controlled trial showing that small language models fine-tuned on the synthetic data produce no measurable gain in live intervention quality over models trained on limited human data alone.

Figures

Figures reproduced from arXiv: 2605.02916 by Chenyu Yan, Haifeng Li, Junhong Lai, Lin Yao, Shuzhong Lai, Wanlin Chen, Yanhao Yu, Yueming Wang.

Figure 1
Figure 1. Figure 1: An example of DOCTORAGENT performing Observe-Think-Action-Correct. In the Observe phase, DOCTORAGENT categorizes and interprets the child’s responses. In the Think phase, DOCTORAGENT per￾forms iterative, multi-round reasoning to determine ap￾propriate intervention strategies based on the observed information. After each Think step, DOCTORAGENT immediately enters the Act and Correct phase, generat￾ing a con… view at source ↗
Figure 2
Figure 2. Figure 2: An overview of our framework. ASDAGENT, for both Dialogue Synthesis and Real Autism Intervention. 2.2 Strategic Reasoning in Medical Agents The evolution of LLMs in healthcare is shift￾ing from passive knowledge retrieval to Agentic AI—systems (Wang et al., 2025b) capable of au￾tonomous planning, reasoning, and tool use. To overcome the "black box" nature of end-to-end gen￾eration, researchers have increas… view at source ↗
Figure 3
Figure 3. Figure 3: Human and LLM-based Preference Evaluation [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Human and LLM-based Scoring between Real Data and Synthetic data. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Win–Tie–Lose Comparison Between Model￾Generated and Human Doctor Responses Across Differ￾ent Evaluators. 5.3 Data Efficacy As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Evaluation on Real Intervention Dialogues. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training dynamics of Qwen3-4B during Su [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case Study in Dialogue Synthesis with Topic "Buy fruit" and the same Turns. The left side shows a real [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Case Study in Real Autism Intervention. The diagram above illustrates the intervention responses of [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative comparison between our O-T-A-C framework and the Tree-of-Thought (ToT) baseline. [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative case studies demonstrating the efficacy of the [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The standard workflow of Discrete Trial Training (DTT) derived from ABA literature (PACKET), illustrating the structured interaction cycle. Doctors can adjust their treatment strategies as needed, based on the actual intervention situation. too few exchanges fail to reflect the doctor’s intervention strategies adequately. • Dialogues focused on entities, such as story￾books or toys, were removed. The mode… view at source ↗
Figure 14
Figure 14. Figure 14: t-SNE scatter plot of macro topics across 10 [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Distribution of Key Metrics by Child Cate [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: t-SNE Clustering of Child Profiles The box plot and scatter plot are shown in the [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Distribution of Conversation Turns (Log [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Base prompt System Prompt ## Role Setting You are an experienced intervention therapist for children with autism, providing effective support to children based on the principles of Applied Behavior Analysis (ABA).Your goal is to help them improve their communication and social skills through thematic conversations. ## Intervention Principles 1. Please follow ABA principles during the conversation, combini… view at source ↗
Figure 19
Figure 19. Figure 19: ABA prompt [PITH_FULL_IMAGE:figures/full_fig_p030_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Prompt for Strategy Labeling (Repetition/Mechanical imitation), a distinc￾tion critical for assessing ASD communicative progress. • Functional Hypothesis: The model infers the underlying motivation for the child’s behavior (e.g., Escape/Avoidance, Sensory Stimulation, or Access to Attention). • Internal State Estimation: The model quan￾tifies the child’s current psychological state by estimating discrete … view at source ↗
Figure 21
Figure 21. Figure 21: Prompt for DoctorAgent: Observe receives clear, unambiguous feedback, mirroring the Discrete Trial Training (DTT) protocol. The following are Strategy-Specific Guidelines: • Instruction: Focuses on generating clear, con￾cise commands tailored to the child’s lan￾guage level, stripping away unnecessary con￾versational filler. • Assistance: Differentiates between Half￾Assistance (providing moderate verbal cu… view at source ↗
Figure 22
Figure 22. Figure 22: Prompt for DoctorAgent: Think [PITH_FULL_IMAGE:figures/full_fig_p033_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Prompt for DoctorAgent: Act in Strategy Instruction [PITH_FULL_IMAGE:figures/full_fig_p034_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Prompt for DoctorAgent: Act in Strategy Half-Assistance [PITH_FULL_IMAGE:figures/full_fig_p034_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Prompt for DoctorAgent: Act in Strategy Full-Assistance [PITH_FULL_IMAGE:figures/full_fig_p035_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Prompt for DoctorAgent: Act in Strategy Other [PITH_FULL_IMAGE:figures/full_fig_p035_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Prompt for DoctorAgent: Act in Strategy Reinforcement [PITH_FULL_IMAGE:figures/full_fig_p036_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Prompt for ChildAgent: Act in Type Irrelevant Response [PITH_FULL_IMAGE:figures/full_fig_p037_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Prompt for ChildAgent: Act in Type Relevant Response [PITH_FULL_IMAGE:figures/full_fig_p037_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Prompt for ChildAgent: Act in Type Repetitive Response [PITH_FULL_IMAGE:figures/full_fig_p037_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Prompt for ToT [PITH_FULL_IMAGE:figures/full_fig_p038_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Prompt for LLM evaluation: Turing-like Test [PITH_FULL_IMAGE:figures/full_fig_p039_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Prompt for LLM evaluation: Scoring for Quality of dialogue synthesis [PITH_FULL_IMAGE:figures/full_fig_p040_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Prompt for LLM evaluation: Scoring for Clinical intervention effect [PITH_FULL_IMAGE:figures/full_fig_p041_34.png] view at source ↗
read the original abstract

The development of AI-assisted Early Intensive Behavioral Intervention (EIBI) for Autism Spectrum Disorder (ASD) is severely constrained by data scarcity. Furthermore, while Applied Behavior Analysis (ABA) serves as the gold standard for clinical intervention, general-purpose Large Language Models (LLMs) struggle to strictly adhere to its standardized procedures, often resulting in interactions that are linguistically fluent but strategically inconsistent. To address these challenges, we introduce \textsc{ASDAgent}, a strategy-aware framework designed to unify high-fidelity intervention dialogue synthesis and clinical decision support. \textsc{ASDAgent} incorporates two specialized components to solve distinct problems: (i) a \textsc{DoctorAgent} equipped with an Observe-Think-Act-Correct (O-T-A-C) reasoning loop, which resolves the issue of strategy collapse in LLMs by making ABA execution explicit and controllable; and (ii) a \textsc{ChildAgent} that utilizes probabilistic behavior modeling to mitigate data homogeneity, simulating diverse and non-deterministic ASD response patterns. Experiments demonstrate that dialogues generated by \textsc{ASDAgent} closely mirror the strategy distribution of human therapists (KL divergence: 0.083). In real autism intervention, \textsc{ASDAgent} achieves nearly 80\% strategic consistency with human experts. Moreover, we show that synthetic data produced by \textsc{ASDAgent} effectively distills professional clinical knowledge into small language models (SLMs), significantly enhancing their therapeutic capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript introduces ASDAgent, a strategy-aware agent framework for autism intervention consisting of a DoctorAgent with an Observe-Think-Act-Correct (O-T-A-C) reasoning loop and a ChildAgent with probabilistic behavior modeling. It claims that generated dialogues mirror human therapist strategy distributions with a KL divergence of 0.083, achieve nearly 80% strategic consistency with human experts in real interventions, and that synthetic data from the framework can distill clinical knowledge into small language models to enhance their therapeutic capabilities. The work is based on a real clinical dataset to address data scarcity in EIBI for ASD.

Significance. If the results hold after proper validation, the framework could meaningfully advance AI-assisted early intensive behavioral intervention by enabling high-fidelity synthetic dialogue generation and clinical decision support, directly tackling data scarcity and LLM strategy inconsistency in ABA-based autism therapy. Grounding in a real clinical dataset is a constructive element.

major comments (3)
  1. [Abstract] Abstract: the headline metrics (KL divergence 0.083 and ~80% strategic consistency) are presented without any description of the evaluation protocol, data splits, inter-rater reliability for the consistency measure, or controls for prompt-engineering effects, making it impossible to assess whether the mirroring claim is load-bearing or artifactual.
  2. [ChildAgent description] ChildAgent probabilistic behavior modeling: no calibration or validation is reported against real clinical ABA response statistics (e.g., strategy transition probabilities or child behavior distributions) from the dataset; the O-T-A-C loop and probabilistic parameters are therefore unanchored, so the reported distribution match could arise from modeling choices rather than clinical fidelity.
  3. [Experiments] Experiments section: the absence of details on how the free probabilistic transition parameters are set or validated against the clinical dataset leaves the central claim that ASDAgent dialogues 'closely mirror' human therapists without empirical grounding, directly undermining the distillation-to-SLM result.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment point by point below and indicate the revisions we will make to strengthen the manuscript's clarity and empirical grounding.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline metrics (KL divergence 0.083 and ~80% strategic consistency) are presented without any description of the evaluation protocol, data splits, inter-rater reliability for the consistency measure, or controls for prompt-engineering effects, making it impossible to assess whether the mirroring claim is load-bearing or artifactual.

    Authors: We agree that the abstract would benefit from additional context on the evaluation methods. In the revised manuscript, we will expand the abstract to briefly describe the evaluation protocol: the KL divergence is computed between strategy distributions in ASDAgent-generated dialogues and the real clinical dataset using the ABA strategy taxonomy; the ~80% consistency is measured via expert raters assessing alignment on held-out intervention cases, with inter-rater reliability reported (e.g., Cohen's kappa). We will also note that the O-T-A-C loop provides explicit structural control over strategy adherence, as shown in our ablation studies, distinguishing it from prompt engineering alone. These additions will be made while adhering to abstract length limits. revision: yes

  2. Referee: [ChildAgent description] ChildAgent probabilistic behavior modeling: no calibration or validation is reported against real clinical ABA response statistics (e.g., strategy transition probabilities or child behavior distributions) from the dataset; the O-T-A-C loop and probabilistic parameters are therefore unanchored, so the reported distribution match could arise from modeling choices rather than clinical fidelity.

    Authors: We acknowledge that the current description lacks sufficient detail on calibration. The probabilistic parameters in ChildAgent were derived from empirical statistics in the real clinical dataset, including observed child response patterns and strategy transition frequencies. In the revision, we will add a dedicated subsection detailing the estimation method (frequency counts from ABA sessions), example transition matrices, and validation against held-out data to confirm reproduction of real distributions. This will explicitly anchor the model in clinical data. revision: yes

  3. Referee: [Experiments] Experiments section: the absence of details on how the free probabilistic transition parameters are set or validated against the clinical dataset leaves the central claim that ASDAgent dialogues 'closely mirror' human therapists without empirical grounding, directly undermining the distillation-to-SLM result.

    Authors: We agree that the Experiments section needs expanded detail on parameter setting and validation. We will revise it to describe: (1) how transition parameters were set via maximum-likelihood estimation from the clinical dataset's observed sessions; (2) validation using KL divergence on strategy sequences with explicit train/test splits; and (3) ablations isolating the probabilistic component's contribution. These additions will provide empirical grounding for the mirroring claim. The SLM distillation results are supported by independent downstream evaluations of therapeutic task performance, which will be further contextualized in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results from empirical evaluation on held-out clinical data

full rationale

The paper defines ASDAgent via an O-T-A-C reasoning loop in DoctorAgent and probabilistic modeling in ChildAgent, both informed by a real clinical dataset. It then reports separate experimental outcomes (KL divergence 0.083, ~80% strategic consistency, and SLM distillation gains) as measured performance. No equations, parameter-fitting steps, or self-citations are shown that would make these metrics algebraically equivalent to the input dataset statistics or agent design choices by construction. The chain remains self-contained with external empirical checks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The framework rests on the domain assumption that ABA procedures are the correct target for fidelity and that probabilistic modeling can capture real child variability; two new agent entities are introduced without independent falsifiable handles outside the paper's own metrics.

free parameters (1)
  • probabilistic transition parameters in ChildAgent
    Used to generate diverse ASD response patterns; values must be set or learned to avoid homogeneity but are not specified in the abstract.
axioms (1)
  • domain assumption Applied Behavior Analysis is the gold standard for ASD intervention
    Invoked to justify the requirement that generated dialogues must strictly follow ABA strategies.
invented entities (2)
  • DoctorAgent with Observe-Think-Act-Correct loop no independent evidence
    purpose: Enforce explicit, controllable ABA strategy execution in LLM responses
    New component introduced to prevent strategy collapse; no external validation cited.
  • ChildAgent with probabilistic behavior modeling no independent evidence
    purpose: Simulate non-deterministic and diverse ASD child responses
    Invented to address data homogeneity; no independent clinical dataset or validation mentioned.

pith-pipeline@v0.9.0 · 5592 in / 1526 out tokens · 54781 ms · 2026-05-10T17:30:42.906708+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 5 canonical work pages · 4 internal anchors

  1. [1]

    Seed-asr: Understanding diverse speech and contexts with llm-based speech recognition,

    Some current dimensions of applied behav- ior analysis.Journal of applied behavior analysis, 1(1):91. Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, and et al Dong. 2024. Seed- asr: Understanding diverse speech and contexts with llm-based speech recognition.arXiv preprint arXiv:2407.04675. Ariane VS Buescher, Zuleyha Cidav, Martin K...

  2. [2]

    Journal of the American Academy of Child & Adoles- cent Psychiatry, 54(7):580–587

    Long-term outcomes of early intervention in 6-year-old children with autism spectrum disorder. Journal of the American Academy of Child & Adoles- cent Psychiatry, 54(7):580–587. Yi Feng, Mingyang Song, Jiaqi Wang, Zhuang Chen, Guanqun Bi, Minlie Huang, Liping Jing, and Jian Yu. 2025. Ss-gen: A social story generation frame- work with large language models...

  3. [3]

    GPT-4o System Card

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Sys- tems, 43(2):1–55. Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Alek- sander M ˛ adry, Alex Baker-Whitcomb, Alex Beu- tel,...

  4. [4]

    Other requirements relating to uses and disclo- sures of protected health information. OpenAI. 2025. Gpt-5.1. https://platform.openai. com/docs/models. EBP BRIEF PACKET. Discrete trial training. Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. 2002. Bleu: a method for automatic evalu- ation of machine translation. InProceedings of the 40th an...

  5. [5]

    Towards Understanding Sycophancy in Language Models

    Applied behavior analysis as treatment for autism spectrum disorder.The Journal of pediatrics, 175:27–32. Till Scholich, Maya Barr, Shannon Wiltsey Stirman, and Shriti Raj. 2025. A comparison of responses from human therapists and large language model–based chatbots to assess therapeutic communication: Mixed methods study.JMIR Mental Health, 12(1):e69709....

  6. [6]

    Qwen2 Technical Report

    Qwen2 technical report.arXiv preprint arXiv:2407.10671. Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan

  7. [7]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao

    Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning repres...

  8. [8]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Improving diversity of commonsense gener- ation by large language models. InFindings of the Association for Computational Linguistics: EMNLP 2024. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Wein- berger, and Yoav Artzi. 2020. Bertscore: Evaluating text generation with bert. InInternational Confer- ence on Learning Representations. Yanzhao Zhang, Min...

  9. [9]

    Pay attention to the following four elements in the conversation: instruction, assistance, reinforcement, and acknowledgment

    Please follow ABA principles during the conversation, combining Discrete Trial Training (DTT) and Natural Environment Teaching (NET) principles. Pay attention to the following four elements in the conversation: instruction, assistance, reinforcement, and acknowledgment. - Instruction - Provide clear and simple instructions to guide the child in a conversa...

  10. [10]

    When the child responds correctly, reinforcement should be given; when the child does not respond, appropriate prompting should be given to facilitate a correct response; when the child responds incorrectly, do not reinforce the incorrect response, reissue the instruction or provide appropriate prompting to facilitate a correct response

  11. [11]

    clauses,

    Please maintain a warm and friendly tone, show empathy, and affirm and praise the child's responses. Ensure the conversation is natural and concise so that the child can easily understand it. ## Starting the Conversation Now, you will have a thematic conversation with a child with autism. The topic is: {Topic} . Please start the conversation around this t...

  12. [12]

    Dialogue History (Context): The previous dialogue context

  13. [13]

    ## Strategy Definitions Please classify the text into one of the following 5 strategies:

    Doctor's Response (Target): The complete utterance spoken by the doctor in the current turn. ## Strategy Definitions Please classify the text into one of the following 5 strategies:

  14. [14]

    Reinforcement: Providing timely positive feedback and praise to encourage correct and positive behavior

  15. [15]

    Instruction: Providing clear and simple instructions to guide the child's conversation around the topic

  16. [16]

    Half-Assistance: Providing moderate verbal support when the child needs help to facilitate a correct response

  17. [17]

    Full-Assistance: Providing complete verbal support when the child needs help to facilitate a correct response

  18. [18]

    ## Critical Constraints

    Other: Emotional acceptance, repeating the child's words (without teaching intent), small talk, interjections, greetings, or situations that connect the context but do not fall under the above four strategies. ## Critical Constraints

  19. [19]

    - Strictly prohibited: Modifying punctuation, adding or deleting spaces, or rewriting vocabulary

    Lossless Reconstruction: The concatenation of all `content` fields in the output `segments` list must be strictly equal tothe input `doctor_response`. - Strictly prohibited: Modifying punctuation, adding or deleting spaces, or rewriting vocabulary. - Required: All characters of the original text must be preserved

  20. [20]

    Granularity: Please segment based on semantics or punctuation

  21. [21]

    strategy collapse

    Reasonable Segmentation: Multiple sentences are generally connected by periods, question marks, or exclamation points; therefore, sentences cannot be segmented at the position of a comma! The last punctuation mark of the sentence in the `content` field of the output `segments` list cannot be a comma.. Figure 20: Prompt for Strategy Labeling (Repetition/Me...

  22. [22]

    Important Note: If the child repeats the core noun mentioned by the doctor to indicate confirmation, attention, or an answer, this is considered a Related Response

    Related Response : The child understands the instruction and provides a contextually appropriate response. Important Note: If the child repeats the core noun mentioned by the doctor to indicate confirmation, attention, or an answer, this is considered a Related Response

  23. [23]

    That is, the child mechanically and unconsciously repeats the doctor's entire sentence or the last phrase, usually with an interrogative tone or without understanding the meaning

    Repetition : This behavior usually refers to echolalia . That is, the child mechanically and unconsciously repeats the doctor's entire sentence or the last phrase, usually with an interrogative tone or without understanding the meaning. ## Analysis Dimensions Please analyze from the following three dimensions:

  24. [24]

    Response Quality : - What is the relationship between the child's response and the doctor's previous statement? - Use this to determine the type of child's response? (Unrelated response/Related response/No response/Repetition)

  25. [25]

    Behavioral Function Hypothesis : - Why did the child have this reaction? - Common functions: Access to tangible items/attention, Escape/avoidance of tasks, Self-stimulation (Sensory)

  26. [26]

    I" and "You

    State Inference : - Stress : Low (Calm) / Medium (Anxious) / High (On the verge of breakdown). - Engagement : High (Focused) / Medium (Distracted) / Low (Disengaged). Figure 21: Prompt for DoctorAgent: Observe receives clear, unambiguous feedback, mirroring the Discrete Trial Training (DTT) protocol. The following are Strategy-Specific Guidelines: • Instr...

  27. [27]

    Please follow ABA principles in the analysis and decision-making process, combining Discrete Trial Training (DTT) and Natural Environment Teaching (NET) principles, paying attention to the following five strategies: Instruction, Half-Assistance, Full-Assistance, Reinforcement, and Other

  28. [28]

    Half-Assistance - Provide moderate verbal support when the child needs help to facilitate the correct response

    Definitions of the five strategies: Instruction - Provide clear and simple instructions to guide the child in a conversation around the topic. Half-Assistance - Provide moderate verbal support when the child needs help to facilitate the correct response. Full-Assistance - Provide complete verbal support when the child needs help to facilitate the correct ...

  29. [29]

    - When the child repeats a response, do not accept or reinforce it, reissue the instruction or give appropriate partial or fullprompting to facilitate a correct response

    Principles for handling different types of child responses: - When the child responds correctly (relevant answer), reinforcement should be given first; - When the child does not respond, appropriate partial or full prompting should be given to facilitate a correct response; - When the child responds incorrectly (irrelevant answer), accept the child's sema...

  30. [30]

    neutral acceptance

    Based on the child's current response type, follow the following logic chain for decision-making: **Scenario: Child answers incorrectly or irrelevantly** Logic chain: - **Step 1 (Consider whether neutral acceptance is needed)**: If you haven't taken any action yet, consider whether you need to accept the child's response to avoid direct negation. If not n...

  31. [31]

    Please follow ABA principles in the analysis and decision-making process, combining Discrete Trial Training (DTT) and Natural Environment Teaching (NET) principles

  32. [32]

    - Generally, use questions rather than statements

    Instruction - Provide clear and simple instructions to guide the child to engage in conversation around the topic. - Generally, use questions rather than statements. - Questions should be related to the topic and directed to the child; statements should provide clear and simple instructions

  33. [36]

    Figure 23: Prompt for DoctorAgent: Act in Strategy Instruction System Prompt ## Role Setting You are a professional ABA therapist

    Your responses must be consistent with the child's actual information, such as their age and language development stage. Figure 23: Prompt for DoctorAgent: Act in Strategy Instruction System Prompt ## Role Setting You are a professional ABA therapist. You have just completed a deep logical thinking process (Think), and now you need to execute the strategy...

  34. [38]

    - This generally involves using questions rather than statements

    Half-Assistance - Provide moderate verbal support when the child needs help to facilitate correct responses. - This generally involves using questions rather than statements. - Questions should be centered around the topic and provide moderate verbal assistance to the child; statements should provide moderate verbal support to encourage correct responses ...

  35. [42]

    Figure 24: Prompt for DoctorAgent: Act in Strategy Half-Assistance System Prompt ## Role Setting You are a professional ABA therapist

    Your responses must be consistent with the child's actual information, such as their age and language development stage. Figure 24: Prompt for DoctorAgent: Act in Strategy Half-Assistance System Prompt ## Role Setting You are a professional ABA therapist. You have just completed a deep logical thinking process (Think), and now you need to execute the stra...

  36. [44]

    - This usually involves declarative sentences

    Full-Assistance - Provide complete verbal support when the child needs help, to facilitate correct responses. - This usually involves declarative sentences. - Declarative sentences provide complete verbal support, helping the child understand and respond correctly

  37. [48]

    Figure 25: Prompt for DoctorAgent: Act in Strategy Full-Assistance System Prompt ## Role Setting You are a professional ABA therapist

    Your responses must be consistent with the child's actual information, such as their age and language development stage. Figure 25: Prompt for DoctorAgent: Act in Strategy Full-Assistance System Prompt ## Role Setting You are a professional ABA therapist. You have just completed a deep logical thinking process (Think), and now you need to execute the stra...

  38. [50]

    - The response must be a declarative sentence

    Other - Accept children's semantically irrelevant answers or responses that connect to the context but do not fall under the four strategies mentioned above. - The response must be a declarative sentence. - The declarative sentence should acknowledge the child's answer and express understanding and attention to it

  39. [54]

    Reinforcement

    Your responses must be consistent with the child's actual information, such as their age and language development stage. Figure 26: Prompt for DoctorAgent: Act in Strategy Other System Prompt ## Role Setting You are a professional ABA therapist. You have just completed a deep logical thinking process (Think), and now you need to execute the strategy **Rei...

  40. [55]

    Please follow ABA principles in the analysis and decision-making process, combining Discrete Trial Training (DTT) and NaturalEnvironment Teaching (NET) principles

  41. [56]

    - The feedback must be in the form of a declarative or exclamatory sentence

    Reinforcement - Provide timely positive feedback and praise to encourage correct and positive behavior. - The feedback must be in the form of a declarative or exclamatory sentence. - The declarative or exclamatory sentence should clearly express recognition and encouragement of the child's correct response

  42. [57]

    - When the child repeats the response, do not accept or reinforce, reissue the instruction or provide appropriate partial or full assistance to facilitate a correct response

    Principles for handling different types of child responses: - When the child responds correctly (relevant answer), reinforcement should be given first; - When the child does not respond, appropriate partial or full assistance should be given first to facilitate a correct response; - When the child responds incorrectly (irrelevant answer), first accept the...

  43. [58]

    When executing one strategy, do not mix in content from other strategies, emphasizing atomic operations

  44. [59]

    You need to consider the conversation topic and the historical conversation content to ensure your response is relevant and coherent within the context!

  45. [60]

    Figure 27: Prompt for DoctorAgent: Act in Strategy Reinforcement accurately reflecting the rigid behavioral patterns observed in ASD

    Your responses must be consistent with the child's actual information, such as their age and language development stage. Figure 27: Prompt for DoctorAgent: Act in Strategy Reinforcement accurately reflecting the rigid behavioral patterns observed in ASD. J.7 Prompt for ToT J.8 Prompt for LLM evaluation System Prompt ## Character Profile You are a {age}-ye...

  46. [61]

    I" and "you,

    Based on the doctor's input, you need to generate an **Irrelevant Response**. Irrelevant responses include the following (you can choose freely, but you must specify the type in the output JSON): - **Incorrect answer**: Although the answer is related to the doctor's question, the answer is clearly wrong. - **Completely irrelevant to the doctor's instructi...

  47. [62]

    Your response must strictly adhere to your character profile, including: name, gender, age, language development age, and the conversational style from the previous dialogue history. Figure 28: Prompt for ChildAgent: Act in Type Irrelevant Response System Prompt ## Character Profile You are a {age}-year-old autistic child named {name} , and your gender is...

  48. [63]

    You need to generate a **Relevant Response** based on the doctor's input language. The relevant response should include the following content, which you can choose freely, but you must specify the type in the output JSON: - **Correct Answer**: A simple, direct, and possibly memorized correct answer that matches the doctor's instructions or reply. If langu...

  49. [64]

    Your response must strictly adhere to your character profile, including: name, gender, age, language development age, and the conversational style from the previous dialogue history. Figure 29: Prompt for ChildAgent: Act in Type Relevant Response System Prompt ## Character Profile You are a {age}-year-old autistic child named {name} , and your gender is {...

  50. [65]

    The repetitive response should include the following content, which you can choose freely, but you must specify the type in the output JSON: - **Mimicry**: Mimicry

    You need to generate a **Repetitive Response** response based on the doctor's input language. The repetitive response should include the following content, which you can choose freely, but you must specify the type in the output JSON: - **Mimicry**: Mimicry. Repeat the last few words of the doctor's instructions or reply. - **Self-repetition**: Not repeat...

  51. [66]

    Your response must strictly adhere to your character profile, including: name, gender, age, language development age, and the conversational style from the previous dialogue history. Figure 30: Prompt for ChildAgent: Act in Type Repetitive Response System Prompt ## Role Setting You are an experienced interventional therapist for children with autism, prov...

  52. [67]

    Please follow the ABA principles in your analysis and decision-making process, combining the Directed Teaching (DTT) principle and the Contextual Teaching (NET) principle, paying attention to the following five strategies: Instruction, Half-Assistance, Full-Assistance, Reinforcement, and Other

  53. [68]

    * Half-Assistance - Provide appropriate verbal support when the child needs help to promote correct responses

    Definitions of the five strategies: * Instruction - Provide clear and simple instructions to guide the child in conversation around the topic. * Half-Assistance - Provide appropriate verbal support when the child needs help to promote correct responses. * Full-Assistance - Provide full verbal support when the child needs help to promote correct responses....

  54. [69]

    Child response types include: Relevant Responses / Irrelevant Responses / Repetition / No Response

  55. [70]

    the child's reaction

    Principles for Handling Different Types of Children's Reactions: - When a child responds correctly (relevantly), reinforcement should be given first. - When a child does not respond, appropriate semi- or full support should be provided to encourage a correct response. - When a child responds incorrectly (irrelevantly), accept the child's semantically inco...

  56. [71]

    Do not repeat the same strategy already executed in this round; if the previous action was a command, wait for the child's response first

  57. [72]

    Reason based on observation and history, not on unfounded assumptions. Figure 31: Prompt for ToT System Prompt ## Role Setting You are a professional reviewer in the field of child rehabilitation and ABA intervention, tasked with comparing the quality of two doctor-ASD child dialogues. ## Task Requirements

  58. [73]

    Your judgment should be based on clinical professionalism, language quality, and safety, not on dialogue length or superficial politeness

  59. [74]

    ## Strategy Definitions

    Please focus on evaluating the teaching logic and interaction quality, ignoring minor speech imperfections or grammatical errors in the real recording transcripts. ## Strategy Definitions

  60. [75]

    of higher overall quality and more suitable for real clinical intervention

    Choose the dialogue you believe is "of higher overall quality and more suitable for real clinical intervention."

  61. [76]

    If the two dialogues are very similar in quality, you may choose “Tie”

  62. [77]

    ## Review Dimensions ### A

    Provide detailed reasons based on the review dimensions, listing key points. ## Review Dimensions ### A. Professionalism A1. Dialogue Principles Does the dialogue follow the principles of Discrete Trial Training (DTT) or Natural Environment Teaching (NET) in Applied Behavior Analysis (ABA)? A2. Instructions Are the doctor's instructions clear and concise,...

  63. [78]

    Evaluate three different intervention dialogues between children with ASD and doctors

  64. [79]

    seems friendly

    Please remain objective, neutral, and strictly adhere to the scoring criteria. Do not relax the scoring standards because the dialogue content "seems friendly."

  65. [80]

    ## Scoring Method - Each item is scored individually, ranging from 0 to 4 points

    Provide detailed reasons for your scores based on the scoring dimensions, listing key points. ## Scoring Method - Each item is scored individually, ranging from 0 to 4 points. - Please strictly follow the definitions below for scoring: 0 points: None of the dialogues meet the requirements for this item. 1 point: A small portion of the dialogues meet the r...

  66. [81]

    evaluate the quality of the dialogue between the ASD child and the doctor

    Your task is not to generate dialogue, but to "evaluate the quality of the dialogue between the ASD child and the doctor" and assign a score. Please remain objective, neutral, and strictly adhere to the scoring criteria. Do not relax the scoring standards simply because the dialogue "seems friendly."

  67. [82]

    ## Scoring Method - Each item is scored separately, ranging from 0 to 4 points

    Provide detailed reasons for your score based on the scoring dimensions, listing key points. ## Scoring Method - Each item is scored separately, ranging from 0 to 4 points. - Please strictly follow the definitions below for scoring: 0 points: None of the doctor's dialogue meets the requirements for this item. 1 point: A small portion of the doctor's dialo...