First, Do No Harm: AI Supervisor Scaffolds Novice Growth in Counselor Education

Bin Hu; Chen Xu; Jian Shen; Jieshuo Zhang; Juan Wang; Leyang Cui; Luyao Ji; Minqiang Yang; Qunxi Dong; Tian Lan

arxiv: 2508.09042 · v3 · submitted 2025-08-12 · 💻 cs.CL

First, Do No Harm: AI Supervisor Scaffolds Novice Growth in Counselor Education

Chen Xu , Zhenyu Lyu , Tian Lan , Yi Yang , Yu Ji , Luyao Ji , Jian Shen , Zhihua Wang

show 7 more authors

Leyang Cui Jieshuo Zhang Qunxi Dong Minqiang Yang Juan Wang Xiuling Liu Bin Hu

This is my paper

Pith reviewed 2026-05-18 23:18 UTC · model grok-4.3

classification 💻 cs.CL

keywords AI supervisioncounselor educationethical violationsnovice growthself-efficacysimulation trainingGRPO optimization

0 comments

The pith

An AI supervisor built with simulated ethical mistakes and growth-focused rewards helps novice counselors gain self-efficacy that carries into real client sessions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that an AI can supervise novice counselors by spotting subtle ethical violations in their responses, diagnosing them against professional standards, and giving feedback that explains the risk and suggests better alternatives. To create training data without relying on scarce real-world labels, the authors program a controllable AI novice to deliberately commit specific categories of ethical errors during simulated counseling exchanges, which automatically produces labeled examples for the supervisor. They optimize the supervisor not for giving correct answers but with a reward that checks whether its feedback actually causes a weaker novice model to improve its own performance. If this holds, beginning counselors could practice recognizing and correcting dangerous mistakes in a safe environment before working with vulnerable clients, reducing the chance of unnoticed harm during early training. A user study with real counseling students then measures whether the simulated training produces measurable increases in their confidence across key ethical competencies when applied to practice.

Core claim

The authors claim that a supervisor trained on the ETHICSCAFF dataset generated by a controllable AI novice and optimized via GRPO under the Novice Growth Reward produces better clinical metrics in guided novices than in unguided peers, sharpens its own ethical detection, and yields significant self-efficacy gains across all eight assessed competencies for novice counseling-psychology students, with the improvements transferring from simulation to real-world practice.

What carries the argument

The Novice Growth Reward (NGR) that scores a supervisor's feedback according to whether it produces measurable performance gains in a weaker novice model rather than according to surface-level correctness of the explanation.

If this is right

A novice guided by the supervisor outperforms an unguided peer on clinical metrics.
Teaching-oriented optimization via NGR further sharpens the supervisor's own ethical detection.
Novice counseling students show significant self-efficacy gains across all eight assessed competencies after receiving the AI feedback.
The benefits observed in simulation transfer to real-world clinical practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The controllable-mistake approach for creating synthetic supervision data could be extended to other high-stakes training settings where subtle errors carry serious consequences.
Longer-term studies could test whether the reported self-efficacy gains persist and produce observable reductions in ethical incidents during actual student-client sessions.
Widespread adoption might decrease reliance on scarce human supervisors during the earliest phases of counselor education.

Load-bearing premise

The predefined mistake categories that the controllable AI novice is programmed to enact must accurately reflect the distribution and subtlety of ethical violations that arise in actual clinical interactions with vulnerable clients.

What would settle it

A controlled study in which novice students later work with real clients and show no reduction in ethical violations or self-efficacy scores compared with an unguided control group would falsify the transfer of the scaffold.

read the original abstract

The most dangerous mistakes a novice counselor makes are not the obvious ones: they are utterances that sound caring while quietly violating professional ethics and leaving vulnerable clients less protected. We build an AI supervisor that does not replace novice counselors, but grows them-teaching them to internalize ethical violations they would otherwise never notice. What makes this supervisor non-trivial is not detection but teaching: it must locate the ethical-violating utterance, diagnose the ethical violation against APA principles, and deliver feedback that explains not just what went wrong, but why it is risky and how to respond differently. The core obstacle is that (1) ethical violations are by nature unlabeled in real clinical data, and (2) existing AI counselors trained only to match correct answers will never learn to teach. We resolve both at once: a controllable AI novice that intentionally enacts predefined mistake categories makes supervision labels a natural byproduct of generation, yielding ETHICSCAFF, a 9,915-instance human-in-the-loop dataset; and GRPO under a Novice Growth Reward (NGR) optimizes the supervisor not for answer correctness but for whether a weaker novice model actually improves after reading its explanation. Experiments show that a novice guided by our supervisor outperforms an unguided peer on clinical metrics, and that teaching-oriented optimization via NGR further sharpens the supervisor's own ethical detection. In a user study with novice counseling-psychology students, participants show significant self-efficacy gains across all eight assessed competencies after receiving AI supervisory feedback, demonstrating that the scaffold transfers from simulation to real-world practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds an AI supervisor by generating ethical mistakes on purpose with a controllable novice and optimizing via a growth reward so the novice actually improves, but the user-study evidence stays at self-reported gains.

read the letter

The one thing to know is that this work creates supervision labels by making an AI novice deliberately produce ethical violations in set categories, then trains the supervisor with GRPO under a Novice Growth Reward that scores how much a weaker model gets better after the feedback. That pipeline is the concrete step forward. Most prior AI counseling systems chase correct responses; here the target is teaching the novice to notice and avoid the subtle ethical slips that real data never labels. The human-in-the-loop dataset of nearly ten thousand instances follows directly from the controllable generation, which is a practical way around the unlabeled-data problem the authors flag. The shift to measuring supervisor quality by novice improvement rather than static accuracy also fits the teaching goal. The user study reports significant self-efficacy gains across eight competencies after AI feedback, which at least shows the scaffold can move student confidence in a controlled setting. The soft spot is the transfer claim. Self-efficacy is a pre/post self-report from students in simulation; it does not track actual counselor behavior, ethical decisions, or client outcomes in live sessions with vulnerable people. The stress-test note is right that this leaves the real-world inference unsupported, and the paper's own motivation about subtle violations makes the gap noticeable. Predefined mistake categories may also miss the messier distribution that occurs outside the lab. This is for researchers working on AI tools for professional training in psychology or adjacent fields. A reader who wants a worked example of growth-oriented RL for feedback systems could extract the NGR setup. It has a clear applied angle and enough technical grounding to deserve peer review, though the evaluation section will need more objective behavioral measures to hold up under scrutiny. I'd send it out with that flag.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ETHICSCAFF, a 9,915-instance human-in-the-loop dataset generated by a controllable AI novice that intentionally produces predefined ethical mistake categories, and trains an AI supervisor via GRPO under a Novice Growth Reward (NGR) that optimizes for measurable improvement in a weaker novice model rather than answer correctness alone. Experiments indicate that novices receiving supervisor feedback outperform unguided peers on clinical metrics, NGR further improves the supervisor's ethical detection, and a user study with novice counseling-psychology students reports significant self-efficacy gains across eight competencies, which the authors take as evidence that the scaffold transfers from simulation to real-world practice.

Significance. If the core results hold under stronger validation, the work offers a practical method for generating supervision signals in domains where ethical errors are unlabeled and high-stakes, by coupling synthetic mistake generation with a growth-oriented reward. The human-in-the-loop dataset construction and the explicit focus on teaching rather than detection constitute a clear technical contribution that could inform similar scaffolds in other professional training settings.

major comments (2)

[User Study] User Study section: The central claim that AI supervisory feedback demonstrates transfer 'from simulation to real-world practice' rests on pre/post self-efficacy self-reports collected from counseling-psychology students. Self-efficacy is a subjective measure that does not directly assess actual counselor utterances, ethical decision-making, or client outcomes in live sessions; the manuscript provides no objective behavioral metrics, control-group comparison, or real-client component, which leaves the transfer inference unsupported given the paper's own emphasis on subtle, unlabeled ethical violations with vulnerable clients.
[Dataset construction] Dataset construction / §3: The controllable AI novice relies on a fixed set of predefined mistake categories to produce supervision labels. No independent validation (e.g., expert review of category coverage or comparison against real clinical transcripts) is reported to confirm that these categories adequately capture the distribution and subtlety of ethical violations that arise in actual interactions, which is load-bearing for the claim that the resulting ETHICSCAFF dataset enables realistic supervision training.

minor comments (2)

[Abstract] Abstract and §4: The abstract states 'significant self-efficacy gains' without reporting the statistical test, p-values, effect sizes, or sample size; adding these details would improve interpretability of the user-study results.
[Method] Notation: The scaling factor in the Novice Growth Reward is listed among free parameters; an explicit equation or pseudocode for how the reward is computed from novice improvement would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which highlight important aspects of validation and evidence strength. We address each major comment below, providing clarifications and outlining planned revisions to improve the manuscript's rigor while preserving the core contributions of the ETHICSCAFF dataset and Novice Growth Reward.

read point-by-point responses

Referee: [User Study] User Study section: The central claim that AI supervisory feedback demonstrates transfer 'from simulation to real-world practice' rests on pre/post self-efficacy self-reports collected from counseling-psychology students. Self-efficacy is a subjective measure that does not directly assess actual counselor utterances, ethical decision-making, or client outcomes in live sessions; the manuscript provides no objective behavioral metrics, control-group comparison, or real-client component, which leaves the transfer inference unsupported given the paper's own emphasis on subtle, unlabeled ethical violations with vulnerable clients.

Authors: We agree that self-efficacy is a subjective measure and does not constitute direct evidence of changes in counselor behavior, ethical decision-making, or client outcomes. In counselor education research, however, self-efficacy is a standard, validated construct (rooted in Bandura's framework) used to evaluate training interventions because it predicts subsequent performance and is ethically feasible to assess with novices. Our user study was intentionally scoped as an initial demonstration with counseling-psychology students in a controlled setting, where real-client interactions would raise significant ethical barriers. We did not include a control group or objective utterance analysis in this phase due to resource and IRB constraints. We will revise the User Study section and Discussion to explicitly acknowledge these limitations, reframe the transfer claim as 'preliminary evidence suggestive of transfer' rather than a definitive demonstration, and add recommendations for future work incorporating behavioral coding or controlled trials. This addresses the concern without overstating the current results. revision: partial
Referee: [Dataset construction] Dataset construction / §3: The controllable AI novice relies on a fixed set of predefined mistake categories to produce supervision labels. No independent validation (e.g., expert review of category coverage or comparison against real clinical transcripts) is reported to confirm that these categories adequately capture the distribution and subtlety of ethical violations that arise in actual interactions, which is load-bearing for the claim that the resulting ETHICSCAFF dataset enables realistic supervision training.

Authors: The predefined mistake categories were systematically derived from the APA Ethical Principles of Psychologists and Code of Conduct, supplemented by iterative input from counseling psychology experts during the human-in-the-loop generation process to ensure clinical relevance. While we did not conduct a standalone expert validation study or direct comparison against de-identified real clinical transcripts (due to privacy regulations), the dataset construction involved multiple rounds of human review and refinement to align generations with realistic novice errors. We will revise §3 to provide greater detail on the category selection methodology, document the expert consultations, and add an explicit limitations subsection discussing potential gaps in coverage of rare or context-specific violations. This will enhance transparency and allow readers to better evaluate the dataset's realism. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper's methodology constructs a dataset via controllable AI novice with predefined mistake categories to generate supervision labels as a byproduct, then applies GRPO optimization under Novice Growth Reward defined explicitly as measured improvement in a separate weaker model. This constitutes an intentional RL design for teaching rather than a self-referential loop or fitted input renamed as prediction. The user study reports independent self-efficacy gains from novice counseling students as an external human evaluation. No self-definitional equations, load-bearing self-citations, uniqueness theorems from prior author work, or ansatz smuggling appear in the abstract or described steps. The derivation remains self-contained against the stated benchmarks and human-in-the-loop process.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested assumption that simulated ethical mistakes generated by a controllable AI novice are representative of real clinical errors; no independent evidence for this mapping is provided in the abstract. The NGR reward introduces at least one fitted component whose exact functional form is not specified.

free parameters (1)

Novice Growth Reward scaling factor
The reward that measures whether a weaker novice improves after feedback must contain at least one scaling or weighting parameter chosen to balance detection accuracy against growth signal.

axioms (1)

domain assumption Predefined mistake categories cover the space of APA ethical violations that novices actually produce in practice.
The dataset construction step relies on this coverage assumption to turn generated mistakes into reliable supervision labels.

pith-pipeline@v0.9.0 · 5860 in / 1392 out tokens · 28774 ms · 2026-05-18T23:18:43.971334+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mistake-driven multi-agent dialogue-feedback generation pipeline... MATE dataset... GRPO under a Novice Growth Reward

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.