CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

Heajun An; Jin-Hee Cho; Qi Zhang; Vedanth Achanta

arxiv: 2605.21609 · v1 · pith:ZDE3AW4Unew · submitted 2026-05-20 · 💻 cs.CL · cs.AI· cs.CY

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

Heajun An , Qi Zhang , Vedanth Achanta , Jin-Hee Cho This is my paper

Pith reviewed 2026-05-22 09:18 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY

keywords adolescent LLM safetyresponse rewritingguardrailsCR4Trefusal avoidancedevelopmental alignmentAI safetyselective reconstruction

0 comments

The pith

Targeted rewriting turns unsafe or refusal-style LLM outputs into age-appropriate guidance for adolescents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that safety mechanisms for large language models interacting with teenagers work better when they transform problematic responses instead of refusing them outright. It presents CR4T as a way to detect risks lightly and then rewrite outputs so they remove harmful elements, avoid shutting down conversations, and add suitable guidance based on developmental needs. This matters because refusal approaches can leave teens without support and create frustration, whereas reconstruction keeps the helpful core while aligning with age-specific vulnerabilities. Tests indicate the method cuts unsafe and overly refusey results without touching normal exchanges. If this holds, it points to a more constructive path for keeping AI safe yet usable in youth settings.

Core claim

CR4T is a model-agnostic framework that pairs lightweight risk detection with domain-conditioned rewriting to selectively reconstruct unsafe or refusal-style outputs into age-appropriate, guidance-oriented responses while preserving benign intent, thereby reducing unsafe and refusal-oriented outcomes without unnecessary intervention on acceptable interactions.

What carries the argument

The CR4T critique-and-revise process, which detects risk-amplifying content and reconstructs it into developmentally aligned guidance.

If this is right

Conversations with adolescents can continue productively instead of ending in refusals.
Safety can shift from suppression to guided transformation while keeping user intent intact.
Developmental considerations can be built directly into response handling for teen users.
The approach works across different base models without requiring retraining or fine-tuning.
Fewer conversational dead-ends may support more sustained and positive AI interactions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This style of rewriting might build greater ongoing trust in AI tools among adolescent users by avoiding abrupt blocks.
Live deployment in apps could test whether the method improves both safety metrics and user engagement over time.
Similar reconstruction techniques could extend to other user groups with specific sensitivity needs.
Pairing the system with ongoing user feedback might allow further tailoring of guidance to individual contexts.

Load-bearing premise

Lightweight risk detection can reliably flag unsafe or refusal-style outputs and rewriting can convert them into suitable guidance without introducing inaccuracies or altering benign intent.

What would settle it

Human review of a sample of original and CR4T-rewritten responses showing either new factual errors, lost original meaning, or missed unsafe content that the system failed to catch.

Figures

Figures reproduced from arXiv: 2605.21609 by Heajun An, Jin-Hee Cho, Qi Zhang, Vedanth Achanta.

**Figure 1.** Figure 1: Overview of the CR4T framework. The pipeline first performs adolescent-specific domain classification and generates [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: CR4T transforms unsafe and refusal-oriented responses into safe, constructive, and developmentally aligned guidance. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CR4T pushes rewrite-based guardrails over refusal for teen LLMs but the abstract leaves the experiments uncheckable.

read the letter

The main point is that this paper frames adolescent LLM safety as a rewrite problem instead of a refusal problem. CR4T uses lightweight detection to spot unsafe or overly blocked outputs, then reconstructs them into age-appropriate guidance while trying to keep the original intent. That shift from adult-style suppression to developmentally tuned transformation is the clearest new angle here, and it directly addresses how refusal can create dead-ends in real teen conversations about advice or sensitive topics. The model-agnostic claim and the selective approach are practical strengths if they scale. The paper does a solid job laying out why current guardrails fall short for this age group and why a socio-technical fix makes sense. On the downside, the abstract states that experiments show reduced unsafe and refusal outcomes without unnecessary intervention, yet it gives no methods, datasets, metrics, baselines, or controls. That leaves the central claim hard to evaluate and puts heavy weight on the untested assumption that detection and rewriting can stay reliable and accurate in adolescent contexts. Without those details the results stay promotional rather than demonstrative. This is for people building or studying safety layers for youth-facing AI systems. A reader already working on guardrails or developmental alignment could pull useful framing from it, but anyone wanting reproducible evidence would need the full methods and data. It deserves peer review so the experiments can be examined properly rather than desk-rejected on the abstract alone.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CR4T (Critique-and-Revise-for-Teenagers), a model-agnostic framework for adolescent LLM safety. It reframes safety as a socio-technical transformation problem rather than refusal-based filtering, using lightweight risk detection combined with domain-conditioned rewriting to convert unsafe or refusal-style outputs into age-appropriate, guidance-oriented responses while preserving benign intent. The central claim is that this selective reconstruction substantially reduces unsafe and refusal-oriented outcomes without unnecessary intervention on acceptable interactions, supported by experimental results.

Significance. If the experimental claims are substantiated with rigorous controls and metrics, the work could meaningfully advance LLM safety research by shifting focus from suppression to constructive, developmentally aligned guidance. This offers a potential alternative to current refusal-centric guardrails and emphasizes human-centered design for vulnerable user groups, with possible broader applicability to other sensitive domains.

major comments (2)

[Abstract] The abstract states that 'Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions,' but provides no description of the datasets, evaluation metrics, baselines, controls, or statistical analysis used. This absence makes the central empirical claim impossible to verify and is load-bearing for the paper's conclusions.
[Introduction / Proposed Method] The framework relies on the assumption that lightweight risk detection can reliably identify unsafe or refusal-style outputs and that domain-conditioned rewriting can transform them without introducing new inaccuracies or losing benign intent. No details are given on how the detector or rewriter are trained, validated, or evaluated for false positives/negatives in adolescent contexts.

minor comments (2)

[Abstract] The term 'age-appropriate' is used repeatedly but not operationalized with specific developmental guidelines or references to adolescent psychology literature.
[Proposed Method] Clarify whether CR4T is intended as a post-hoc filter or integrated into the generation process, as this affects reproducibility and deployment considerations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed the major comments carefully and provide point-by-point responses below. Revisions have been made to address the concerns regarding the presentation of empirical claims and methodological details.

read point-by-point responses

Referee: [Abstract] The abstract states that 'Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions,' but provides no description of the datasets, evaluation metrics, baselines, controls, or statistical analysis used. This absence makes the central empirical claim impossible to verify and is load-bearing for the paper's conclusions.

Authors: We acknowledge that the abstract's brevity omits explicit references to the evaluation protocol. The Experiments section details the adolescent query dataset (curated from public interaction logs with age-appropriate filtering), metrics (unsafe response rate, refusal rate, intent preservation, and guidance quality scores), baselines (standard refusal guardrails and no-intervention controls), and statistical analysis (paired t-tests with reported p-values and effect sizes). To improve verifiability without expanding the abstract excessively, we have added a single sentence summarizing the evaluation framework and key controls. revision: yes
Referee: [Introduction / Proposed Method] The framework relies on the assumption that lightweight risk detection can reliably identify unsafe or refusal-style outputs and that domain-conditioned rewriting can transform them without introducing new inaccuracies or losing benign intent. No details are given on how the detector or rewriter are trained, validated, or evaluated for false positives/negatives in adolescent contexts.

Authors: The Proposed Method section describes the detector as a lightweight fine-tuned classifier and the rewriter as a domain-conditioned prompt-based module grounded in adolescent developmental principles. We agree that explicit training, validation, and error analysis details would strengthen the claims. We have inserted a dedicated subsection reporting the training corpus (synthetic adolescent queries plus expert-annotated examples), validation procedure (5-fold cross-validation), and false positive/negative rates evaluated against adolescent psychology expert annotations, including a confusion matrix and discussion of edge cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework is operational and externally benchmarked

full rationale

The paper proposes the CR4T framework as a socio-technical approach combining lightweight risk detection and domain-conditioned rewriting to transform unsafe or refusal-style LLM outputs into age-appropriate guidance. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the abstract or described structure. The central claims rest on experimental results rather than internal definitions or reductions to prior self-authored uniqueness theorems. The derivation chain is therefore self-contained against external benchmarks, with the reported outcomes presented as falsifiable observations rather than tautological restatements of the method.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on domain assumptions about adolescent developmental needs and the feasibility of accurate risk detection plus rewriting; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption Adolescent users have distinct developmental vulnerabilities requiring age-appropriate guidance rather than refusal or suppression.
Invoked to argue against adult-centric norms and for the transformation problem framing in the abstract.

pith-pipeline@v0.9.0 · 5746 in / 1229 out tokens · 46751 ms · 2026-05-22T09:18:42.449078+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 7 internal anchors

[1]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page
[2]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page
[3]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980
[4]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page
[5]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984
[6]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page
[7]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page
[8]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page
[9]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page
[10]

2017 , eprint=

Attention Is All You Need , author=. 2017 , eprint=

work page 2017
[11]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page
[12]

2025 , institution=

How people use chatgpt , author=. 2025 , institution=

work page 2025
[13]

Teens, Social Media and AI Chatbots 2025 , year =

work page 2025
[14]

Artificial intelligence review , volume=

Safeguarding large language models: A survey , author=. Artificial intelligence review , volume=. 2025 , publisher=

work page 2025
[15]

Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security , pages=

YouthSafe: A Youth-Centric Safety Benchmark and Safeguard Model for Large Language Models , author=. Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security , pages=

work page 2025
[16]

Advances in neural information processing systems , volume=

Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms , author=. Advances in neural information processing systems , volume=

work page
[17]

LLM safety for children , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track) , pages=

work page 2025
[18]

arXiv preprint arXiv:2510.05484 , year=

Evaluating llm safety across child development stages: A simulated agent approach , author=. arXiv preprint arXiv:2510.05484 , year=

work page arXiv
[19]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Twenty-First Symposium on Usable Privacy and Security (SOUPS 2025) , pages=

Youth-Centered GAI Risks (YAIR): A Taxonomy of Generative AI Risks from Empirical Data , author=. Twenty-First Symposium on Usable Privacy and Security (SOUPS 2025) , pages=

work page 2025
[21]

Proceedings of the 41st International Conference on Machine Learning , pages=

RigorLLM: resilient guardrails for large language models against undesired content , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page
[22]

Learning, Media and Technology , volume=

‘No, Alexa, no!’: designing child-safe AI and protecting children from the risks of the ‘empathy gap’in large language models , author=. Learning, Media and Technology , volume=. 2025 , publisher=

work page 2025
[23]

Proceedings of the 24th Interaction Design and Children , pages=

Parents’ perceptions about the use of generative AI systems by adolescents , author=. Proceedings of the 24th Interaction Design and Children , pages=

work page
[24]

Youth & Society , volume=

Safety in cyberspace: Adolescents' safety and exposure online , author=. Youth & Society , volume=. 2006 , publisher=

work page 2006
[25]

Proceedings of the 2021 CHI conference on human factors in computing systems , pages=

Exploring generative models with middle school students , author=. Proceedings of the 2021 CHI conference on human factors in computing systems , pages=

work page 2021
[26]

2025 IEEE Symposium on Security and Privacy (SP) , pages=

Exploring parent-child perceptions on safety in generative AI: concerns, mitigation strategies, and design implications , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=

work page 2025
[27]

ACM Transactions on Information Systems , volume=

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. ACM Transactions on Information Systems , volume=. 2025 , publisher=

work page 2025
[28]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page
[29]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Llama guard: Llm-based input-output safeguard for human-ai conversations , author=. arXiv preprint arXiv:2312.06674 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

ShieldGemma: Generative AI Content Moderation Based on Gemma

Shieldgemma: Generative ai content moderation based on gemma , author=. arXiv preprint arXiv:2407.21772 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Advances in neural information processing systems , volume=

Jailbroken: How does llm safety training fail? , author=. Advances in neural information processing systems , volume=

work page
[32]

new media & society , pages=

‘I’m sorry Dave, I’m afraid I can’t do that’: Moral regulation in refusals by LLM chatbots , author=. new media & society , pages=. 2025 , publisher=

work page 2025
[33]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Safety is not only about refusal: Reasoning-enhanced fine-tuning for interpretable llm safety , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

work page 2025
[34]

Advances in Neural Information Processing Systems , volume=

Improving alignment and robustness with circuit breakers , author=. Advances in Neural Information Processing Systems , volume=

work page
[35]

Steering Language Model Refusal with Sparse Autoencoders , author=

work page
[36]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Interpretation meets safety: A survey on interpretation methods and tools for improving llm safety , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025
[37]

Contemporary Issues in Early Childhood , volume=

AI's empathy gap: The risks of conversational Artificial Intelligence for young children's well-being and key ethical considerations for early childhood education and care , author=. Contemporary Issues in Early Childhood , volume=. 2025 , publisher=

work page 2025
[38]

2023 , address =

Health Advisory on Social Media Use in Adolescents , institution =. 2023 , address =

work page 2023
[39]

Papageno effects , author=

Role of media reports in completed and prevented suicide: Werther v. Papageno effects , author=. The British Journal of Psychiatry , volume=. 2010 , publisher=

work page 2010
[40]

The Lancet Psychiatry , volume=

Prevention, early intervention, harm reduction, and treatment of substance use in young people , author=. The Lancet Psychiatry , volume=. 2016 , publisher=

work page 2016
[41]

BMC psychiatry , volume=

Perceived barriers and facilitators to mental health help-seeking in young people: a systematic review , author=. BMC psychiatry , volume=. 2010 , publisher=

work page 2010
[42]

Aggression and violent behavior , volume=

Are cyberbullying intervention and prevention programs effective? A systematic and meta-analytical review , author=. Aggression and violent behavior , volume=. 2019 , publisher=

work page 2019
[43]

OpenAI GPT-5 System Card

Openai gpt-5 system card , author=. arXiv preprint arXiv:2601.03267 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[45]

Common Sense Media

The dawn of the AI era: Teens, parents, and the adoption of generative AI at home and school , author=. Common Sense Media. Available online: https://www. commonsensemedia. org/sites/default/files/research/report/2024-the-dawn-of-the-ai-era\_final-release-for-web. pdf (accessed on 4 November 2025) , year=

work page 2024
[46]

AI for Children: Healthcare, Psychology, Education , year=

MinorBench: A hand-built benchmark for content-based risks for children , author=. AI for Children: Healthcare, Psychology, Education , year=

work page
[47]

Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions

Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions , author=. arXiv preprint arXiv:2506.13510 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Advances in neural information processing systems , volume=

Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=

work page
[49]

Archives of suicide research , volume=

Bullying, cyberbullying, and suicide , author=. Archives of suicide research , volume=. 2010 , publisher=

work page 2010
[50]

Journal of youth and adolescence , volume=

The effectiveness of an intervention to promote awareness and reduce online risk behavior in early adolescence , author=. Journal of youth and adolescence , volume=. 2016 , publisher=

work page 2016
[51]

Current psychiatry reports , volume=

Adolescent substance use disorder treatment: an update on evidence-based strategies , author=. Current psychiatry reports , volume=. 2019 , publisher=

work page 2019
[52]

ArXiv , year=

Qwen2.5 Technical Report , author=. ArXiv , year=

work page
[53]

gpt-oss-120b & gpt-oss-20b model card , author=

work page
[54]

ArXiv , year=

Mistral 7B , author=. ArXiv , year=

work page
[55]

Sentence-bert: Sentence embeddings using siamese bert-networks , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) , pages=

work page 2019
[56]

Information processing & management , volume=

Term-weighting approaches in automatic text retrieval , author=. Information processing & management , volume=. 1988 , publisher=

work page 1988
[57]

arXiv preprint arXiv:2209.11055 , year=

Efficient few-shot learning without prompts , author=. arXiv preprint arXiv:2209.11055 , year=

work page arXiv
[58]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page

[2] [2]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page

[3] [3]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980

[4] [4]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page

[5] [5]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984

[6] [6]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page

[7] [7]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page

[8] [8]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page

[9] [9]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page

[10] [10]

2017 , eprint=

Attention Is All You Need , author=. 2017 , eprint=

work page 2017

[11] [11]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page

[12] [12]

2025 , institution=

How people use chatgpt , author=. 2025 , institution=

work page 2025

[13] [13]

Teens, Social Media and AI Chatbots 2025 , year =

work page 2025

[14] [14]

Artificial intelligence review , volume=

Safeguarding large language models: A survey , author=. Artificial intelligence review , volume=. 2025 , publisher=

work page 2025

[15] [15]

Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security , pages=

YouthSafe: A Youth-Centric Safety Benchmark and Safeguard Model for Large Language Models , author=. Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security , pages=

work page 2025

[16] [16]

Advances in neural information processing systems , volume=

Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms , author=. Advances in neural information processing systems , volume=

work page

[17] [17]

LLM safety for children , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track) , pages=

work page 2025

[18] [18]

arXiv preprint arXiv:2510.05484 , year=

Evaluating llm safety across child development stages: A simulated agent approach , author=. arXiv preprint arXiv:2510.05484 , year=

work page arXiv

[19] [19]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Twenty-First Symposium on Usable Privacy and Security (SOUPS 2025) , pages=

Youth-Centered GAI Risks (YAIR): A Taxonomy of Generative AI Risks from Empirical Data , author=. Twenty-First Symposium on Usable Privacy and Security (SOUPS 2025) , pages=

work page 2025

[21] [21]

Proceedings of the 41st International Conference on Machine Learning , pages=

RigorLLM: resilient guardrails for large language models against undesired content , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page

[22] [22]

Learning, Media and Technology , volume=

‘No, Alexa, no!’: designing child-safe AI and protecting children from the risks of the ‘empathy gap’in large language models , author=. Learning, Media and Technology , volume=. 2025 , publisher=

work page 2025

[23] [23]

Proceedings of the 24th Interaction Design and Children , pages=

Parents’ perceptions about the use of generative AI systems by adolescents , author=. Proceedings of the 24th Interaction Design and Children , pages=

work page

[24] [24]

Youth & Society , volume=

Safety in cyberspace: Adolescents' safety and exposure online , author=. Youth & Society , volume=. 2006 , publisher=

work page 2006

[25] [25]

Proceedings of the 2021 CHI conference on human factors in computing systems , pages=

Exploring generative models with middle school students , author=. Proceedings of the 2021 CHI conference on human factors in computing systems , pages=

work page 2021

[26] [26]

2025 IEEE Symposium on Security and Privacy (SP) , pages=

Exploring parent-child perceptions on safety in generative AI: concerns, mitigation strategies, and design implications , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=

work page 2025

[27] [27]

ACM Transactions on Information Systems , volume=

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions , author=. ACM Transactions on Information Systems , volume=. 2025 , publisher=

work page 2025

[28] [28]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page

[29] [29]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Llama guard: Llm-based input-output safeguard for human-ai conversations , author=. arXiv preprint arXiv:2312.06674 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

ShieldGemma: Generative AI Content Moderation Based on Gemma

Shieldgemma: Generative ai content moderation based on gemma , author=. arXiv preprint arXiv:2407.21772 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Advances in neural information processing systems , volume=

Jailbroken: How does llm safety training fail? , author=. Advances in neural information processing systems , volume=

work page

[32] [32]

new media & society , pages=

‘I’m sorry Dave, I’m afraid I can’t do that’: Moral regulation in refusals by LLM chatbots , author=. new media & society , pages=. 2025 , publisher=

work page 2025

[33] [33]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Safety is not only about refusal: Reasoning-enhanced fine-tuning for interpretable llm safety , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

work page 2025

[34] [34]

Advances in Neural Information Processing Systems , volume=

Improving alignment and robustness with circuit breakers , author=. Advances in Neural Information Processing Systems , volume=

work page

[35] [35]

Steering Language Model Refusal with Sparse Autoencoders , author=

work page

[36] [36]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Interpretation meets safety: A survey on interpretation methods and tools for improving llm safety , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025

[37] [37]

Contemporary Issues in Early Childhood , volume=

AI's empathy gap: The risks of conversational Artificial Intelligence for young children's well-being and key ethical considerations for early childhood education and care , author=. Contemporary Issues in Early Childhood , volume=. 2025 , publisher=

work page 2025

[38] [38]

2023 , address =

Health Advisory on Social Media Use in Adolescents , institution =. 2023 , address =

work page 2023

[39] [39]

Papageno effects , author=

Role of media reports in completed and prevented suicide: Werther v. Papageno effects , author=. The British Journal of Psychiatry , volume=. 2010 , publisher=

work page 2010

[40] [40]

The Lancet Psychiatry , volume=

Prevention, early intervention, harm reduction, and treatment of substance use in young people , author=. The Lancet Psychiatry , volume=. 2016 , publisher=

work page 2016

[41] [41]

BMC psychiatry , volume=

Perceived barriers and facilitators to mental health help-seeking in young people: a systematic review , author=. BMC psychiatry , volume=. 2010 , publisher=

work page 2010

[42] [42]

Aggression and violent behavior , volume=

Are cyberbullying intervention and prevention programs effective? A systematic and meta-analytical review , author=. Aggression and violent behavior , volume=. 2019 , publisher=

work page 2019

[43] [43]

OpenAI GPT-5 System Card

Openai gpt-5 system card , author=. arXiv preprint arXiv:2601.03267 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

Common Sense Media

The dawn of the AI era: Teens, parents, and the adoption of generative AI at home and school , author=. Common Sense Media. Available online: https://www. commonsensemedia. org/sites/default/files/research/report/2024-the-dawn-of-the-ai-era\_final-release-for-web. pdf (accessed on 4 November 2025) , year=

work page 2024

[46] [46]

AI for Children: Healthcare, Psychology, Education , year=

MinorBench: A hand-built benchmark for content-based risks for children , author=. AI for Children: Healthcare, Psychology, Education , year=

work page

[47] [47]

Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions

Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions , author=. arXiv preprint arXiv:2506.13510 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[48] [48]

Advances in neural information processing systems , volume=

Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=

work page

[49] [49]

Archives of suicide research , volume=

Bullying, cyberbullying, and suicide , author=. Archives of suicide research , volume=. 2010 , publisher=

work page 2010

[50] [50]

Journal of youth and adolescence , volume=

The effectiveness of an intervention to promote awareness and reduce online risk behavior in early adolescence , author=. Journal of youth and adolescence , volume=. 2016 , publisher=

work page 2016

[51] [51]

Current psychiatry reports , volume=

Adolescent substance use disorder treatment: an update on evidence-based strategies , author=. Current psychiatry reports , volume=. 2019 , publisher=

work page 2019

[52] [52]

ArXiv , year=

Qwen2.5 Technical Report , author=. ArXiv , year=

work page

[53] [53]

gpt-oss-120b & gpt-oss-20b model card , author=

work page

[54] [54]

ArXiv , year=

Mistral 7B , author=. ArXiv , year=

work page

[55] [55]

Sentence-bert: Sentence embeddings using siamese bert-networks , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) , pages=

work page 2019

[56] [56]

Information processing & management , volume=

Term-weighting approaches in automatic text retrieval , author=. Information processing & management , volume=. 1988 , publisher=

work page 1988

[57] [57]

arXiv preprint arXiv:2209.11055 , year=

Efficient few-shot learning without prompts , author=. arXiv preprint arXiv:2209.11055 , year=

work page arXiv

[58] [58]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

work page internal anchor Pith review Pith/arXiv arXiv