Artificial intelligence review , volume=

Safeguarding large language models: A survey , author= · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

CR4T is a model-agnostic framework using lightweight risk detection and domain-conditioned rewriting to convert unsafe or refusal-style LLM responses into developmentally appropriate guidance for adolescents.

Do LLMs have core beliefs?

cs.LG · 2026-05-05 · unverdicted · novelty 5.0

LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.

citing papers explorer

Showing 2 of 2 citing papers.

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety cs.CL · 2026-05-20 · unverdicted · none · ref 14
CR4T is a model-agnostic framework using lightweight risk detection and domain-conditioned rewriting to convert unsafe or refusal-style LLM responses into developmentally appropriate guidance for adolescents.
Do LLMs have core beliefs? cs.LG · 2026-05-05 · unverdicted · none · ref 78
LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.

Artificial intelligence review , volume=

fields

years

verdicts

representative citing papers

citing papers explorer