Proceedings of the 41st International Conference on Machine Learning , pages=

RigorLLM: resilient guardrails for large language models against undesired content , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

CR4T is a model-agnostic framework using lightweight risk detection and domain-conditioned rewriting to convert unsafe or refusal-style LLM responses into developmentally appropriate guidance for adolescents.

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

SafeHarbor introduces a hierarchical memory-augmented guardrail with adversarial rule extraction and entropy-driven self-evolution to balance safety and utility in LLM agents.

citing papers explorer

Showing 2 of 2 citing papers.

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety cs.CL · 2026-05-20 · unverdicted · none · ref 21
CR4T is a model-agnostic framework using lightweight risk detection and domain-conditioned rewriting to convert unsafe or refusal-style LLM responses into developmentally appropriate guidance for adolescents.
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety cs.CR · 2026-05-07 · unverdicted · none · ref 11
SafeHarbor introduces a hierarchical memory-augmented guardrail with adversarial rule extraction and entropy-driven self-evolution to balance safety and utility in LLM agents.

Proceedings of the 41st International Conference on Machine Learning , pages=

fields

years

verdicts

representative citing papers

citing papers explorer