Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection

Hoang Phan, Victor Li, Qi Lei · 2025 · DOI 10.18653/v1/2025.findings-emnlp.503

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

Safety Reflection Pretraining adds regular safety reflections to pretraining data to integrate self-monitoring and reduce unsafe generalization from safe data in LLMs.

Understanding the Self-Reflection Mechanisms of LLMs through Biased Attitude Associations

cs.SI · 2026-05-30 · unverdicted · novelty 4.0

ReBias-Lens shows LLM self-reflection produces layer-wise smoothing of global valence fluctuations that reduces behavioral bias overall, yet selectively locks in and amplifies certain category-specific biases.

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

cs.CL · 2026-05-13 · unverdicted · novelty 3.0

AERIC uses a 387-parameter head on LLM hidden states for same-pass anticipatory detection of implicit harm, reporting AUROC gains on DiaSafety and Harmful Advice plus low-latency trigger rates on HarmBench and SocialHarmBench.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection cs.AI · 2026-06-17 · unverdicted · none · ref 56
Safety Reflection Pretraining adds regular safety reflections to pretraining data to integrate self-monitoring and reduce unsafe generalization from safe data in LLMs.

Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection

fields

years

verdicts

representative citing papers

citing papers explorer