pith. sign in

arXiv preprint arXiv:2408.06223 , year =

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 1 cs.LG 1

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

Safe-RULE: Safe Reinforcement UnLEarning

cs.LG · 2026-06-08 · unverdicted · novelty 5.0

Safe-RULE introduces a reinforcement unlearning defense for offline safe RL that counters data poisoning by removing malicious data influence while preserving task performance and safety.

citing papers explorer

Showing 2 of 2 citing papers.

  • On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study cs.CL · 2026-06-10 · unverdicted · none · ref 133

    Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.

  • Safe-RULE: Safe Reinforcement UnLEarning cs.LG · 2026-06-08 · unverdicted · none · ref 24

    Safe-RULE introduces a reinforcement unlearning defense for offline safe RL that counters data poisoning by removing malicious data influence while preserving task performance and safety.