TimeROME-DLM enables training-free knowledge editing in masked diffusion language models via temporal causal tracing and low-rank residual edit memory applied at inference time.
arXiv preprint arXiv:2408.06223 , year =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.
Safe-RULE introduces a reinforcement unlearning defense for offline safe RL that counters data poisoning by removing malicious data influence while preserving task performance and safety.
citing papers explorer
-
TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models
TimeROME-DLM enables training-free knowledge editing in masked diffusion language models via temporal causal tracing and low-rank residual edit memory applied at inference time.
-
On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study
Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.
-
Safe-RULE: Safe Reinforcement UnLEarning
Safe-RULE introduces a reinforcement unlearning defense for offline safe RL that counters data poisoning by removing malicious data influence while preserving task performance and safety.