DualOptim+ introduces base and delta optimizer states that adaptively bridge shared and decoupled components based on gradient directional conflicts to improve trade-offs in LLM machine unlearning.
Learning to refuse: Towards mitigating privacy risks in LLMs
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Existing LLM unlearning methods fail honesty standards by hallucinating on forgotten knowledge; ReVa improves rejection rates nearly twofold while enhancing retained honesty.
CURaTE performs continual unlearning in LLMs in real time by using sentence embeddings to detect and refuse forget requests without changing model parameters, achieving effective forgetting and perfect knowledge preservation.
The study decomposes memorization risks in code LLMs into unintentional and malicious disclosure, demonstrates assessment methods on OLMo models and Dolma data, and finds that data changes affect risks differently depending on sensitive information type.
citing papers explorer
-
DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models
DualOptim+ introduces base and delta optimizer states that adaptively bridge shared and decoupled components based on gradient directional conflicts to improve trade-offs in LLM machine unlearning.
-
Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning
Existing LLM unlearning methods fail honesty standards by hallucinating on forgotten knowledge; ReVa improves rejection rates nearly twofold while enhancing retained honesty.
-
CURaTE: Continual Unlearning in Real Time with Ensured Preservation of LLM Knowledge
CURaTE performs continual unlearning in LLMs in real time by using sentence embeddings to detect and refuse forget requests without changing model parameters, achieving effective forgetting and perfect knowledge preservation.
-
Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation
The study decomposes memorization risks in code LLMs into unintentional and malicious disclosure, demonstrates assessment methods on OLMo models and Dolma data, and finds that data changes affect risks differently depending on sensitive information type.