Learning to refuse: Towards mitigating privacy risks in LLMs

· 2024 · arXiv 2407.10058

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

DualOptim+ introduces base and delta optimizer states that adaptively bridge shared and decoupled components based on gradient directional conflicts to improve trade-offs in LLM machine unlearning.

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning

cs.LG · 2026-05-09 · conditional · novelty 6.0

Existing LLM unlearning methods fail honesty standards by hallucinating on forgotten knowledge; ReVa improves rejection rates nearly twofold while enhancing retained honesty.

CURaTE: Continual Unlearning in Real Time with Ensured Preservation of LLM Knowledge

cs.CL · 2026-04-16 · unverdicted · novelty 6.0

CURaTE performs continual unlearning in LLMs in real time by using sentence embeddings to detect and refuse forget requests without changing model parameters, achieving effective forgetting and perfect knowledge preservation.

Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation

cs.CR · 2025-03-27 · unverdicted · novelty 5.0

The study decomposes memorization risks in code LLMs into unintentional and malicious disclosure, demonstrates assessment methods on OLMo models and Dolma data, and finds that data changes affect risks differently depending on sensitive information type.

citing papers explorer

Showing 4 of 4 citing papers.

DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models cs.LG · 2026-05-20 · unverdicted · none · ref 4
DualOptim+ introduces base and delta optimizer states that adaptively bridge shared and decoupled components based on gradient directional conflicts to improve trade-offs in LLM machine unlearning.
Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning cs.LG · 2026-05-09 · conditional · none · ref 5
Existing LLM unlearning methods fail honesty standards by hallucinating on forgotten knowledge; ReVa improves rejection rates nearly twofold while enhancing retained honesty.
CURaTE: Continual Unlearning in Real Time with Ensured Preservation of LLM Knowledge cs.CL · 2026-04-16 · unverdicted · none · ref 2
CURaTE performs continual unlearning in LLMs in real time by using sentence embeddings to detect and refuse forget requests without changing model parameters, achieving effective forgetting and perfect knowledge preservation.
Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation cs.CR · 2025-03-27 · unverdicted · none · ref 27
The study decomposes memorization risks in code LLMs into unintentional and malicious disclosure, demonstrates assessment methods on OLMo models and Dolma data, and finds that data changes affect risks differently depending on sensitive information type.

Learning to refuse: Towards mitigating privacy risks in LLMs

fields

years

verdicts

representative citing papers

citing papers explorer