Towards safer large language models through machine unlearning

Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang · 2024 · arXiv 2402.10058

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

PPU-Bench is a real-world benchmark exposing forget-retain trade-offs in MLLM unlearning and motivating Boundary-Aware Optimization to enforce intra-subject factual boundaries.

Is your algorithm unlearning or untraining?

cs.LG · 2026-04-09 · conditional · novelty 7.0

Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

ASRU combines activation redirection and reward-optimized fine-tuning to unlearn cross-modal sensitive knowledge in MLLMs, reporting +24.6% better unlearning effectiveness and 5.8x higher generation quality on Qwen3-VL while preserving utility with limited retained data.

Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

A contrastive visual forgetting technique constrained to the null space of retained knowledge enables targeted unlearning of visual concepts in MLLMs while preserving non-target visual and all textual knowledge.

Representation-Guided Parameter-Efficient LLM Unlearning

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.

From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

MAGE builds a memory graph from a user anchor to generate its own supervision signals for corpus-free unlearning, matching the effectiveness of methods that use external reference data on TOFU and RWKU benchmarks.

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

cs.LG · 2024-08-14 · accept · novelty 4.0

The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.

Machine Unlearning: A Comprehensive Survey

cs.CR · 2024-05-13 · unverdicted · novelty 2.0

A survey classifying machine unlearning into centralized (exact and approximate), distributed/irregular data, verification, and privacy/security categories with technique overviews.

citing papers explorer

Showing 8 of 8 citing papers.

PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models cs.CV · 2026-05-09 · unverdicted · none · ref 14
PPU-Bench is a real-world benchmark exposing forget-retain trade-offs in MLLM unlearning and motivating Boundary-Aware Optimization to enforce intra-subject factual boundaries.
Is your algorithm unlearning or untraining? cs.LG · 2026-04-09 · conditional · none · ref 19
Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).
ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models cs.CL · 2026-05-15 · unverdicted · none · ref 12
ASRU combines activation redirection and reward-optimized fine-tuning to unlearn cross-modal sensitive knowledge in MLLMs, reporting +24.6% better unlearning effectiveness and 5.8x higher generation quality on Qwen3-VL while preserving utility with limited retained data.
Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning cs.AI · 2026-05-07 · unverdicted · none · ref 6
A contrastive visual forgetting technique constrained to the null space of retained knowledge enables targeted unlearning of visual concepts in MLLMs while preserving non-target visual and all textual knowledge.
Representation-Guided Parameter-Efficient LLM Unlearning cs.CL · 2026-04-19 · unverdicted · none · ref 174
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models cs.CL · 2026-04-15 · unverdicted · none · ref 3
MAGE builds a memory graph from a user anchor to generate its own supervision signals for corpus-free unlearning, matching the effectiveness of methods that use external reference data on TOFU and RWKU benchmarks.
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities cs.LG · 2024-08-14 · accept · none · ref 139
The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.
Machine Unlearning: A Comprehensive Survey cs.CR · 2024-05-13 · unverdicted · none · ref 81
A survey classifying machine unlearning into centralized (exact and approximate), distributed/irregular data, verification, and privacy/security categories with technique overviews.

Towards safer large language models through machine unlearning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer