Toward understanding unlearning difficulty: A mechanistic perspective and circuit-guided difficulty metric

Jiali Cheng, Ziheng Chen, Chirag Agarwal, Hadi Amiri · 2026 · arXiv 2601.09624

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

CURE:Circuit-Aware Unlearning for LLM-based Recommendation

cs.IR · 2026-04-04 · unverdicted · novelty 7.0

CURE disentangles LLM recommendation circuits into forget-specific, retain-specific, and task-shared modules with tailored update rules to achieve more effective unlearning than weighted baselines.

Towards Understanding the Robustness of Sparse Autoencoders

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Integrating pretrained sparse autoencoders into LLM residual streams reduces jailbreak success rates by up to 5x across multiple models and attacks.

citing papers explorer

Showing 2 of 2 citing papers.

CURE:Circuit-Aware Unlearning for LLM-based Recommendation cs.IR · 2026-04-04 · unverdicted · none · ref 10
CURE disentangles LLM recommendation circuits into forget-specific, retain-specific, and task-shared modules with tailored update rules to achieve more effective unlearning than weighted baselines.
Towards Understanding the Robustness of Sparse Autoencoders cs.LG · 2026-04-20 · unverdicted · none · ref 23
Integrating pretrained sparse autoencoders into LLM residual streams reduces jailbreak success rates by up to 5x across multiple models and attacks.

Toward understanding unlearning difficulty: A mechanistic perspective and circuit-guided difficulty metric

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer