Toward understanding unlearning difficulty: A mechanistic perspective and circuit-guided difficulty metric

Jiali Cheng, Ziheng Chen, Chirag Agarwal, Hadi Amiri · 2026 · arXiv 2601.09624

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

TRACER: Token ReAssignment for Concept ERasure in Generative Recommendation

cs.IR · 2026-06-05 · unverdicted · novelty 7.0

TRACER uses token reassignment for concept-related items plus a coherence regularizer to unlearn specific concepts in generative recommendation while preserving utility better than baselines.

CURE:Circuit-Aware Unlearning for LLM-based Recommendation

cs.IR · 2026-04-04 · unverdicted · novelty 7.0

CURE disentangles LLM recommendation circuits into forget-specific, retain-specific, and task-shared modules with tailored update rules to achieve more effective unlearning than weighted baselines.

Towards Understanding the Robustness of Sparse Autoencoders

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Integrating pretrained sparse autoencoders into LLM residual streams reduces jailbreak success rates by up to 5x across multiple models and attacks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

TRACER: Token ReAssignment for Concept ERasure in Generative Recommendation cs.IR · 2026-06-05 · unverdicted · none · ref 13
TRACER uses token reassignment for concept-related items plus a coherence regularizer to unlearn specific concepts in generative recommendation while preserving utility better than baselines.
CURE:Circuit-Aware Unlearning for LLM-based Recommendation cs.IR · 2026-04-04 · unverdicted · none · ref 10
CURE disentangles LLM recommendation circuits into forget-specific, retain-specific, and task-shared modules with tailored update rules to achieve more effective unlearning than weighted baselines.
Towards Understanding the Robustness of Sparse Autoencoders cs.LG · 2026-04-20 · unverdicted · none · ref 23
Integrating pretrained sparse autoencoders into LLM residual streams reduces jailbreak success rates by up to 5x across multiple models and attacks.

Toward understanding unlearning difficulty: A mechanistic perspective and circuit-guided difficulty metric

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer