Title resolution pending

Association for Computational Linguistics · 2023 · DOI 10.18653/v1/2023.emnlp-main.84

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Improving LLM Unlearning Robustness via Random Perturbations

cs.CL · 2025-01-31 · unverdicted · novelty 7.0

LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.

Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting

cs.CL · 2026-05-21 · unverdicted · novelty 5.0

CITA generates Chinese implicit toxicity samples that cause 69.48% average missed detection across seven tested detectors while preserving harmfulness, and the same data improves robustness when used to fine-tune a CITD defense model.

citing papers explorer

Showing 2 of 2 citing papers.

Improving LLM Unlearning Robustness via Random Perturbations cs.CL · 2025-01-31 · unverdicted · none · ref 34
LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.
Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting cs.CL · 2026-05-21 · unverdicted · none · ref 5
CITA generates Chinese implicit toxicity samples that cause 69.48% average missed detection across seven tested detectors while preserving harmfulness, and the same data improves robustness when used to fine-tune a CITD defense model.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer