30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

representative citing papers

Contrastive Identification and Generation in the Limit

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

Contrastive pair presentations yield exact identifiability characterizations via a geometric refinement of Angluin's condition, a new contrastive closure dimension for generation, mutual incomparability with text identification, and a single algorithm that tolerates any finite corruption budget.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.

Probing Privacy Leaks in LLM-based Code Generation via Test Generation

cs.SE · 2026-05-14 · unverdicted · novelty 6.0

A test-driven pipeline with an auto-constructed privacy feature library detects 2.56 times more confirmed privacy leaks in LLM-based code generation than existing baselines.

The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

ZCP detects direct and evasive data contamination in LLMs by truncating CoT reasoning and contrasting zero-CoT accuracy on original versus perturbed isomorphic datasets, plus a Contamination Confidence metric.

Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

cs.CR · 2026-04-20 · unverdicted · novelty 5.0

BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.

ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation

cs.IR · 2026-04-10 · unverdicted · novelty 5.0

ALDEN boosts private data extraction rates from RAG systems by combining active learning for query diversification with dynamic estimation of the underlying knowledge-base topic distribution.

Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure

cs.CL · 2026-05-03

citing papers explorer

Showing 8 of 8 citing papers.

Contrastive Identification and Generation in the Limit cs.LG · 2026-05-07 · unverdicted · none · ref 17
Contrastive pair presentations yield exact identifiability characterizations via a geometric refinement of Angluin's condition, a new contrastive closure dimension for generation, mutual incomparability with text identification, and a single algorithm that tolerates any finite corruption budget.
Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain cs.CL · 2026-05-09 · unverdicted · none · ref 37
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models cs.CV · 2026-05-05 · unverdicted · none · ref 26
CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.
Probing Privacy Leaks in LLM-based Code Generation via Test Generation cs.SE · 2026-05-14 · unverdicted · none · ref 7
A test-driven pipeline with an auto-constructed privacy feature library detects 2.56 times more confirmed privacy leaks in LLM-based code generation than existing baselines.
The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation cs.LG · 2026-05-21 · unverdicted · none · ref 17
ZCP detects direct and evasive data contamination in LLMs by truncating CoT reasoning and contrasting zero-CoT accuracy on original versus perturbed isomorphic datasets, plus a Contamination Confidence metric.
Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective cs.CR · 2026-04-20 · unverdicted · none · ref 63
BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.
ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation cs.IR · 2026-04-10 · unverdicted · none · ref 55
ALDEN boosts private data extraction rates from RAG systems by combining active learning for query diversification with dynamic estimation of the underlying knowledge-base topic distribution.
Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure cs.CL · 2026-05-03 · unreviewed · ref 6

30th USENIX security symposium (USENIX Security 21) , pages=

fields

years

verdicts

representative citing papers

citing papers explorer