Membership infer- ence attack susceptibility of clinical language models,

· 2021 · arXiv 2104.08305

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Detecting Pretraining Data from Large Language Models

cs.CL · 2023-10-25 · conditional · novelty 7.0

Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.

MC-PDD: Masked Corpus-Level Pretraining Data Detection for Black-Box Large Language Models

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

A masked-token hit-rate comparison method detects pretraining data membership in black-box LLMs with performance comparable to white-box approaches.

Data Compressibility Quantifies LLM Memorization

cs.CL · 2025-07-08 · unverdicted · novelty 5.0

Set-level data entropy estimators show linear correlation with LLM memorization scores, forming the Entropy-Memorization Linearity.

Towards the Anonymization of the Language Modeling

cs.CL · 2025-01-05 · unverdicted · novelty 4.0

Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.

citing papers explorer

Showing 3 of 3 citing papers after filters.

MC-PDD: Masked Corpus-Level Pretraining Data Detection for Black-Box Large Language Models cs.CL · 2026-06-06 · unverdicted · none · ref 30
A masked-token hit-rate comparison method detects pretraining data membership in black-box LLMs with performance comparable to white-box approaches.
Data Compressibility Quantifies LLM Memorization cs.CL · 2025-07-08 · unverdicted · none · ref 53
Set-level data entropy estimators show linear correlation with LLM memorization scores, forming the Entropy-Memorization Linearity.
Towards the Anonymization of the Language Modeling cs.CL · 2025-01-05 · unverdicted · none · ref 29
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.

Membership infer- ence attack susceptibility of clinical language models,

fields

years

verdicts

representative citing papers

citing papers explorer