Training data extraction from pre-trained language models: A survey

· 2023 · arXiv 2305.16157

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms

cs.LG · 2026-04-18 · unverdicted · novelty 7.0

Unlearnable examples fail under pretraining-finetuning due to semantic filtering by frozen layers, but Shallow Semantic Camouflage restores effectiveness by confining perturbations to semantically valid subspaces.

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

First unified survey formalizing Pretraining Data Exposure across exposure levels and reviewing attack, defense, and contamination methods for LLMs.

Selective Token-Level Cryptographic Redaction for Privacy-Preserving Clinical Deployment of Large Language Models

cs.CL · 2026-06-02 · unverdicted · novelty 4.0

HERALD selectively encrypts sensitive tokens via medical NER, POS policies, and deterministic ciphertext substitution to enable privacy-preserving clinical LLM use while recovering near-plaintext task performance.

Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions

cs.SE · 2024-02-26 · unverdicted · novelty 4.0

Industry AI practitioners view model quality through nine attributes with context-dependent priorities, where data imbalance is a key challenge addressed by strategies like active learning, as confirmed by interviews and a follow-up survey.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions cs.SE · 2024-02-26 · unverdicted · none · ref 69
Industry AI practitioners view model quality through nine attributes with context-dependent priorities, where data imbalance is a key challenge addressed by strategies like active learning, as confirmed by interviews and a follow-up survey.

Training data extraction from pre-trained language models: A survey

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer