Privacy in large language models: Attacks, defenses and future directions

· 2023 · arXiv 2310.10383

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

Understanding User Privacy Perceptions of GenAI Smartphones

cs.CR · 2026-04-07 · unverdicted · novelty 7.0

Users show limited grasp of GenAI smartphone data practices but heightened privacy concerns across collection, storage, and control, calling for better transparency and user controls.

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.

RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion

cs.CV · 2025-03-08 · unverdicted · novelty 6.0

RedDiffuser is a reinforced diffusion framework that generates adversarial visual contexts to audit and expose widespread multimodal safety failures in VLMs, increasing unsafe response rates by up to 10.69% on LLaVA with transfer to other models.

Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework

cs.LG · 2025-09-11 · unverdicted · novelty 5.0

Safe-SAIL supplies a pre-explanation metric and segment-level simulation to interpret 1758 safety SAE features across pornography, politics, violence, and terror, with public models and tools released.

On the Privacy of LLMs: An Ablation Study

cs.CR · 2026-05-04 · unverdicted · novelty 4.0

Privacy attacks on LLMs show strong signals for membership inference and backdoors but weaker performance for attribute inference and data extraction, with risks highly dependent on system configuration.

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

cs.CR · 2026-04-26 · unverdicted · novelty 4.0

LLM-CEG applies differential privacy during fine-tuning of DistilGPT-2 to reduce membership inference attack success by 71.5% while increasing out-of-distribution utility by 47-50%.

Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

cs.CR · 2025-10-26 · unverdicted · novelty 4.0

Sentra-Guard reports 99.96% detection of adversarial LLM prompts with AUC 1.00 and ASR of 0.004% using a hybrid SBERT-FAISS and transformer classifier architecture with multilingual translation and human feedback.

citing papers explorer

Showing 7 of 7 citing papers.

Understanding User Privacy Perceptions of GenAI Smartphones cs.CR · 2026-04-07 · unverdicted · none · ref 43
Users show limited grasp of GenAI smartphone data practices but heightened privacy concerns across collection, storage, and control, calling for better transparency and user controls.
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling cs.LG · 2026-04-22 · unverdicted · none · ref 33
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion cs.CV · 2025-03-08 · unverdicted · none · ref 23
RedDiffuser is a reinforced diffusion framework that generates adversarial visual contexts to audit and expose widespread multimodal safety failures in VLMs, increasing unsafe response rates by up to 10.69% on LLaVA with transfer to other models.
Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework cs.LG · 2025-09-11 · unverdicted · none · ref 23
Safe-SAIL supplies a pre-explanation metric and segment-level simulation to interpret 1758 safety SAE features across pornography, politics, violence, and terror, with public models and tools released.
On the Privacy of LLMs: An Ablation Study cs.CR · 2026-05-04 · unverdicted · none · ref 27
Privacy attacks on LLMs show strong signals for membership inference and backdoors but weaker performance for attribute inference and data extraction, with risks highly dependent on system configuration.
LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models cs.CR · 2026-04-26 · unverdicted · none · ref 15
LLM-CEG applies differential privacy during fine-tuning of DistilGPT-2 to reduce membership inference attack success by 71.5% while increasing out-of-distribution utility by 47-50%.
Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts cs.CR · 2025-10-26 · unverdicted · none · ref 1
Sentra-Guard reports 99.96% detection of adversarial LLM prompts with AUC 1.00 and ASR of 0.004% using a hybrid SBERT-FAISS and transformer classifier architecture with multilingual translation and human feedback.

Privacy in large language models: Attacks, defenses and future directions

fields

years

verdicts

representative citing papers

citing papers explorer