Users show limited grasp of GenAI smartphone data practices but heightened privacy concerns across collection, storage, and control, calling for better transparency and user controls.
Privacy in large language models: Attacks, defenses and future directions
7 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 7representative citing papers
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
RedDiffuser is a reinforced diffusion framework that generates adversarial visual contexts to audit and expose widespread multimodal safety failures in VLMs, increasing unsafe response rates by up to 10.69% on LLaVA with transfer to other models.
Safe-SAIL supplies a pre-explanation metric and segment-level simulation to interpret 1758 safety SAE features across pornography, politics, violence, and terror, with public models and tools released.
Privacy attacks on LLMs show strong signals for membership inference and backdoors but weaker performance for attribute inference and data extraction, with risks highly dependent on system configuration.
LLM-CEG applies differential privacy during fine-tuning of DistilGPT-2 to reduce membership inference attack success by 71.5% while increasing out-of-distribution utility by 47-50%.
Sentra-Guard reports 99.96% detection of adversarial LLM prompts with AUC 1.00 and ASR of 0.004% using a hybrid SBERT-FAISS and transformer classifier architecture with multilingual translation and human feedback.
citing papers explorer
-
Understanding User Privacy Perceptions of GenAI Smartphones
Users show limited grasp of GenAI smartphone data practices but heightened privacy concerns across collection, storage, and control, calling for better transparency and user controls.
-
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
-
RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion
RedDiffuser is a reinforced diffusion framework that generates adversarial visual contexts to audit and expose widespread multimodal safety failures in VLMs, increasing unsafe response rates by up to 10.69% on LLaVA with transfer to other models.
-
Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework
Safe-SAIL supplies a pre-explanation metric and segment-level simulation to interpret 1758 safety SAE features across pornography, politics, violence, and terror, with public models and tools released.
-
On the Privacy of LLMs: An Ablation Study
Privacy attacks on LLMs show strong signals for membership inference and backdoors but weaker performance for attribute inference and data extraction, with risks highly dependent on system configuration.
-
LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models
LLM-CEG applies differential privacy during fine-tuning of DistilGPT-2 to reduce membership inference attack success by 71.5% while increasing out-of-distribution utility by 47-50%.
-
Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts
Sentra-Guard reports 99.96% detection of adversarial LLM prompts with AUC 1.00 and ASR of 0.004% using a hybrid SBERT-FAISS and transformer classifier architecture with multilingual translation and human feedback.