Membership in- ference attacks against language models via neighbour- 14 hood comparison

· 2023 · arXiv 2305.18462

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

cs.CR · 2024-10-03 · unverdicted · novelty 7.0

ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.

Auditing Data Membership in Reinforcement Learning With Verifiable Rewards

cs.CR · 2025-11-18 · unverdicted · novelty 6.0

DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.

Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set

cs.CR · 2026-05-01

citing papers explorer

Showing 3 of 3 citing papers.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents cs.CR · 2024-10-03 · unverdicted · none · ref 122
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards cs.CR · 2025-11-18 · unverdicted · none · ref 24
DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.
Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set cs.CR · 2026-05-01 · unreviewed · ref 72

Membership in- ference attacks against language models via neighbour- 14 hood comparison

fields

years

verdicts

representative citing papers

citing papers explorer