pith. sign in

Adversarial examples are not easily detected: Bypassing ten detection methods

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

roles

background 3

polarities

unclear 2 background 1

representative citing papers

Adversarial Hubness in Multi-Modal Retrieval

cs.CR · 2024-12-18 · unverdicted · novelty 7.0

Adversarial hubs can be generated to be retrieved as top-1 for over 84% of test queries in text-to-image retrieval, far exceeding natural hubs.

Scalable Extraction of Training Data from (Production) Language Models

cs.LG · 2023-11-28 · conditional · novelty 7.0

Adversaries can scalably extract gigabytes of training data from open, semi-open, and closed language models via querying attacks, including a divergence method that increases extraction rates 150x on aligned models like ChatGPT.

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

An agentic red teaming system automates creation of adversarial testing workflows from natural language goals, unifying ML and generative AI attacks and achieving 85% success rate on Meta Llama Scout with no custom human code.

Low-Resource Languages Jailbreak GPT-4

cs.CL · 2023-10-03 · conditional · novelty 6.0

Translating unsafe inputs to low-resource languages jailbreaks GPT-4 at rates on par with or exceeding state-of-the-art attacks.

Improved Baselines with Visual Instruction Tuning

cs.CV · 2023-10-05 · conditional · novelty 4.0

Simple changes to LLaVA using CLIP-ViT-L-336px, an MLP connector, and academic VQA data yield state-of-the-art results on 11 benchmarks with only 1.2M public examples and one-day training on 8 A100 GPUs.

citing papers explorer

Showing 13 of 13 citing papers.