arXiv preprint arXiv:2403.01251

Accelerating greedy coordinate gradient via probe sampling · 2024 · arXiv 2403.01251

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs

cs.CR · 2025-11-04 · unverdicted · novelty 5.0

ASTRA is an automated closed-loop framework that discovers, retrieves, and evolves jailbreak attack strategies for LLMs using a dynamic three-tier strategy library and outperforms baselines in black-box settings.

Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

cs.LG · 2024-10-20 · unverdicted · novelty 5.0

Faster-GCG improves GCG efficiency 8x via regularization, temperature sampling, and duplicate avoidance, reaching 78.1% success rate with 32K evaluations across five aligned LLMs.

citing papers explorer

Showing 3 of 3 citing papers.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unverdicted · none · ref 66
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs cs.CR · 2025-11-04 · unverdicted · none · ref 59
ASTRA is an automated closed-loop framework that discovers, retrieves, and evolves jailbreak attack strategies for LLMs using a dynamic three-tier strategy library and outperforms baselines in black-box settings.
Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models cs.LG · 2024-10-20 · unverdicted · none · ref 11
Faster-GCG improves GCG efficiency 8x via regularization, temperature sampling, and duplicate avoidance, reaching 78.1% success rate with 32K evaluations across five aligned LLMs.

arXiv preprint arXiv:2403.01251

fields

years

verdicts

representative citing papers

citing papers explorer