Proceedings of the 37th International Conference on Machine Learning,

Francesco Croce, Matthias Hein , title = · 2020

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

SDM: A Powerful Tool for Evaluating Model Robustness

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.

Adaptive Probe-based Steering for Robust LLM Jailbreaking

cs.CR · 2026-05-19 · unverdicted · novelty 5.0

Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.

citing papers explorer

Showing 2 of 2 citing papers.

SDM: A Powerful Tool for Evaluating Model Robustness cs.CV · 2026-05-19 · unverdicted · none · ref 5
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
Adaptive Probe-based Steering for Robust LLM Jailbreaking cs.CR · 2026-05-19 · unverdicted · none · ref 37
Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.

Proceedings of the 37th International Conference on Machine Learning,

fields

years

verdicts

representative citing papers

citing papers explorer