Unified high-probability regret bounds for online convex optimisation with ℓq-Lipschitz losses via ℓp-regularised FTRL and cone-measure sampling from ℓr-spheres, for all p,q,r in [1,∞].
On the impossible safety of large AI models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.
citing papers explorer
-
High-probability zeroth-order online convex optimisation beyond Euclidean geometry
Unified high-probability regret bounds for online convex optimisation with ℓq-Lipschitz losses via ℓp-regularised FTRL and cone-measure sampling from ℓr-spheres, for all p,q,r in [1,∞].
-
Jailbroken: How Does LLM Safety Training Fail?
LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.