pith. sign in

Mart: Improving llm safety with multi-round automatic red-teaming

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 2 method 1

citation-polarity summary

fields

cs.CR 5 cs.AI 2

polarities

background 3

representative citing papers

Adaptive Instruction Composition for Automated LLM Red-Teaming

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

Adaptive Instruction Composition uses a neural contextual bandit with RL to adaptively combine crowdsourced texts, generating more effective and diverse LLM jailbreaks than random or prior adaptive methods on Harmbench.

citing papers explorer

Showing 7 of 7 citing papers.