Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain

· 2025 · cs.CL · arXiv 2509.10546

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large Language Models (LLMs) are increasingly deployed in finance, where unsafe behavior can lead to serious regulatory risks. However, most red-teaming research focuses on overtly harmful content and overlooks attacks that appear legitimate on the surface yet induce regulatory-violating responses. We address this gap by introducing a controllable black-box multi-turn risk-concealed red-teaming framework (CoRT) that progressively conceals surface-level risk while exploiting regulatory-violating behaviors. CoRT contains two key components: (i) a Risk Concealment Attacker (RCA) that generates multi-turn prompts via iterative refinement, and (ii) a Risk Concealment Controller (RCC) that predicts a turn-level Risk Concealment Score (RCS) to steer RCA's follow-up style. We also built a domain-specific benchmark, FinRisk-Bench, with 522 instructions spanning six financial risk categories. Experiments on nine widely used LLMs show that CoRT (RCA) achieves 93.19% average attack success rate (ASR), and CoRT (RCA+RCC) further improves the average ASR to 95.00%. Our code and FinRisk-Bench are available at https://github.com/gcheng128/CoRT.

representative citing papers

MESA: Prioritizing Vulnerable Communication Channels for Securing Multi-Agent Systems

cs.CR · 2026-06-29 · unverdicted · novelty 6.0

MESA ranks MAS communication edges by vulnerability via graph-theoretic metrics and dynamic probes, achieving mean Spearman ρ=+0.60 correlation with empirical per-edge attack success and 3x interception gain when monitoring the top 10%.

citing papers explorer

Showing 1 of 1 citing paper.

MESA: Prioritizing Vulnerable Communication Channels for Securing Multi-Agent Systems cs.CR · 2026-06-29 · unverdicted · none · ref 13 · internal anchor
MESA ranks MAS communication edges by vulnerability via graph-theoretic metrics and dynamic probes, achieving mean Spearman ρ=+0.60 correlation with empirical per-edge attack success and 3x interception gain when monitoring the top 10%.

Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain

fields

years

verdicts

representative citing papers

citing papers explorer