Title resolution pending

· 2023 · arXiv 2311.08592

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain

cs.CL · 2025-09-07 · unverdicted · novelty 6.0

CoRT achieves 95% average attack success rate on nine LLMs by using iterative risk-concealing prompts and a controller that scores concealment levels on a new 522-instruction financial risk benchmark.

ShieldGemma: Generative AI Content Moderation Based on Gemma

cs.CL · 2024-07-31 · unverdicted · novelty 4.0

ShieldGemma delivers a family of Gemma2-based classifiers that outperform Llama Guard and WildCard on public safety benchmarks while introducing a synthetic-data curation pipeline for safety tasks.

citing papers explorer

Showing 2 of 2 citing papers.

Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain cs.CL · 2025-09-07 · unverdicted · none · ref 32
CoRT achieves 95% average attack success rate on nine LLMs by using iterative risk-concealing prompts and a controller that scores concealment levels on a new 522-instruction financial risk benchmark.
ShieldGemma: Generative AI Content Moderation Based on Gemma cs.CL · 2024-07-31 · unverdicted · none · ref 17
ShieldGemma delivers a family of Gemma2-based classifiers that outperform Llama Guard and WildCard on public safety benchmarks while introducing a synthetic-data curation pipeline for safety tasks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer