SG-Bench: Evaluating LLM safety generalization across diverse tasks and prompt types

Yutao Mou, Shikun Zhang, Wei Ye · 2024 · arXiv 2410.21965

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

cs.SD · 2026-04-16 · unverdicted · novelty 8.0

VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.

Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)

cs.CR · 2026-05-19 · accept · novelty 7.0

Systematic review of thirteen malicious-code prompt corpora for coding LLM refusal evaluation that catalogs construction methods, surfaces gaps in human baselines, cross-corpus comparability, and malware taxonomies, and proposes methodological improvements.

Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers

cs.AI · 2026-05-28 · unverdicted · novelty 4.0

SafeDIG applies position-aware sparse feature transfer via SAEs in DiT models to reduce unsafe generations in target risk domains on FLUX.1 Dev and SD 3.5 while keeping source safety and quality.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers cs.AI · 2026-05-28 · unverdicted · none · ref 34
SafeDIG applies position-aware sparse feature transfer via SAEs in DiT models to reduce unsafe generations in target risk domains on FLUX.1 Dev and SD 3.5 while keeping source safety and quality.

SG-Bench: Evaluating LLM safety generalization across diverse tasks and prompt types

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer