InInternational Conference on Learning Representations

Measuring massive multitask language understanding · 2025 · arXiv 2507.21134

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

cs.SD · 2026-04-16 · unverdicted · novelty 8.0

VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

cs.CR · 2026-05-06 · unverdicted · novelty 6.0

NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.

StealthGraph: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

cs.CL · 2026-01-08 · unverdicted · novelty 6.0

StealthGraph generates implicit domain-relevant harmful prompts via knowledge-graph guidance and two-strategy obfuscation to enable more realistic red-teaming of LLM safety.

citing papers explorer

Showing 1 of 1 citing paper after filters.

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation cs.CR · 2026-05-06 · unverdicted · none · ref 59
NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.

InInternational Conference on Learning Representations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer