Disentangled Safety Adapters decouple safety computations from task-optimized LLMs via lightweight adapters, yielding up to 53% better AUC on safety tasks and dynamic inference-time alignment with reduced performance trade-offs.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Training-free methods for LLM trustworthiness show inconsistent results across dimensions, with clear trade-offs in utility, robustness, and overhead depending on where they intervene during inference.
citing papers explorer
-
Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment
Disentangled Safety Adapters decouple safety computations from task-optimized LLMs via lightweight adapters, yielding up to 53% better AUC on safety tasks and dynamic inference-time alignment with reduced performance trade-offs.