Training DetailsWe have done all the training of LLMs with LLaMA-Factory (Zheng et al., 2024), which is a popular toolbox for LLM training

into the safety training data · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection

cs.LG · 2026-02-08 · conditional · novelty 6.0

OGPSA projects safety gradients orthogonal to a low-rank subspace from general capability gradients, improving safety-utility trade-offs in SFT and DPO pipelines on Qwen2.5-7B and Llama3.1-8B.

citing papers explorer

Showing 1 of 1 citing paper.

Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection cs.LG · 2026-02-08 · conditional · none · ref 29
OGPSA projects safety gradients orthogonal to a low-rank subspace from general capability gradients, improving safety-utility trade-offs in SFT and DPO pipelines on Qwen2.5-7B and Llama3.1-8B.

Training DetailsWe have done all the training of LLMs with LLaMA-Factory (Zheng et al., 2024), which is a popular toolbox for LLM training

fields

years

verdicts

representative citing papers

citing papers explorer