pith. sign in

hub Mixed citations

Jailbreak and guard aligned language models with only few in-context demonstrations.arXiv preprint arXiv:2310.06387

Mixed citation behavior. Most common role is background (60%).

20 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 3 method 2

citation-polarity summary

representative citing papers

Secure LLM Fine-Tuning via Safety-Aware Probing

cs.LG · 2025-05-22 · unverdicted · novelty 6.0

SAP locates safety-correlated directions via contrastive signals and perturbs hidden-state propagation with a lightweight probe to preserve safety while fine-tuning LLMs for task performance.

citing papers explorer

Showing 20 of 20 citing papers.