HMNS is a new jailbreak method that uses causal head identification and nullspace-constrained injection to achieve higher attack success rates than prior techniques on aligned language models.
Qlora: Efficient finetuning of quantized llms
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.
AnalyticScore applies new FGTI interpretability principles to text-based scoring and achieves accuracy within 0.06 QWK of uninterpretable state-of-the-art while matching human featurization on the ASAP-SAS dataset.
citing papers explorer
-
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
HMNS is a new jailbreak method that uses causal head identification and nullspace-constrained injection to achieve higher attack success rates than prior techniques on aligned language models.
-
Leveraging RAG for Training-Free Alignment of LLMs
RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.
-
Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments
AnalyticScore applies new FGTI interpretability principles to text-based scoring and achieves accuracy within 0.06 QWK of uninterpretable state-of-the-art while matching human featurization on the ASAP-SAS dataset.